How widespread personal genomics could benefit molecular biology

While the majority of the buzz surrounding personal genomics has to do with prediction of disease risk and other medical applications, there’s clearly the potential for these sorts of technologies to influence basic science as well. In this post, I’ll lay out one such potential application: the use of personal genomics in understanding basic molecular biology, in particular the biology of transcriptional regulation in humans.

What’s the question?

Many readers will remember from their introductory biology classes the central dogma of molecular biology (so named by Francis Crick out of apparent misunderstanding of the word “dogma”): that information in the genome flows from DNA to RNA to protein. A major outstanding question in molecular biology centers around the first step in that process: how does the information in DNA tell the cell what RNAs to produce, and at what level to produce them?

In some sense, this question can be broken down into a few related, but somewhat independent questions: first, where in the genome is the information that tells RNA polymerase where to bind and begin creating RNA? Second, once transcription has begun, what DNA signals tell it to stop? And finally, once an RNA has been produced, where is the information encoded that directs splicing to form an mRNA that can now be translated into protein?

How does personal genomics help?

Let’s take a particular example: imagine you wanted to identify all the bases important for the inclusion of a particular exon in a processed mRNA. Ideally, what you’d want to do is alter each potentially important base and assay its effect. The standard way to do this would be to generate bits of DNA carrying each variant exon (a minigene), pop them into a cell, and measure the level of splicing of the exon. There are a number of problems with this approach, however: from a scientific point of view, splicing in this sort of assay might not be equivalent to what happens in vivo in the context of chromatin; and from a practical point of view, it’s simply a lot of work to do hundreds of such experiments.

So ideally, you want a system where you could mutate each base in the exon with relative ease and examine the effects of each mutation in its “natural environment”. The connection with personal genomics is then clear: evolution has done this exact experiment for us. Every generation, every base in the human genome is mutated about 70 times [1]. Some of these mutations never see the light of day because they’re lethal, and most will only stick around for a generation or two, but some will make it to reasonable frequency in the population. In any case, at most bases in the genome, if we could sample enough people, we could find a mutation to test.

One could imagine, then, at some point in the future, that the hypothetical investigator curious about the positions necessary for the efficient splicing of an exon could order up cell lines from individuals with mutations in each possible regulatory base and assay their effects (or better yet, simply download genome-wide splicing data). Is this far-fetched? Well, consider that the Personal Genome Project is generating cell lines from all of its participants, and that they have already had some success using these lines to study gene regulation. With a few cell lines, this approach is interesting, but with hundreds of thousands it will become enormously powerful.

[1] For a back of the envelope calculation, say the mutation rate is about 1×10-8 /base/generation, and the human population size is about 7×109.

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

5 Responses to “How widespread personal genomics could benefit molecular biology”


  • So how cheap is to curate cell lines. A million cell lines?

  • Another great GU. A small contribution: I heard an E. Lander lecture on iTunes U. the other day where E. Lander said Crick called it the “central dogma” because at the time he conceived of the idea it was based on nothing but faith (no data).

  • @Steve

    Keeping cell lines for hundreds of thousands of individuals is expensive, but not crazily so. The Coriell Institute holds about 300,000 cell lines, with a $10m (£6m) budget, much of which will go on research, not directly into keeping the collections. For comparison, this is somewhat less than the £7.7m budget as the WTCCC2.

  • @Joe

    This sounds pretty similar to, for instance, the Cambridge BioResource.

    The big advantage here really is the assumption that everyone will be being sequenced as part of the default healthcare track. But once virtually everyone is sequenced by default by the health service, huge swathes of genetics become trivial. Genome-wide association studies are easy when all you need is signed consent from your case individuals to use their existing genome sequence, rather than needing to collect, store and sequence samples. Population genetics is also a lot easier when you know the genetics of the whole population.

  • Luke, thanks, I wasn’t aware of the Cambridge BioResource. Once all those individuals are sequenced, this sort of thing should be quite feasible.

Comments are currently closed.

Page optimized by WP Minify WordPress Plugin