While the majority of the buzz surrounding personal genomics has to do with prediction of disease risk and other medical applications, there’s clearly the potential for these sorts of technologies to influence basic science as well. In this post, I’ll lay out one such potential application: the use of personal genomics in understanding basic molecular biology, in particular the biology of transcriptional regulation in humans.
What’s the question?
Many readers will remember from their introductory biology classes the central dogma of molecular biology (so named by Francis Crick out of apparent misunderstanding of the word “dogma”): that information in the genome flows from DNA to RNA to protein. A major outstanding question in molecular biology centers around the first step in that process: how does the information in DNA tell the cell what RNAs to produce, and at what level to produce them?
In some sense, this question can be broken down into a few related, but somewhat independent questions: first, where in the genome is the information that tells RNA polymerase where to bind and begin creating RNA? Second, once transcription has begun, what DNA signals tell it to stop? And finally, once an RNA has been produced, where is the information encoded that directs splicing to form an mRNA that can now be translated into protein?
How does personal genomics help?
Let’s take a particular example: imagine you wanted to identify all the bases important for the inclusion of a particular exon in a processed mRNA. Ideally, what you’d want to do is alter each potentially important base and assay its effect. The standard way to do this would be to generate bits of DNA carrying each variant exon (a minigene), pop them into a cell, and measure the level of splicing of the exon. There are a number of problems with this approach, however: from a scientific point of view, splicing in this sort of assay might not be equivalent to what happens in vivo in the context of chromatin; and from a practical point of view, it’s simply a lot of work to do hundreds of such experiments.
So ideally, you want a system where you could mutate each base in the exon with relative ease and examine the effects of each mutation in its “natural environment”. The connection with personal genomics is then clear: evolution has done this exact experiment for us. Every generation, every base in the human genome is mutated about 70 times . Some of these mutations never see the light of day because they’re lethal, and most will only stick around for a generation or two, but some will make it to reasonable frequency in the population. In any case, at most bases in the genome, if we could sample enough people, we could find a mutation to test.
One could imagine, then, at some point in the future, that the hypothetical investigator curious about the positions necessary for the efficient splicing of an exon could order up cell lines from individuals with mutations in each possible regulatory base and assay their effects (or better yet, simply download genome-wide splicing data). Is this far-fetched? Well, consider that the Personal Genome Project is generating cell lines from all of its participants, and that they have already had some success using these lines to study gene regulation. With a few cell lines, this approach is interesting, but with hundreds of thousands it will become enormously powerful.
 For a back of the envelope calculation, say the mutation rate is about 1×10-8 /base/generation, and the human population size is about 7×109.