Detecting positive natural selection from genetic data

As humans expanded out of Africa into the rest of the world, they adapted to a whole host of new habitats, pathogens, and food sources. In recent years, there has been an explosion of interest in identifying the specific genetic loci underlying these adaptations using whole genome genotyping (and now sequencing). In this post, I’ll outline some of the basic principles of how these methods work.

Testing for traces of Neanderthal in your own genome

I’m guessing everyone reading this post is familiar with recent research from Svante Paabo’s group indicating that modern humans interbred with Neanderthals during their long co-existence in Eurasia between 30,000 and 80,000 years ago. According to the researchers’ calculations, somewhere between 1 and 4% of the DNA in modern non-African humans is derived from these interbreeding events – in other words, many of us are walking around with Neanderthal DNA sitting in our genomes.

So how much of your genome is Neanderthal? Over at The Genetic Genealogist, Blaine Bettinger takes a look at the options currently available to those interested in digging for Neanderthal ancestry in their own genetic backyard. Blaine notes that one company is already offering a test labelled as looking for Neanderthal ancestry based on a limited number of variable (microsatellite) markers. However, this test doesn’t actually look directly for putative Neanderthal-derived variants; instead, it (rather quaintly) tests for “strong matches between your DNA fingerprint [...] and populations identified as “archaic,” that is, whose composition retains the earliest earmarks of out‐of‐Africa genetics.” This is a very rough approach to the problem, to put it mildly.

Added in edit 15/07/10: John Hawks has a justifiably scathing review of the test on his blog; I’ve removed links to the company from this post to avoid giving them extra publicity.

People who have already had their genomes scanned by a company like 23andMe theoretically have sufficient data already available to perform a much higher-resolution analysis. However, sadly there’s not yet any readily available algorithm out there for doing this, despite there being (as Blaine notes) substantial interest for such a test from amongst the 23andMe community.

Seems like there’s some real scope for a DIY genomics tool here. Is anyone out there already working on this? Let us know in the comments.

