As humans expanded out of Africa into the rest of the world, they adapted to a whole host of new habitats, pathogens, and food sources. In recent years, there has been an explosion of interest in identifying the specific genetic loci underlying these adaptations using whole genome genotyping (and now sequencing). In this post, I’ll outline some of the basic principles of how these methods work.
Actually, I say “basic principles”, but what I mean is “basic principle”, because there’s only one: every method for detecting positive selection tries to identify alleles which have gone up in frequency unusually fast. The differences between methods for detecting selection all lie in the information they use to find such alleles (since for the moment, we don’t have good data on historical allele frequencies).
One class of methods makes a straightforward assumption: imagine that selection is acting in one population, but not another. In this case, the allele frequencies of the selected alleles in the first population will go up relatively quickly compared to the frequencies of those same alleles in the second population. The test, then, is simple: are there alleles that have unusually large allele frequency differences between two populations? This principle was recently used to identify the gene EPAS1 as potentially important in adaptation to high altitude in Tibetans . On the right is the relevant figure from that paper: a two-dimensional histogram of allele frequencies in a Han Chinese population (HAN) versus those in a Tibetan population (TIB). The points labeled EPAS1 are clear outliers in this distribution.
Another class of methods makes a different (and perhaps more restrictive) assumption: imagine that selection is acting on a mutation which is new (or at very low frequency) in a population. Again, selection will act to increase the frequency of the allele, with the result that there will be a young allele at relatively high frequency in the population. The age of an allele can be assessed by measuring the amount of genetic variation around the allele (as time passes, more mutations occur in the region) or the length of the haplotypes on which the allele sits (as time passes, recombination breaks up the association of the allele with nearby ones). Again the test is clear: find the young alleles at unusually high frequency. As an example of this, consider the genetic variation around the pigmentation gene KITLG pictured on the right (from ): each plot represents a population, each horizontal line in the plot represents a 500 kb haplotype in the population, and identical haplotypes are the same color. In the non-African populations, you can clearly see the large blocks of red, indicating that there is a long haplotype with very little variation (ie. a relatively young allele) nearly at fixation in these populations. Indeed, this gene is known to contribute to lighter skin in non-African populations .
I’ve tried to avoid the alphabet soup of acronyms for tests for selection in the above discussion, but people who have read papers using these tests will recognize tests based on haplotype homozygosity or the site frequency spectrum as tests of the latter type, and tests based on FST as tests of the first type (there are also tests like XP-EHH which combine aspects of the two). Though all these sorts of statistics have their pluses and minuses (perhaps to be discussed in future posts), the general principle remains quite similar.
 Yi et al. (2010) Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude. Science. DOI: 10.1126/science.1190371
 Coop et al. (2009) The Role of Geography in Human Adaptation. PLoS Genetics. DOI:10.1371/journal.pgen.1000500
 Miller et al. (2007) cis-Regulatory changes in Kit ligand expression and parallel evolution of pigmentation in sticklebacks and humans. Cell. DOI:10.1016/j.cell.2007.10.055