As I mentioned a few weeks ago, we recently published a large study into the genetics of inflammatory bowel disease (IBD), which included a number of analyses digging into the biology and evolutionary history of IBD genetic risk. Gratifyingly, our paper has stimulated a lot of discussion among other scientists, which has generated several ideas about future directions for this work. One question that was raised by several population-genetics experts at ASHG was about our natural selection analysis, and in particular our claim to discover an enrichment of balancing selection in IBD loci. In the paper, we found clear signals of natural selection on IBD loci, a subset of which we interpreted as balancing selection. In this post I will set out how I came to this conclusion, but then outline another explanation that could explain the results: recent local positive selection in Europeans.
Tag Archive for 'GWAS'
Out in Nature this week is a paper by three Genomes Unzipped authors reporting 71 new genetic associations with inflammatory bowel disease (IBD). This breaks the record for the largest number of associations for any common disease, and includes many new and interesting biological insights that you should all go and read about in the paper itself (pay-to-access I’m afraid) or on the Sanger Institute’s website.
One thing that we did not discuss in the paper was genetic prediction of IBD (i.e. using the risk variants we have discovered to predict who will or will not develop the disease). In this post I want to outline some of the situations in which we have considered using genetic risk prediction of IBD, and discuss whether any of them would actually work in practice.
[Last week, Ed Yong at Not Exactly Rocket Science covered a paper positing an association between a genetic variant and an aspect of social behavior called prosociality. On Twitter, Daniel and Joe dismissed this study out of hand due to its small sample size (n = 23), leading Ed to update his post. Daniel and Joe were then contacted by Alex Kogan, the first author of the study in question. He kindly shared his data with us, and agreed to an exchange here on Genomes Unzipped. In this post, we expand on our point about the importance of sample size; Alex’s reply is here.
Edit 01/12/11 (DM): The original version of this post included language that could have been interpreted as an overly broad attack on more serious, well-powered studies in psychiatric disease genetics. I've edited the post to reduce the possibility of collateral damage. To be clear: we're against over-interpretation of results from small studies, not behavioral genetics as a whole, and I apologise for any unintended conflation of the two.]
In October of 1992, genetics researchers published a potentially groundbreaking finding in Nature: a genetic variant in the angiotensin-converting enzyme ACE appeared to modify an individual’s risk of having a heart attack. This finding was notable at the time for the size of the study, which involved a total of over 500 individuals from four cohorts, and the effect size of the identified variant–in a population initially identified as low-risk for heart attack, the variant had an odds ratio of over 3 (with a corresponding p-value less than 0.0001).
Readers familiar with the history of medical association studies will be unsurprised by what happened over the next few years: initial excitement (this same polymorphism was associated with diabetes! And longevity!) was followed by inconclusive replication studies and, ultimately, disappointment. In 2000, 8 years after the initial report, a large study involving over 5,000 cases and controls found absolutely no detectable effect of the ACE polymorphism on heart attack risk. In the meantime, the same polymorphism had turned up in dozens of other association studies for a wide range of traits ranging from obstetric cholestasis to meningococcal disease in children, virtually none of which have ever been convincingly replicated.
Continue reading ‘Size matters, and other lessons from medical genetics’
Over the last several years, the number of genetic variants unambiguously associated with disease risk has grown dramatically. However, interpreting these signals has been extremely difficult—most of the identified variants do not disrupt genes, and indeed many don’t fall anywhere near genes (this observation has even led some to discount these signals entirely). To an investigator interested in following up on these signals, this is somewhat depressing: how can we hope to explore how polymorphisms affect disease risk if they don’t seem to fall in any sort of genome annotation that we understand?
In this context, I thought I’d point to an important paper that, among many other things, gives the first systematic evidence that variants which influence disease are not just randomly scattered across the genome, but instead tend to fall in particular regions—in particular, enhancer elements (regions where DNA-binding proteins interact with DNA to influence gene expression).
The authors rely on the fact that, in the cell, DNA is wrapped around proteins called histones, which control how accessible the DNA is to things like transcription factors (see above figure). These proteins can be chemically modified, and it is now clear that particular patterns of modifications are predictive of the function of the DNA in the region—some modifications indicate transcribed genes, others regions of enhancer activity, others repressed regions, etc.
What the authors did in this study was generate genome-wide maps of several histone modifications in nine different cell types, and use this data to predict the function of each 200 base pair segment of the human genome in each cell type. There are a number of interesting analyses of these “maps” of genome function in the paper, but for our purposes here there’s one of particular interest: the authors took sets of SNPs associated with various diseases and simply asked, are these variants enriched in regions with any particular functional prediction? And indeed, for several phenotypes, there is a striking enrichment of association signals in enhancers elements in a relevant cell type. For example, SNPs which influence lipid levels are enriched in enhancers in a liver cancer cell line, and SNPs which influence the autoimmune disease lupus are enriched in enhancers in a lymphoblastoid cell line.
As these types of functional maps are generated in more cell types, I imagine there will be more stories like this. The problem with interpreting disease association studies, it seems likely, is largely due to our lack of understanding of genome function.
Citation: Ernst et al. (2011) Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. doi:10.1038/nature09906
Early last year David Goldstein and colleagues published a provocative paper claiming that many GWAS associations are driven not by common variants of modest effect (the canonical common disease – common variant hypothesis underpinning GWAS) but instead by a local cluster of lower frequency variants that have much bigger effects on disease risk. They dubbed this hypothesized phenomenon “synthetic association” and the term quickly became a genetics buzzword. The paper was widely discussed in both the specialist and mainstream media, and caused quite a stir among academic statistical geneticists.
That debate has been re-opened today by a set of Perspectives in PLoS Biology: a rebuttal by us (Carl & Jeff) and our colleagues at Sanger, a rebuttal by Naomi Wray, Shaun Purcell and Peter Visscher, a rebuttal to the rebuttals by David Goldstein and an editorial by Robert Shields to tie it all together.
Last week, a post went up on the Bioscience Resource Project blog entited The Great DNA Data Deficit. This is another in a long string of “Death of GWAS” posts that have appeared around the last year. The authors claim that because GWAS has failed to identify many “major disease genes”, i.e. high frequency variants with large effect on disease, it was therefore not worthwhile; this is all old stuff, that I have discussed elsewhere (see also my “Standard GWAS Disclaimer” below). In this case, the authors argue that the genetic contribution to complex disease has been massively overestimated, and in fact genetics does not play as large a part in disease as we believe.
The one particularly new thing about this article is that they actually look at the foundation for beliefs about missing heritability; the twin studies of identical and non-identical twins from which we get our estimates of the heritability of disease. I approve of this: I think all those who are interested in the genetics of disease should be fluent in the methodology of twin studies. However, in this case, the authors come to the rather odd conclusion that heritability measures are largely useless, based on a small statistical misunderstanding of how such studies are done.
I thought I would use this opportunity to explain, in relative detail, where we get our estimates of heritability from, why they are generally well-measured and robust, and real issues need to be considered when interpreting twin study results. This post is going to contain a little bit of maths, but don’t worry if it scares you a little, you only really need to get the gist.
Continue reading ‘Estimating heritability using twins’