Out in Nature this week is a paper by three Genomes Unzipped authors reporting 71 new genetic associations with inflammatory bowel disease (IBD). This breaks the record for the largest number of associations for any common disease, and includes many new and interesting biological insights that you should all go and read about in the paper itself (pay-to-access I’m afraid) or on the Sanger Institute’s website.
One thing that we did not discuss in the paper was genetic prediction of IBD (i.e. using the risk variants we have discovered to predict who will or will not develop the disease). In this post I want to outline some of the situations in which we have considered using genetic risk prediction of IBD, and discuss whether any of them would actually work in practice.
Continue reading ‘Dozens of new IBD genes, but can they predict disease?’
Over at Nature News, Erika Check Hayden has a post about a recent Science Translational Medicine paper by Bert Vogelstein and colleages looking at the potential predictive power of genetics. The take-home message from the study (or at least the message that has been taken home by, e.g., this NYT article) is that DNA does not perfectly determine which disease or diseases you may get in the future. This take home message is true, and to me relatively obvious (in the same way that smoking doesn’t perfectly determine lung cancer, or body weight and dietary health doesn’t perfectly determine diabetes status).
A lot of researchers have had a pretty negative reaction to this paper (see Erika’s storify of the twitter coverage). There are lots of legitimate criticism (see Erika’s post for details), but to be honest I suspect that a lot of this is a mixture of indignation and sour grapes that this paper, a not particularly original or particularly well done attempt to answer a question that many other people have answered before, got so much press (including a feature in the NYT). A very large number of people have tried to quantify the potential predictive power of genetics for a number of years – why was there no news feature for me and Jeff, or David Clayton, or Naomi Wray and Peter Visccher, or any of the other large number of stat-gen folks who have been doing exactly these studies for years. ANGER RISING and so forth.
But of course, the reason is relatively obvious. Continue reading ‘Identical twins usually do not die from the same thing’
It's all a game of Risk!
The first thing I did when I received my genotyping results from 23andMe was log on to their website and take a look at my estimated disease risks. For most people, these estimates are one of the primary reasons for buying a direct to consumer (DTC) genetics kit. But how accurate are these disease risk estimates? How robust is the information that goes into calculating them? In a previous post I focused on how odds ratios (the ratio of the odds of disease if allele A is carried as opposed to allele B) can vary across different populations, environments and age groups and, as a consequence, affect disease risk estimates. It turns out that even if we forget about these concerns for a moment, getting an accurate estimate of disease risk is far from straightforward. One of the primary challenges is deciding which disease loci to include in the risk prediction and in this post I will investigate the effect this decision can have on risk estimates.
To help me in my quest, I will use ulcerative colitis (UC) as an example throughout the post, estimating Genomes Unzipped members’ risk for the disease as I go. Ulcerative colitis is one of two common forms of autoimmune infllammatory bowel disease and I have selected it not on the basis of any special properties (either genetic or biological) but because I am familiar with the genetics of the disease having worked on it extensively.
The table below gives our ulcerative colitis risks according to 23andMe. The numbers in the table represent the percentage of people 23andMe would expect to suffer from UC given our genotype data (after taking our sex and ethnicity into account). The colours highlight individuals who fall into 23andMe’s “increased risk” (red) or “decreased risk” (blue) categories based on comparisons with the average risk (males: 0.77%; females 0.51%). As far as I am aware none of us actually do suffer from UC.
Continue reading ‘At odds with disease risk estimates’
Razib Khan, more known for his detailed low-downs of population biology and history, has written an important post on Gene Expression, explaining in careful detail exactly how to run some simple population genetic analysis on public genomes, as well as on your own personal genomics data. The outcome of the tutorial is an ADMIXTURE plot (like the one to the left), showing what proportion of your genome comes from different ancestral populations. This sort of analysis is not difficult, but it can often be hard to know how to start, so Razib’s post gives a good landing point for people who want to dig deaper into their own genomes.
This tutorial also ties in to some political ideas that Razib has been talking about since the recent call to allow access to genomic information only via prescription. If you are worried about losing access to your genome, one option is to ensure that you do not require companies to generate and interpret your genome. As sequencing, genotyping and computing prices fall, DIY genetics becomes more and more plausible. Learn to discover things about your own genome, and no-one will be able to take that away from you. [LJ]
Continue reading ‘Analysing your own genome, bloggers respond to the FDA and more reporting on bogus GWAS results’
Last week, a post went up on the Bioscience Resource Project blog entited The Great DNA Data Deficit. This is another in a long string of “Death of GWAS” posts that have appeared around the last year. The authors claim that because GWAS has failed to identify many “major disease genes”, i.e. high frequency variants with large effect on disease, it was therefore not worthwhile; this is all old stuff, that I have discussed elsewhere (see also my “Standard GWAS Disclaimer” below). In this case, the authors argue that the genetic contribution to complex disease has been massively overestimated, and in fact genetics does not play as large a part in disease as we believe.
The one particularly new thing about this article is that they actually look at the foundation for beliefs about missing heritability; the twin studies of identical and non-identical twins from which we get our estimates of the heritability of disease. I approve of this: I think all those who are interested in the genetics of disease should be fluent in the methodology of twin studies. However, in this case, the authors come to the rather odd conclusion that heritability measures are largely useless, based on a small statistical misunderstanding of how such studies are done.
I thought I would use this opportunity to explain, in relative detail, where we get our estimates of heritability from, why they are generally well-measured and robust, and real issues need to be considered when interpreting twin study results. This post is going to contain a little bit of maths, but don’t worry if it scares you a little, you only really need to get the gist.
Continue reading ‘Estimating heritability using twins’
A quick note about the Reader Survey; we are going to stop taking responses at the end of Saturday (Pacific Time). If you haven’t already done so, please fill out the survey now.
A couple of interesting articles this week on the Personal Genome Project and public genomics in general. Mark Henderson at the Times has an opinion piece (behind a paywall, I’m afraid) about Misha Angrist‘s book Here Is A Human Being (see also this review from The Intersection), and in the Duke Magazine Mary Carmichael has an in-depth feature on the work of George Church, with some interesting history of the early days of the PGP.
One aspect that comes out of these articles is how those who take part in public genomics projects are starting to own the unknown unknowns. They accept that they cannot anticipate all the risks of making their data public, but are willing to take the risk of exposing themselves to these unknown risks, and in doing so turn them into knowns. Another aspect is the sheer number of individuals who want to sign up to have their data published online: 15,000 people have expressed interesting in being part of the PGP, despite initial NIH concerns the no-one would want to take part at all. This also chimes with research presented at ASHG this year, showing that members of the public are more concerned with contributing to scientific knowledge, and, crucially, getting access to their own genetic data than they are about the potential risks that such data could expose them too. [LJ]
Continue reading ‘Friday Links’
In the recent report from the US Government Accountability Office on direct-to-consumer genetic tests, much was made of the fact that risk predictions from DTC genetic tests may not be applicable to individuals from all ethnic groups. This observation was not new to the report – it has been commented on by numerous critics ever since the inception of the personal genomics industry.
So, why does risk prediction accuracy vary between individuals and what can be done to combat this? Are the DTC companies really to blame?
To explore these questions it is first necessary to understand what is meant by the odds ratio (OR). In genetic case-control association studies the OR typically represents the ratio of the odds of disease if allele A is carried compared to if allele B is carried. If all else is equal, genetic loci with a higher OR are more informative for disease prediction – so getting an accurate estimate is extremely important if prediction underpins your business model. However, getting an accurate estimate of OR is far from easy because many, often unmeasured, factors can cause OR estimates to vary. In this post I will try to break down the concept of a single, fixed odds ratio for a disease association, and highlight a number of factors that can cause odds ratios to vary using examples from the scientific literature.
Continue reading ‘Getting even with the odds ratio’
(This is an extended version of a short piece written as part of a series organized by the excellent Mary Carmichael at Newsweek. Readers eager for more detail on the statistics behind risk prediction should read Kate’s excellent discussion posted yesterday.)
In 2003 Francis Collins, having just led the human genome project to completion, made a prediction: within ten years, “predictive genetic tests will exist for many common conditions” and “each of us can learn of our individual risks for future illness”. The deadline of his prophecy is fast approaching, but how close are we to realizing his vision of being able to get a read-out of disease risk from a person’s DNA?
Continue reading ‘Why prediction is a risky business’