(This is an extended version of a short piece written as part of a series organized by the excellent Mary Carmichael at Newsweek. Readers eager for more detail on the statistics behind risk prediction should read Kate’s excellent discussion posted yesterday.)
In 2003 Francis Collins, having just led the human genome project to completion, made a prediction: within ten years, “predictive genetic tests will exist for many common conditions” and “each of us can learn of our individual risks for future illness”. The deadline of his prophecy is fast approaching, but how close are we to realizing his vision of being able to get a read-out of disease risk from a person’s DNA?
In order to evaluate this question, it’s helpful to look back at the state of human disease genetics before the human genome project had even begun. Geneticists had become very efficient at pinpointing the roots of so-called “monogenic” disorders (e.g. sickle cell anemia, cystic fibrosis) where a rare defect in a single gene causes disease. These discoveries provided both insight into the biology of these diseases and provided highly predictive genetic tests. For instance, the disrupted gene in sickle cell anemia, called HBB, plays a key role in oxygen transportation by red blood cells, and the mutation which causes the disease is routinely used as a diagnostic test in at risk babies in hospitals around the world.
By contrast, the more complex conditions that Collins hopes to predict (such as diabetes or multiple sclerosis) aren’t caused by a catastrophic problem with a single gene, but are instead subtly influenced by a combination of many different genes, as well as environmental factors such as diet and exercise. Progress in understanding the genetic part of that equation has accelerated rapidly in the last four years as genome-wide association studies (GWAS) have identified hundreds of locations in the human genome which influence a wide variety of diseases and traits. Much like the monogenic examples described above, the specific genes associated with each disease have told us a great deal about the underlying biology of disease. For example, the most recent GWAS of type 2 diabetes highlighted a previously unsuspected mechanism: many associated genes are involved in regulating the cell cycle (the fundamental process by which cells grow, replicate and divide throughout our lives).
Unlike monogenic disorders, however, the predictive power of the variants discovered by GWAS is generally very poor. The gene variants discovered by GWAS barely nudge someone’s overall risk, typically increasing it by a factor of 1.1–1.5. These tiny effects can only be found by studying tens of thousands of individuals, which is critical when interpreting these findings in light of one person’s disease risk: statistically significant association in a population does not translate into meaningful individual prediction. For instance, GWAS have found 38 genes affecting type 2 diabetes, but these only explain about 10% of its observed heritability. This means that current prediction algorithms based solely on these genes are missing the majority of relevant genetic information, as well as all the environmental factors! The combination of small genetic effects documented to date with the lack of key environmental information severely hampers the statistical models used to predict genetic disease risk.
Genetic risk prediction of these conditions is further clouded by the difficulty in translating results from the scientific literature to something more relevant to an individual. GWAS typically report something called a “relative risk,” which measures the increased chance someone with a particular genetic variant has of getting sick compared to the background rate of that disease in his community. Translating this information to a meaningful personal prediction can be tricky, because the background rate can vary widely around the world. If an individual isn’t well matched to the background in a study, basing his personal predictions on that study could yield highly inaccurate results. Furthermore, someone’s interpretation of a given relative risk could change dramatically depending on the underlying population risks for different diseases: a two-fold increase in predicted risk of multiple sclerosis would be a rounding error to most people (a change from 0.1% to 0.2%) but the same effect size for diabetes would represent an alarming increase from 20% to 40% lifetime risk.
Nevertheless, there is some hope for predicting our risk of getting sick from our genes. Genes have already been discovered for some traits, such as severe adverse reactions to certain drugs, which are essentially monogenic. These are already used clinically. There are also many types of genetic risk factor which are hidden to GWAS technologies (such as low frequency variants of intermediate effect size), and the rapid decrease in the cost of sequencing a person’s whole genome is likely to unleash a new wave of discoveries in coming years. These advances could be combined with prediction models which incorporate non-genetic information, or are used in conjunction with specific symptoms to aid diagnosis. The clock is ticking, but time hasn’t quite run out for Collins’ prediction about prediction.