Society and the personal genome

Victory! Those of us involved in genomics research spend a lot of time thinking about how scientific and technological developments might influence personal genomics. For instance, does the falling cost of sequencing mean that medically useful personal genomics will likely be based on sequence rather than genotype data? (Yes.)

At the Sanger Institute we’ve recently launched (along with our friends at EBI) a project to look more deeply at a question which is less often on the lips of genomics boffins: “How does genomics affect as us people, both individually and in communities?” Because of the obvious resonance with Genomes Unzipped it should come as no surprise that many of us (including myself, Daniel and Luke) have been intimately involved in this initiative.

The actual line-up of events has been diverse, and a lot of fun. We’ve had two excellent debates, including one between Ewan Birney and Paul Flicek (pictured) on the value, or lack thereof, of celebrity genomes (covered in more detail here). A poet, Fiona Sampson, spent some time on campus and we’ve commissioned a book of poetry from her. This one raised some eyebrows, but I have to say that talking to her has given me some brand new ways of thinking about my own work. We’re also working on a more interactive project in the hope of making personal genomics a bit more personal. Stay tuned.

Size matters, and other lessons from medical genetics

Size really matters: prior to the era of large genome-wide association studies, the large effect sizes reported in small initial genetic studies often dwindled towards zero (that is, an odds ratio of one) as more samples were studied. Adapted from Ioannidis et al., Nat Genet 29:306-309.

[Last week, Ed Yong at Not Exactly Rocket Science covered a paper positing an association between a genetic variant and an aspect of social behavior called prosociality. On Twitter, Daniel and Joe dismissed this study out of hand due to its small sample size (n = 23), leading Ed to update his post. Daniel and Joe were then contacted by Alex Kogan, the first author of the study in question. He kindly shared his data with us, and agreed to an exchange here on Genomes Unzipped. In this post, we expand on our point about the importance of sample size; Alex’s reply is here.

Edit 01/12/11 (DM): The original version of this post included language that could have been interpreted as an overly broad attack on more serious, well-powered studies in psychiatric disease genetics. I've edited the post to reduce the possibility of collateral damage. To be clear: we're against over-interpretation of results from small studies, not behavioral genetics as a whole, and I apologise for any unintended conflation of the two.]

In October of 1992, genetics researchers published a potentially groundbreaking finding in Nature: a genetic variant in the angiotensin-converting enzyme ACE appeared to modify an individual’s risk of having a heart attack. This finding was notable at the time for the size of the study, which involved a total of over 500 individuals from four cohorts, and the effect size of the identified variant–in a population initially identified as low-risk for heart attack, the variant had an odds ratio of over 3 (with a corresponding p-value less than 0.0001).

Readers familiar with the history of medical association studies will be unsurprised by what happened over the next few years: initial excitement (this same polymorphism was associated with diabetes! And longevity!) was followed by inconclusive replication studies and, ultimately, disappointment. In 2000, 8 years after the initial report, a large study involving over 5,000 cases and controls found absolutely no detectable effect of the ACE polymorphism on heart attack risk. In the meantime, the same polymorphism had turned up in dozens of other association studies for a wide range of traits ranging from obstet­ric cholestasis to menin­go­­coccal disease in children, virtually none of which have ever been convincingly replicated.
Genetic risk prediction in complex disease

I thought I’d point out a review article in Human Molecular Genetics that just came out in (open access) preprint form by Luke and myself on genetic risk prediction in complex disease. In it we discuss some of the strengths and weaknesses of genetic and risk prediction compared to classical epidemiological predictors, different statistical modelling considerations, and the effect of GWAS on prediction. Readers of this space might find the conclusion of some interest, where we consider some of the societal aspects of trying to bring the interpretation of genomes into mainstream medical practice.

Analysing your own genome, bloggers respond to the FDA and more reporting on bogus GWAS results

Razib Khan, more known for his detailed low-downs of population biology and history, has written an important post on Gene Expression, explaining in careful detail exactly how to run some simple population genetic analysis on public genomes, as well as on your own personal genomics data. The outcome of the tutorial is an ADMIXTURE plot (like the one to the left), showing what proportion of your genome comes from different ancestral populations. This sort of analysis is not difficult, but it can often be hard to know how to start, so Razib’s post gives a good landing point for people who want to dig deaper into their own genomes.

This tutorial also ties in to some political ideas that Razib has been talking about since the recent call to allow access to genomic information only via prescription. If you are worried about losing access to your genome, one option is to ensure that you do not require companies to generate and interpret your genome. As sequencing, genotyping and computing prices fall, DIY genetics becomes more and more plausible. Learn to discover things about your own genome, and no-one will be able to take that away from you. [LJ]

Predicting lupus outcomes, US biomedical funding battles and telling children about genetic disease

There are a pair of papers in PLoS Genetics that shine some light on the effect of common GWAS variants on complex traits. The first investigates the effect of 22 common variants on sub-phenotypes of systemic lupus erythematosus, in how well a model including clinical measures plus GWAS variants can predict specific complications of lupus, over a model including just clinical outcomes. In some cases, there is little improvement (GWAS variants add nothing to prediction of renal failure, for instance), but in many there is a measurable improvement (such as for hameatological disorder and oral ulcers, the former of which is largely unpredictable from clinical outcomes). The second paper is a breakdown of the effect of the common obesity-associated variant FTO on BMI across age ranges; we see an interesting effect, whereby the variant that causes an increase in BMI in older children actually causes a fall in BMI in children under the age of 2. [LJ]

It’s budget battle season in the United States, and biomedical research funding looks likely to be caught in the crossfire. President Obama has proposed a $745 x 106 increase in the NIH budget, bringing it to $31.8 x 109 in total. This wouldn’t quite keep up with inflation, leading to a slight decrease in grant success rates from America’s largest biomedical research funder. The Republican-controlled House of Representatives, by contrast, has slashed the NIH budget by $1.6 x 109 in their proposed budget (bill HR1), which would be a heavy blow to research funding. Of course, scientists, non-crazy editorial writers and activist groups have been rallying around protecting research funding (in the NIH and beyond). Unfortunately I wouldn’t expect a speedy resolution, as veteran US politics blogger Nate Silver likens the whole situation to Zugzwang. [JCB]

HiSeq doubles its output, a next-gen sequencing primer, and return of genetic data to patients

Illumina CEO Jay Flatley announced that an upgrade to their HiSeq 2000 platform expected this spring will allow users to generate 600 gigabases of sequence (the equivalent of 5 high quality human genomes) per one-week run of the machine. This would essentially double the current throughput of the platform and propel Illumina even further ahead in the arms race of delivering vast quantities of low cost sequence data. [JCB]

Over at Golden Helix, Gabe Rudy has just completed a three-part series introducing readers to the promise and challenges of new DNA sequencing technologies, which is well worth a read for those just starting out in the analysis of next-gen sequence data or who have a more-than-casual interest in the current state of the field. [DM]

This month’s edition of Trends in Genetics includes a review article on the ethical issues raised by the feedback of individual genetic data to research participants by Bredenoord and colleagues. This has long been a subject of debate, but the recent increase in studies that assay a large number of genetic variants (such as genome-wide association studies and whole-genome sequencing studies) has brought this issue to the fore. There is currently no consensus on how to deal with this, and in my experience the approach favoured has varied both between projects and between the ethics committees that have assessed them.

Are synthetic associations a man-made phenomenon?

Early last year David Goldstein and colleagues published a provocative paper claiming that many GWAS associations are driven not by common variants of modest effect (the canonical common disease – common variant hypothesis underpinning GWAS) but instead by a local cluster of lower frequency  variants that have much bigger effects on disease risk. They dubbed this hypothesized phenomenon “synthetic association” and the term quickly became a genetics buzzword. The paper was widely discussed in both the specialist and mainstream media, and caused quite a stir among academic statistical geneticists.

That debate has been re-opened today by a set of Perspectives in PLoS Biology: a rebuttal by us (Carl & Jeff) and our colleagues at Sanger, a rebuttal by Naomi Wray, Shaun Purcell and Peter Visscher, a rebuttal to the rebuttals by David Goldstein and an editorial by Robert Shields to tie it all together.

Our favourite papers of 2010

To celebrate the end of the blogging year here at Genomes Unzipped, we wanted to spend a bit of time reminiscing about the papers we enjoyed the most in 2010. Feel free to add your own suggestions in the comments!

Joe: Mice, men, and PRDM9. A key goal in evolutionary biology is to identify the mechanisms leading to speciation. One way to get at that goal is to identify genes that cause sterility or reduced fitness in hybrids between species or diverged populations. In mammals, exactly one such gene has been identified to date: the DNA-binding protein PRDM9. This year, three groups working on a seemingly different problem–deciphering the molecular mechanisms by which recombination shuffles genetic variation between generations–stumbled across an important gene in this process: PRDM9. Variation in this gene influences recombination patterns in both mice and humans, and is responsible for the dramatic differences in recombination patterns between humans and chimpanzees. Is it a simple coincidence that a gene which influences recombination also appears to have a role in speciation? Time will tell.

Parvanov et al. (2010) Prdm9 Controls Activation of Mammalian Recombination Hotspots. Science. DOI: 10.1126/science.1181495.

Baudat et al. (2010). PRDM9 Is a Major Determinant of Meiotic Recombination Hotspots in Humans and Mice. Science. DOI: 10.1126/science.1183439.

Myers et al. (2010). Drive Against Hotspot Motifs in Primates Implicates the PRDM9 Gene in Meiotic Recombination. Science. DOI: 10.1126/science.1182363.

Daniel: Whole-genome sequencing to develop personalised cancer assays. The area of medicine where the transforming power of new DNA sequencing technologies is moving the fastest is in cancer diagnostics and therapy. There were many studies relevant to this field in 2010 (with a fair proportion featuring on the excellent MassGenomics blog), but this paper was a simple, elegant example: the authors performed low-coverage whole-genome sequencing of four tumour samples, identified large genomic rearrangements present in the tumour cells but not in the patient’s healthy tissue, and then designed personalised, quantitative assays measuring the proportion of cells carrying these rearrangements in the patients’ blood. These assays allowed them to track, almost in real time, how the patients’ cancers responded to various therapies, like so:

Leary et al. (2010) Development of personalized tumor biomarkers using massively parallel sequencing. Science Translational Medicine. DOI: 10.1126/scitranslmed.3000702.
Friday links

Welcome to the inaugural Friday links post. We’ll be using these posts to share interesting articles stumbled across by Unzipped members during the week.

We’re still tweaking the format, but the basic idea will be a brief paragraph of commentary followed by the initials of the person who wrote it.

Dan Koboldt reviews a recent paper reporting the use of whole-genome sequencing to find the mutation responsible for a severe genetic disease. Interestingly, in this case the disease was undiagnosed, and the causal variant was used to produce a diagnosis of sitosterolemia; more interestingly, this diagnosis had already been ruled out by another test, that was shown to be a false negative. [DM]

Sitting Bull Stamp ScienceNews reports that researchers from the University of Copenhagen have got permission to sequence the genome of Sitting Bull, the native American war chief that led the battle of Little Bighorn. I don’t know exactly what they intend to learn from the genome scientifically, but it seems like this might serve primarily as a monument to a major figure in native American resistance. So the question I have is this: how can we go from a genome sequence (which is generally just a text file on a computer) to a public rememberance, something akin to the 1989 postage stamp shown to the left? [LJ]

Two papers in the current issue of Nature Genetics highlight recent inroads made in understanding the genetics of infectious disease susceptibility. The first found an association between risk of meningococcal disease and CFH, a gene previously implicated in age related macular degeneration. The second identified a susceptibility locus for tuberculosis in African samples. Paul de Bakker and Amalio Telenti have a nice News and Views piece about them as well, remarking on this welcome advance not only in understanding infection, but also in using GWAS to gain insight about disease risk in non-Europeans. [JCB]

Update: Dan Frost from the GoldenHelix blog has drawn our attention to a thought-provoking post on the future of GWAS studies. The post suggests that much of the missing heritability in complex disease is hiding in the set of variants that are badly tagged by existing chips, and proposes that GWAS studies in the future may include a sequencing phase to discover new variants in cases, followed by genotyping using custom genotype chips to capture this variation. The question, from my point of view, is how many common SNPs are there that aren’t well tagged by existing chips, and thus how much heritability could be hidden there? This is exactly the sort of question that the 1000 Genomes dataset was designed to answer. [LJ]

Setting the record straight

The current issue of Cell has some important correspondence in response to an essay published by Jon McClellan and Mary Claire King in April. Daniel covered the original piece and hosted a guest post from Kai Wang which detailed some of the more obvious flaws in their argument. Now, Wang and his colleagues from Philadelphia have published an official response in Cell, in parallel with a similar letter from Robert Klein and colleagues from New York. Accompanying these is a further reply from McClellan and King. Read on for an overview of three contentious statements made in the original piece, and the rebuttals to each.

