Archive for the 'Journal Club' Category

Guidelines for finding genetic variants underlying human disease

Authors: Daniel MacArthur and Chris Gunter.

New DNA sequencing technologies are rapidly transforming the diagnosis of rare genetic diseases, but they also carry a risk: by allowing us to see all of the hundreds of “interesting-looking” variants in a patient’s genome, they make it potentially easy for researchers to spin a causal narrative around genetic changes that have nothing to do with disease status. Such false positive reports can have serious consequences: incorrect diagnoses, unnecessary or ineffective treatment, and reproductive decisions (such as embryo termination) based on spurious test results. In order to minimize such outcomes the field needs to decide on clear statistical guidelines for deciding whether or not a variant is truly causally linked with disease.

In a paper in Nature this week we report the consensus statement from a workshop sponsored by the National Human Genome Research Institute, on establishing guidelines for assessing the evidence for variant causality. We argue for a careful two-stage approach to assessing evidence, taking into account the overall support for a causal role of the affected gene in the disease phenotype, followed by an assessment of the probability that the variant(s) carried by the patient do indeed play a causal role in that patient’s disease state. We argue for the primacy of statistical genetic evidence for new disease genes, which can be supplemented (but not replaced by) additional informatic and experimental support; and we emphasize the need for all forms of evidence to be placed within a statistical framework that considers the probability of any of the reported lines of evidence arising by chance.

The paper itself is open access, so you can read the whole thing – we won’t rehash a complete summary here. However, we did want to discuss the back story and expand on a few issues raised in the paper.
Continue reading ‘Guidelines for finding genetic variants underlying human disease’

The ENCODE project: lessons for scientific publication

The ENCODE Project has this week released the results of its massive foray into exploring the function of the non-protein-coding regions of the human genome. This is a tremendous scientific achievement, and is receiving plenty of well-deserved press coverage; for particularly thorough summaries see Ed Yong’s excellent post at Discover and Brendan Maher at Nature.

I’m not going to spend time here recounting the project’s scientific merit – suffice it to say that the project’s analyses have already improved the way researchers are approaching the analysis of potential disease-causing genetic variants in non-coding regions, and will have an even greater impact over time. Instead, I want to highlight what a tremendous feat of scientific publication the project has achieved.
Continue reading ‘The ENCODE project: lessons for scientific publication’

Another “IQ gene”: new methods, old flaws

A very large genome-wide association study (GWAS) of brain and intracranial size has just been published in Nature Genetics. The study looked at brain scans and genetic information from over 20,000 individuals, and discovered two new genetic variants that affect brain and head morphology, one which affects the volume of the skull, and one of which affects the size of the hippocampus.

The main study is very well carried out, and the two associations look to me to be well established. However, there are a few little things about the paper that, when combined with some biased reporting in the press, that have been bothering me. Firstly, the main result that has been reported in the news is that the study found an “IQ gene”, but this was only a very small follow-on in the study, and the evidence underlying it is relatively weak (certainly not the “Best evidence yet that a single gene can affect IQ”, as reported by New Scientist). Secondly, the authors use a misleading reporting of statistics to hide the fact that one of their association could easily be cause by an (already well known) association to general body size.

Continue reading ‘Another “IQ gene”: new methods, old flaws’

Guest post: Accurate identification of RNA editing sites from high-throughput sequencing data

[By Gokul Ramaswami and Robert Piskol. Gokul Ramaswami is a graduate student and Robert Piskol is a postdoctoral fellow in the Department of Genetics at Stanford University. Both study RNA editing with Jin Billy Li.]

Thank you to Genomes Unzipped for giving us the opportunity to write about our paper published in Nature Methods [1]. Our goal was to develop a method to identify RNA editing sites using matched DNA and RNA sequencing of the same sample. Looking at the problem initially, it seems straightforward enough to generate a list of variants using the RNA sequencing data and then filter out any variants that also appear in the DNA sequencing. In reality, one must pay close attention to the technical details in order to discern true RNA editing sites from false positives. In this post, we will highlight a couple of key strategies we employed to accurately identify editing sites.
Continue reading ‘Guest post: Accurate identification of RNA editing sites from high-throughput sequencing data’

Misapplied statistics in the OXTR/Prosociality story

Out in the PNAS Early Edition is a letter to the editor from four Genomes Unzipped authors (Luke, Joe, Daniel and Jeff). We report that we found a statistical error that drove the seemly highly significant association between polymorphisms in the OXTR gene and prosocial behaviour. The original study involved a sample of 23 people, each of whom had their prosociality rated 116 times (giving a total of 2668 observations), but the authors inadvertantly used a method that implicitly assumed there were actually 2668 different individuals in the study.

The authors kindly provided us with the raw data, and we ran what are called “null simulations” on their dataset to check to see whether their method could generate false positives. This involved randomly swapping around the genotypes of the 23 individuals, and then analysing these randomised datasets using the same statistical method as the paper. These “null datasets” are random, and have no real association between prosociality and OXTR genotype, so if the author’s method was working properly it would almost never find an association in these datasets. The plot below shows the distribution of the “p-value” from the author’s method in the null datasets – if everything was working properly all of the bars would be the same size:

Continue reading ‘Misapplied statistics in the OXTR/Prosociality story’

Guest post by Ben Neale: Evaluating the impact of de novo coding mutation in autism

[Dr. Neale is currently an Assistant in Genetics in the Analytic and Translational Genetics Unit at Massachusetts General Hospital and Harvard Medical School and an affiliate of the Broad Institute of Harvard and MIT. Dr. Neale's research centers on statistical genetics and how to apply those methods to complex traits, with a particular focus on childhood psychiatric illness such as autism and ADHD.]

Today, in Nature, three letters (1, 2, 3) were published on the role of de novo coding mutations in the development of autism. I am lead author on one of these manuscripts, working in collaboration with the ARRA Autism Consortium. In this post, I’ll describe the main findings of our work as they relate to autism and how we approached the interpretation of de novo mutations. In essence, de novo point mutation is likely relevant to autism in ~10% of cases, but a single de novo event is not likely to be sufficient to cause autism. Underscoring this is that fewer than half of the cases had an obviously functional point mutation in the exome. However, three genes, SCN2A, KATNAL2 and CHD8 have emerged as likely candidates for contributing to autism pathogenesis.

De novo is Latin for “from the beginning,” and when describing genetic variation or mutation means that the variant has spontaneously arisen and was not inherited from either parent. In autism, de novo copy number variants are among the earliest clearly identified genetic risk factors (see Sanders et al. and Pinto et al. for reviews). Given that these events are novel, natural selection has not acted on them, except for instances where the point mutation is lethal in early life. With next generation sequencing (NGS), we now have the opportunity to identify these events directly.

In this study we explored the impact of de novo mutations on autism by performing targeted sequencing of the protein-coding regions of the genome (known collectively as the exome, and comprising just 1.5% of the genome as a whole) in 175 mother-father-child trios in which the child was diagnosed as autistic. Having sequence from all three members of each family allowed us to find mutations that had arisen spontaneously in a patient’s genome, rather than being inherited from their parents.

We have made a pre-formatted version of our manuscript available here. In this post I just wanted to highlight some of the key lessons emerging from our study.
Continue reading ‘Guest post by Ben Neale: Evaluating the impact of de novo coding mutation in autism’

Questioning the evidence for non-canonical RNA editing in humans

In May of last year, Li and colleagues reported that they had observed over 10,000 sequence mismatches between messenger RNA (mRNA) and DNA from the same individuals (RDD sites, for RNA-DNA differences) [1]. This week, Science has published three technical comments on this article (one that I wrote with Yoav Gilad and Jonathan Pritchard; one by Wei Lin, Robert Piskol, Meng How Tan, and Billy Li; and one by Claudia Kleinman and Jacek Majewski). We conclude that at least ~90% of the Li et al. RDD sites are technical artifacts [2,3,4]. A copy of the comment I was involved in is available here, and Li et al. have responded to these critiques [5].

In this post, I’m going to describe how we came to the conclusion that nearly all of the RDD sites are technical artifacts. For a full discussion, please read the comments themselves.


Position biases in alignments around RDD sites. For each RDD site with at least five reads mismatching the genome, we calculated the fraction of reads with the mismatch (or the match) at each position in the alignment of the RNA-seq read to the genome (on the + DNA strand). Plotted is the average of this fraction across all sites, separately for the alignments which match and mismatch the genome.

Continue reading ‘Questioning the evidence for non-canonical RNA editing in humans’

All genomes are dysfunctional: broken genes in healthy individuals

Breakdown of the number of loss-of-function variants in a "typical" genome

I don’t normally blog here about my own research, but I’m making an exception for this paper. There are a few reasons to single this paper out: firstly, it’s in Science (!); and secondly, no fewer than five Genomes Unzipped members (me, Luke, Joe, Don and Jeff) are co-authors. For me it also represents the culmination of a fantastic postdoc position at the Wellcome Trust Sanger Institute (for those who haven’t heard on Twitter, I’ll be starting up a new research group at Massachusetts General Hospital in Boston next month).

Readers who don’t have a Science subscription can access a pre-formatted version of the manuscript here. In this post I wanted to give a brief overview of the study and then highlight what I see as some of the interesting messages that emerged from it.

First, some background

This is a project some three years in the making – the idea behind it was first conceived by my Sanger colleague Bryndis Yngvadottir and I back in 2009, and it subsequently expanded into a very productive collaboration with several groups, most notably Mark Gerstein’s group at Yale University, and the HAVANA gene annotation team at the Sanger Institute.

The idea is very simple. We’re interested in loss-of-function (LoF) variants – genetic changes that are predicted to be seriously disruptive to the function of protein-coding genes. These come in many forms, ranging from a single base change that creates a premature stop codon in the middle of a gene, all the way up to massive deletions that remove one or more genes completely. These types of DNA changes have long been of interest to geneticists, because they’re known to play a major role in really serious diseases like cystic fibrosis and muscular dystrophy.

But there’s also another reason that they’re interesting, which is more surprising: every complete human genome sequenced to date, including celebrities like James Watson and Craig Venter, has appeared to carry hundreds of these LoF variants. If those variants were all real, that would indicate a surprising degree of redundancy in the human genome. But the problem is we don’t actually know how many of these variants are real – no-one has ever taken a really careful look at them on a genome-wide scale.
Continue reading ‘All genomes are dysfunctional: broken genes in healthy individuals’

Phantom Heritability: What it does and doesn’t mean

Just out in prepublication at PNAS is a paper from Eric Lander’s lab, entitled, somewhat provocatively The mystery of missing heritability: Genetic interactions create phantom heritability. The authors suggest that certain types of gene-gene interactions could be causing us to underestimate how much of the heritability of complex traits has been uncovered by our genetic studies to date.

There has been an awful lot of talk about this research since Eric Lander talked about it at ASHG a few years ago, and the paper itself has generated quite a bit of discussion on- and off-line. Razib Khan reported on the paper last week, giving a good summary. He mentioned a press release about the paper issued by the advocacy organisation GeneWatch, which confuses the additive heritability discussed in this paper with the total heritability of diseases (a distinction explained below), and uses this to draw conclusions about how this result alters the promise of personal genomics. This just goes to show how much confusion there already is out there about this subject.

I have a more detailed post up on Genetic Inference about this paper, the strength of the argument, and what it means for the field. Here I am just going to pull out what I think are some important take-home points about this paper:

1) Broad sense heritabilities (the kind that are clinically important for e.g. risk prediction) have NOT been significantly overestimated The type of heritability we ultimately care about, the broad or total heritability, is how much total phenotypic variation is captured by genetics, or equivalently the correlation between identical twins in uncorrelated environments. The figure at the top of this post shows a plot that I made using Zuk et al’s equations, comparing true broad sense heritabilities, against what would be estimated based on twin studies (I have matched the colouring etc to Figure 1 of the paper). The twin study estimator of heritability is a robust estimator of total heritability for heritabilities less than 0.5. Above that, LP epistasis causes growing overestimation – it can make a 50% heritable trait look like a 65%, and 70% look like a 95%. It does not make weakly heritable traits look strongly heritable, just strongly heritable traits look very strongly heritable.

2) This paper is discussing additive heritability. This is a specific form of heritability that acts “simply” – half of it is passed on to offspring, siblings share an amount proportion to how related they are, and the genes that underlie it do not interact with each other. We do not know how much heritability acts like this, but various lines of evidence have made us think that it is a relatively good model, and most competing models have been incompatible with this evidence, or look contrived. What Zuk et al have done is produce a set of plausible, simple and non-contrived models (Limiting Pathway or LP models) that look pretty much indistinguishable from additivity using many of the tests we have run, but can act very differently in twin studies. Under these models, twin studies will overestimate the additive heritability (i.e. make us think that a larger proportion of heritability acts “simply”). The equivalent plot to the top of the page for estimating additive heritability, which you can see here, shows massive overestimation of additive heritability across the spectrum.

3) There is no real evidence that these LP models apply (and in fact there are still a few reasons to believe additivity could still broadly apply, see my other post for details). The issue is that we cannot conclusively rule these models (or models like these) out, and therefore the heritability explained by the genetic variants we have found so far is very uncertain.

4) This is important because our measures of “heritability explained” by the genetic variants we have found look at how much additive heritability is explained. These measures have in general told us that we have only explained a small proportion (generally < 25%) of additive heritability – but if in fact the heritability is largely not additive, but we are treating it like it is, we could in fact have explained a higher proportion of heritability than we believe. This would mean that the “missing heritability” is missing not because we have not found the right genetic risk factors, but because we have not found the right model to use. This could be good news: the genetic variants we have discovered could in fact be used to predict disease a lot better than they we can at the moment, if only we can find the right model to use them with.

Guest post from Alex Kogan: Size and populations matter–let’s understand why

[This is a guest post by Alex Kogan. Last week, Ed Yong at Not Exactly Rocket Science covered a paper positing an association between a genetic variant and an aspect of social behavior called prosociality. On Twitter, Daniel and Joe dismissed this study out of hand due to its small sample size (n = 23), leading Ed to update his post. Daniel and Joe were then contacted by Alex Kogan, the first author of the study in question. He kindly shared his data with us, and agreed to an exchange here on Genomes Unzipped. Our comments on the study are here; this is Alex’s reply.]

It’s a truism that resonates across science: Size matters when doing and interpreting the statistical (and practical) meaning of a study. But the size of what? Well, it’s quite a few things—all of which are very important in understanding what a study is ultimately telling us. One of the first numbers researchers focus on is the p-value. The p-value relies on a bit of counterintuitive logic: It represents the percentage of times you would get an effect as big as you got (or bigger) if there is really no effect in the general population. So we first assume that there is really no difference in some outcome between two groups across the general population (we call this the null hypothesis), and then we ask what are the chances of us finding the difference that we found (or bigger) given this assumption. If this percentage is low (many fields adopt a p = .05 standard, or a 5% chance that we’d get the effect we got or bigger if there is really no effect in the general population), then we can reject the initial idea that there is no difference in the general population. So what have we learned if the p-value is .05 or lower? That there is likely a difference in the general population—how big this difference is remains a mystery, however; the p-value never answers that question.
Continue reading ‘Guest post from Alex Kogan: Size and populations matter–let’s understand why’

Page optimized by WP Minify WordPress Plugin