[Dr. Neale is currently an Assistant in Genetics in the Analytic and Translational Genetics Unit at Massachusetts General Hospital and Harvard Medical School and an affiliate of the Broad Institute of Harvard and MIT. Dr. Neale's research centers on statistical genetics and how to apply those methods to complex traits, with a particular focus on childhood psychiatric illness such as autism and ADHD.]
Today, in Nature, three letters (1, 2, 3) were published on the role of de novo coding mutations in the development of autism. I am lead author on one of these manuscripts, working in collaboration with the ARRA Autism Consortium. In this post, I’ll describe the main findings of our work as they relate to autism and how we approached the interpretation of de novo mutations. In essence, de novo point mutation is likely relevant to autism in ~10% of cases, but a single de novo event is not likely to be sufficient to cause autism. Underscoring this is that fewer than half of the cases had an obviously functional point mutation in the exome. However, three genes, SCN2A, KATNAL2 and CHD8 have emerged as likely candidates for contributing to autism pathogenesis.
De novo is Latin for “from the beginning,” and when describing genetic variation or mutation means that the variant has spontaneously arisen and was not inherited from either parent. In autism, de novo copy number variants are among the earliest clearly identified genetic risk factors (see Sanders et al. and Pinto et al. for reviews). Given that these events are novel, natural selection has not acted on them, except for instances where the point mutation is lethal in early life. With next generation sequencing (NGS), we now have the opportunity to identify these events directly.
In this study we explored the impact of de novo mutations on autism by performing targeted sequencing of the protein-coding regions of the genome (known collectively as the exome, and comprising just 1.5% of the genome as a whole) in 175 mother-father-child trios in which the child was diagnosed as autistic. Having sequence from all three members of each family allowed us to find mutations that had arisen spontaneously in a patient’s genome, rather than being inherited from their parents.
We have made a pre-formatted version of our manuscript available here. In this post I just wanted to highlight some of the key lessons emerging from our study.
We must carefully calibrate our prior expectation to evaluate de novo mutation
To evaluate the observed de novo events, we calculated the expected number of events in the exome taking into account the sequence context. Basically, different sequences of bases have different levels of mutability. A key driver of this variation in mutation rate is the amount of GC content (the proportion of DNA that is C-G rather than A-T base pairs). The GC content of the exome is approximately 50% compared to the 40% genome-wide average. As a consequence, protein-coding sequences are inherently more mutable. Taking this into account, the expected number of de novo events per person in the exome is a shade over 1. However, current exome sequencing technologies do not capture all regions equally well (and some regions aren’t captured at all), which revises down the expectation to 0.87 per person.
It’s worth emphasizing these numbers: that means the majority of people who have their exome sequenced will be found to carry at least one de novo mutation in a protein-coding gene, even if they are perfectly healthy. That means that human geneticists must be extremely cautious in assigning disease-causing status to such mutations.
We observe only a modest increase in rate, suggesting a limited role of de novo coding mutation
Overall we observed an average of 0.92 events per trio, slightly higher than expected, but not significantly so. Furthermore, a majority of cases did not have an obviously deleterious point mutation in the exome. However, we did observe more nonsense mutations than expected, suggesting that some of the nonsense events are relevant. We also observed a significant excess of protein-protein interaction for genes that harbor de novo missense, splice site or nonsense mutations.
Few genes are hit multiple times, highlighting the complex genetic basis of psychiatric illness
When we combined the events identified in our paper with the two companion papers, we identify 18 genes that have de novo functional mutations in two separate individuals, where we expect ~12 by chance. These results reinforce the idea that many different genes are involved in the causation of autism, which has long been hypothesized for this disease and for other psychiatric traits such as schizophrenia.
We observe two loss-of-function (LoF) de novo events in three genes (for a nice overview of LoF mutations, Daniel’s previous post is a great resource). These three genes are SCN2A, CHD8 and KATNAL2. While our paper was in review, we also performed additional trio sequencing, identifying a third de novo LoF allele in SCN2A. To put this in perspective, across approximately 600 trios, we observe only one gene hit independently by three likely functional mutations. In other words, de novo mutations contributing to autism risk are not concentrated in just a few critical genes – they are spread across many genes, each contributing just a small proportion of the overall genetic risk of this disease.
We explored these three genes in an expanded exome sequencing dataset of 935 cases and 870 controls and the Exome Variant Server (EVS). The EVS contains approximately 3,500 European Americans and 1,850 African Americans. For SCN2A no additional LoF alleles were observed in cases, controls, or the EVS. So across approximately 1,500 autism patients, we observe 3 cases with LoF mutations in SCN2A, which works out to be 0.2% of cases. For CHD8 we observe an additional 3 LoF alleles in the cases cohort, but none in any control sample, bringing the total to 5 LoF alleles in 1,500 cases (0.33%). For KATNAL2 we observe 3 additional LoF alleles in cases, but also observe 3 LoF alleles in the control and EVS data, which works out to be an odds ratio of approximately 5, with again 0.33% cases having such an allele. All three of these genes are now strong candidates for playing a role in autism, and the identification of these three genes is certainly progress for understanding the biological basis of autism. The interpretation of these results was strongly informed by the inclusion of additional exome sequencing data, suggesting that further trio and case control sequencing will inform gene identification efforts for autism.
The identification of gene candidates for autism is still a clear priority for gaining insight into the biological basis of the disease. The genes highlighted by this work are just the first few pieces of the complex puzzle. Further efforts to fully integrate all of the sequencing data of cases, controls and trios are currently being facilitated by the Autism Sequencing Consortium (ASC), a collaboration organized by the NIMH. Clearly, more sequencing data must be generated to identify additional genetic effects that predispose to this disease.
(1) Neale et al. (2012) Patterns and rates of exonic de novo mutations in autism spectrum disorders. doi:10.1038/nature11011
(2) Sanders et al. (2012) De novo mutations revealed by whole-exome sequencing are strongly associated with autism. doi:10.1038/nature10945.
(3) O’Roak et al. (2012) Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. doi:10.1038/nature10989.