A personal view of genetic diagnosis using modern DNA sequencing technologies

Medical and personalized Genetics

One of my research area is the diagnosis of rare genetic conditions and in that context I collaborate with David Kelsell at Queen Mary University of London. One of the interesting cases we have analysed recently is an extremely rare condition in which the patients suffer severe chronic skin and bowel inflammation. I was in charge of the analysis of the sequence data, so for readers interested in the technical aspects here is a short overview: this was a sequencing project of pooled DNA (from three samples, only one of them relevant to this study), using a capture array to enrich for regions of the genome showing evidence of being linked to the disease. I first came across a 4 bp deletion in the ADAM17 gene, and it was rapidly identified by David Kelsell and his team as a likely cause of the disorder. Further functional work confirmed that this variant is almost certainly the disease-causing mutation.

This study was published last year and an interesting aspect is that the individual in whom this mutation was first seen is now 19 years old, doing well, and entering his second year at the University of Cambridge. Daniel MacArthur and myself met with him to discuss his thoughts and feelings about the whole experience. Continue reading ‘A personal view of genetic diagnosis using modern DNA sequencing technologies’

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

Phantom Heritability: What it does and doesn’t mean

Just out in prepublication at PNAS is a paper from Eric Lander’s lab, entitled, somewhat provocatively The mystery of missing heritability: Genetic interactions create phantom heritability. The authors suggest that certain types of gene-gene interactions could be causing us to underestimate how much of the heritability of complex traits has been uncovered by our genetic studies to date.

There has been an awful lot of talk about this research since Eric Lander talked about it at ASHG a few years ago, and the paper itself has generated quite a bit of discussion on- and off-line. Razib Khan reported on the paper last week, giving a good summary. He mentioned a press release about the paper issued by the advocacy organisation GeneWatch, which confuses the additive heritability discussed in this paper with the total heritability of diseases (a distinction explained below), and uses this to draw conclusions about how this result alters the promise of personal genomics. This just goes to show how much confusion there already is out there about this subject.

I have a more detailed post up on Genetic Inference about this paper, the strength of the argument, and what it means for the field. Here I am just going to pull out what I think are some important take-home points about this paper:

1) Broad sense heritabilities (the kind that are clinically important for e.g. risk prediction) have NOT been significantly overestimated The type of heritability we ultimately care about, the broad or total heritability, is how much total phenotypic variation is captured by genetics, or equivalently the correlation between identical twins in uncorrelated environments. The figure at the top of this post shows a plot that I made using Zuk et al’s equations, comparing true broad sense heritabilities, against what would be estimated based on twin studies (I have matched the colouring etc to Figure 1 of the paper). The twin study estimator of heritability is a robust estimator of total heritability for heritabilities less than 0.5. Above that, LP epistasis causes growing overestimation – it can make a 50% heritable trait look like a 65%, and 70% look like a 95%. It does not make weakly heritable traits look strongly heritable, just strongly heritable traits look very strongly heritable.

2) This paper is discussing additive heritability. This is a specific form of heritability that acts “simply” – half of it is passed on to offspring, siblings share an amount proportion to how related they are, and the genes that underlie it do not interact with each other. We do not know how much heritability acts like this, but various lines of evidence have made us think that it is a relatively good model, and most competing models have been incompatible with this evidence, or look contrived. What Zuk et al have done is produce a set of plausible, simple and non-contrived models (Limiting Pathway or LP models) that look pretty much indistinguishable from additivity using many of the tests we have run, but can act very differently in twin studies. Under these models, twin studies will overestimate the additive heritability (i.e. make us think that a larger proportion of heritability acts “simply”). The equivalent plot to the top of the page for estimating additive heritability, which you can see here, shows massive overestimation of additive heritability across the spectrum.

3) There is no real evidence that these LP models apply (and in fact there are still a few reasons to believe additivity could still broadly apply, see my other post for details). The issue is that we cannot conclusively rule these models (or models like these) out, and therefore the heritability explained by the genetic variants we have found so far is very uncertain.

4) This is important because our measures of “heritability explained” by the genetic variants we have found look at how much additive heritability is explained. These measures have in general told us that we have only explained a small proportion (generally < 25%) of additive heritability – but if in fact the heritability is largely not additive, but we are treating it like it is, we could in fact have explained a higher proportion of heritability than we believe. This would mean that the “missing heritability” is missing not because we have not found the right genetic risk factors, but because we have not found the right model to use. This could be good news: the genetic variants we have discovered could in fact be used to predict disease a lot better than they we can at the moment, if only we can find the right model to use them with.

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

On bad genetics reporting

This short article on the Independent’s website may not be the worst piece of genetics reporting ever, but given its brevity it may well take a new record for the density of errors and misconceptions. (To save you the trouble of hunting down the article it’s actually referring to, which of course is not linked, it’s this online article in Molecular Psychiatry).

Let’s start with the headline:

Sleeping is all in the genes

No. Data from twin studies suggest that the length of time people sleep for is around 44% heritable – that is, around 44% of the variation in this trait is due to inherited (and presumably mostly genetic) factors. The article being discussed in the piece provides no new information about the heritability of this trait.

Scientists have found the reason why some people need more sleep than others lies in their genes.

Scientists have found that one of the reasons people sleep longer than others is possibly a variant in a non-coding region of the gene ABCC9. Even if this association is real (and the evidence in the article is less than compelling), it explains just 5% of the variation in sleep length between people.

A survey of more than 10,000 people …

A survey of 4,251 people found the association between sleep length and the ABCC9 variant. This association was not replicated in a separate set of 5,949 individuals. The authors have a potential explanation for this lack of replication (based on the season in which the sleep length measurements were collected), and then did a post hoc re-analysis of their combined sample accounting for season that produced positive results.

showed those carrying the gene ABCC9, present in one in five of us,

The gene ABCC9 is present in all of us (hell, it’s even present in fruitflies). However, there is a genetic variation in one region of the ABCC9 gene, and one version of this variation is present in 17.3% of Europeans.
Continue reading ‘On bad genetics reporting’

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

Guest post from Alex Kogan: Size and populations matter–let’s understand why

[This is a guest post by Alex Kogan. Last week, Ed Yong at Not Exactly Rocket Science covered a paper positing an association between a genetic variant and an aspect of social behavior called prosociality. On Twitter, Daniel and Joe dismissed this study out of hand due to its small sample size (n = 23), leading Ed to update his post. Daniel and Joe were then contacted by Alex Kogan, the first author of the study in question. He kindly shared his data with us, and agreed to an exchange here on Genomes Unzipped. Our comments on the study are here; this is Alex’s reply.]

It’s a truism that resonates across science: Size matters when doing and interpreting the statistical (and practical) meaning of a study. But the size of what? Well, it’s quite a few things—all of which are very important in understanding what a study is ultimately telling us. One of the first numbers researchers focus on is the p-value. The p-value relies on a bit of counterintuitive logic: It represents the percentage of times you would get an effect as big as you got (or bigger) if there is really no effect in the general population. So we first assume that there is really no difference in some outcome between two groups across the general population (we call this the null hypothesis), and then we ask what are the chances of us finding the difference that we found (or bigger) given this assumption. If this percentage is low (many fields adopt a p = .05 standard, or a 5% chance that we’d get the effect we got or bigger if there is really no effect in the general population), then we can reject the initial idea that there is no difference in the general population. So what have we learned if the p-value is .05 or lower? That there is likely a difference in the general population—how big this difference is remains a mystery, however; the p-value never answers that question.
Continue reading ‘Guest post from Alex Kogan: Size and populations matter–let’s understand why’

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

Size matters, and other lessons from medical genetics

Size really matters: prior to the era of large genome-wide association studies, the large effect sizes reported in small initial genetic studies often dwindled towards zero (that is, an odds ratio of one) as more samples were studied. Adapted from Ioannidis et al., Nat Genet 29:306-309.

[Last week, Ed Yong at Not Exactly Rocket Science covered a paper positing an association between a genetic variant and an aspect of social behavior called prosociality. On Twitter, Daniel and Joe dismissed this study out of hand due to its small sample size (n = 23), leading Ed to update his post. Daniel and Joe were then contacted by Alex Kogan, the first author of the study in question. He kindly shared his data with us, and agreed to an exchange here on Genomes Unzipped. In this post, we expand on our point about the importance of sample size; Alex’s reply is here.

Edit 01/12/11 (DM): The original version of this post included language that could have been interpreted as an overly broad attack on more serious, well-powered studies in psychiatric disease genetics. I've edited the post to reduce the possibility of collateral damage. To be clear: we're against over-interpretation of results from small studies, not behavioral genetics as a whole, and I apologise for any unintended conflation of the two.]

In October of 1992, genetics researchers published a potentially groundbreaking finding in Nature: a genetic variant in the angiotensin-converting enzyme ACE appeared to modify an individual’s risk of having a heart attack. This finding was notable at the time for the size of the study, which involved a total of over 500 individuals from four cohorts, and the effect size of the identified variant–in a population initially identified as low-risk for heart attack, the variant had an odds ratio of over 3 (with a corresponding p-value less than 0.0001).

Readers familiar with the history of medical association studies will be unsurprised by what happened over the next few years: initial excitement (this same polymorphism was associated with diabetes! And longevity!) was followed by inconclusive replication studies and, ultimately, disappointment. In 2000, 8 years after the initial report, a large study involving over 5,000 cases and controls found absolutely no detectable effect of the ACE polymorphism on heart attack risk. In the meantime, the same polymorphism had turned up in dozens of other association studies for a wide range of traits ranging from obstet­ric cholestasis to menin­go­­coccal disease in children, virtually none of which have ever been convincingly replicated.
Continue reading ‘Size matters, and other lessons from medical genetics’

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

Going green: lessons from plant genomics for human sequencing studies

This is a guest post by Jeffrey Rosenfeld. Jeff is a next-generation sequencing advisor in the High Performance and Research Computing group at the University of Medicine and Dentistry of New Jersey, working on a variety of human and microbial genetics projects. He is also a Visiting Scientist at the American Museum of Natural History where he focuses on whole-genome phylogenetics. He was trained at the University of Pennsylvania, New York University and Cold Spring Harbor Laboratory.

As human geneticists, it is all too easy to ignore papers published about non-human organisms – especially when those organisms are plants. After all, how much can the analysis of (say) Arabidopsis genome diversity possibly assist in my quest to better understand the human genome and determine which genes cause disease? Quite a bit, as it happens: a fascinating recent paper in Nature demonstrates a number of lessons that we can learn from our distant green relatives.

By exploiting the small genome size of Arabidopsis (~120 million bases, compared to the relatively gargantuan 3 billion bases of Homo sapiens), researchers were able to perform complete genome sequencing and transcriptome profiling in 18 different ecotypes of the plant (similar to what we would call strains of an animal).

In a normal genome re-sequencing experiment, the procedure is to obtain DNA from an individual, sequence the DNA, align it to a reference sequence and then to call variants (i.e. differences from the reference). This approach is used by the 1000 Genomes Project and basically all of the hundreds of disease-focused human sequencing projects currently underway around the world. This approach allows researchers to relatively easily identify single-base substitution (SNP) and small insertion/deletion (indel) differences between genomes. However, the amount of variability that can be identified is restricted by the use of a reference: regions where there is extreme divergence between the reference and sample genomes are often badly called, and more complex variants (e.g. large, recurrent rearrangements of DNA) can be missed. Additionally, and crucially, sequences that are not present in the reference genome will be completely missed by this approach.
Continue reading ‘Going green: lessons from plant genomics for human sequencing studies’

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

Report on clinical genome sequencing

The PHG Foundation, an independent genomics think-tank, has launched a new report on next generation sequencing and its impact on health and health systems. The Report, Next steps in the sequence: the implications of whole genome sequencing for health in the UK can be freely downloaded and aims to provide a comprehensive overview of the many and varied issues relating to clinical genome sequencing.

When planning the work, we were motivated by the astonishingly rapid development of fast, affordable whole genome sequencing (WGS) technologies, which are set to change many aspects of health care. The sheer quantity and complexity of the information generated by genome sequencing, along with ever-changing understanding of the function of genomes in health and disease, presents new challenges for health systems.

The Report reviews the technologies, informatics pipeline and key clinical applications of WGS, and as well as the economic, ethical, legal and social implications and organisational challenges of offering WGS within the UK NHS. The final two policy chapters outline different scenarios for testing, storing and returning results, and contains 10 key recommendations reached with the help of several expert stakeholder workshops.

Continue reading ‘Report on clinical genome sequencing’

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

Debating the future of genome sequencing in medicine

This is a cross-post from my more technical blog, Genetic Inference. However, I thought that it might be of interest to non-specialists who like to keep up with the ongoing debates about the role of genomics in health and medicine.

Last week many of us at Genomes Unzipped (along with over 7000 other geneticists) were at the International Congress of Human Genetics in Montreal. A highlight of the meeting was a large debate entitled “Current and Emerging Sequencing Technologies: Changing the Practice of Medical Genetics”. The panel and the audience were both packed with research scientists, clinicians and industry researchers (you can see the full list of panel participants here), and as you’d expect the discussion was at times pretty lively.

Different perspectives

Joris Veltman described his exome sequencing of 500 individuals with intractable disease, and noted that there has been much success, and very little evidence of harm. Ségolène Aymé mentioned NIH targts that hope to see almost all genetic diseases diagnosed by 2020, and new treatments for rare diseases to be developed simultaneously. There seemed to be a solid consensus across the panel that sequencing should be rolled out as a standard tool in the diagnosis of genetic diseases, provided that the approach is a targeted one, restricted to finding the pathogenic mutation(s) causing the disease.

More controversial was the role of sequencing of healthy individuals, and the general return of data to patients or doctors for any reason other than directly diagnosing a genetic disease. Rade Drmanac, chief scientific officer of Complete Genomics, was obviously strongly in favour of everyone having their genome sequenced, and made it clear that Complete Genomics intends to start offering sequencing to doctors in the future. In his vision, genomes are sequenced at birth, and an initial analysis of immediately actionable results (e.g. potential genetic diseases) is passed to the doctor and patient, with further analyses being carried out if and when they are required.

Michael Hayden immediately dismissed this as hype. He pointed out how unable the US is to handle medical sequencing, with no good systems of reimbursement, a massive shortage of genetic councilors, and a general lack of training in the medical profession.While more positive in general, Louanne Hudgins also expressed worries about the lack of knowledge of genetics among doctors, with some truly scary examples of MDs failing to understanding even the most basic concepts in genetics.

Continue reading ‘Debating the future of genome sequencing in medicine’

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

Revisiting RNA-DNA sequence differences

A few months ago, I discussed a paper by Li and colleagues reporting a large number of sequence differences between mRNA and DNA from the same individual [1]. While some such differences are expected due to known mechanisms of RNA editing (e.g. A->I editing, see [2]), Li et al. reported an astonishingly high number of them, including thousands of events inconsistent with any known regulatory mechanism. These results implied at least one, and probably many, new mechanisms of gene regulation, and called into question some basic assumptions in molecular biology.

An alternative explanation for the observations of Li et al. is less exciting–imagine two genes with similar (but not identical) sequences, which produce similar (but not identical) mRNAs. If you accidentally attributed both mRNA sequences to the same gene, you could erroneously conclude that one of the two sequences arose via RNA editing of the other. According to a new paper in by Schrider and colleagues [3], this banal artifact accounts for the majority of the reported RNA-DNA sequence differences in Li et al.

Schrider et al. show that RNA-DNA mismatches are enriched in genes with close paralogs or copy number variants, both of which are consistent with the technical artifact mentioned above. However, their most striking result is that, at many of the putative RNA editing sites, the “edited” base from the mRNA is actually present in genomic DNA. To show this, Schrider et al. took advantage of the fact that low-coverage DNA sequencing data is available for the individuals used in the Li et al. study. They searched through these data to find genomic sequences matching the “edited” mRNA form. If these sites were truly due to RNA editing, they shouldn’t find any. Instead, at ~75% of the tested sites, they could find a genomic match to the “edit” in at least one individual. There are some potential complications with the interpretation of this number (as they note, the genomic data could include sequencing errors that happen to be the same base as the “edit”), but this observation strongly suggests that a majority of the sites identified by Li et al. are false positives due to this single technical issue.


[1] Li et al. (2011) Widespread RNA and DNA Sequence Differences in the Human Transcriptome. Science. doi: 10.1126/science.1207018

[2] Levanon et al. (2004) Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nature Biotechnology. doi:10.1038/nbt996

[3] Schrider et al. (2011) Very Few RNA and DNA Sequence Differences in the Human Transcriptome. PLoS One. doi:10.1371/journal.pone.0025842

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

Friday Links: Studying association studies, and success at last in psychiatric genetics

In PLoS Genetics this week there is a viewpoint article on data sharing in disease genetics. The authors systematically looked at 643 genome-wide association studies published between 2002 and 2010, to see how easily available the results of the studies are now. They found that the availability of full study results has gone down over time, and many groups that do share data have put more restrictions in place on its use. They put this down to fears over the privacy of research subjects, and in particular to the Homer et al study. The Homer et al result is somewhat complicated, but in essence it says that if you have stolen someone’s genotype data, you can use it to figure out if they have participated in any given research study by looking at the full results of the study.

It certainly seems possible that worries about privacy are reducing the free flow of information within the research community. However, whether on balance the decrease in information flow is worth the increase in security is an open question. For my own view, I feel that having the genome-wide results of genome-wide association studies freely available is very important to the field, and is more important than the the rather esoteric risk of someone stealing someone’s DNA and using it to figure out that they once took part in a research study of inflammatory bowel disease. [LJ]

Genome-wide association studies have been hugely successful in identifying dozens of common genetic risk factors for a large number of common diseases. However, one area that GWAS has not had much success in is the field of psychiatric illness, where finding common risk factors that replicate across studies has been consistently difficult. However, it looks like this is starting to change. The current issue of Nature Genetics has two papers from the Psychiatric GWAS Consortium, detailing some of the largest meta-analyses of schizophrenia and bipolar disease ever published.

The schizophrenia study robustly replicated two previously implicated variants, and discovered five new ones, and the bipolar disease study replicated one and discovered a new one. The new variants give us some pretty startling insights into the genetics of the diseases, in particular revealing the importance of a non-coding gene micro-RNA 137 in regulating a wide range of genes expressed in neurons. As always, these variants explain only a small proportion of the total genetic effect, but they show that psychiatric genetics has now truly entered the GWAS arena, with all the scientific benefits that this can bring to medical research. [LJ]

The images above, in order, are taken from the paper Temporal Trends in Results Availability from Genome-Wide Association Studies, and from Wikimedia Commons.

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

Page optimized by WP Minify WordPress Plugin