We predict that in the future a large sum of money will be invested in recruiting highly trained and skilled personnel for data handling and downstream analysis. Various physicians, bioinformaticians, biologists, statisticians, geneticists, and scientific researchers will be required for genomic interpretation due to the ever increasing data.
Hence, for cost estimation, it is assumed that at least one bioinformatician (at $75,000), physician (at $110,000), biologist ($72,000), statistician ($70,000), geneticist ($90,000), and a technician ($30,000) will be required for interpretation of one genome. The number of technicians required in the future will decrease as processes are predicted to be automated. Also the bioinformatics software costs will plummet due to the decrease in computing costs as per Moore’s law.
Thus, the cost in 2011 for data handling and downstream processing is $285,000 per genome as compared to $517,000 per genome in 2017. These costs are calculated by tallying salaries of each person involved as well as the software costs.
These numbers would be seriously bad news for the future of genomic medicine, if they were even remotely connected with reality. Fortunately this is not the case. In fact this article (and other alarmist pieces on the “$1000 genome, $1M interpretation” theme) wildly overstate the economic challenges of genomic interpretation.
Since this meme appears to be growing in popularity, it’s worth pointing out why genome analysis costs will go down rather than up over time:
Genome analysis will become increasingly automated
Right now, anyone who wants to provide a thorough clinical analysis of a genome sequence needs to prepare for some serious manual labour. After finding all of the possible sites of genetic variation, a clinical genomicist needs to identify those that are either known disease-causing variants or are found in a known disease gene, then check the published evidence supporting those associations, discuss their significance with clinical experts, perform experimental validation, and then generate a report explaining the findings to the patient and her doctor.
That’s all time-consuming stuff. But with every genome that we analyse, we get better at automating the easy steps, fix mistakes in our databases that might otherwise lead to wild goose chases, and obtain more unambiguous evidence about the clinical significance of each mutation.
The genome interpretation of 2017 won’t be a drawn-out process involving constant back-and-forth between highly-paid specialists. It will be a complex but thoroughly automated series of analysis steps, resulting in only a few potentially interesting results to be passed on to geneticists and clinicians for manual checking and signing off. Importantly, it will also be (at least for those who live in the right health systems, or have the right insurance) a dynamic process, where your sequence is constantly checked against new information without the need for complex human intervention.
That’s not to say that clinicians and other specialists will be replaced by the machines – genomicists and informaticians will be constantly at work refining the interpretation systems, but their work will be scaled up to the analysis of hundreds of thousands of genomes. Clinicians will provide the same point-of-care attentiveness (or lack thereof, in some cases) as in the current medical system, but they will do so using carefully processed, filtered and validated information from upstream analysis systems. The idea that each of these specialists will play a time-consuming role in interpreting each individual genome is completely unrealistic, and unnecessary.
Finding known mutations and interpreting novel ones will be easier
Right now, publicly available databases of known disease-causing mutations are shockingly noisy and incomplete – a situation I’ve described in the past as the greatest failure of modern human genetics. This is due to a combination of factors: researchers who published alleged mutations without performing the necessary checks for causality, academics and commercial entities who maintain private monopolies over crucial information from disease-specific studies, and occasional transcription errors by the curators of public databases, to name just a few.
But this will change – or rather, if it doesn’t change then we should be deeply ashamed of ourselves as a research community. Right now it’s unclear which of the many competing efforts to catalogue disease mutations will emerge as the single go-to source, but I’m optimistic that by 2017 both funding bodies and journals will have applied sufficient pressure to ensure that there is at least one fully open, comprehensive, well-annotated and accurate resource containing these variants.
The list of well-established human disease genes will grow massively over the next 18-24 months as genome-wide approaches like exome sequencing are applied to increasingly large numbers of rare disease families. We will also be able to unambiguously discard many of the mutations currently in resources like OMIM, as it becomes clear from large-scale sequencing studies that these variants are in fact common in healthy individuals.
The end result will be an open-access database that any clinical genomicist can tap into when interpreting their patient data – meaning far less time wasted chasing false leads, and fewer true disease-causing variants missed during the interpretation process. That also means clinicians will be handed increasingly clear, intuitive results to deliver to their patients, rather than a long list of “maybe interesting” variants that they are completely unequipped to make sense of.
Genome sequencing technology will be more accurate
Finally, it’s worth emphasising that a lot of the time and expense in clinical genomics right now stems from imperfections in the underlying sequence data. Current short-read sequencing technologies have been phenomenally good at driving sequencing costs down, and across a large swathe of the genome they do a pretty good job of finding important mutations. However, they are still subject to a distressing level of error, and also can’t access approximately 10-15% of the human genome that is highly repetitive or poorly mapped.
That’s all changing fast. The reads generated by these instruments are getting longer and more accurate, meaning they can be used to peer into previously off-limits segments of the genome. New technologies such as Oxford Nanopore promise even more rapid improvements to these parameters – or, at the very least, promise to drive even greater competition among existing providers to up their game. We can confidently expect that the genome of 2017 will be dramatically more accurate and complete than the genome of 2012. Importantly, because the underlying reads are longer and more accurate, it will also be possible to store the raw data underlying a genopme sequence in a far smaller volume of disk space than is the case currently.
Why the alarmism?
It’s worth bearing in mind that there are many people out there with strong incentives to make genome interpretation sound more challenging – and more lucrative – than it actually is. Right now there are dozens of companies launching in the genome interpretation space, and hundreds of venture capitalists who need to be convinced that the market size for genome interpretation is enormous. (I’m not claiming that the authors of the GEN piece have ulterior motives – perhaps they have simply been swayed by widespread exaggeration in the field.)
Let me be clear: in the next 5-10 years, millions of genomes will be sequenced in a clinical setting, and all of them will need some level of interpretation. We will need to build complex systems for securely managing large-scale data both from genomes and (more importantly) from many other high-throughout genomic assays, for accurately mining these data, and for returning results in a format that is easy for clinicians and patients to understand. Billions of dollars will be invested, and some people will get very rich developing companies to create these systems. But the idea that we will be looking at a $500K genome interpretation pipeline is completely absurd.
The annoying thing about this faux obstacle is that there are real challenges ahead. For instance: how can we integrate genome data with information from dynamic, real-time monitoring of patient health? How can we protect patient privacy and build rigorous systems without suppressing innovation? And how can we ensure that new technologies are used to actually improve health outcomes for everyone, rather than simply increasing healthcare costs? None of these questions has an easy answer, and we don’t have much time to figure them out – so let’s not waste our time building costly, imaginary genome interpretation pipelines in the air.