Guest post by Adam Rutherford: Unknown unknowns and the human genome

07/06/2011
Categories: Guest Posts
Written by Guest Author

This is the second of three guest posts from panellists in the Race to the $1000 Genome session tomorrow at the Cheltenham Science Festival. Yesterday we heard from Oxford Nanopore‘s Clive Brown about the disruptive effects of genomic technology; today’s instalment is from science broadcaster Adam Rutherford, presenter of the recent BBC series about the genome, The Gene Code. Tomorrow we’ll hear from Genomes Unzipped’s own Caroline Wright.

There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know. —Donald Rumsfeld.

The expectations of the Human Genome Project were Rumsfeldian. This much-mocked statement that the then US Secretary of Defense Donald Rumsfeld made in response to the continued absence of evidence for weapons of mass destruction was made almost exactly a year after the publication of the first results of the Human Genome Project (HGP). His oddly profound cod-philosophy resonates with that grand endeavour. The announcement, initially in June 2000, and the publication, were met with triumphalism in the media, fanned by our and its glorious leaders. President Clinton stood on a platform, flanked by Craig Venter and Francis Collins at the White House, and declared that:

Without a doubt this is the most important most wondrous map produced by human kind…

Today we are learning the language in which God created life.

Whatever your religious disposition, that is a bold statement. He and others went on to speculate that soon we would understand and be on the path to curing many, if not all, diseases. Geneticists bristled at this hubris. The fundamental problem was unknown unknowns. It turned out that humans have far fewer genes than we expected. The vast majority of the genome does not contain genes. So what is it doing?

Culturally this was and is a big deal: the history of genetics has conspired to reveal a simplistic view of inheritance. In the 19^th century, Mendel revealed the rules of inheritance with his pea experiments: traits such as flower colour are carried by discreet units, genes, one from each parent. At the beginning of the 20^th, Thomas Hunt Morgan cross bred fruit flies and showed that these genes sat on chromosomes. Crick and Watson revealed the mechanism of genetics in the iconic DNA double helix. Crick went on to formulate the so-called central dogma, that DNA makes RNA makes protein. In the 1980s, with DNA sequencing in its infancy, the first human diseases to be understood genetically were ones that slavishly followed simple Mendelian inheritance patterns: cystic fibrosis, Huntington’s disease, Duchenne muscular dystrophy.

But humans are not simple. The unknown unknowns of human genetics in 2001 were that we didn’t know that these disorders were the outliers, and we didn’t know what was hidden in the rest of the genome. Whilst all of the tenets of genetics are still correct, we can’t currently account for human heritability or complexity using the straightforwardness of those models. To characterise this, as some have done, as a “crisis” in genetics is absurd. This is a perfectly valid scientific issue, which will be resolved with science.

This is why the drive for the conceptual $1000 genome is important. While we’re keen to point out that humans are unique, we’ve been slow to point out that that uniqueness will be reflected in our genomes. The HGP was a brilliant and necessary step in elucidating human complexity and disease. Characterising rare genetic differences between individuals, rather than the broad similarities, is a crucial step on that path, and this can only be achieved by sequencing many more people’s genomes. As such, the drive to reduce the cost of sequencing is absolutely critical. This is the process of science: converting unknown unknowns via known unknowns, into known knowns. Who knew Donald Rumsfeld was such a clear thinker?

Tags: $1000 genome, adam rutherford, cheltenham science festival, donald rumsfeld, guest post.
11 Comments

11 Responses to “Guest post by Adam Rutherford: Unknown unknowns and the human genome”

Feed for this Entry

Keith Grimaldi
07/06/2011 at 14:08

I never knew why that was mocked so much, it was about the most intelligent thing he said!

I look forward to the analyses of the vast amounts of data (emphasise “the analyses” I don’t want any part in the wading through all the raw data, not brave or knowledgeable enough). I see how it could really change the treatments of cancers but i do wonder if we will really find the “causes” of common disease, or the missing heritability, in just a few rare variants.

One criticism of the GWAS approach is that it has reinforced the emphasis on genes for common diseases and I worry that funds will be sucked up to do the same with WGS while still ignoring for the most part gene x environment interactions.

I DO NOT have any support for the “crisis” or absurd “genes are not important, it’s all environment” claims, on the contrary, but I do think that we need to at least give them equal value in studies of complex diseases.

The “old” candidate gene studies have a bad name because they were so inconsistent. In fact they ware, partly due to small studies throwing up chance results but also because often the study looked just a gene and disease. When studied included genes + disease + environment they tended to be much more consistent (eg. http://bit.ly/bB2Efd)

It might be a few rare variants that are important, I haven’t seen any convincing arguments for this yet though, certainly not any that would be more convincing than rare combinations of common variants increasing disease risk under paricular environmental circumstances
Shane McKee @shanemuk
07/06/2011 at 20:16

Keith, you make good points, but I think the sheer mathematics of it point to rare variants being responsible for at least a significant proportion of the “missing heritability”. For one thing, we each pick up between 60 and 100 new mutations that weren’t in our parents, and over just a few generations over a broad population, that amounts to a fair degree of genetic variability (and of course it’s the seedcorn of evolution once selection kicks in). Of course the good thing about these competing hypotheses is that they are rapidly becoming testable, precisely because genomic level sequencing is dropping in price. In my lectures to med students & doctors I often remark that since the initial sequence, we’ve knocked a zero off the price tag every 2 years – makes Moore’s law look a bit pathetic, really!
The real value (and how we convert the unknown unknowns into known knowns – I do like that Rumsfeld quote too) is that we unpack the biology, and will get more clues towards understanding pathways. Part of where I *hope* this will lead is towards “minimally disruptive medicine” – much of our biology is geared towards keeping our systems in more or less equilibrium – at least for long enough for us to reproduce. This suggests that we have mechanisms for holding the show together, and it is certainly not a new idea that in our treatments for some aspects of systems-gone-awry we upset other areas.
So I’m rambling now, but the bottom line is that cheap widely-available sequencing will make many problems tractable, whether we are talking about rare diseases like Kabuki, or common conditions like heart disease or epilepsy.

We need to convert the Unknown into the Geknown!
Alex Ling
07/06/2011 at 21:39

Having followed Larry Moran’s blog, I can now quasi-authoratively say that the central dogma is not “that DNA makes RNA makes protein,” but rather that information in proteins does not flow backwards to nucleic acids. DNA to RNA to protein is the sequence hypothesis. On a separate point, again courtesy of Larry, not everyone over-predicted the number of human genes.
Ken Rubenstein
08/06/2011 at 00:07

As a devotee of Denis Noble, author of many things, including “The Music of Life,” I have to consider DNA as an important member of the orchestra, but a member nonetheless. Biological causation is not only an upward process moving from DNA to higher organizational levels. The higher levels also feed back on lower ones, even the Holy Genome. For example, mechanical shifts in the extracellular matrix affect DNA replication and transcription, among other things. I certainly have nothing against genomics. On the other hand, when you’ve just been given a great new power hammer, everything starts to look like a nail.
Keith Grimaldi
08/06/2011 at 15:18

@Shane

one thing I am really looking forward to with whole genome sequencing is cataloguing where these “private” mutations are – way back I was working for a long time on DNA damage and repair and the evidence was that coding regions are repaired preferentially. It will be interesting to see if WGS data reflect this – there is some information but I don’t think enough has come out yet.

Even though I am still not convinced that despite the mutation rate that a few rare mutations are able to explain what we don’t know about 64% obesity and 45% coronary heart disease rates. But as you say – it is now possible to start the experiments, I just hope (repeating myself, sorry) that this time round it will not simply be sequencing a whole bunch of diseased people vs a healthy (I don’t think that would work anyway, not without massive bunches…)

Afraid though that in another 10 years we will still be asking the big questions like “Why is there Something rather than Nothing?”, or “Why am I fat and not Thin?” (Plus the favourite of my teenage son “Why should I do Something rather than Nothing?”)
Shane McKee @shanemuk
08/06/2011 at 16:05

@Keith, LOL! The teenage gene then? :-)
I do think (and you probably agree!) that people occasionally mix up the “heritability” of traits with the overall risk factors, and undoubtedly there are other factors – environmental, epigenetic – at play that combine to produce an overall causative mesh. And you’re right – the biology of these traits is probably not going to be resolved by standard trial design. I envisage that we will be trying to tackle the biology by combining the wealth of genome sequence data that we are going to get, almost by-the-by, with phenotype/envt data on a population basis.

The Big Questions are the easy ones – “Why is there Something rather than Nothing” is because “Nothing” cannot exist :-) Max Tegmark has good things to say about all that, but I don’t think he blogs or tweets… Which is a shame.
Dave Kaufman
09/06/2011 at 17:48

Uncovering significant gene-environment interactions will be of critical importance in developing our understanding of both risk factors and biological mechanisms of many diseases. Unfortunately, at this point the measurement of the majority of relevant environmental exposures at the level of precision needed makes WGS look like child’s play. Sequencing is decades ahead of comparable environmental measurement in terms of cost, feasibility and accuracy. We may be able to elucidate many of the major environmental factors (or their surrogates) without precise measurements, the way we found HD, CF, etc. using linkage studies. However, the science of environmental measurement is another significant barrier to be reckoned with.
Annasanna
21/07/2011 at 09:51

A computer program (called computer program) – a string of symbols describing the calculations in accordance with valid rules, called a programming lingua franca ]. The program is usually executed not later than a computer (for eg, displays the net point), then at once – if it is expressed in a vernacular understandable for the machinery or indirectly – if it is interpreted beside another program (the interpreter). The program can be a sequence of instructions that specify the modifications of the vehicle but it can also chronicle the result in another progress (eg, lambda calculus).
samsung
The formal appearance of the count method in the form of individual language understandable to the rise organization is called when the program expressed in machine-readable (that is, by the numbers, and more veracious ones and zeros) is called party jurisprudence or binary physique (executable).

Computer programs can be classified according to their applications. As a result distinguished by means of user applications, operating systems, video games, compilers, and others. Programs embedded interior the seal is referred to as firmware.
Ken Rubenstein
28/09/2011 at 15:59

I would advise Adam to carefully read Denis Noble’s The Music of Life and then reconsider the centrality of DNA.
Shane McKee @shanemuk
28/09/2011 at 16:26

I haven’t read Noble, but if we are talking about heritability of traits, then we most certainly are talking primarily about DNA. The capacities for any epigenetic mechanisms to influence heritability appear to be a great deal lower than the gaps that need to be filled. It’s not that they are not there, nor even that they are unimportant, but epigenetic descriptions just aren’t up to job of explaining most biological variation, including susceptibility to disease.
Ken Rubenstein
28/09/2011 at 17:05

The book goes well beyond that. Worth a read.