Our genomes, unzipped

When we launched this website back in June, I welcomed readers with a promise that Genomes Unzipped would “ultimately be much more than just a group blog”. Indeed, the last four months of blogging have really just been a prelude of sorts to what comes next: the real Genomes Unzipped.

Today we’re launching an exciting new phase of the project. Although we’re not entirely sure where this journey will take us, we’re looking forward to finding out – and to bringing you along with us.

What are we doing?

Over the last year, all the members of Genomes Unzipped have had genome scans performed by personal genomics company 23andMe; several of us have also had additional tests done by other genetic testing companies (Counsyl, deCODEme). From today, we’ll be making all of our raw genetic data and the reports generated from these tests freely available online. As the project proceeds, we aim to obtain data from an ever larger array of tests – ultimately extending to whole-genome sequencing – and release it openly. Right now you can freely download the 23andMe data from everyone in the project from this website.

Over the next few weeks, each of the members will be writing about their own experiences with genetic testing, and what they’ve learnt from their own genetic data. We’ll be discussing analyses we’ve performed on our own raw data, using software written both by group members and other collaborators; and we’ll be releasing the code for that software in our new code repository. We’ll also be talking about the process of deciding to release our genetic data publicly, and how we discussed this decision with our families.

To make it easier for us (and you) to explore our genomes, we have assembled a custom genome browser using JBrowse – this provides a visual interface that allows our 23andMe (and later, complete sequence) data to be viewed in the context of genes and other features. It’s still in prototype form, but we’ll be refining it and adding more data as the project proceeds.

Why are we doing this?

When I first started thinking about a new group blog back in late 2009, the idea was fairly simple: put together a group of people who were experts in fields related to personal genomics, help them get access to their own genetic data, and create a platform for them to talk about what they found. I quickly joined forces with Luke and we refined the idea further.

As we discussed the notion of a group of experts analysing their own genomes, one thing rapidly became clear: for maximum public benefit the analyses had to be open and reproducible, and that meant making the underlying data public. In other words, for this to work, members of the group had to be ready to spill their genetic secrets to the world.

In September 2009 I had an opportunity to purchase a sizeable number of kits from one testing company (23andMe) at a discount, and quickly contacted a group of the smartest people I knew in genomics – all initially based in the Cambridge area – to take advantage of the offer. Thus the project was born; fittingly, our first meeting was held over pints of ale in The Eagle, the pub frequented by Watson and Crick during their early work on the structure of DNA.

Initially there was some discussion about various models for partial anonymity – not linking people’s names to their data, or allowing people to write under a pseudonym, for instance. However, given there was no way for us to guarantee that we could protect the identity of the participants once their data were released, we decided that the only viable solution was for members to write under their own full names from the outset, and to have their genetic data transparently linked to their identity.

Remarkably, despite being given every opportunity to change their minds, nearly everyone I approached to join the group still decided to go ahead with the decision to share their genetic data online. We all made that decision for a variety of reasons, but there are some common threads:

  • we want to share the results of scientific analysis of our own genomes, and as proponents of open data access most of us believe that doing good science means releasing complete data for others to investigate;
  • we hope that releasing our data publicly will help to guide useful discussions about genetic privacy and the benefits, risks and limitations of genetic information in general;
  • some of us believe that large open-access, non-anonymous research databases such as the Personal Genome Project represent an ideal resource for genetic research, and that sharing genetic information with the scientific community is a public good – and we hope that our own experiences will spark discussion about the risks and benefits of open research projects;
  • we all believe that many of the fears expressed about the dangers of genetic information are exaggerated, and see this project as an opportunity to have a constructive public discussion about the truth behind these fears;
  • given the ease with which a dedicated snoop could obtain genetic information surreptitiously (via shed skin, hair or saliva, for instance), some of us argue that the whole notion of genetic privacy is illusory anyway – while releasing our data online makes it easier for people to get hold of it, this is a difference of degree rather than kind.

What about the risks?

We’re going into this process with our eyes wide open. Everyone in the group has a sound background knowledge of genetics: we know the sorts of things that can be found in a genome, and what such discoveries can mean for individuals and their families. However, like others willing to share their genetic data – such as the participants in the Personal Genome Project (PGP) – we simply feel that the potential benefits of this project outweigh its potential harms.

To ensure that everyone in the group is making a fully informed decision, we’ve put together a lengthy informed consent document (PDF; modified from the consent forms used for the PGP) that lays out the risks and issues involved in disclosing genetic information publicly. This document explains exactly what we’re putting on the line here: anyone in the world can now access our genetic data and infer information about our disease risks and our genetic relationships with other people, and it’s possible to imagine all sorts of ways in which that knowledge could be abused.

It also explains that these risks could also apply to our families: our published data could be used, for instance, to infer the risk of serious disease in our parents, siblings or children. We have encouraged all of the members of the project to discuss these issues with their first-degree relatives to ensure they are as fully aware of the potential risks as possible.

Finally, the document points out that there is the class of unknown unknowns: risks that no-one knows about yet. To the best of our ability, the members of the group have weighed up all of these uncertainties and decided to go ahead.

We can’t fully predict what the future holds, but there are good reasons to be optimistic that the risks of disclosing genetic information will be minimal. As the passing of GINA in the US shows, in Western countries there is strong public opposition to the idea of unfair discrimination against individuals on the basis of genetic information. While this opposition hasn’t yet been codified into law outside the US, there’s every reason to expect that individuals who try to abuse genetic information will ultimately pay a high legal or social price.

As a group we expect that our genomes will be joined by many, many others in the public domain over the next few years. As the sheer power of open databases of genetic and medical data becomes clear, we anticipate that participating in such studies will be increasingly viewed as something of a moral imperative. Already there are over 10,000 individuals signed up for disclosure of their genetic data as part of the Personal Genomes Project, and that number is growing fast.

As we move towards a world where thousands of people release their genomes into the public domain, someone has to be the guinea pig. We take comfort from the fact that others have already paved the way for this project: Craig Venter, James Watson, and the members of the PGP-10, for instance. Like them, we feel we are well-equipped with the knowledge required to respond to any serious consequences that arise as a result of genetic disclosure. We hope that our experiences and those of other early disclosers will provide valuable lessons for those who follow.

What next?

Over the next few weeks we’ll be discussing the ways in which we’ve peered into our own genetic data, and providing you with some of the tools and background knowledge you would need to do the same. You’ll have a chance to ask questions to people who work with genetic information for a living about what they’ve gleaned from their own genomes.

Moving forward, we hope that we can use our own data as a resource for developing new tools for analysing personal genetic data. In addition to the data of core group members, pending further investigation into the legal and ethical obstacles, we plan to consider hosting data from others who are also willing to share their genomes. We will also be releasing the software for the analyses we perform here for others to use and modify, and will welcome submission of other people’s programs to the GNZ code repository. Ultimately, we hope that we can become a hub for a diverse community of people interested in building and using tools for exploring their own DNA.

We will continue to explore the personal genomics marketplace, obtaining and reviewing new products and services as resources permit. For the time being we are a collection of like-minded individuals with limited funding (if you’d like to help remedy that, please let us know). Our commitment to openness extends to our relationships with personal genomics companies and with our funders: our disclosures page will contain full information about how we have obtained personal genomics products and services and from whom we have obtained funding.

We will also be posting invited commentary from external experts in genomics, bio-ethics, philosophy and law about the issues surrounding open genetic data release, and around genetic testing in general. And importantly, we’ll be looking for help from you, our readers: by participating in discussion, suggesting new analyses, testing the software and resources we describe here, and contributing your own tools, you can help build the dynamic community we want this website to become.

Thanks

This post is already pretty long, but I couldn’t finish without broadcasting sincere thanks to a number of people for helping the project get to this stage.

Firstly, to all the members of Genomes Unzipped, most of all to Luke, who has worked tirelessly on every aspect of this project, and is almost single-handedly responsible for getting the website up and running; also particularly to Dan, who has made sure we steered clear of legal and ethical minefields and drafted the project’s informed consent form; Kate, who worked hard on the project’s internal site and designed the website banner; Caroline, always quick with practical feedback and advice; and my wife Ilana, for many hours of discussion about the project’s goals and logistics, and for agreeing to contribute her own genetic data. Special thanks also to Joe for working with Luke on the genome browser.

Outside the project, we were fortunate to receive guidance from many wise individuals. We are extremely grateful to Zoe McDougall from Oxford Nanopore for her incredibly useful advice on many diverse topics over the last year. We are also indebted to Mark Henderson from the Times for many useful discussions, and to David Hooper from Reynolds Porter Chamberlain LLP and Alison Hall from the PHG Foundation for informal legal advice. We’d also like to thank the PHG Foundation for their generous grant of £600 to build and maintain the website.

Finally, thanks to the readers who’ve joined us over the last four months while we honed our blogging skills, sharpened up the website and prepared for phase 2. We’ve enjoyed the discussions we’ve had with you – and we look forward to even more fruitful discussion as the project moves into this new era.

  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Reddit

28 Responses to “Our genomes, unzipped”


  • Was that meeting in the Eagle by design or does it just provide a nice bit of revisionist history? :)

  • Nice! I think I still owe Joseph an email. The human demo at jbrowse.org was knocked offline by a disk failure, so I’ve been scrambling to get that back up, and I had to put responding to mailing list posts on the back burner for a few days.

    I see that the RefSeq genes track does have names now, though; maybe he got it working on his own?

    It’s very cool work that y’all are doing!

  • What an amazing project, I’m glad I came by your (group) blog.

    I would like to offer some help in the following way.

    1) Data analysis competitions – Please have a look at this website:
    http://kaggle.com/
    There are other like it.
    The premise will be perfect for both publicizing and analyzing your project and data.

    2) collaborations – I believe institutions of trans-humanist groups (like H+, mprize and others) could find interest (and thus, offer various kinds of support) to your project. Although you had probably thought about that already.

    3) Facebook marketing – I encourage you to go “social” on your blog. The simplest way will be to create a facebook “page” for your project, and put it on the sidebar.

    4) R – I know there are people in the R community (e.g: people who use the software on r-project.org) who are into genetic analysis. If you would get some example on analyzing your data using R, I’d love to republish such posts (either on my personal blog: r-statistics.com, and/or by adding that feed into r-bloggers.com, which has many many more readers)

    I admire your work – please keep it up!
    Best,
    Tal

  • p.s: please consider adding the WP plugin
    “subscribe to comments now!”

    Best,
    Tal

  • Mitch,

    Thanks again for your tips on setting up the browser; JBrowse is a brilliant piece of software. We were able to get a mostly-functional prototype up and running within about a day of downloading it, which is a testamant to the fantastic work you guys have done in making it easy to understand and use.

    And yes, I figured out how to get the gene names by the refseq genes; turns out it wasn’t so hard after all :)

  • Perhaps we’ll have to take a swing at your data :)

    Great news!

  • Daniel MacArthur

    n/a,

    Your spreadsheet is locked for viewing (at least to me).

  • The link works for me (and I’m not the creator of the spreadsheet). Another link: http://bit.ly/cNDHf3

    I copied and pasted the contents here, but that post is probably caught in your spam filter.

  • Daniel MacArthur

    Worked for me that second time, thanks. Once we’ve resolved the ethical/legal issues we’ll be hoping to recruit some of these people into the club…

  • Daniel,

    Count me in. Just tell me where to sign.

    John

  • Daniel-

    PIONEERING IDEA; the whole concept is brilliant! Rationale behind blog TITLE now becomes clear. I’m looking forward WITH ENTHUSIASM to future GNZ posts. IMO,this is PGP for the rest of us…

    (BTW- got halfway through the PGP app and decided not to go through with it after all)

    Bob

  • …and special thanks for sharing your Counsyl report with us! Very useful for MedEd 2.0.

    BW

  • Good stuff – you need to work on this bit though: “We can’t fully predict what the future holds…”

    Apart from that as soon as I get any significant genetic data I’ll join in – you all need to start looking for some serious funding for all the research you could do once you get several thousand uploaded!

  • I appreciate your will to change the rules with this idea, but IMHO it is quite risky to have your data online, most of all because we have no idea of which discoveries could be made in the future and how these discoveries would be used. Those data would be in the internet forever, I wouldn’t have done it. I love reading this blog, but in this case I don’t agree.

  • When do you think you’ll be ready to accept additional data? I would be most interested in submitting my own (23andMe).

  • I was looking at Dienekes’ site this morning. For some reason, Joe Pickrell, Daniel McArthur, and Jeff Barrett come up as having some Ashkenazi Jewish ancestors. Do any of you know of any Ashkenazi Jewish ancestors?

    Vincent Plagnol also comes up with significant Jewish ancestry, yet he doesn’t know of any.

    Comments?

  • So, can others who have had their genomes tested at 23andMe or elsewhere offer to have our genomes made public on this site and available for viewing and comparison? Will this truly be public? If so, I volunteer. Where do I send my raw data?

  • This is very interesting. I did not see any mention of licensing. What exactly do you mean by ‘freely available”? Perhaps a Creative Commons license make sense here?

  • @Moreno – I’m sure that you are not alone, it’s the “unknown unknowns” Daniel mentions (why Rumsfeld was ridiculed for that I don’t know, one of the smartest things he said).

    I would not have any problems but when I saw the DNACALC examination on http://dienekes.blogspot.com/ it made me think a bit more. Ancestry is not a problem for me (although it could easily be for some) – but it would be a bit unsettling if my name was on that list, and all the rest to follow. Having said that I would still go ahead.

    Questions for the Unzipped (never to be zipped up again) group: Jim Watson famously did not want his APOE status revealed to himself or anyone else. So from the data you made public so far a) can the APOE – Alzheimers status be determined and b) would any of you have any problems with it for yourselves or family if it was made public?

  • Daniel MacArthur

    To those volunteering to share their own data: stay tuned, we’re currently working through the ethical/legal implications and will be getting back to you as soon as we can. This won’t be something we’ll be able to offer in the next few weeks, however: there’s lots of consultation and advice-seeking we need to do first.

    Markus, our data will be available under a CC0 license – you’re correct that we haven’t made that obvious yet, we’ll remedy this ASAP.

    Moreno: this certainly isn’t for everyone. We’ve weighed up the risks as best we can and decided that they are outweighed by the potential benefits. Only time will tell if we are right!

  • Questions for the Unzipped (never to be zipped up again) group: Jim Watson famously did not want his APOE status revealed to himself or anyone else

    I never understood Jim Watson’s position on this–dude’s like 80 years old and is still pretty sharp, so I’m not sure exactly what he was worried about.

  • @Daniel I hope time will tell you are right! This is one of the rare cases in which I hope to be wrong. :-) Good luck!

  • @Keith:

    Questions for the Unzipped (never to be zipped up again) group: Jim Watson famously did not want his APOE status revealed to himself or anyone else. So from the data you made public so far a) can the APOE – Alzheimers status be determined and b) would any of you have any problems with it for yourselves or family if it was made public?

    Answer: yes, I would want to know. It was also something I specifically discussed with my family.

  • Daniel MacArthur

    Hi Keith,

    Like Dan, I’d want to know, and I have no intention of following Watson’s example in redacting that portion of my genome.

    23andMe doesn’t explicitly test for APOE, but I gather it can be inferred to some extent – we’ll have more about this soon.

  • FWIW, I had my APOE status determined by Navigenics (via Scripps Genomic Health Initiative), and while I was certainly happy to see that I am E3/E3, it would NOT have deterred me from reporting if it was instead E4/E4. In Jim Watson’s case (above), maybe he was trying to honor the privacy of a family member (e.g. his son).

    I have far less difficulty in a public release of my genotype than of my complete health history. If not for the latter requirement, I would have joined the PGP-10K. That issue, along with the potential impact on family members, especially my daughters, is what prompted me to reconsider.

    I think Keith and Moreno raise good points. Nevertheless, I laud the Unzipped having the guts to go forward with this.

  • I’ll gladly throw my 23andMe and deCODE data into the mix as they have been freely available online for a couple of years now at http://www.genomealberta.ca/PersonalGenotyping/ Navigenics is also completed and should be added to the list soon and DNA Ancestry.com is lurking out there as well.
    I tried to get in on the PGP action but alas, Americans have all the fun!

    For those a little shy about peering too deeply into other people’s genes or who don’t want to share the real thing we also have 24 virtual genes to share at http://facebook.genomealberta.ca/en/cards.html
    More coming there as well – funding permitting. 15,000 or so have been shared to date and we’d like to keep ‘em coming.

    Anyone likely to be at the NASW get together next month in New Haven? A congratulatory toast is on me as oddly enough our virtual genes came to life the same way as your though not in quite so notable a pub.
    Good luck with the project,

    Mike

  • Cool article,great job, helpful information. Thanks.

Comments are currently closed.

Page optimized by WP Minify WordPress Plugin