Archive for the 'Analysis' Category

Identifying targets of natural selection in human and dog evolution

Over the course of the past year or so, I’ve been working (with Jonathan Pritchard) on a statistical method for learning about the history of a set of populations from genetic data. Much of this work is described in a paper we recently made available as a preprint [1]. However, as many readers will know, writing a paper involves deciding which results are important to the main point (and worth fleshing out in detail), and which aren’t. In this post, I’m going to describe some results and thoughts that didn’t quite make the cut, but which I think merit a small note. In particular, I’m going to discuss how having a demographic model for a large number of populations might be used to identify genes important in adaptation, and describe results from humans and dogs.

Background

Imagine you have genome-wide genetic data (from SNP arrays, genome sequencing, or whatever) from a number of populations in a species. A common way to visualize the relationship between your populations is to use a tree. For example, below I’ve built a tree of the 53 human populations from the Human Genome Diversity Panel (using the data from Li et al. [2]).

HGDP Tree

Maximum likelihood tree of 53 human populations built using TreeMix.

Continue reading ‘Identifying targets of natural selection in human and dog evolution’

Notes on the evidence for extensive RNA editing in humans

UPDATE 3/17/12: A more extensive analysis of the paper discussed in this post is here. Several groups have concluded that at least 90% of the sites identified are technical artifacts

The “central dogma” of molecular biology holds that the information present in DNA is transferred to RNA and then to protein. In a paper published online at Science yesterday, Li and colleagues report a potentially extraordinary observation: they show evidence that, within any given individual, there are tens of thousands of places where transcribed RNA does not match the template DNA from which it is derived [1]. This phenomenon, called RNA editing, is generally thought to be limited (in humans) to conversions of the base adenosine to the base inosine (which is read as guanine by DNA sequencers), and occasionally from cytosine to uracil. In contrast, these authors report that any type of base can be converted to any other type of base.

If these observations are correct, they represent a fundamental change in how we view the process of gene regulation. However, in this post I am going to point out a couple of technical issues that, if not properly taken into account, have the potential to cause a large number of false positives in this type of data. The main point can be summarized like this: RNA editing involves the production of two different RNA and/or protein sequences from a single DNA sequence. To infer RNA editing from the presence of two different RNA and/or protein sequences, then, one must be very sure that they derive from the same DNA sequence, rather than from two different copies of the DNA (due to, for example, paralogs or copy number variants). Although this issue has the potential to be a large source of false positives in a study like this, I will discuss an additional technical problem that could also result in false positives.

Continue reading ‘Notes on the evidence for extensive RNA editing in humans’

My Genome Online – A Challenge To You

[Editor's Note: This guest post is contributed by Blaine Bettinger. Blaine is the author of The Genetic Genealogist, a blog that examines the intersection of genetics and ancestry, and a patent attorney at Bond, Schoeneck & King in Syracuse, NY.]

As you may have heard, I recently made my 23andMe and Family Tree DNA autosomal testing results available for download online at “mygenotype,” and dedicated the information to the public domain (if dedicating DNA sequence to the public domain is even possible – I’m currently doing some research in this area and expect to write more in the future). [Editor's Note: see additional comments on personal genomics data in the public domain at the end of this post.]

At “mygenotype” you can download the following:

My Family Tree DNA Results:

  1. Affymetrix Autosomal DNA Results (2010)
  2. Affymetrix X-Chromosome DNA Results (2010)
  3. Illumina Autosomal DNA Results (2011)
  4. Illumina X-Chromosome DNA Results (2011)

My 23andMe Results:

  1. V2 Results (2008)
  2. V3 Results (2010)
  3. Y-DNA Results (2010)
  4. mtDNA Results (2010)

You can also find my SNPedia Promethease reports:

In addition to my genome, Razib Khan of Gene Expression has a spreadsheet of approximately 48 other genomes that are available for download online.

A Challenge To YOU

Now that the information is out there, available to anyone who might be interested, it remains to be seen who might be interested in the information.

Continue reading ‘My Genome Online – A Challenge To You’

At odds with disease risk estimates

It's all a game of Risk!

The first thing I did when I received my genotyping results from 23andMe was log on to their website and take a look at my estimated disease risks. For most people, these estimates are one of the primary reasons for buying a direct to consumer (DTC) genetics kit. But how accurate are these disease risk estimates? How robust is the information that goes into calculating them? In a previous post I focused on how odds ratios (the ratio of the odds of disease if allele A is carried as opposed to allele B) can vary across different populations, environments and age groups and, as a consequence, affect disease risk estimates.  It turns out that even if we forget about these concerns for a moment, getting an accurate estimate of disease risk is far from straightforward. One of the primary challenges is deciding which disease loci to include in the risk prediction and in this post I will investigate the effect this decision can have on risk estimates.

To help me in my quest, I will use ulcerative colitis (UC) as an example throughout the post, estimating Genomes Unzipped members’ risk for the disease as I go. Ulcerative colitis is one of two common forms of autoimmune infllammatory bowel disease and I have selected it not on the basis of any special properties (either genetic or biological) but because I am familiar with the genetics of the disease having worked on it extensively.

The table below gives our ulcerative colitis risks according to 23andMe. The numbers in the table represent the percentage of people 23andMe would expect to suffer from UC given our genotype data (after taking our sex and ethnicity into account). The colours highlight individuals who fall into 23andMe’s “increased risk” (red) or “decreased risk” (blue) categories based on comparisons with the average risk (males: 0.77%; females 0.51%). As far as I am aware none of us actually do suffer from UC.
Continue reading ‘At odds with disease risk estimates’


Page optimized by WP Minify WordPress Plugin