In my last post, I discussed how I used 23andMe data to test hypotheses about my ancestry. In particular, I was intrigued by Dienekes Pontikos’s result suggesting that I (and my colleague Vincent) might be partly Ashkenazi Jewish. Ultimately, however, I concluded that his algorithm was not properly modeling my southern European ancestry (inherited from one Italian grandparent), and that this was leading to a spurious result.
I was wrong.
What did I conclude previously?
Let’s quickly recap my previous discussion: Dienekes’s program models individuals as being a mixture of Ashkenazi Jewish, northwest European, and southeast European ancestry. People, like Vincent and myself, who are not fully descended from those three populations will pose problems for this algorithm. I thought it was unlikely to be a coincidence that Vincent and I were the only two people to get confusing results. Indeed, when I included Italian and French individuals in my analysis, I saw no clear evidence for any Ashkenazi ancestry on either of our parts. Mystery solved.
Some inconsistencies revealed
After I published that previous post, however, a couple things came up that seemed incongruous. First, a commenter recommended that I check out the Ancestry Finder tool on 23andMe. What this tool does is identify large segments of your genome that perfectly match the genomes of other people of known ancestry. If, for example, parts of my genome perfectly matched an individual who knows for a fact that s/he is Ashkenazi, this would be pretty strong evidence that those parts of my genome were descended from someone who also was Ashkenazi. Indeed, this is what I found—a moderate proportion (3-30%) of my genome does indeed appear to be of recent Ashkenazi ancestry in this analysis. I was skeptical about this, but on reflection, I couldn’t come up with a good reason that this result would be spurious.
Second, Dienekes followed up on his analysis of the ancestry of the GNZ participants with a much larger data set, including individuals of southwest European descent. As expected, when including more data, there was no evidence that Vincent has any Ashkenazi ancestry. Unexpectedly, this was not true for me—even in this larger analysis, the evidence for Ashkenazi ancestry didn’t disappear.
I followed up on this using a similar approach to Dienekes. I used the same dataset I assembled previously—a set of European populations from the Human Genome Diversity Panel, a set of Ashkenazi individuals, and the GNZ data. This time, instead of using principle components analysis (which averages information across the entire genome), I used the model implemented in the program admixture (which models individuals as mixtures of different populations) . With this model and these data, it’s relatively easy to find a component of ancestry that is essentially unique to the Ashkenazi population . Below, I’ve plotted (in red) the estimated fraction of Ashkenazi ancestry for a subset of individuals from this analysis. As you can see, there are two GNZ individuals with any red: Dan (who knows he is fully of Ashkenazi descent) and, surprisingly, myself. Combined with the Ancestry Finder results, there are two possibilities: either all the algorithms are getting confused (one can imagine situations where this would be the case) or I’m confused myself.
As I was mulling over these sorts of issues, I sent the link to my previous analysis to a family member. I didn’t really expect this person to find it that interesting, but hey, you never know. I then got a phone call. I’ll summarize a couple days worth of moderate confusion, second-hand reports of conversations with distant relatives, and family intrigue with this: as it turns out, one of my great-grandparents was indeed a Polish Ashkenazi Jew who immigrated to the United States around the turn of the century. I, obviously, was completely unaware of this.
So to conclude, a tip of my hat to Dienekes and everyone else who looked at these data—this has been the first genuinely unexpected thing to come out of my genetic data.
 Alexander et al. (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Research. doi: 10.1101/gr.094052.109
 I first thinned the data to remove SNPs in strong linkage disequilibrium. I then ran admixture using K=8, 9, and 10, looking for an ancestry component essentially specific to the Ashkenazi population. The program finds one at K=9. Plotted in red in the figure is the fraction of each individual’s ancestry predicted to be from this population (which I’m interpreting as Ashkenazi). I ran this on all the individuals, but am only plotting the GNZ individuals, and the Ashkenazi and Italian populations for comparison.