I’ve been following Razib Khan’s scholarly and analytical exploration of his family’s genetic history – using data from 23andMe – over at Gene Expression with increasing fascination. When last week he noted that his findings appeared to be (finally) converging on a consensus, I asked if he’d be willing to summarise his journey for Genomes Unzipped readers. Here it is. –DM.

I’ve always been interested in genetics, anthropology, and history. Many may perceive me to be a collector of obscure facts, but summing up infinitesimals does produce something substantial in aggregation. One of the most influential books in my life has been History and Geography of Genes. So with that, the shift from classical markers to uniparental lineages, and now to the dense SNP-chips, has been a boon for my own intellectual interests which reside in part at the intersection of history and population genetics.

However, I’ve never been deeply curious as to the history of my own personal genome. I’m not adopted. All four of my grandparents were ethnic Bengalis, albeit from relatively diverse communal backgrounds. I look typically South Asian. Genealogy has never been a family fascination, and I’m going to be honest and admit that until five years ago I didn’t even know the names of my grandparents (in the Bengali language there are distinctive terms for maternal and paternal grandparents, so this wasn’t needed). Both sides of my family are from the Comilla district of Bengal, and that’s all I really cared about (and I didn’t care that much, I don’t put much stock in “heritage” as determinative).

As for other yields of personal genomics, I was skeptical. My parents have many siblings, and many, many, cousins. I had a general sense of my risks for diseases through an inspection of the pedigree of my family and their medical histories. Additionally, many of the risk alleles have been identified in European study populations, and I wasn’t totally sure about the between-population portability of these inferences. And I won’t even address the fact that effect size of many of the markers isn’t something to shout home about.

But last spring Daniel alerted me to the 23andMe “DNA Day” sale. It was affordable, and at that point enough of the readers of my weblog had been typed that I kept getting questions as to my own background (e.g., my family has the title Khan, so there was a question as to whether I carried the “Genghis Khan haplotype”). So I bit. At the time I recall emailing Dan and being excited that I’d be told I likely had brown eyes and was 75% “European” and 25% “Asian.” When my results came back, I was in for a mild surprise. The proportion to the left are calculated by 23andMe’s “ancestry painting” algorithm. As you can see, I’m more than 25% “Asian.” My initial reaction was that this seemed a touch high, but no worries, I would ask around and see which other South Asians had such a high value. After dozens of instances of “gene sharing,” the answer came back: none.

It seems that the “normal” range of Asian generated by the ancestry painting algorithm for South Asians is from 10-35%. The low value occurs in people from the northwest of the Indian subcontinent, and the high values for those from the east and south. Only one person broke the magic 35% barrier…and he was Bangladeshi (38%). A quick review. The paper Reconstructing Indian History offers a plausible model of why the 23andMe algorithm produces the values it does for South Asians. Basically, South Asians are a two-way admixture between a very European-like population and a marginally Asian-like population, “Ancestral North Indians” (ANI) and “Ancestral South Indians” (ASI). Assuming a two-way mixture the proportion of the former declines from ~75% in the northwest of the Indian subcontinent to ~45% in the far south, and also from upper castes to lower castes. My initial hunch was that my elevated Asian ancestry was simply a function of having more ASI than the norm.

But there was a reason to be skeptical of this. The image to the left is a PCA plot which places me on the distribution of a selection of Central/South Asian populations from the HGDP data base. I’m the green flag, and all the black ones below and to my “southwest” are the South Asians with whom I share genes. These range from people who are Punjabi, to Tamils and Biharis. I am clearly an outlier. The only person close to me is the other Bangladeshi individual. I am shifted toward the Hazara/Uygur cluster. What does that mean? The Hazaras and Uygurs are hybrid populations, with East and West Eurasian ancestry. Taking into account the fact that Comilla district borders regions to the east inhabited by Tibeto-Burman tribes, an alternative hypothesis to the idea that I have exceptionally high levels of ASI is that I have a supplementary Tibeto-Burman element to my ancestry which I was not aware of. I immediately queried my parents about this. Photos of my father as a young man had always seemed to indicate that he had Mongoloid admixture. But neither of them knew of any such ancestry. Since 23andMe does not disaggregate the signal produced by ASI and genuine East Asian I was at an impasse.

Until that is projects like Dodecad, Eurogenes, and Harappa came on the scene. The table below is from Harappa, but the other two have noticed the same pattern (as I have myself running ADMIXTURE):

I do have some Southeast Asian ancestry elevating my Asian quantum! So I now went back to the suspicion about my father’s heritage. This was the rational move: my mother is a relatively light-skinned Bengali with no distinctive eastern aspects to her countenance. Additionally, she has a family oral history detailing the origin of her paternal grandfather from the cosmopolitan Muslim population of Delhi, with ultimate roots in the Middle East. My father’s mother’s family was from the conventional Bengali caste of Thakurs (many Muslims retain some caste identity after conversion). His father’s origins were not so clear. Not to go into the details, but there were aspects of their cultural background which indicated a non-Bengali origin. My assumption had always been that this was tied to the west, as evidenced by the surname Khan, along with the fact that I carry the West Eurasian Y haplogroup group R1a1a. But by a process of elimination I adduced that all I had thought, all my family had assumed, was false, and that perhaps my paternal grandfather’s family were from a group of Tibeto-Burmans who had converted to Islam (at least in part).

With that, I decided to genotype my parents, thanks to 23andMe’s holiday sale. While waiting for this I wrote a long post inferring a paternal history tying me to the populations of Burma, based numerous chromosomal segment matches with populations from Yunnan in China in the HGDP data base. When I got the results back I found out that I was probably wrong. As it happens, my father is somewhat less Asian than my mother. Both the ancestry painting and PCA indicated this. I myself ran their data through ADMIXTURE and EIGENSOFT with various parameters…and over and over the results came back that my mother was more Asian than my father.

So I went back to the drawing board, and wrote a long post suggesting that my parents’ elevated Asian ancestry may have to do with the origins of the Muslims of eastern Bengal. I won’t bore you to with the details, but the short of it is that these results told me less about my own family I think than it does about the ethnogenesis of peoples on the eastern fringe of South Asia. My parents’ lack of oral history about Tibeto-Burman ancestors makes total sense in light of my revised model. If the admixture event was old there is no need for a cultural memory. By analogy, consider the low levels of African admixture in the contemporary Mexican population, which dates back to the absorption of the slaves of the colonial era in the 19th century in that nation. Today that history has been forgotten, and Mexicans self-identify as a “mestizo” society. That is, a mixture of Europeans and Amerindians.

Where does this leave me? I haven’t changed very much in terms of my own self-conception. On the margin my shifts in ancestral quanta have no metaphysical impact on me. But, it does give me insight into broader models of ethnogenesis. I’m back where I began: scientific genealogy is simply not a great interest of mine, but historical population genetics is. My enthusiasm about Harappa is due in large part to the explicit clarification about my own ancestral background, and what it can tell us about South Asia.

More functionally, I did find out that I had brown eyes. One of my siblings was chagrined to know that I was now aware of their dry earwax “condition” (their physician being unaware that more than 90% of Koreans have this “condition”). I am lactose intolerant. No surprise. Also, my family may have a Neandertal variant of a haplotype on dystrophin. Very fun, but not too revelatory in the end.

But I think I’ll keep this hobby. It’s not that expensive. And with the proliferation of analytic tools there is alway the possibility of squeezing more juice out of the fruit in the future.

