[Last week, Ed Yong at Not Exactly Rocket Science covered a paper positing an association between a genetic variant and an aspect of social behavior called prosociality. On Twitter, Daniel and Joe dismissed this study out of hand due to its small sample size (n = 23), leading Ed to update his post. Daniel and Joe were then contacted by Alex Kogan, the first author of the study in question. He kindly shared his data with us, and agreed to an exchange here on Genomes Unzipped. In this post, we expand on our point about the importance of sample size; Alex’s reply is here.
Edit 01/12/11 (DM): The original version of this post included language that could have been interpreted as an overly broad attack on more serious, well-powered studies in psychiatric disease genetics. I've edited the post to reduce the possibility of collateral damage. To be clear: we're against over-interpretation of results from small studies, not behavioral genetics as a whole, and I apologise for any unintended conflation of the two.]
In October of 1992, genetics researchers published a potentially groundbreaking finding in Nature: a genetic variant in the angiotensin-converting enzyme ACE appeared to modify an individual’s risk of having a heart attack. This finding was notable at the time for the size of the study, which involved a total of over 500 individuals from four cohorts, and the effect size of the identified variant–in a population initially identified as low-risk for heart attack, the variant had an odds ratio of over 3 (with a corresponding p-value less than 0.0001).
Readers familiar with the history of medical association studies will be unsurprised by what happened over the next few years: initial excitement (this same polymorphism was associated with diabetes! And longevity!) was followed by inconclusive replication studies and, ultimately, disappointment. In 2000, 8 years after the initial report, a large study involving over 5,000 cases and controls found absolutely no detectable effect of the ACE polymorphism on heart attack risk. In the meantime, the same polymorphism had turned up in dozens of other association studies for a wide range of traits ranging from obstetric cholestasis to meningococcal disease in children, virtually none of which have ever been convincingly replicated.
The ACE story is not unique; time and time again, initial reports of associations between candidate genes and complex diseases failed to replicate in subsequent studies. With the benefit of hindsight, the problem is clear: in general, common genetic polymorphisms have very small effects on disease risk. Detecting these subtle effects requires studying not dozens or hundreds, but thousands or tens-of-thousands of individuals. Smaller studies, which had no power to detect these small effects, were essentially random p-value generators. Sometimes the p-values were “significant” and sometimes not, without any correlation to whether a variant was truly associated. Additionally, since investigators were often looking at only a few variants (often just one!) in a single gene that they strongly believed to be involved in the disease, they were often able to subset the data (splitting males and females, for example) to find “significant” results in some subgroup. This, combined with a tendency to publish positive results and leave negative results in a desk drawer, resulted in a conflicted and confusing body of literature which actively retarded progress in medical genetics .
The problems that plagued underpowered candidate genetic association studies, also endemic to other fields of science, are eminently soluble – in a sane world they should now be well behind us. And indeed, in the last four years the genetics community has identified thousands of associations between genetic variants and disease that consistently and robustly replicate, thanks to the crucial innovation of genome-wide association studies done on thousands of individuals. These studies were both well powered to find tiny effects, and weren’t constrained to a particular starting hypothesis about which bits of the genome are involved in any particular disease.
Given this progress we find it frustrating to see researchers making two-decade-old mistakes today. Consider the paper in question by Alex Kogan and colleagues. The authors took a highly-studied candidate gene (the oxytocin receptor) and tested for association between a genetic variant in this gene and a trait called prosociality in a sample of 23 individuals . In light of what we know about complex trait genetics, this study design is hopelessly underpowered. If the effect sizes of genetic variants on relatively well-defined traits like diabetes and heart attack are small, the effect sizes of genetic variants on less well-defined traits like prosociality must be even smaller. This observation has been reinforced many times by the correlation between how easy it is to clinically define a disease or trait and how successful the GWAS approach has been. This study has produced a random p-value, perhaps “significant” (in this case, standard linear regression gives a p-value of 0.03), but ultimately meaningless.
The Kogan et al. study is just a symptom, in our opinion, of a habit among some research groups of ignoring the recent history of genetics. These groups continue to publish small studies on “sexy” genes like the serotonin transporter (known sometimes as “the depression gene”; this moniker is indeed depressing) and MAOA (unfortunately also known as the “warrior gene”). By historical analogy, most if not all of this literature is wrong, and will soon be forgotten. Signs of this are already starting to appear. For example, consider the “depression gene”. A recent large meta-analysis involving thousands of individuals found no detectable effect of the gene on the disease. And a recent large (~5,000 individuals) genome-wide association study of various personality traits found no significant effects of genetic variants anywhere in the genome on personality.
Our genes affect nearly every aspect of our lives, including our personality, so real genetic associations to these traits unquestionably exist. Indeed, genuine risk variants for serious psychiatric diseases like schizophrenia have been found, but only in very large, carefully-performed studies involving tens of thousands of people. Unravelling the genetic basis of variation in more subtle human behavioural traits will be a fascinating process, but everything we know about both the genetics of complex traits and the complexity of human behavior indicate that this will not be easy, and that it will also require genome-wide approaches with sample sizes in the thousands, not the low dozens. In addition, it will require adhering to rigorous procedures for study design and statistical analysis, as followed by most large-scale disease genome-wide association studies but all too often ignored by behavioral geneticists.
Finally, we extend a plea to science writers: before writing about any article claiming a genetic association, it’s worth doing some simple sanity checks. Is the sample large enough to capture the typically tiny effect sizes we expect to see for complex human traits? (Unless there is some reason to believe that a trait has a comparatively simple genetic basis, that means a sample size in the thousands.) Have the authors performed an independent replication study in a separate cohort, using the same genetic model and statistical approach? And does the study show any of the telltale signs of “significance-hunting”, such as reporting of results from some subsets of their cohort but not others, or the use of an unnecessarily complex statistical model? If the answers are no, no or yes to these questions, it is very likely that the study’s results are the outcome of artefact or chance rather than a genuine association, and you should report it with the appropriate caveats  – or better yet, don’t report it at all until the crucial replication studies have been performed.
 The author has been quoted as saying that “the number of observers and video clips observed actually makes for a larger sample size, providing greater statistical power“. This is incorrect. If I were interested in height and had 100 friends measure me, the large number of friends measuring my height obviously does not influence the sample size. I would still have a sample size of one, albeit with a very precise measurement.
 It’s worth noting once again the phenomenal example set by Ed Yong, who registered critical comments about the article on Twitter and edited his post to accommodate them within a matter of minutes.