Parental relatedness through time revealed by runs of homozygosity in ancient DNA. Harald Ringbauer, John Novembre & Matthias Steinrücken. Nature Communications volume 12, Article number: 5425. Sep 14 2021. https://www.nature.com/articles/s41467-021-25289-w
Abstract: Parental relatedness of present-day humans varies substantially across the globe, but little is known about the past. Here we analyze ancient DNA, leveraging that parental relatedness leaves genomic traces in the form of runs of homozygosity. We present an approach to identify such runs in low-coverage ancient DNA data aided by haplotype information from a modern phased reference panel. Simulation and experiments show that this method robustly detects runs of homozygosity longer than 4 centimorgan for ancient individuals with at least 0.3 × coverage. Analyzing genomic data from 1,785 ancient humans who lived in the last 45,000 years, we detect low rates of first cousin or closer unions across most ancient populations. Moreover, we find a marked decay in background parental relatedness co-occurring with or shortly after the advent of sedentary agriculture. We observe this signal, likely linked to increasing local population sizes, across several geographic transects worldwide.
Discussion
We developed a method for measuring ROH in low coverage ancient DNA. Our algorithm follows a long line of previous work utilizing HMMs to infer such segments10,40,41,42. A key methodological advantage here is to use hidden states that, within an ROH segment, copy from a reference panel of haplotypes to take advantage of haplotype information. This tool enabled us to screen aDNA data from 1785 individuals for ROH, an order of magnitude more ancient individuals than hitherto amenable for such analysis. We generated evidence for two key aspects of the human past: Identifying long ROH (>20 cM) provided insight into the past prevalence of close kin unions such as cousin matings, whereas short ROH (4–8 cM) revealed changing patterns of past background relatedness that reflect local population sizes.
We found that only 1 out of 1785 ancient individuals have long ROH typical for the offspring of first-degree relatives (e.g., brother–sister or parent–offspring). Historically, matings of first-degree relatives are only documented in royal families of ancient Egypt, Inca, and pre-contact Hawaii, where they were sporadic occurrences7. The only other example of an offspring of first-degree relatives found using aDNA to date is the recently reported case from an elite grave in Neolithic Ireland18. Our findings are in agreement that first-degree unions were generally rare in the human past.
Further, we find that only 54 out of 1785 ancient individuals (3.0%, CI: 2.3–3.9%) have long ROH typical for the offspring of first cousins (88%) and less commonly observed for second cousins (20%). Such long ROH can also arise as a consequence of small mating pools (e.g., 8% in randomly mating populations of size 500, which may explain the long ROH we observed on certain island populations). Therefore, the rate of long ROH is an upper bound for the rate of first-cousin unions. On the other hand, because of incomplete power, some long ROH may be missed in our empirical analysis; however, even if the method would fail to detect half of all ROH > 20 cM, well below the power that we observed in our simulations, we would still detect 60% of first cousins (see Table S5). We conclude that in our ancient sample substantially less than 10% of all parental unions occurred on the level of first cousins.
In two specific regions with high levels of long ROH in the present-day2, the dataset contained a sufficient number of ancient individuals to allow analyzing time transects. For both transects (the Levant and present-day Northwest Pakistan), we observe a substantial shift in the levels of long ROH. In contrast to the high abundance of long ROH typical of close kin unions in the present-day individuals, long ROH was uncommon in the ancient individuals, including up to the Middle Ages. Additional data from these regions and others with high levels of long ROH today, such as North Africa as well as Central, South, and West Asia2, will help resolve with more precision the origin and spread of these well-studied kinship-based mating systems43,44. Overall, our results show how an ROH-based method can be used to inform understanding of shifts in cultural marriage/mating practices.
As a second major finding, we observed that human background relatedness as measured by short ROH (4–8 cM) decreased markedly over time in many geographic transects, with a significant drop occurring during or shortly after the local “Neolithic Transition”, the transition from a lifestyle of hunting and gathering to one of agriculture and settlement45,46,47. Assuming that early farmers had no increased individual mobility compared to foragers, which would agree with observations in present-day forager populations48, the substantial decrease of short ROH evidences markedly increasing local population sizes. This finding adds support to the long-held hypothesis of local population sizes increasing following the Neolithic transition45,46,47. Previous analysis of ancient genomes of foragers and early farmers already identified several lines of genomic evidence for farmers having larger population sizes than earlier hunter–gatherers, such as decreasing genome-wide diversity49,50, decreasing prevalence of ROH11,12,13,14,18 and decreasing coalescent rates estimated from high-coverage genomes27. Our analysis adds a refined level of geographic and temporal resolution by analyzing an order of magnitude of more individuals (1785 ancient humans) and by organizing those individuals into several densely sampled time transects in different geographic regions.
For individuals from early Eurasian Steppe pastoralist groups, we observe an intermediate level of short ROH. These early cultures (e.g., the Yamnaya) have drawn much attention in archeological and ancient DNA studies to date, as archeological, linguistic, and genetic evidence suggest they played an important role in the origin of Indo-European languages and of several populations expansions32,51,52,53,54. The elevated rate of short ROH we observed provides evidence that many matings occurred within and among small, related groups. An alternative interpretation for the abundance of short ROH could be that burial sites (Kurgans) represent a biased sample of societal classes with more short ROH than the general populace51. However, as short ROH probes parental ancestry up to several dozen generations into the past, this signal would require reproductive isolation between societal strata maintained over many generations. Therefore, it is likely that at least part of the signal is due to Steppe populations having comparably low population densities or experienced recent bottlenecks.
Our analysis is limited by several caveats. Importantly, skeletal remains accessible by archeological means often do not constitute a random cross-section of past populations. While levels of background relatedness are expected to be similar within a mixing population, rates of close kin unions can vary substantially because of social structure; e.g., elite dynasties may practice close kin unions despite them being uncommon in the general population. Another limitation is the incomplete sampling of the current aDNA record and that for much of the world, we necessarily make inferences from small numbers and sparse sampling. Future work analyzing the rapidly growing ancient DNA record will help to resolve additional details of social and cultural factors operating at finer scales (e.g., leveraging more precise timings of shifts and more subtle shifts in ROH patterns). In particular, future studies focusing on specific localized questions will increasingly combine archeological and genetic evidence16, in ways that will empower the use of the genetic evidence about the past provided by the methodology presented here.
In addition to denser sampling, there are several ways how our analysis can be improved upon by future work. Here we focused our analysis on long ROH (>20 cM) and short ROH (4–8 cM). While this dichotomy helped us to disentangle more clearly recent and distant parental relatedness, we expect that future work refining the downstream analysis of ROH will be able to extract more subtle signatures by looking across all ROH scales. Furthermore, we note that our application focused on a set of SNPs widely used for human ancient DNA (1240K SNPs). For whole-genome sequencing data (available for a subset of the data analyzed here), using all genome-wide variants would likely lower the requirements for coverage below the current limit of 400,000 of the 1240K SNPs covered at least once (corresponding to ca. 0.3× whole-genome sequencing coverage). Another improvement would be using a reference panel that includes ancient haplotypes. Currently, no long-range phased ancient haplotypes are available, but future work will likely produce such data.
One alternative approach to identify ROH in low coverage ancient genomes could be to use imputation followed by screening for stretches of homozygous markers using standard ROH detection methods. This was recently done for ancient individuals with >10× coverage18. Since imputation of genomes was reported to work well to a coverage similar to the low coverage cutoff used here [55,56ca. 0.5×] and most imputation methods are based on haplotype-copying methods related to the approach utilized here [the Li and Stephens model22, we expect any such approach to perform similar to ours, after appropriate testing and calibration, as conducted for our method. We chose to develop a method utilizing several key advantages of pseudo-haploid data, which is more widely available and requires fewer assumptions about genotype quality, making subsequent analysis less prone to batch effects introduced by various isolation, sequencing, and genotyping protocols.
Identifying ROH can also be a starting point for other powerful applications: ROH consists of only a single haplotype (the main signal of our method), which is therefore perfectly phased, a prerequisite for powerful methods relying on haplotype copying57 or tree reconstruction26,58. Moreover, long ROH could be used to estimate contamination and error rates, an important task in ancient DNA studies20. ROH lacks heterozygotes, allowing one to identify heterozygous reads within ROH that must originate from contamination or genotyping error, similar to estimating contamination from the hemizygous X chromosomes in males59. Another promising future direction is the development of a method to identify long shared sequence blocks in ancient DNA not only within (ROH), but also between individuals, called identity-by-descent (IBD). Calling IBD between individuals would substantially increase power for measuring background relatedness since signals from every pair of individuals could be used. Moreover, a geographic IBD block signal is highly informative about patterns of recent migration35,60,61,62. Extending our method to similarly use haplotype information from a phased reference panel when detecting IBD could enable such analyses in low coverage ancients individuals.
Finally, the analysis of ROH has additional implications beyond human demography and kinship-based mating systems. In many plants and animal species, ROH is more prevalent (due to different mating systems, small population sizes, or domestication), and the study of ROH may be particularly interesting for understanding early plant and animal breeding, as actively controlled mating among domesticates would be expected to alter ROH63. For aDNA from extinct or endangered species, ROH can shed light on the extinction and inbreeding processes, as is observed for example in aDNA from high-coverage Neanderthal individuals17,64,65,66, or modern DNA from Isle Royal wolves67. Finally, as ROH exposes rare deleterious recessive alleles68, the temporal dynamics of ROH are relevant for understanding the evolutionary dynamics of deleterious variants and health outcomes67,69,70,71. We hope that the core ideas of our approach will inspire the analysis of low-coverage data from a wide range of natural populations.