There are many different ways to model your genetic ancestry. I prefer the Global25/nMonte method (see here). This is a step by step guide to modeling ancient ancestry proportions with this simple but powerful method using my own genome.
As far as I know, the vast majority of my recent ancestors came from the northern half of Europe. This may or may not be correct, but it gives me somewhere to start, so that I can come up with a coherent model. If you don’t have this sort of information, because, perhaps, you were adopted, then just look in the mirror, and work from there. Like I say, it’s not imperative that you know anything whatsoever about your ancestry, because your genetic data will do the talking, but you do need a model when modeling. In scientific literature nowadays, Northern Europeans are often described as a three-way mixture between Yamnaya-related pastoralists, Anatolian-derived early farmers, and Western European Hunter-Gatherers (WHG). So let’s see if this model works for me. Obviously, if it does, then it’ll confirm the information that I have about my origins, but it might also reveal finer details that I’m not aware of. The datasheet that I’m using for this model is available here.
[1] distance%=6.9025 / distance=0.069025 Davidski Yamnaya_Samara 53.9Barcin_N 30.75Rochedane 15.35Tepecik_Ciftlik_N 0
Yep, the model does work, with a fairly reasonable distance of almost 7%. The ancestry proportions more or less match those from scientific literature and the plethora of analyses that I’ve featured at this blog on the topic. Please note that I’ve kept things very simple, using only four reference populations and individuals as proxies for four distinct streams of ancestry. But I’ve put my own twist on this Neolithic/Bronze Age model by including two populations from Neolithic Anatolia (Barcin_N and Tepecik_Ciftlik_N), just to see what would happen. The WHG proxy is Rochedane. Admittedly, though, my Yamnaya cut of ancestry appears somewhat bloated at over 53%, and the model’s distance is a little higher than what I normally see for really strong models. So let’s check if I can get a better fitting and more sensible result by adding a slightly more easterly forager proxy than Rochedane: Narva_Lithuania.
[1] distance%=5.9331 / distance=0.059331 Davidski Yamnaya_Samara 45.75Barcin_N 31.45Narva_Lithuania 22.8Rochedane 0Tepecik_Ciftlik_N 0
The statistical fit does improve, and when given a choice between Rochedane and Narva_Lithuania, the algorithm picks the latter as the only source of extra forager input in my genome. What could this mean? It might mean that a large part of my ancestry derives from the Baltic region. Actually, I know for a fact that this is true. But even if I had no idea about my genealogy, this result would be a very strong hint about my genetic origins. Indeed, let’s follow this trail and try to further improve the fit of the model by adding a more relevant Yamnaya-related proxy, such as early Baltic Corded Ware (CWC_Baltic_early).
[1] distance%=5.444 / distance=0.05444 Davidski CWC_Baltic_early 54.95Barcin_N 26.7Narva_Lithuania 18.35Rochedane 0Tepecik_Ciftlik_N 0Yamnaya_Samara 0
Holy shit! To be honest, I wasn’t expecting this sort of resolution and accuracy, and I can’t promise that everyone using the Global25/nMonte method will see such incredibly nuanced outcomes, but this isn’t a fluke. It can’t be, because it gels so well with everything that I know about my ancestry. Please note also that I belong to Y-chromosome haplogroup R1a-M417, which is a lineage intimately associated with the Corded Ware expansion across Northern Europe (for instance, see here). But of course, the Baltic and nearby regions haven’t been isolated from migrations and invasions since the Corded Ware times. For instance, at some point, probably during the Bronze Age, Uralic-speaking peoples moved west across the forest zone of Northeastern Europe and into the East Baltic and northern Scandinavia. It’s generally accepted that they brought Siberian admixture with them (see here). Moreover, from the Iron Age to the Middle Ages, East Central Europe was under intense pressure from a wide range of nomadic steppe groups with complex ancestry, such as the Sarmatians, Avars, Huns, and Mongolians. Did any of these peoples leave their mark on my genome? At the risk of overfitting the model, let’s explore this possibility by adding a few more reference populations.
[1] distance%=5.444 / distance=0.05444 Davidski CWC_Baltic_early 54.95Barcin_N 26.7Narva_Lithuania 18.35Han 0Mongolian 0Nganassan 0Rochedane 0Sarmatian_Pokrovka 0Tepecik_Ciftlik_N 0Yamnaya_Samara 0
Nothing changes when I add the Han Chinese, Mongolians, Nganassans (an Uralic people from Siberia), and Sarmatians to the model. But what about if I throw in the only ancient Slav in my datasheet?
[1] distance%=2.9904 / distance=0.029904 Davidski Slav_Bohemia 85.9CWC_Baltic_early 7.7Narva_Lithuania 6.4Barcin_N 0Rochedane 0Tepecik_Ciftlik_N 0Yamnaya_Samara 0
Considering that the vast majority of my recent ancestors were Poles, thus a Slavic-speaking people from near the Baltic, this outcome makes perfect sense. And check out the new distance! But the problem now is that I’m overfitting the model by using two very similar and probably very closely related references, CWC_Baltic_early and Slav_Bohemia. And overfitting should be avoided at all costs. So it might be useful to break up this effort into two models: one focusing on the Neolithic and Bronze Age, and the other on the Iron Age and Middle Ages. I’ll do that soon, but not just yet, because there are still too few Iron Age and Medieval samples available from the Baltic region and surrounds for meaningful analyses of this type. For a more technical guide to running Global25-type data with nMonte, please refer to this post by regular Eurogenes commentator Onur: An nMonte and 4mix guide for the participants of the Basal-rich K7 and/or Global 10 tests.
Комментариев нет:
Отправить комментарий