[1] distance%=6.9025 / distance=0.069025 Davidski Yamnaya_Samara 53.9Barcin_N 30.75Rochedane 15.35Tepecik_Ciftlik_N 0Yep, the model does work, with a fairly reasonable distance of almost 7%. The ancestry proportions more or less match those from scientific literature and the plethora of analyses that I've featured at this blog on the topic. Please note that I've kept things very simple, using only four reference populations and individuals as proxies for four distinct streams of ancestry. But I've put my own twist on this Neolithic/Bronze Age model by including two populations from Neolithic Anatolia (Barcin_N and Tepecik_Ciftlik_N), just to see what would happen. The WHG proxy is Rochedane. Admittedly, though, my Yamnaya cut of ancestry appears somewhat bloated at over 53%, and the model's distance is a little higher than what I normally see for really strong models. So let's check if I can get a better fitting and more sensible result by adding a slightly more easterly forager proxy than Rochedane: Narva_Lithuania.
[1] distance%=5.9331 / distance=0.059331 Davidski Yamnaya_Samara 45.75Barcin_N 31.45Narva_Lithuania 22.8Rochedane 0Tepecik_Ciftlik_N 0The statistical fit does improve, and when given a choice between Rochedane and Narva_Lithuania, the algorithm picks the latter as the only source of extra forager input in my genome. What could this mean? It might mean that a large part of my ancestry derives from the Baltic region. Actually, I know for a fact that this is true. But even if I had no idea about my genealogy, this result would be a very strong hint about my genetic origins. Indeed, let's follow this trail and try to further improve the fit of the model by adding a more relevant Yamnaya-related proxy, such as early Baltic Corded Ware (CWC_Baltic_early).
[1] distance%=5.444 / distance=0.05444 Davidski CWC_Baltic_early 54.95Barcin_N 26.7Narva_Lithuania 18.35Rochedane 0Tepecik_Ciftlik_N 0Yamnaya_Samara 0Holy shit! To be honest, I wasn't expecting this sort of resolution and accuracy, and I can't promise that everyone using the Global25/nMonte method will see such incredibly nuanced outcomes, but this isn't a fluke. It can't be, because it gels so well with everything that I know about my ancestry. Please note also that I belong to Y-chromosome haplogroup R1a-M417, which is a lineage intimately associated with the Corded Ware expansion across Northern Europe (for instance, see here). But of course, the Baltic and nearby regions haven't been isolated from migrations and invasions since the Corded Ware times. For instance, at some point, probably during the Bronze Age, Uralic-speaking peoples moved west across the forest zone of Northeastern Europe and into the East Baltic and northern Scandinavia. It's generally accepted that they brought Siberian admixture with them (see here). Moreover, from the Iron Age to the Middle Ages, East Central Europe was under intense pressure from a wide range of nomadic steppe groups with complex ancestry, such as the Sarmatians, Avars, Huns, and Mongolians. Did any of these peoples leave their mark on my genome? At the risk of overfitting the model, let's explore this possibility by adding a few more reference populations.
[1] distance%=5.444 / distance=0.05444 Davidski CWC_Baltic_early 54.95Barcin_N 26.7Narva_Lithuania 18.35Han 0Mongolian 0Nganassan 0Rochedane 0Sarmatian_Pokrovka 0Tepecik_Ciftlik_N 0Yamnaya_Samara 0Nothing changes when I add the Han Chinese, Mongolians, Nganassans (an Uralic people from Siberia), and Sarmatians to the model. But what about if I throw in the only ancient Slav in my datasheet?
[1] distance%=2.9904 / distance=0.029904 Davidski Slav_Bohemia 85.9CWC_Baltic_early 7.7Narva_Lithuania 6.4Barcin_N 0Rochedane 0Tepecik_Ciftlik_N 0Yamnaya_Samara 0Considering that the vast majority of my recent ancestors were Poles, thus a Slavic-speaking people from near the Baltic, this outcome makes perfect sense. And check out the new distance! But the problem now is that I'm overfitting the model by using two very similar and probably very closely related references, CWC_Baltic_early and Slav_Bohemia. And overfitting should be avoided at all costs. So it might be useful to break up this effort into two models: one focusing on the Neolithic and Bronze Age, and the other on the Iron Age and Middle Ages. I'll do that soon, but not just yet, because there are still too few Iron Age and Medieval samples available from the Baltic region and surrounds for meaningful analyses of this type. See also... Global25 workshop 1: that classic West Eurasian plot Global25 workshop 2: intra-European variation Global25 workshop 3: genes vs geography in Northern Europe Getting the most out of the Global25 Genetic ancestry online store (to be updated regularly)
* This article was originally published here

Комментариев нет:
Отправить комментарий