lauantai 16. tammikuuta 2016

Does genetics prove the origin of IE-languages?

Continuing my previous testing method I tried to find out if it is possible to make any conclusion about where the IE-language family is from.   Did it come from east to Central Europe, from Middle East to South Europe or via another southern route?  To analyze it I made two tests, dividing at first the data between two major Bronze Age Central European cultures using Corded Ware and Bell Beaker samples.

It looks like there is nothing common between Central and South Europeans explaining the supposed linguistic root by Central European cultures.   Non-IE speakers, Basques and Finns, are placed between those two major IE-groups.  Because plotting the difference between Corded Ware and Bell Beaker samples looked however logical in a sense of place in geography I brought those groups together and used Anatolian Neolithic samples on the second axis, trying to reveal how the early farmer ancestry changes the picture.

The result broughtnot Central and South Europeans closer each other, but still the question remains:  if IE-languages wer brought to Central Europe via East Europe how we explain the lack of distinct Corded Ware and Bell Beaker drift ainfluence in the south? It looks more like we have in Central Europe a strong southern gene impact instead of any strong northern influence in the south.

Edit 20.1.16 10:15

Maybe it is not yet clear who are those Local and MostCW Finns.  At first they are on European, Eurasian and world level PCA plots almost in a same position,  there is no distinct trend separating them towards any other Europen or Asian populations.  Both groups act very similarly on PCA-plots constructed of present-day populations, but they clearly differ in wide genome tests and tests implying genetic drift, such as Dstat.

  1. While I reckon that there are many questions still open on the genetic side of things, I don't think that linguistics can be explained only by genetics or at least not Indoeuropean language expansion.

    If we go by the Kurgan model (which I support), we have to consider several waves:

    1. Early Kurgans of the 4th millennium (weak but persistent genetic legacy)
    2. Corded Ware (strong but non-persistent genetic legacy)
    3. Expansion into Greece and related
    4. Western and Italian expansion (Urnfield to La Tène)
    5. Roman imperial expansion

    Each of them is separated from the previous one by a quite regular span of c. 1000 years and they have different centers:
    1. Eastern European steppe
    2. Poland (although with lesser Ukrainian influence)
    3. Somewhere in the Mid-Danube
    4. Hercynian Forest (SW reaches of Central Europe)
    5. Italy

    When we try to interpret all this (and more not described) complexity as a single wave, we fail miserably.

    Anyhow, I wouldn't use Bell Beaker as a component because so far there are only a few samples from East Germany, who surely are not representative of the phenomenon at large, rather they are extreme cases of admixture with CW-like genetics or legacy from previous Kurgan waves (Baalberge). It's highly doubtful that BB were a "people" but rather a multi-ethnic "guild" or "sect" instead.

    1. Thanks for you comment. This work was more like an exercise, as well as two previous works. I wanted to show some principles, nothing startling indeed, but I thought that it is sometimes good to look at basic things too.

    2. I'm cool with it. No fundamental cricticism intended, just trying to help you (and all the others trying to discern IE influence in genetics) define hopefully better the terms of the problem.

      Basically: you won't find the IE factor in Iberia unless you understand that it came from the Rhine (Celts) and Italy (Romans) and not directly from the Volga. This is an IE factor that is necessarily different from the IE factor that affected Central and Northern Europe.

      If Mexicans invade someplace and impose their variant of Spanish there now, it will be two other factors away from the original IE homeland: the Iberian center and the futurible Mexican center. However it's more complex than that because Mexicans, Argentines, US-Americans and Haitians are not the same thing, yet all them are product of the same wave from Europe in the last few centuries, even if some display largely Native ancestry and others African one.

      If we naively look at that puzzle even today (as archaeo-genetists totally blind to recent history but not to language relations) we'd be most puzzled. That's the kind of stuff we're trying to understand in Europe mutatis mutandi.

    3. Importantly (sorry to insist but I forgot): for me the problem with your question is that it's the wrong question. The question is not why there's so little IE component in southern and particularly SW Europe, that is within expectations, the problem is why there appears to be so much in NW Europe: in the Atlantic Islands particularly.

    4. Because of my very limited knowledge about IE-languages I have no plans to start any big work with this theme. Actually last weeks I have been struggling with computing resources and with new haplotype tests, but I have to admit to be very suspicious with ideas of telling me that ancient Greek and Latin languages came from Central Europe. According to linguists those languages were born around 3000BC, as well as almost all IE-languages in a vast area in Europe and Asia. It sounds icredible that everything happened everywhere simultaneously. Uniparental genes sure give evidences proving for the same origin, somewhere in Europe, but I can't see genetic evidences about any Central European linguistic expansion. Anyway, this is a huge theme to deal with and and in my opinion also a bit too sloppy topic to dip my spoon into this soup :).

    5. Well, for whatever is worth, I wouldn't say that "everything happened simultaneously". On the contrary, at the very least Anatolian and Tocharian branches must have split earlier than the rest, what seems consistent with the Maykop and Afanasevo cultures respectively. Probably other branches also began diverging then, c. 4000 BCE, but they retained connections within the geography of Eastern and Central and Europe (secondary E→W flows at the root of Corded Ware, then eastward expansion of Corded Ware and derived groups), what may explain that European IE and Indo-Iranian, as well as some lesser branches like Greek, seem closer than they "should" on shallow look.

      I have drafted my own archaeology-based reconstruction HERE. IMO there was a Western IE branch that should have included the precursors of Celtic, Italic, Germanic, Balto-Slavic and probably also Illyric that coalesced in Central Europe within Corded Ware and precursors. Greek (and probably also Phrygio-Armenian) may be related to it but only at the very origin (via Vucedol) and also by contact. Indo-Iranian would be rooted in Yamna but got back-influences from Corded Ware, so it looks more similar to European lects than other Asian groups.

      It all happened since 4000 BCE but it's not a simple star-like diversification. Instead it has many interactions. While the expansion of Western IE can only probably be dated to c. 1300 BCE, diversification in Central Europe must be older, beginning around the end of the Corded Ware period c. 2500 BCE. We can imagine it looking at the map of early Bronze Age cultures, many of which were unified afterwards under Tumuli and Urnfield cultures but surely not to the point of losing their ethno-linguistic distinctiveness.

      Hope this helps.

    6. Many thanks. Maybe I have in future more time to learn more, but knowingmyself it would be frustrating to solve linguistic facts behind theories.

  2. This PCA should give a general idea about where the MostCW and FinLocal stand in an European context.

    FinLocal group may have a bit of very unusual cryptic ancestry. Compared to Vepsians they are equally or less related to farmer groups, hunter-gatherer and Bronze Age European groups. This happens despite the fact that Vepsians have significantly more East Eurasian (Chimp.DG Han Veps FinnLocal Z -3.947) which should pull them further away, especially from the farmers and BA groups.

    1. Yeah, Local- and MostCW groups are very close each other on pca, apparently because now Finnish samples are not separated by Siberian admixture, they are separated by a more significant difference, by migrations in Baltic Sea region. Usually the Siberian admixture (3-8% over the assumed Central and West European level, see Haak et al.) stretches Finnish samples into a long tail form. Of course there are also other differences inside the Finnish sample group but PCA "distills" everything not common in all selected samples and magnifies it. So there is a bigger difference in Finland if we use Dstat and real ancestral references. And we should use ancestral references especially in testing Finns, because Finland has not received newer migrations as much as their neighbors, including even small Finnic groups in Russia. Slavic and German migrations have shaped our neighbors' genes much more after the Bronze Age to a more uniform look. On European level Sardinians and Basques have preserved the original history even better than Finns. So the small Siberian admixture distorts results when the plot is made using present day Europeans, but this error is corrected by ancient references.

  3. Long PCA tail forms like that are caused by drift away from Sardinians and early famers.

    That's why some eastern 1000genomes Finns look as distant from Sardinians as Chuvashes who are really much more eastern. Eastern Finns should actually be closer to British than the northern Russian sample from HGDP in this PCA, using D-stats.

  4. Nope, on that plot Finns and Mozabites go to the same direction because two dimensions are not enough to figure all differences. The long tail appearing on plots designed properly is created by Šiberian, because it doesn't exist usually in Europe. The plot you linked is an example of bad design.

  5. It is drift, otherwise they would not be more eastern than HGDP Russians. Even Saamis are actually closer to West Europeans than Maris. It doesn't matter whether the PCA has mozabites, every time the kind of tail where 1kgenomes FIN is more eastern than HGDP Rus or Behar Mordva forms on an European PCA, the reason is drift.

    1. On this plot there is too mamy Finnish samples leadimg to magnified Finnish components. Those Finnish components include local ancestry from deep history and also genetic drift, but it doesn't mean that Finns on this plot have more drift than for example Lithuanians. They simply took too many Finnish samples compared to other North and Northeast Europeans, making Finnish component dominant. Again bad design. You should avoid oversampling and also avoid stress too much two dimensions by too much diversity, as it was the case on your first plot.

    2. In this special case the oversampling is unbelievable. Those LS-Finns are not any kind of average Finns and are all from a small town/village of around 20000 inhabitants and most probably cousins in 5-10 generations. The oversampling rate is 275. Really bad design.

    3. That's the point, oversampling causes these tails on an european pca. Even Orcadians can create a northeastern tail in high numbers, while a small Finnish sample (3 here) isn't numerous enough to differentiate from Lithuanians. The Russians and Mordvas mentioned before don't form a tail but a pretty solid cluster though, even in higher numbers.

    4. There is not only one way to ruin your pca plot, but many ways. Most of us want to beliebe that the truth is on pca because pca plots are so nice :)