It has been a common idea, especially among linguists, to say that Baltic Finnic languages came from the Volga region, from so called Volga river bend near Samara. It is a carefully cherished tradition in Finnish science, but any movement of people from there to Finland is still without genetic evidences. Now I am going to prove something which contradicts with this idea of the Volga origin of Finns, or at least gives a new view about it. I'll show a plausible genetic evidence of Volga-Saami connection using the Saami sample (Haak et al. 2015 and Lazaridis et al. 2014), which shows very high similarity with the ancient Eneolithic Samara sample (Mathiesson et al.).
The other half of my Finnish story tells about ancient Central-European influence in Finland. Around 20% of Finnish samples from the 1000genome project show Corded-Ware similarity comparable to Estonians and Lithuanians, and Western Finnish project samples show equally Corded-Ware similarity with Swedes, some even more, despite of the fact that they are much more "eastern" when compared to present-day Swedes.
This Finnish duality doesn't tell were and when the mixing occurred and so far I have not seen any genetic evidence about the Baltic Finnic origin. It looks very possible that genetically Baltic Finns were born somewhere in region from Estonia to White Sea, no matter what the origin of Baltic Finnish language could have been.
Saamis are genetically closer for Eneolithic Samara people than Mordovians (Mordva) and Chuvashes. Worth noticing is that Mordovians, who live near Volga are not closer those ancient people living in Samara. Saami people live thousands kilometers and thousands years away from what was the suggested Volga home range. Siberian admixture of Chuvashes roughly equals to Saami Siberian. This statistic has however very limited use, because Saami people are not Central Europeans, but still the statistic shows them being comparable to Central Europeans when compared to ancient East European samples. What could be the best outcome?
Probably some readers can think that the Eneolithic Samara - Saami - Finnish genetic connection is only based on the amount of Siberian. It is not true and easily proved false. Chuvashes and Mansi people (and Komis, not included) with high Siberian admixture are far away from the Eneolithic Samara, definitely not comparable to the Saamis. Similarly those Finns being closest Eneolithic Samara have less Siberian than Russians living in Archangel and Pinega regions in Russia (look project results).
Only people in northernmost Europe beat Saami_WGA in comparison with Eneolithic Samara. Have to admit, this is a bit complicated question. Then let's look at another perspective of supposed Finnish ancestry, Corded Ware samples. It is less complicated.
Corded Ware results
Only Lithuanians beat the Finnish CW-group (20% of Finnish samples from the 1000g project after removing outliers) when the test is done using over half million SNPs. Even Lithuanians would be beaten with more homogeneous Finnish sample group. There is all variations from very CW-looking to only moderately CW-looking. They don't look like coming from Volga bend. Not really.
Then combining Saami and CW results and project members. To do this I have to use my smaller data base, based on Estonian Biocentre's data. The accuracy is somewhat poorer. Numbers show the difference between Eneolithic Samara and German Corded Ware affinities in Finland and in neighboring countries, as well as results for project members. Using Eneolithic Samara and CW samples the Siberian-like admixture becomes excluded and results show only affinities common for those two groups, even if tested populations or project members have extra Siberian admixture. It is important to understand that this table alone doesn't tell how much individuals and populations have those two ancient affinities (it tells only a ratio). To see the big picture you have to take into account also two previous tables showing how significant is the relation between ancient and modern populations.
Sunday, March 13, 2016
1. Hinxton2 is HI2 from the study "Iron Age and Anglo-Saxon genomes from East England reveal British migration history". HI2 Hinxton Male 170 BCE – 80 CE.
2. Rabrit3 is one of Roman Age samples from the study "Genomic signals of migration and continuity in Britain before the Anglo-Saxons". I can't identify which one it is of those six local samples from Driffield Terrace, because study authors don't tell connections between sample labels and sample data. Rabrit3 is processed using sample files ERR1043145, ERR1043146, ERR1043147.
3. Iabrit is M1489 from the same study (Genomic signals...). M1489 Iron Age Melton, age estimate between 210 BC and 40 AD.
4. Anglosaxon/anglosaxon2 is NO3423, again from the same study. NO3423 Anglo-Saxon Norton on Tees. Age estimate is unknown, but it is mentioned to be Anglo-Saxon.
All four samples are remastered using BWA-mem as described in my previous post. BWA-mem makes automatic trimming for reads and gives great results with minimum personal action and control, the process is fully automated. Before choosing BWA-mem I tested three additional softwares.
I have also standardized the sample selection in this test to ensure same SNP coverage for all samples and to avoid errors due to SNP qualification and differences in SNP counts. So each ancient sample is compared almost exactly similarly against modern populations. This is fundamental, because especially differences in the SNP count can cause severe biases to results.
Here are Dstat results:
result: CEU Mbuti_Pygmy iabrit Chimp.DG 0.4532 100.000 17762 6683 184450
result: CEU Mbuti_Pygmy anglosaxon Chimp.DG 0.4604 100.000 28390 10489 287312
result: French Mbuti_Pygmy iabrit Chimp.DG 0.4533 100.000 17714 6663 184450
result: French Mbuti_Pygmy anglosaxon Chimp.DG 0.4579 100.000 28268 10511 287312
result: FinnLocal Mbuti_Pygmy iabrit Chimp.DG 0.4491 100.000 17677 6721 184450
result: FinnLocal Mbuti_Pygmy anglosaxon Chimp.DG 0.4542 100.000 28218 10592 287312
result: FinnMostCW Mbuti_Pygmy iabrit Chimp.DG 0.4539 100.000 17757 6670 184450
result: FinnMostCW Mbuti_Pygmy anglosaxon Chimp.DG 0.4592 100.000 28352 10509 287312
result: IBS Mbuti_Pygmy iabrit Chimp.DG 0.4481 100.000 17602 6709 184450
result: IBS Mbuti_Pygmy anglosaxon Chimp.DG 0.4521 100.000 28066 10590 287312
result: Kent Mbuti_Pygmy iabrit Chimp.DG 0.4546 100.000 17771 6664 184450
result: Kent Mbuti_Pygmy anglosaxon Chimp.DG 0.4604 100.000 28375 10484 287312
result: Estonia Mbuti_Pygmy iabrit Chimp.DG 0.4536 100.000 17611 6620 183032
result: Estonia Mbuti_Pygmy anglosaxon Chimp.DG 0.4609 100.000 28197 10406 285211
result: Sardinian Mbuti_Pygmy iabrit Chimp.DG 0.4498 100.000 17634 6692 184450
result: Sardinian Mbuti_Pygmy anglosaxon Chimp.DG 0.4528 100.000 28105 10587 287311
result: Orcadian Mbuti_Pygmy iabrit Chimp.DG 0.4552 100.000 17766 6652 184450
result: Orcadian Mbuti_Pygmy anglosaxon Chimp.DG 0.4595 100.000 28345 10496 287311
result: TSI Mbuti_Pygmy iabrit Chimp.DG 0.4479 100.000 17597 6711 184450
result: TSI Mbuti_Pygmy anglosaxon Chimp.DG 0.4529 100.000 28076 10573 287312
result: North_Italian Mbuti_Pygmy iabrit Chimp.DG 0.4497 100.000 17653 6702 184450
result: North_Italian Mbuti_Pygmy anglosaxon Chimp.DG 0.4557 100.000 28173 10533 287311
result: Russian_Vologda Mbuti_Pygmy iabrit Chimp.DG 0.4483 100.000 17635 6718 184450
result: Russian_Vologda Mbuti_Pygmy anglosaxon Chimp.DG 0.4525 100.000 28129 10603 287311
result: CEU Mbuti_Pygmy hinxton2 Chimp.DG 0.4391 100.000 45931 17904 433006
result: CEU Mbuti_Pygmy rabrit3 Chimp.DG 0.4557 100.000 29993 11214 299676
result: French Mbuti_Pygmy hinxton2 Chimp.DG 0.4375 100.000 45806 17923 433006
result: French Mbuti_Pygmy rabrit3 Chimp.DG 0.4536 100.000 29889 11235 299676
result: FinnLocal Mbuti_Pygmy hinxton2 Chimp.DG 0.4327 100.000 45622 18064 433006
result: FinnLocal Mbuti_Pygmy rabrit3 Chimp.DG 0.4486 100.000 29784 11336 299676
result: FinnMostCW Mbuti_Pygmy hinxton2 Chimp.DG 0.4384 100.000 45896 17921 433006
result: FinnMostCW Mbuti_Pygmy rabrit3 Chimp.DG 0.4540 100.000 29933 11239 299676
result: IBS Mbuti_Pygmy hinxton2 Chimp.DG 0.4329 100.000 45492 18006 433006
result: IBS Mbuti_Pygmy rabrit3 Chimp.DG 0.4505 100.000 29732 11263 299676
result: Kent Mbuti_Pygmy hinxton2 Chimp.DG 0.4402 100.000 45988 17876 433006
result: Kent Mbuti_Pygmy rabrit3 Chimp.DG 0.4574 100.000 30041 11186 299676
result: Estonia Mbuti_Pygmy hinxton2 Chimp.DG 0.4394 100.000 45638 17775 430442
result: Estonia Mbuti_Pygmy rabrit3 Chimp.DG 0.4558 100.000 29766 11126 297499
result: Sardinian Mbuti_Pygmy hinxton2 Chimp.DG 0.4327 100.000 45516 18021 433005
result: Sardinian Mbuti_Pygmy rabrit3 Chimp.DG 0.4515 100.000 29768 11250 299675
result: Orcadian Mbuti_Pygmy hinxton2 Chimp.DG 0.4384 100.000 45892 17919 433005
result: Orcadian Mbuti_Pygmy rabrit3 Chimp.DG 0.4571 100.000 30016 11185 299675
result: TSI Mbuti_Pygmy hinxton2 Chimp.DG 0.4329 100.000 45487 18005 433006
result: TSI Mbuti_Pygmy rabrit3 Chimp.DG 0.4491 100.000 29704 11292 299676
result: North_Italian Mbuti_Pygmy hinxton2 Chimp.DG 0.4346 100.000 45645 17989 433005
result: North_Italian Mbuti_Pygmy rabrit3 Chimp.DG 0.4528 100.000 29838 11238 299675
result: Russian_Vologda Mbuti_Pygmy hinxton2 Chimp.DG 0.4337 100.000 45616 18017 433005
result: Russian_Vologda Mbuti_Pygmy rabrit3 Chimp.DG 0.4493 100.000 29754 11306 299675
And here are corresponding graphic maps:
Results differ somewhat from what I got earlier, obviously due to the stricter data preparation and more neutral outgroups.
Finally, I made also IBS-statistics using same data and a PCA-plot. It is however reasonable to state that due to the homozygosity error of ancient samples most homozygous modern populations get extra boost and give us too high results. This is typical for Balts, Irismen and Scots. I don't know about Basque homozygosity.
I was able to catch extra populations using Plink and --geno 0.01 option to standardize the SNP set as much as possible.
Creating PCA needs more samples to pick proper and all-inclusive components and is here done using another data set with less SNPs and more populations.
edit 17.3.2016 23:05
I read a comment on a Finnish history forum that using two outgroups, as I did in this post, is not recommended and can distort results. I gladly admit that this is true. But the reason for using two outgroups is very clear; I used this way to get big amount of results comparable instead of comparing only three populations. Using three target populations and one outgroup makes impossible to compare results from separate qpDstat runs, or make it at least painful. Of course the latter method, using three targets, gives better accuracy.
But no smoke here without fire, my tests using two outgroups looks reliable. In my previous results (above) the FinnMostCW group was very close to the Iron Age British sample, closer than the French sample group. I made a new test using same data, now using three target populations and one outgroup. It confirms my previous results:
0 FinnMostCW 16
1 French 28
2 iabrit 1
3 Chimp.DG 1
jackknife block size: 0.050
snps: 605676 indivs: 46
number of blocks for jackknife: 551
nrows, ncols: 46 605676
result: FinnMostCW French iabrit Chimp.DG 0.0020 1.039 9159 9122 184450
Indeed, I will have to come back to this question with larger data.
edit 18.3.2016 12:40
Here is another dpDstat result using three target populations. I am quite disappointed to the way some people react when they are not happy seeing some results. My only goal is to make objective tests using primarily European genetic data. My focus is not on Finnish results, neither I try to avoid making reliable results about Finns.
0 FinnMostCW 16
1 FinnLocal 15
2 iabrit 1
3 Chimp.DG 1
jackknife block size: 0.050
snps: 605676 indivs: 33
number of blocks for jackknife: 551
nrows, ncols: 33 605676
result: FinnMostCW FinnLocal iabrit Chimp.DG 0.0073 3.750 9120 8989 184450
Saturday, March 5, 2016
Before going ahead with Roman Age samples I want to publish a PCA plot including all ancient Brits, excluding the Middle Eastern one. It look like on the main axis all Roman Age samples are very close present-day Brits and Irishmen. The Anglo-Saxon and Roman Age sample 7 are closer Swedes. All those samples turn on the second axis somewhat towards Basques. But the the Iron Age sample is clearly different, it locates just between France and England. Maybe she was from Bretagne/Brittany. I am not aware of the British history why just the Iron Age sample from Melton would look like this.