keskiviikko 23. maaliskuuta 2016

Two-fold ancestry of Finnish people

It has been a common idea, especially among linguists, to say that Baltic Finnic languages came from the Volga region, from so called Volga river bend near Samara. It is a carefully cherished tradition in Finnish science, but any movement of people from there to Finland is still without genetic evidences. Now I am going to prove something which contradicts with this idea of the Volga origin of Finns, or at least gives a new view about it.  I'll show a plausible genetic evidence of Volga-Saami connection using the Saami sample (Haak et al. 2015 and Lazaridis et al. 2014), which shows very high similarity with the ancient Eneolithic Samara sample (Mathiesson et al.).

The other half of my Finnish story tells about ancient Central-European influence in Finland.  Around 20% of Finnish samples from the 1000genome project show Corded-Ware similarity comparable to Estonians and Lithuanians, and Western Finnish project samples show equally Corded-Ware similarity with Swedes, some even more, despite of the fact that they are much more "eastern" when compared to present-day Swedes.

This Finnish duality doesn't tell were and when the mixing occurred and so far I have not seen any genetic evidence about the Baltic Finnic origin. It looks very possible that genetically Baltic Finns were born somewhere in region from Estonia to White Sea, no matter what the origin of Baltic Finnish language could have been. 

Saami results

Saamis are genetically closer for Eneolithic Samara people than Mordovians (Mordva) and Chuvashes.   Worth noticing is that Mordovians, who live near Volga are not closer those ancient people living in Samara.  Saami people live thousands kilometers and thousands years away from what was the suggested Volga home range.  Siberian admixture of Chuvashes roughly equals to Saami Siberian.  This statistic has however very limited use, because Saami people are not Central Europeans, but still the statistic shows them being comparable to Central Europeans when compared to ancient East European samples.   What could be the best outcome?

Probably some readers can think that the Eneolithic Samara - Saami - Finnish genetic connection is only based on the amount of Siberian.  It is not true and easily proved false.  Chuvashes and Mansi people (and Komis, not included) with high Siberian admixture are far away from the Eneolithic Samara, definitely not comparable to the Saamis.  Similarly those Finns being closest Eneolithic Samara have less Siberian than Russians living in Archangel and Pinega regions in Russia (look project results).

Only people in northernmost Europe beat Saami_WGA in comparison with Eneolithic Samara.  Have to admit, this is a bit complicated question. Then let's look at another perspective of supposed Finnish ancestry, Corded Ware samples.  It is less complicated.

Corded Ware results

Only Lithuanians beat the Finnish CW-group (20% of Finnish samples from the 1000g project after removing outliers) when the test is done using over half million SNPs.  Even Lithuanians would be beaten with more homogeneous Finnish sample group.  There is all variations from very CW-looking to only moderately CW-looking. They don't look like coming from Volga bend.  Not really.

Then combining Saami and CW results and project members.  To do this I have to use my smaller data base, based on Estonian Biocentre's data.   The accuracy is somewhat poorer.   Numbers show the difference between Eneolithic Samara and German Corded Ware affinities in Finland and in neighboring countries, as well as results for project members.  Using Eneolithic Samara and CW samples the Siberian-like admixture becomes excluded and results show only affinities common for those two groups, even if tested populations or project members have extra Siberian admixture.  It is important to understand that this table alone doesn't tell how much individuals and populations have those two ancient affinities  (it tells only a ratio).  To see the big picture you have to take into account also two previous tables showing how significant is the relation between ancient and modern populations.

Project results

sunnuntai 13. maaliskuuta 2016

Continuing tests with ancient Brits, better material and final results

Continuing with ancient British samples.  This is fascinating because these samples represent high scanning quality giving precise results.  I use now 4 samples:

1. Hinxton2 is HI2 from the study "Iron Age and Anglo-Saxon genomes from East England reveal British migration history". HI2 Hinxton Male 170 BCE – 80 CE.

2. Rabrit3 is one of Roman Age samples from the study "Genomic signals of migration and continuity in Britain before the Anglo-Saxons".  I can't identify which one it is of those six local samples from Driffield Terrace, because study authors don't tell connections between sample labels and sample data.  Rabrit3 is processed using sample files ERR1043145, ERR1043146, ERR1043147.

3. Iabrit is M1489 from the same study (Genomic signals...).  M1489 Iron Age Melton, age estimate between 210 BC and 40 AD.

4. Anglosaxon/anglosaxon2 is NO3423, again from the same study.  NO3423 Anglo-Saxon Norton on Tees. Age estimate is unknown, but it is mentioned to be Anglo-Saxon.

All four samples are remastered using BWA-mem as described in my previous post.  BWA-mem makes automatic trimming for reads and gives great results with minimum personal action and control, the process is fully automated.  Before choosing BWA-mem I tested three additional softwares.

I have also standardized the sample selection in this test to ensure same SNP coverage for all samples and to avoid errors due to SNP qualification and differences in SNP counts.  So each ancient sample is compared almost exactly similarly against modern populations. This is fundamental, because especially differences in the SNP count can cause severe biases to results.

Here are Dstat results:

result:        CEU Mbuti_Pygmy     iabrit   Chimp.DG      0.4532   100.000 17762   6683 184450
result:        CEU Mbuti_Pygmy anglosaxon   Chimp.DG      0.4604   100.000 28390  10489 287312
result:     French Mbuti_Pygmy     iabrit   Chimp.DG      0.4533   100.000 17714   6663 184450
result:     French Mbuti_Pygmy anglosaxon   Chimp.DG      0.4579   100.000 28268  10511 287312
result:  FinnLocal Mbuti_Pygmy     iabrit   Chimp.DG      0.4491   100.000 17677   6721 184450
result:  FinnLocal Mbuti_Pygmy anglosaxon   Chimp.DG      0.4542   100.000 28218  10592 287312
result: FinnMostCW Mbuti_Pygmy     iabrit   Chimp.DG      0.4539   100.000 17757   6670 184450
result: FinnMostCW Mbuti_Pygmy anglosaxon   Chimp.DG      0.4592   100.000 28352  10509 287312
result:        IBS Mbuti_Pygmy     iabrit   Chimp.DG      0.4481   100.000 17602   6709 184450
result:        IBS Mbuti_Pygmy anglosaxon   Chimp.DG      0.4521   100.000 28066  10590 287312
result:       Kent Mbuti_Pygmy     iabrit   Chimp.DG      0.4546   100.000 17771   6664 184450
result:       Kent Mbuti_Pygmy anglosaxon   Chimp.DG      0.4604   100.000 28375  10484 287312
result:    Estonia Mbuti_Pygmy     iabrit   Chimp.DG      0.4536   100.000 17611   6620 183032
result:    Estonia Mbuti_Pygmy anglosaxon   Chimp.DG      0.4609   100.000 28197  10406 285211
result:  Sardinian Mbuti_Pygmy     iabrit   Chimp.DG      0.4498   100.000 17634   6692 184450
result:  Sardinian Mbuti_Pygmy anglosaxon   Chimp.DG      0.4528   100.000 28105  10587 287311
result:   Orcadian Mbuti_Pygmy     iabrit   Chimp.DG      0.4552   100.000 17766   6652 184450
result:   Orcadian Mbuti_Pygmy anglosaxon   Chimp.DG      0.4595   100.000 28345  10496 287311
result:        TSI Mbuti_Pygmy     iabrit   Chimp.DG      0.4479   100.000 17597   6711 184450
result:        TSI Mbuti_Pygmy anglosaxon   Chimp.DG      0.4529   100.000 28076  10573 287312
result: North_Italian Mbuti_Pygmy     iabrit   Chimp.DG      0.4497   100.000 17653   6702 184450
result: North_Italian Mbuti_Pygmy anglosaxon   Chimp.DG      0.4557   100.000 28173  10533 287311
result: Russian_Vologda Mbuti_Pygmy     iabrit   Chimp.DG      0.4483   100.000 17635   6718 184450
result: Russian_Vologda Mbuti_Pygmy anglosaxon   Chimp.DG      0.4525   100.000 28129  10603 287311

result:        CEU Mbuti_Pygmy   hinxton2   Chimp.DG      0.4391   100.000 45931  17904 433006
result:        CEU Mbuti_Pygmy    rabrit3   Chimp.DG      0.4557   100.000 29993  11214 299676
result:     French Mbuti_Pygmy   hinxton2   Chimp.DG      0.4375   100.000 45806  17923 433006
result:     French Mbuti_Pygmy    rabrit3   Chimp.DG      0.4536   100.000 29889  11235 299676
result:  FinnLocal Mbuti_Pygmy   hinxton2   Chimp.DG      0.4327   100.000 45622  18064 433006
result:  FinnLocal Mbuti_Pygmy    rabrit3   Chimp.DG      0.4486   100.000 29784  11336 299676
result: FinnMostCW Mbuti_Pygmy   hinxton2   Chimp.DG      0.4384   100.000 45896  17921 433006
result: FinnMostCW Mbuti_Pygmy    rabrit3   Chimp.DG      0.4540   100.000 29933  11239 299676
result:        IBS Mbuti_Pygmy   hinxton2   Chimp.DG      0.4329   100.000 45492  18006 433006
result:        IBS Mbuti_Pygmy    rabrit3   Chimp.DG      0.4505   100.000 29732  11263 299676
result:       Kent Mbuti_Pygmy   hinxton2   Chimp.DG      0.4402   100.000 45988  17876 433006
result:       Kent Mbuti_Pygmy    rabrit3   Chimp.DG      0.4574   100.000 30041  11186 299676
result:    Estonia Mbuti_Pygmy   hinxton2   Chimp.DG      0.4394   100.000 45638  17775 430442
result:    Estonia Mbuti_Pygmy    rabrit3   Chimp.DG      0.4558   100.000 29766  11126 297499
result:  Sardinian Mbuti_Pygmy   hinxton2   Chimp.DG      0.4327   100.000 45516  18021 433005
result:  Sardinian Mbuti_Pygmy    rabrit3   Chimp.DG      0.4515   100.000 29768  11250 299675
result:   Orcadian Mbuti_Pygmy   hinxton2   Chimp.DG      0.4384   100.000 45892  17919 433005
result:   Orcadian Mbuti_Pygmy    rabrit3   Chimp.DG      0.4571   100.000 30016  11185 299675
result:        TSI Mbuti_Pygmy   hinxton2   Chimp.DG      0.4329   100.000 45487  18005 433006
result:        TSI Mbuti_Pygmy    rabrit3   Chimp.DG      0.4491   100.000 29704  11292 299676
result: North_Italian Mbuti_Pygmy   hinxton2   Chimp.DG      0.4346   100.000 45645  17989 433005
result: North_Italian Mbuti_Pygmy    rabrit3   Chimp.DG      0.4528   100.000 29838  11238 299675
result: Russian_Vologda Mbuti_Pygmy   hinxton2   Chimp.DG      0.4337   100.000 45616  18017 433005
result: Russian_Vologda Mbuti_Pygmy    rabrit3   Chimp.DG      0.4493   100.000 29754  11306 299675

And here are corresponding graphic maps:


Results differ somewhat from what I got earlier, obviously due to the stricter data preparation and more neutral outgroups.

Finally,  I made also IBS-statistics using same data and a PCA-plot.  It is however reasonable to state that due to the homozygosity error of ancient samples most homozygous modern populations get extra boost and give us too high results.  This is typical for Balts, Irismen and Scots.  I don't know about Basque homozygosity.

I was able to catch extra populations using Plink and --geno 0.01 option to standardize the SNP set as much as possible.

Creating PCA needs more samples to pick proper and all-inclusive components and is here done using another data set with less SNPs and more populations. 

edit 17.3.2016 23:05

I read a comment on a Finnish history forum that using two outgroups, as I did in this post, is not recommended and can distort results.  I gladly admit that this is true.  But the reason for using two outgroups is very clear; I used this way to get big amount of results comparable instead of comparing only three populations.  Using three target populations and one outgroup makes impossible to compare results from separate qpDstat runs, or make it at least painful.  Of course the latter method, using three targets, gives better accuracy.

But no smoke here without fire, my tests using two outgroups looks reliable.  In my previous results (above) the FinnMostCW group was very close to the Iron Age British sample, closer than the French sample group.  I made a new test using same data, now using three target populations and one outgroup.  It confirms my  previous results:

  0           FinnMostCW   16
  1           French   28
  2           iabrit    1
  3           Chimp.DG    1
jackknife block size:     0.050
snps: 605676  indivs: 46
number of blocks for jackknife: 551
nrows, ncols: 46 605676
result: FinnMostCW     French     iabrit   Chimp.DG      0.0020     1.039  9159   9122 184450

Indeed, I will have to come back to this question with larger data.  

edit 18.3.2016 12:40

Here is another dpDstat result using three target populations.   I am quite disappointed to the way some people react when they are not happy seeing some results.   My only goal is to make objective tests using primarily European genetic data. My focus is not on Finnish results, neither I try to avoid making reliable results about Finns.

  0           FinnMostCW   16
  1           FinnLocal   15
  2           iabrit    1
  3           Chimp.DG    1
jackknife block size:     0.050
snps: 605676  indivs: 33
number of blocks for jackknife: 551
nrows, ncols: 33 605676
result: FinnMostCW  FinnLocal     iabrit   Chimp.DG      0.0073     3.750  9120   8989 184450

lauantai 5. maaliskuuta 2016

Continuing tests with ancient Brits

Before going ahead with Roman Age samples I want to publish a PCA plot including all ancient Brits, excluding the Middle Eastern one.   It look like on the main axis all Roman Age samples are very close present-day Brits and Irishmen.  The Anglo-Saxon and Roman Age sample 7 are closer Swedes.  All those samples turn on the second axis somewhat towards Basques.  But the the Iron Age sample is clearly different, it locates just between France and England.  Maybe she was from Bretagne/Brittany.  I am not aware of the British history why just the Iron Age sample from Melton would look like this.