keskiviikko 10. lokakuuta 2018

More IBD-statistics, including Mordvas, Brits and Basques

I have now automated the process and can do "quick and dirty" statistics of all below listed populations.  Adding new populations takes about two days in a case I have them in my backbone data base.   Here are now existing populations and sample counts.   Adding new populations to my data base takes longer, around a week. 

The results are based on slightly changed phasing parameters, so there can be slight differences in absolute values.  Used programs are Plink, Impute2 and Beagle version 5. 







sunnuntai 30. syyskuuta 2018

Potential Kyrgyz admixture in Europe, shown using IBD results from Beagle 5

It is well known that during the first millennium the Europe  was threatened by invading Mongols called Huns.  The Huns themselves were probably not a homogeneous group and were build of many ethnic groups.  Later, during the second millennium Europeans, especially Eastern Slavs were threatened by the army of Genghis Khan.  Those later rulers were called Tatars.  My results show that there is a subtle Mongol admixture in Europe, but amazingly not among present day Tatars.  The results are based on the difference of IBD sharing between Nganasans and Kyrgyz.  Nganasans are known as an isolated group of Northwest Siberian people, used often to demonstrate Siberian admixture in Europe.  My previous test shows that Evens match even better in this purpose, but the distribution of Evens is much larger in Europe, which can be due to the European admixture among Evens, rather than wise versa. So it is better to use more distinct Nganasans in this purpose.



     

lauantai 29. syyskuuta 2018

Swedish IBD-sharings around the Baltic Sea and a little further

Unfortunately I have not enough German, Danish and Norwegian samples, or the origin is unclear, but here some evidence about the Swedish influence in Finland, or conversely Finnish/Karelian influence in Sweden, if you prefer it.     For better or worse, the IBD-sharing is evident on the eastward route.   My goal is to add at least some known German samples in the future.



Saami IBD-sharing

Continuing with the same phased data I made two IBD-statistics and compared the Saami (from Finnmark) with Northern Europeans and Siberians.  The result shows highest 1-2 cM IBD's between Saamis and Even samples from the SGDP data.  So I made a follow-up using SGDP-Evens.   I have not detailed information about the origin of those Evens.   The curve implies that the connection between Saamis and SGDP-Evens is old but strong.

Notable is also the strong 2-3cM sharing between Saamis and Northern Baltic Finns, including Finns, Karelians, Vepsas and Ingrians.   This may be a consequence of the first contact between Saamis and Baltic Finns.  Smaller segments are more vague in timing due to the random segment break down and data  inaccuracy.

















perjantai 28. syyskuuta 2018

Differences in IBD-sharings between Estonians and Finns

These statistics are based on haploid IBD's made by Beagle 5 and Beagle Refined IBD.  The data was partly imputed (IMPUTE2) to increase the SNP amount of some populations from 300 kSNP to 500kSNP, although most samples are from larger data sets.   Samples of this IBD-data are listed here, including also sample sizes.  Accuracy of results depends on the sample size, although trends are quite reliable anyway, as the following tests show without doubt.   I'll continue testing with this data later.

Results show the average IBD-sharing (segments) between single individuals of each population.

(X-axis legends fixed, note that the minimum segment size was 1 cM)















sunnuntai 9. syyskuuta 2018

Sigtuna, Sweden Viking Age

I was lucky to get new samples (ENA bam-files) from the study Genomic and Strontium Isotope Variation Reveal Immigration Patterns in a Viking Age Town, Maja Krzewiska et al. 2018.  The study included seven high quality late Viking Age samples from the Swedish town Sigtuna.  Sigtuna was an important market place, founded 970 ad and continued to be important to the 13th century.  These seven samples are good enough to make admixture analyses.  I have made admixture analyses using Dna.Land's software, look here,  and I see no reason to change my methods.   These samples are only 900-1100 years old and it is reasonable to suggest that modern references are suitable for this purpose, especially because with the modern reference the genome quality is is much better than using other ancient samples as references.

Why admixture analyses, why not tests utilizing genetic drift, like qp3Pop and qpDstat?  Both above-mentioned methods or other tests like IBD and IBS statistics, as well as Fst tell genetic distances, not the genome structure, giving often a false image of our ancestry.  In spite of weaknesses of admixture analyses in case of really old ancient samples (where no one can really figure the admixture history), in this particular case all samples are only 1000 years old and the link between them and us is significant and the admixture history figurable.  Good admixture results only call for two condition to be true:  at first the admixture history must be real and references must be right.  In many cases these conditions are not fulfilled and people after seeing senselessness start to bark up the wrong tree.           

grt036

East_Scandinavian 24.7
Slavic 18.8
Mediterranean 14.3
Northeast_European 14.0
Central_European 11.0
Northwest_European 8.8
Saami 4.4
Baltic 3.3

grt035

Northwest_European 40.8
Central_European 16.7
Mediterranean 14.8
East_Scandinavian 10.8
Saami 9.8
Baltic 5.7
Slavic 1.3

kal006 (this looks like present-day Estonians)

Baltic 54.3
Finnic 20.4
East_Scandinavian 12.8
Northeast_European 9.5
Siberian 2.0

stg021 (this looks like preset-day Swedes)

East_Scandinavian 43.5
Northwest_European 22.9
Mediterranean 12.8
Slavic 6.6
Saami 6.1
Northeast_European 2.5
Uralic 1.3
East_Asian 1.8
Baltic 1.7

84001 (this must be mainly British)

Northwest_European 64.2
Baltic 14.1
Uralic 5.0
Saami 4.7
East_Scandinavian 4.3
Slavic 2.1
Mediterranean 2.3
Finnic 2.8

84005 (a quarter Finn, maybe even more Finnish taking into account the Finnish demographic history after the Viking Age)

East_Scandinavian 39.6
Finnic 24.7
Baltic 13.6
Northwest_European 11.8
Slavic 3.1
Mediterranean 3.0
Central_European 3.6

For a comparison, a project sample of Finnish ancestry from Southwestern Finland:

Finnic 49.1
Northwest_European 25.1
East_Scandinavian 12.9
Northeast_European 5.1
Slavic 2.4
Saami 2.9
Baltic 2.3


urm160

Central_European 33.1
East_Scandinavian 29.9
Slavic 19.1
Finnic 6.8
Baltic 4.5
Uralic 2.9
Saami 1.8
Northwest_European 1.3



It is notable that many Viking Age Swedish samples show Saami without Finnish ancestry.  This probably means that the Finnish-Saami mixing was not yet as significant as today.  It is also possible that the Saami admixture here means common Swedish-Saami ancestry which is not clearly assignable using modern Saami references.

lauantai 9. kesäkuuta 2018

QpAdm tests, Iron Age Scanian sample RISE174

I've been a long time waiting for ancient Finnish samples and it starts to look  unlikely that I'll ever see them because the Finnish soil is acid and destroys organic material in one millennium.   But don't ever lose hope, it was available all the time from Internet. An old Finnish sample has been available already three years in open data bases.  It was found from the Southern Sweden and was labeled as RISE174, dated 427-611 AD, later Scania_IA.  That is right, we have had three years an ancient Fennoscandinavian sample, common for all Fennoscandinavians and also for many Baltic people, but obviously because it was found from Southern Sweden it's real value was not noticed before.  The reason could also be partly a common prejudice.   I wouldn't be much surprised if Finnish researchers in future will find similar Iron Age samples from Southwestern Finland, if it is possible due to the Finnish soil.  I'll show that present-day Swedes are not much more related to this sample than present-day Finns.  Generally speaking, all qpAdm-results listed below are reliable, although there were many neighboring cultures in the past, being also genetically very similar and in any tests all populations close enough can substitute each other.  

QpAdm results are made using distant ancestral references (the right-file)  to get best possible coverage.  While these results are very reasonable, it is possible to fine-tune all results by using carefully selected less distant references.  So these results are directional.  

All Finnish groups are from the 1000-genomes project and cover full 1MSNP's, as well as all other groups on the second and third plots.  I have seen in my work that testing ancient and modern samples calls for equal coverage of all modern sample groups. 

What kind of Fennoscandinavian was the RISE174/Scania_IA?  Definitely she was not like a present-day Swede, we can see it in following f3-result below and on PCA-plots.  

Next here is a simple PCA-plot, somewhat imperfect one and flat due to a high amount of Asian influence and too few European samples.   This plot shows a new Finnish group, Finnx, and binds it to the Finnish ancestry.  Finnx forms an ordinary Finnish group with less present-day Swedish admixture than typical Southwest Finns, but also less genetic drift than typical East Finns. 



Another plot (vectors 1/2 and 1/3), removed almost all East Asian and Siberian.  There is still a touch East Asian forwarded by South and Central Asians.  It was necessary to remove East and Siberian components to ignore later Siberian admixture in Finland and see the possible Finnx-Scania_IA connection also on PCA.  On the second plot (vectors 1/3) the East Asian is at its least and Scania_IA gives the best match with Finnx. 




 F3-test showing common genetic drift with Scania_IA:



    

First some European groups in a demonstrative purpose. 

South Italians

Chisq and tail prob:   0.436        0.932651 

Populations:  Beaker_Central_Europe, Iran_LN, Levant_BA
best coefficients:     0.299     0.169     0.531
Jackknife mean:      0.299685849     0.169473438     0.530840713
      std. errors:     0.028     0.032     0.033



North Italians

Chisq and tail prob:   1.738        0.628609

Populations:  Beaker_Central_Europe, Levant_BA, Iran_LN
best coefficients:     0.542     0.416     0.042
Jackknife mean:      0.542422483     0.415568791     0.042008725
      std. errors:     0.019     0.020     0.020


Basques

Chisq and tail prob:   1.789        0.774528

Populations:  Beaker_Central_Europe,   Iberia_EN
best coefficients:     0.578     0.422
Jackknife mean:      0.579234989     0.420765011
      std. errors:     0.030     0.030




Latvians

Chisq and tail prob:   1.101         0.77684

Populations:  Scania_IA, Latvia_MN, Evenk
best coefficients:     0.920     0.092    -0.012
Jackknife mean:      0.918423990     0.093588254    -0.012012243
      std. errors:     0.042     0.042     0.016


Poles

Chisq and tail prob:   1.699        0.790857

Populations:  Poltavka, Germany_MN
best coefficients:     0.502     0.498
Jackknife mean:      0.502425062     0.497574938
      std. errors:     0.039     0.039


Estonians

Chisq and tail prob:   0.642        0.886697

Populations:  Scania_IA, Latvia_MN, Evenk
best coefficients:     0.933     0.052     0.015
Jackknife mean:      0.931374297     0.053427705     0.015197998
      std. errors:     0.039     0.038     0.015


Mordvas

Chisq and tail prob:   0.503        0.973198

Populations:   Sarmatian, Scania_IA
best coefficients:     0.472     0.528
Jackknife mean:      0.475543261     0.524456739
      std. errors:     0.079     0.079


Swedes

Chisq and tail prob:   1.003          0.9093

Populations:   Latvia_LN, England_N
best coefficients:     0.523     0.477
Jackknife mean:      0.520472837     0.479527163
      std. errors:     0.052     0.052


Finnx

Chisq and tail prob:    1.647        0.648853 

Populations:    Scania_IA, Latvia_MN, Evenk
best coefficients:     0.851     0.084     0.065
Jackknife mean:      0.849474735     0.085348593     0.065176672
      std. errors:     0.037     0.036     0.014 



Some "what if" analyses:


What if Mordvas are Polish

Chisq and tail prob:    11.941       0.0177937

Populations:    Poltavka, Germany_MN
best coefficients:     0.727     0.273
Jackknife mean:      0.724786064     0.275213936
      std. errors:     0.047     0.047


What if Latvians are Polish

Chisq and tail prob:     9.038       0.0601576

Populations:   Poltavka, Germany_MN
best coefficients:     0.563     0.437
Jackknife mean:      0.562264229     0.437735771
      std. errors:     0.049     0.049
 



What if Finnx's are Mordva

Chisq and tail prob:    4.728        0.316351

Populations:   Sarmatian, Scania_IA
best coefficients:     0.369     0.631
Jackknife mean:      0.377367376     0.622632624
      std. errors:     0.093     0.093


What if East Finns are Mordvas

Chisq and tail prob:    6.593           0.159

Populations:    Sarmatian, Scania_IA
best coefficients:     0.317     0.683
Jackknife mean:      0.329388055     0.670611945
      std. errors:     0.118     0.118


What if Swedes are Baltic

Chisq and tail prob:    1.867        0.867159

Population:   Scania_IA, Latvia_MN, Evenk
best coefficients,  Jackknife optimisation is negative, 1.000     0.000    -0.000




The results above show the continuum of Scania_IA from Southern Sweden to Russia, to the area where Mordvas live.  Those populations living in eastern and  western areas have however different admixtures, Mordvas lack Baltic middle-neolithic ancestry and show instead late East European Steppe -like admixture, Sarmatians suggesting incursions of the Iranian speakers.  What is also remarkable is that the best Swedish result doesn't include Scania_IA.  This doesn't mean that they have not Iron Age ancestry from Scania and in reality they are really close Scania_IA.  It means only that present-day Swedes are slightly different, own later admixtures and lack of certain older Baltic admixtures.