I have now automated the process and can do "quick and dirty" statistics of all below listed populations. Adding new populations takes about two days in a case I have them in my backbone data base. Here are now existing populations and sample counts. Adding new populations to my data base takes longer, around a week.
The results are based on slightly changed phasing parameters, so there can be slight differences in absolute values. Used programs are Plink, Impute2 and Beagle version 5.
Kalevan ja Untamon geenit
keskiviikko 10. lokakuuta 2018
sunnuntai 30. syyskuuta 2018
Potential Kyrgyz admixture in Europe, shown using IBD results from Beagle 5
It is well known that during the first millennium the Europe was threatened by invading Mongols called Huns. The Huns themselves were probably not a homogeneous group and were build of many ethnic groups. Later, during the second millennium Europeans, especially Eastern Slavs were threatened by the army of Genghis Khan. Those later rulers were called Tatars. My results show that there is a subtle Mongol admixture in Europe, but amazingly not among present day Tatars. The results are based on the difference of IBD sharing between Nganasans and Kyrgyz. Nganasans are known as an isolated group of Northwest Siberian people, used often to demonstrate Siberian admixture in Europe. My previous test shows that Evens match even better in this purpose, but the distribution of Evens is much larger in Europe, which can be due to the European admixture among Evens, rather than wise versa. So it is better to use more distinct Nganasans in this purpose.
lauantai 29. syyskuuta 2018
Swedish IBD-sharings around the Baltic Sea and a little further
Unfortunately I have not enough German, Danish and Norwegian samples, or the origin is unclear, but here some evidence about the Swedish influence in Finland, or conversely Finnish/Karelian influence in Sweden, if you prefer it. For better or worse, the IBD-sharing is evident on the eastward route. My goal is to add at least some known German samples in the future.
Saami IBD-sharing
Continuing with the same phased data I made two IBD-statistics and compared the Saami (from Finnmark) with Northern Europeans and Siberians. The result shows highest 1-2 cM IBD's between Saamis and Even samples from the SGDP data. So I made a follow-up using SGDP-Evens. I have not detailed information about the origin of those Evens. The curve implies that the connection between Saamis and SGDP-Evens is old but strong.
Notable is also the strong 2-3cM sharing between Saamis and Northern Baltic Finns, including Finns, Karelians, Vepsas and Ingrians. This may be a consequence of the first contact between Saamis and Baltic Finns. Smaller segments are more vague in timing due to the random segment break down and data inaccuracy.
Notable is also the strong 2-3cM sharing between Saamis and Northern Baltic Finns, including Finns, Karelians, Vepsas and Ingrians. This may be a consequence of the first contact between Saamis and Baltic Finns. Smaller segments are more vague in timing due to the random segment break down and data inaccuracy.
perjantai 28. syyskuuta 2018
Differences in IBD-sharings between Estonians and Finns
These statistics are based on haploid IBD's made by Beagle 5 and Beagle Refined IBD. The data was partly imputed (IMPUTE2) to increase the SNP amount of some populations from 300 kSNP to 500kSNP, although most samples are from larger data sets. Samples of this IBD-data are listed here, including also sample sizes. Accuracy of results depends on the sample size, although trends are quite reliable anyway, as the following tests show without doubt. I'll continue testing with this data later.
Results show the average IBD-sharing (segments) between single individuals of each population.
(X-axis legends fixed, note that the minimum segment size was 1 cM)
Results show the average IBD-sharing (segments) between single individuals of each population.
(X-axis legends fixed, note that the minimum segment size was 1 cM)
sunnuntai 9. syyskuuta 2018
Sigtuna, Sweden Viking Age
I was lucky to get new samples (ENA bam-files) from the study Genomic and Strontium Isotope Variation Reveal Immigration Patterns in a Viking Age Town, Maja Krzewiska et al. 2018. The study included seven high quality late Viking Age samples from the Swedish town Sigtuna. Sigtuna was an important market place, founded 970 ad and continued to be important to the 13th century. These seven samples are good enough to make admixture analyses. I have made admixture analyses using Dna.Land's software, look here, and I see no reason to change my methods. These samples are only 900-1100 years old and it is reasonable to suggest that modern references are suitable for this purpose, especially because with the modern reference the genome quality is is much better than using other ancient samples as references.
Why admixture analyses, why not tests utilizing genetic drift, like qp3Pop and qpDstat? Both above-mentioned methods or other tests like IBD and IBS statistics, as well as Fst tell genetic distances, not the genome structure, giving often a false image of our ancestry. In spite of weaknesses of admixture analyses in case of really old ancient samples (where no one can really figure the admixture history), in this particular case all samples are only 1000 years old and the link between them and us is significant and the admixture history figurable. Good admixture results only call for two condition to be true: at first the admixture history must be real and references must be right. In many cases these conditions are not fulfilled and people after seeing senselessness start to bark up the wrong tree.
grt036
East_Scandinavian 24.7
Slavic 18.8
Mediterranean 14.3
Northeast_European 14.0
Central_European 11.0
Northwest_European 8.8
Saami 4.4
Baltic 3.3
grt035
Northwest_European 40.8
Central_European 16.7
Mediterranean 14.8
East_Scandinavian 10.8
Saami 9.8
Baltic 5.7
Slavic 1.3
kal006 (this looks like present-day Estonians)
Baltic 54.3
Finnic 20.4
East_Scandinavian 12.8
Northeast_European 9.5
Siberian 2.0
stg021 (this looks like preset-day Swedes)
East_Scandinavian 43.5
Northwest_European 22.9
Mediterranean 12.8
Slavic 6.6
Saami 6.1
Northeast_European 2.5
Uralic 1.3
East_Asian 1.8
Baltic 1.7
84001 (this must be mainly British)
Northwest_European 64.2
Baltic 14.1
Uralic 5.0
Saami 4.7
East_Scandinavian 4.3
Slavic 2.1
Mediterranean 2.3
Finnic 2.8
84005 (a quarter Finn, maybe even more Finnish taking into account the Finnish demographic history after the Viking Age)
East_Scandinavian 39.6
Finnic 24.7
Baltic 13.6
Northwest_European 11.8
Slavic 3.1
Mediterranean 3.0
Central_European 3.6
For a comparison, a project sample of Finnish ancestry from Southwestern Finland:
Finnic 49.1
Northwest_European 25.1
East_Scandinavian 12.9
Northeast_European 5.1
Slavic 2.4
Saami 2.9
Baltic 2.3
urm160
Central_European 33.1
East_Scandinavian 29.9
Slavic 19.1
Finnic 6.8
Baltic 4.5
Uralic 2.9
Saami 1.8
Northwest_European 1.3
It is notable that many Viking Age Swedish samples show Saami without Finnish ancestry. This probably means that the Finnish-Saami mixing was not yet as significant as today. It is also possible that the Saami admixture here means common Swedish-Saami ancestry which is not clearly assignable using modern Saami references.
Why admixture analyses, why not tests utilizing genetic drift, like qp3Pop and qpDstat? Both above-mentioned methods or other tests like IBD and IBS statistics, as well as Fst tell genetic distances, not the genome structure, giving often a false image of our ancestry. In spite of weaknesses of admixture analyses in case of really old ancient samples (where no one can really figure the admixture history), in this particular case all samples are only 1000 years old and the link between them and us is significant and the admixture history figurable. Good admixture results only call for two condition to be true: at first the admixture history must be real and references must be right. In many cases these conditions are not fulfilled and people after seeing senselessness start to bark up the wrong tree.
grt036
East_Scandinavian 24.7
Slavic 18.8
Mediterranean 14.3
Northeast_European 14.0
Central_European 11.0
Northwest_European 8.8
Saami 4.4
Baltic 3.3
grt035
Northwest_European 40.8
Central_European 16.7
Mediterranean 14.8
East_Scandinavian 10.8
Saami 9.8
Baltic 5.7
Slavic 1.3
kal006 (this looks like present-day Estonians)
Baltic 54.3
Finnic 20.4
East_Scandinavian 12.8
Northeast_European 9.5
Siberian 2.0
stg021 (this looks like preset-day Swedes)
East_Scandinavian 43.5
Northwest_European 22.9
Mediterranean 12.8
Slavic 6.6
Saami 6.1
Northeast_European 2.5
Uralic 1.3
East_Asian 1.8
Baltic 1.7
84001 (this must be mainly British)
Northwest_European 64.2
Baltic 14.1
Uralic 5.0
Saami 4.7
East_Scandinavian 4.3
Slavic 2.1
Mediterranean 2.3
Finnic 2.8
84005 (a quarter Finn, maybe even more Finnish taking into account the Finnish demographic history after the Viking Age)
East_Scandinavian 39.6
Finnic 24.7
Baltic 13.6
Northwest_European 11.8
Slavic 3.1
Mediterranean 3.0
Central_European 3.6
For a comparison, a project sample of Finnish ancestry from Southwestern Finland:
Finnic 49.1
Northwest_European 25.1
East_Scandinavian 12.9
Northeast_European 5.1
Slavic 2.4
Saami 2.9
Baltic 2.3
urm160
Central_European 33.1
East_Scandinavian 29.9
Slavic 19.1
Finnic 6.8
Baltic 4.5
Uralic 2.9
Saami 1.8
Northwest_European 1.3
It is notable that many Viking Age Swedish samples show Saami without Finnish ancestry. This probably means that the Finnish-Saami mixing was not yet as significant as today. It is also possible that the Saami admixture here means common Swedish-Saami ancestry which is not clearly assignable using modern Saami references.
lauantai 9. kesäkuuta 2018
QpAdm tests, Iron Age Scanian sample RISE174
I've been a long time waiting for ancient Finnish samples and it starts to look unlikely that I'll ever see them because the Finnish soil is acid and destroys organic material in one millennium. But don't ever lose hope, it was available all the time from Internet. An old Finnish sample has been available already three years in open data bases. It was found from the Southern Sweden and was labeled as RISE174, dated 427-611 AD, later Scania_IA. That is right, we have had three years an ancient Fennoscandinavian sample, common for all Fennoscandinavians and also for many Baltic people, but obviously because it was found from Southern Sweden it's real value was not noticed before. The reason could also be partly a common prejudice. I wouldn't be much surprised if Finnish researchers in future will find similar Iron Age samples from Southwestern Finland, if it is possible due to the Finnish soil. I'll show that present-day Swedes are not much more related to this sample than present-day Finns. Generally speaking, all qpAdm-results listed below are reliable, although there were many neighboring cultures in the past, being also genetically very similar and in any tests all populations close enough can substitute each other.
QpAdm results are made using distant ancestral references (the right-file) to get best possible coverage. While these results are very reasonable, it is possible to fine-tune all results by using carefully selected less distant references. So these results are directional.
All Finnish groups are from the 1000-genomes project and cover full 1MSNP's, as well as all other groups on the second and third plots. I have seen in my work that testing ancient and modern samples calls for equal coverage of all modern sample groups.
What kind of Fennoscandinavian was the RISE174/Scania_IA? Definitely she was not like a present-day Swede, we can see it in following f3-result below and on PCA-plots.
Next here is a simple PCA-plot, somewhat imperfect one and flat due to a high amount of Asian influence and too few European samples. This plot shows a new Finnish group, Finnx, and binds it to the Finnish ancestry. Finnx forms an ordinary Finnish group with less present-day Swedish admixture than typical Southwest Finns, but also less genetic drift than typical East Finns.
Another plot (vectors 1/2 and 1/3), removed almost all East Asian and Siberian. There is still a touch East Asian forwarded by South and Central Asians. It was necessary to remove East and Siberian components to ignore later Siberian admixture in Finland and see the possible Finnx-Scania_IA connection also on PCA. On the second plot (vectors 1/3) the East Asian is at its least and Scania_IA gives the best match with Finnx.
F3-test showing common genetic drift with Scania_IA:
First some European groups in a demonstrative purpose.
South Italians
Chisq and tail prob: 0.436 0.932651
Populations: Beaker_Central_Europe, Iran_LN, Levant_BA
best coefficients: 0.299 0.169 0.531
Jackknife mean: 0.299685849 0.169473438 0.530840713
std. errors: 0.028 0.032 0.033
North Italians
Chisq and tail prob: 1.738 0.628609
Populations: Beaker_Central_Europe, Levant_BA, Iran_LN
best coefficients: 0.542 0.416 0.042
Jackknife mean: 0.542422483 0.415568791 0.042008725
std. errors: 0.019 0.020 0.020
Basques
Chisq and tail prob: 1.789 0.774528
Populations: Beaker_Central_Europe, Iberia_EN
best coefficients: 0.578 0.422
Jackknife mean: 0.579234989 0.420765011
std. errors: 0.030 0.030
Latvians
Chisq and tail prob: 1.101 0.77684
Populations: Scania_IA, Latvia_MN, Evenk
best coefficients: 0.920 0.092 -0.012
Jackknife mean: 0.918423990 0.093588254 -0.012012243
std. errors: 0.042 0.042 0.016
Poles
Chisq and tail prob: 1.699 0.790857
Populations: Poltavka, Germany_MN
best coefficients: 0.502 0.498
Jackknife mean: 0.502425062 0.497574938
std. errors: 0.039 0.039
Estonians
Chisq and tail prob: 0.642 0.886697
Populations: Scania_IA, Latvia_MN, Evenk
best coefficients: 0.933 0.052 0.015
Jackknife mean: 0.931374297 0.053427705 0.015197998
std. errors: 0.039 0.038 0.015
Mordvas
Chisq and tail prob: 0.503 0.973198
Populations: Sarmatian, Scania_IA
best coefficients: 0.472 0.528
Jackknife mean: 0.475543261 0.524456739
std. errors: 0.079 0.079
Swedes
Chisq and tail prob: 1.003 0.9093
Populations: Latvia_LN, England_N
best coefficients: 0.523 0.477
Jackknife mean: 0.520472837 0.479527163
std. errors: 0.052 0.052
Finnx
Chisq and tail prob: 1.647 0.648853
Populations: Scania_IA, Latvia_MN, Evenk
best coefficients: 0.851 0.084 0.065
Jackknife mean: 0.849474735 0.085348593 0.065176672
std. errors: 0.037 0.036 0.014
What if Mordvas are Polish
Chisq and tail prob: 11.941 0.0177937
Populations: Poltavka, Germany_MN
best coefficients: 0.727 0.273
Jackknife mean: 0.724786064 0.275213936
std. errors: 0.047 0.047
What if Latvians are Polish
Chisq and tail prob: 9.038 0.0601576
Populations: Poltavka, Germany_MN
best coefficients: 0.563 0.437
Jackknife mean: 0.562264229 0.437735771
std. errors: 0.049 0.049
What if Finnx's are Mordva
Chisq and tail prob: 4.728 0.316351
Populations: Sarmatian, Scania_IA
best coefficients: 0.369 0.631
Jackknife mean: 0.377367376 0.622632624
std. errors: 0.093 0.093
What if East Finns are Mordvas
Chisq and tail prob: 6.593 0.159
Populations: Sarmatian, Scania_IA
best coefficients: 0.317 0.683
Jackknife mean: 0.329388055 0.670611945
std. errors: 0.118 0.118
What if Swedes are Baltic
Chisq and tail prob: 1.867 0.867159
Population: Scania_IA, Latvia_MN, Evenk
best coefficients, Jackknife optimisation is negative, 1.000 0.000 -0.000
The results above show the continuum of Scania_IA from Southern Sweden to Russia, to the area where Mordvas live. Those populations living in eastern and western areas have however different admixtures, Mordvas lack Baltic middle-neolithic ancestry and show instead late East European Steppe -like admixture, Sarmatians suggesting incursions of the Iranian speakers. What is also remarkable is that the best Swedish result doesn't include Scania_IA. This doesn't mean that they have not Iron Age ancestry from Scania and in reality they are really close Scania_IA. It means only that present-day Swedes are slightly different, own later admixtures and lack of certain older Baltic admixtures.
QpAdm results are made using distant ancestral references (the right-file) to get best possible coverage. While these results are very reasonable, it is possible to fine-tune all results by using carefully selected less distant references. So these results are directional.
All Finnish groups are from the 1000-genomes project and cover full 1MSNP's, as well as all other groups on the second and third plots. I have seen in my work that testing ancient and modern samples calls for equal coverage of all modern sample groups.
What kind of Fennoscandinavian was the RISE174/Scania_IA? Definitely she was not like a present-day Swede, we can see it in following f3-result below and on PCA-plots.
Next here is a simple PCA-plot, somewhat imperfect one and flat due to a high amount of Asian influence and too few European samples. This plot shows a new Finnish group, Finnx, and binds it to the Finnish ancestry. Finnx forms an ordinary Finnish group with less present-day Swedish admixture than typical Southwest Finns, but also less genetic drift than typical East Finns.
F3-test showing common genetic drift with Scania_IA:
First some European groups in a demonstrative purpose.
South Italians
Chisq and tail prob: 0.436 0.932651
Populations: Beaker_Central_Europe, Iran_LN, Levant_BA
best coefficients: 0.299 0.169 0.531
Jackknife mean: 0.299685849 0.169473438 0.530840713
std. errors: 0.028 0.032 0.033
North Italians
Chisq and tail prob: 1.738 0.628609
Populations: Beaker_Central_Europe, Levant_BA, Iran_LN
best coefficients: 0.542 0.416 0.042
Jackknife mean: 0.542422483 0.415568791 0.042008725
std. errors: 0.019 0.020 0.020
Basques
Chisq and tail prob: 1.789 0.774528
Populations: Beaker_Central_Europe, Iberia_EN
best coefficients: 0.578 0.422
Jackknife mean: 0.579234989 0.420765011
std. errors: 0.030 0.030
Latvians
Chisq and tail prob: 1.101 0.77684
Populations: Scania_IA, Latvia_MN, Evenk
best coefficients: 0.920 0.092 -0.012
Jackknife mean: 0.918423990 0.093588254 -0.012012243
std. errors: 0.042 0.042 0.016
Poles
Chisq and tail prob: 1.699 0.790857
Populations: Poltavka, Germany_MN
best coefficients: 0.502 0.498
Jackknife mean: 0.502425062 0.497574938
std. errors: 0.039 0.039
Estonians
Chisq and tail prob: 0.642 0.886697
Populations: Scania_IA, Latvia_MN, Evenk
best coefficients: 0.933 0.052 0.015
Jackknife mean: 0.931374297 0.053427705 0.015197998
std. errors: 0.039 0.038 0.015
Mordvas
Chisq and tail prob: 0.503 0.973198
Populations: Sarmatian, Scania_IA
best coefficients: 0.472 0.528
Jackknife mean: 0.475543261 0.524456739
std. errors: 0.079 0.079
Swedes
Chisq and tail prob: 1.003 0.9093
Populations: Latvia_LN, England_N
best coefficients: 0.523 0.477
Jackknife mean: 0.520472837 0.479527163
std. errors: 0.052 0.052
Finnx
Chisq and tail prob: 1.647 0.648853
Populations: Scania_IA, Latvia_MN, Evenk
best coefficients: 0.851 0.084 0.065
Jackknife mean: 0.849474735 0.085348593 0.065176672
std. errors: 0.037 0.036 0.014
Some "what if" analyses:
What if Mordvas are Polish
Chisq and tail prob: 11.941 0.0177937
Populations: Poltavka, Germany_MN
best coefficients: 0.727 0.273
Jackknife mean: 0.724786064 0.275213936
std. errors: 0.047 0.047
What if Latvians are Polish
Chisq and tail prob: 9.038 0.0601576
Populations: Poltavka, Germany_MN
best coefficients: 0.563 0.437
Jackknife mean: 0.562264229 0.437735771
std. errors: 0.049 0.049
What if Finnx's are Mordva
Chisq and tail prob: 4.728 0.316351
Populations: Sarmatian, Scania_IA
best coefficients: 0.369 0.631
Jackknife mean: 0.377367376 0.622632624
std. errors: 0.093 0.093
What if East Finns are Mordvas
Chisq and tail prob: 6.593 0.159
Populations: Sarmatian, Scania_IA
best coefficients: 0.317 0.683
Jackknife mean: 0.329388055 0.670611945
std. errors: 0.118 0.118
What if Swedes are Baltic
Chisq and tail prob: 1.867 0.867159
Population: Scania_IA, Latvia_MN, Evenk
best coefficients, Jackknife optimisation is negative, 1.000 0.000 -0.000
The results above show the continuum of Scania_IA from Southern Sweden to Russia, to the area where Mordvas live. Those populations living in eastern and western areas have however different admixtures, Mordvas lack Baltic middle-neolithic ancestry and show instead late East European Steppe -like admixture, Sarmatians suggesting incursions of the Iranian speakers. What is also remarkable is that the best Swedish result doesn't include Scania_IA. This doesn't mean that they have not Iron Age ancestry from Scania and in reality they are really close Scania_IA. It means only that present-day Swedes are slightly different, own later admixtures and lack of certain older Baltic admixtures.
Tilaa:
Blogitekstit (Atom)
























