Time and again I see people making conclusions between Finnish N1c1 and eastern admixture. Regardless of the eastern origin of N1c there is not such correlation in Finland. The reality is even worse for those who cherish this fallacy; if we count also Baltic countries the correlation turns out to be negative. In Finland alone all male haplogroups have equal level Asian admixture and the only difference comes from the locality, not from the male haplogroup. Rational person would conclude that the Asian admixture is from a local source. This is a no-brainer and I don't even need to prove it. Everyone being familiar with this matter knows it, but it doesn't prevent the biggest Finnish newspaper distributing this urban myth. Google translation, click here.
Epilogue. The fallacy of the eastern origin of Finns results from many things. I am not interested in other opinions than those bothering Finnish people and researchers, because I don't care much about "public opinions" without scientific basis. A common idea in Finland, believing in different Finnish origins (Lappeenranta-Vaasa or whatever axis) driven by Finnish "race realists" who inherited opinions from the old Swedish school, is that here in Finland have lived two "races". Now some Finnish scientists have agreed this and detached themselves from known historic facts.
Saturday, December 29, 2018
Thursday, December 27, 2018
QpAdm - what it means in practice
As we saw in my previous posts the correlation between fit and standard error is very meaningful. We saw that the Basques are a loose mixture of East European Steppe and ancient Iberian people, but they are only far descendants of those two groups and we can't prove that these two are their only ancestor, although they definitely forwarded genes to Basques. I made similar test showing that the Greeks are distant descendants of Iron Age Anatolians and Bronze Age Balkanians, but again we can't be prove that those two were their only ancestors. Probably not.
Balkans_BronzeAge Anatolia_IA
best coefficients: 0.470 0.530
Jackknife mean: 0.475197121 0.524802879
std. errors: 0.077 0.077
fixed pat wt dof chisq tail prob
00 0 8 15.062 0.0579554 0.470 0.530
On the other hand, qpAdm showed that the Finns are very strictly descendants of Iron Age Scanian, Iron Age Baltic and Iron Age Saami people, but we can't prove exact proportions of those thee admixtures, which we saw in high standard errors. It is easy to understand that admixtures of close populations are not as easy determinable as admixtures of distant populations, because close relatives share much common ancestry.
But how accurate are results showing very distant ancestry and moderately low standard errors, if the fit is poor? I tested it. Following tests show admixtures of Iron Age Saami people in Ostrobothnia Levaluhta.
We see that there is only a small difference in admixtures of Iron Age Saamis generated by present-day Finns and Iron Age Scandinavians in conjunction of Bolshoy outlier. Chisq is high, tail prob. below 0.4, but std. err. only 6% max. Nothing obliges such a high admixture similarity, because the genetic distance between Finns and Scandinavians is rather high. Such a similarity is achieved only by a big genetic distant of Bolshoy outlier.
Another example, although not equally striking.
Chisq is between 10 and 21, tail prob. between 0.006 and 0.24. FI21 shows best fit. Std.error is 5% in FI4 and FI12, highest (9%) in case of FI21.
Balkans_BronzeAge Anatolia_IA
best coefficients: 0.470 0.530
Jackknife mean: 0.475197121 0.524802879
std. errors: 0.077 0.077
fixed pat wt dof chisq tail prob
00 0 8 15.062 0.0579554 0.470 0.530
On the other hand, qpAdm showed that the Finns are very strictly descendants of Iron Age Scanian, Iron Age Baltic and Iron Age Saami people, but we can't prove exact proportions of those thee admixtures, which we saw in high standard errors. It is easy to understand that admixtures of close populations are not as easy determinable as admixtures of distant populations, because close relatives share much common ancestry.
But how accurate are results showing very distant ancestry and moderately low standard errors, if the fit is poor? I tested it. Following tests show admixtures of Iron Age Saami people in Ostrobothnia Levaluhta.
We see that there is only a small difference in admixtures of Iron Age Saamis generated by present-day Finns and Iron Age Scandinavians in conjunction of Bolshoy outlier. Chisq is high, tail prob. below 0.4, but std. err. only 6% max. Nothing obliges such a high admixture similarity, because the genetic distance between Finns and Scandinavians is rather high. Such a similarity is achieved only by a big genetic distant of Bolshoy outlier.
Another example, although not equally striking.
Chisq is between 10 and 21, tail prob. between 0.006 and 0.24. FI21 shows best fit. Std.error is 5% in FI4 and FI12, highest (9%) in case of FI21.
Saturday, December 22, 2018
Exciting results of Basques and Estonians, updated: Southwest Finnish results
I made recalibration of qpAdm references to improve accuracy. Please read my previous post to find my opinion and to see the problematic with qpAdm.
New references:
Kostenki14 813405
MA1 625746
WHG 703903
EHG 975726
CHG 889688
Ganj_Dareh_N 794892
West_Siberia_N 670626
Anatolia_Neolithic 889986
Mbuti1M 971767
Wichi1M 971774
At first I tried to find out the admixture of present-day Estonians and it is really challenging without proper Iron Age samples. So I had to use best available modern samples.
Latvian1M FI12 AncFinn
best coefficients: 0.408 0.451 0.141
Jackknife mean: 0.404965620 0.441429933 0.153604447
std. errors: 0.248 0.159 0.203
fixed pat wt dof chisq tail prob
000 0 7 2.724 0.90932 0.408 0.451 0.141
001 1 8 3.392 0.907406 0.531 0.469 0.000
010 1 8 10.133 0.255821 2.398 -0.000 -1.398 infeasible
100 1 8 5.005 0.757038 0.000 0.626 0.374
Admixtures:
Latvian1M - Latvian samples covering 1 million SNP's
FI12 - it is me, because I am one of my individual samples covering 1 million SNP's and in this case giving best fit. So I represent here present-day Finns.
AncFinn - an ancient Finnish sample from Damgaard et al. 2018.
Reasonable fit for Basques was even more challenging. My test shows that the Basques are averagely two thirds ancient people from the Iberian peninsula and one third from Steppe origin.
SE_Iberia_CA Yamnaya_Samara
best coefficients: 0.671 0.329
Jackknife mean: 0.670091689 0.329908311
std. errors: 0.022 0.022
fixed pat wt dof chisq tail prob
00 0 8 5.956 0.652145 0.671 0.329
edit 25.12.2018 12:55
A new Southwest Finnish result using recalibrated references. Recalibration here means better results of Asian and African admixtures. There was also a inconsistency between Iron Gate and WHG - Iron Gate removed.
Scania_IA Baltic_IA Levaluhta
best coefficients: 0.483 0.358 0.159
Jackknife mean: 0.450272111 0.373492239 0.176235650
std. errors: 0.182 0.183 0.111
fixed pat wt dof chisq tail prob
000 0 7 0.657 0.998647 0.483 0.358 0.159
001 1 8 2.229 0.97317 0.596 0.404 0.000
010 1 8 3.018 0.93322 0.803 0.000 0.197
100 1 8 5.884 0.660177 -0.000 0.849 0.151
011 2 9 4.219 0.896439 1.000 0.000 0.000
101 2 9 6.568 0.681978 0.000 1.000 0.000
110 2 9 21.991 0.00890644 0.000 0.000 1.000
New references:
Kostenki14 813405
MA1 625746
WHG 703903
EHG 975726
CHG 889688
Ganj_Dareh_N 794892
West_Siberia_N 670626
Anatolia_Neolithic 889986
Mbuti1M 971767
Wichi1M 971774
At first I tried to find out the admixture of present-day Estonians and it is really challenging without proper Iron Age samples. So I had to use best available modern samples.
Latvian1M FI12 AncFinn
best coefficients: 0.408 0.451 0.141
Jackknife mean: 0.404965620 0.441429933 0.153604447
std. errors: 0.248 0.159 0.203
fixed pat wt dof chisq tail prob
000 0 7 2.724 0.90932 0.408 0.451 0.141
001 1 8 3.392 0.907406 0.531 0.469 0.000
010 1 8 10.133 0.255821 2.398 -0.000 -1.398 infeasible
100 1 8 5.005 0.757038 0.000 0.626 0.374
Admixtures:
Latvian1M - Latvian samples covering 1 million SNP's
FI12 - it is me, because I am one of my individual samples covering 1 million SNP's and in this case giving best fit. So I represent here present-day Finns.
AncFinn - an ancient Finnish sample from Damgaard et al. 2018.
Reasonable fit for Basques was even more challenging. My test shows that the Basques are averagely two thirds ancient people from the Iberian peninsula and one third from Steppe origin.
SE_Iberia_CA Yamnaya_Samara
best coefficients: 0.671 0.329
Jackknife mean: 0.670091689 0.329908311
std. errors: 0.022 0.022
fixed pat wt dof chisq tail prob
00 0 8 5.956 0.652145 0.671 0.329
edit 25.12.2018 12:55
A new Southwest Finnish result using recalibrated references. Recalibration here means better results of Asian and African admixtures. There was also a inconsistency between Iron Gate and WHG - Iron Gate removed.
Scania_IA Baltic_IA Levaluhta
best coefficients: 0.483 0.358 0.159
Jackknife mean: 0.450272111 0.373492239 0.176235650
std. errors: 0.182 0.183 0.111
fixed pat wt dof chisq tail prob
000 0 7 0.657 0.998647 0.483 0.358 0.159
001 1 8 2.229 0.97317 0.596 0.404 0.000
010 1 8 3.018 0.93322 0.803 0.000 0.197
100 1 8 5.884 0.660177 -0.000 0.849 0.151
011 2 9 4.219 0.896439 1.000 0.000 0.000
101 2 9 6.568 0.681978 0.000 1.000 0.000
110 2 9 21.991 0.00890644 0.000 0.000 1.000
Tuesday, December 18, 2018
Still not enough West European Iron Age samples to get proper qpAdm results of West Europeans
My try to model present-day Swedes was not what I hoped, because lack of proper western Iron Age samples. Now I tried to find out the best possible solution using Scania_IA and older samples. I noticed that in all possible variations we need recently unavailable and unknown Iron Age samples to achieve reasonable results. So I have to forget such tests until West European Iron Age samples are available. Several Central European Late Copper Age samples turned out to be best ones, but made not proper fits, for instance:
Scania_IA Protoboleraz_LCA
best coefficients: 0.949 0.051
Jackknife mean: 0.947619305 0.052380695
std. errors: 0.041 0.041
This is best I can do right now.
An issue beyond qpAdm is how to determine standard errors. While we can consider low standard error good, there is also a good reason to consider high standard error reasonable in many cases. In a case where two or more populations share pretty much common ancestry (as it is in many case today) qpAdm can't determine which one is the right one. For instance in a case of admixtures built of Swedes and Norwegians the standard error can be very high, because qpAdm is not able to break ancestries into common ancestry of both populations. So, when we try to minimize the standard error we in fact abandon the most obvious result. Usually this dilemma is tried to avoid in two ways: 1) using very ancient/distant samples to avoid common ancestry or 2) approving very high chisq and small tail prob values. In the latter case we actually approve poorer results to show falsely better results.
A result showing high standard errors:
Estonians:
Scania_IA Baltic_IA Poland_BA
best coefficients: 0.560 0.108 0.332
Jackknife mean: 0.253950408 0.349222728 0.396826864
std. errors: 0.532 0.634 0.389
In this case all admixtures are overlapping resulting statistical transitions and uncertainty between admixtures and high standard errors, but chisq and tail prob values are still relatively good, respectively 2.290 and 0.942093.
Another case shows low standard errors, but poorer coverage of admixtures:
Swedes:
Scania_IA Hungary_LCA
best coefficients: 0.948 0.052
Jackknife mean: 0.946235880 0.053764120
std. errors: 0.043 0.043
Respectively chisq and tail prob values were 7.413 and 0.492767.
I can make a more provocative latter example for similar target populations in which standard errors could be 1-2 percentages and chisq and tail prob values around 10-20 and 0.1-0.2
Scania_IA Protoboleraz_LCA
best coefficients: 0.949 0.051
Jackknife mean: 0.947619305 0.052380695
std. errors: 0.041 0.041
This is best I can do right now.
An issue beyond qpAdm is how to determine standard errors. While we can consider low standard error good, there is also a good reason to consider high standard error reasonable in many cases. In a case where two or more populations share pretty much common ancestry (as it is in many case today) qpAdm can't determine which one is the right one. For instance in a case of admixtures built of Swedes and Norwegians the standard error can be very high, because qpAdm is not able to break ancestries into common ancestry of both populations. So, when we try to minimize the standard error we in fact abandon the most obvious result. Usually this dilemma is tried to avoid in two ways: 1) using very ancient/distant samples to avoid common ancestry or 2) approving very high chisq and small tail prob values. In the latter case we actually approve poorer results to show falsely better results.
A result showing high standard errors:
Estonians:
Scania_IA Baltic_IA Poland_BA
best coefficients: 0.560 0.108 0.332
Jackknife mean: 0.253950408 0.349222728 0.396826864
std. errors: 0.532 0.634 0.389
In this case all admixtures are overlapping resulting statistical transitions and uncertainty between admixtures and high standard errors, but chisq and tail prob values are still relatively good, respectively 2.290 and 0.942093.
Another case shows low standard errors, but poorer coverage of admixtures:
Swedes:
Scania_IA Hungary_LCA
best coefficients: 0.948 0.052
Jackknife mean: 0.946235880 0.053764120
std. errors: 0.043 0.043
Respectively chisq and tail prob values were 7.413 and 0.492767.
I can make a more provocative latter example for similar target populations in which standard errors could be 1-2 percentages and chisq and tail prob values around 10-20 and 0.1-0.2
Monday, December 10, 2018
Finnish genetic composition: Iron Age Baltic, Iron Age Germanic and probably Iron Age Saami people
You probably have read my previous post regarding European genetic structures composed by Admixture and Eurasian data. My aim was to make an admixture analysis free of recent genetic drift. It gave following admixtures for Southwestern Finns (Finnish, k=10)
- Saami 14%
- Baltic 49%
- Germanic 35%
The post is here.
Now I have tested same admixtures using Iron Age samples and qpAdm. QpAdm allocates admixtures for given populations, in these tests admixture populations were Baltic Iron Age, Scanian Iron Age and Levaluhta Iron Age. Levaluhta consists of five Iron Age remains found from Finnish Bothnia, Ostrobothnia. The id of the Scanian IA sample is RISE174 and the Baltic IA sample is DA171. Although my results are unambiguous, there are some uncertainty regarding the software and given references, and being the first exploring something like this I am curious to see results of professional geneticists. I simply can't understand why this matter wouldn't interest also researchers. All samples I use here are publicly available.
Southwest Finns:
chisq tail prob
1.112 0.992818
Levaluhta 0.115
Scania_IA 0.437
Baltic_IA 0.448
Vepsa:
chisq tail prob
1.555 0.980354
Levaluhta 0.202
FIN_Southwest 0.614
Baltic_IA 0.184
Nganasan -0.000
and another result of Vepsa
chisq tail prob
2.734 0.908488
Levaluhta 0.000
FIN_Southwest 0.764
Baltic_IA 0.164
Nganasan 0.072
The match is much worse when using present-day Baltic, Germanic and Saami counterparts.
Southwest Finns:
chisq tail prob
9.270 0.233881
Saami 0.156
Latvian 0.297
Swedish 0.548
Outgroups and coverages were
Kostenki14 813405
MA1 625746
WHG 703903
Iron_Gates_HG 884901
EHG 983976
CHG 889688
Ganj_Dareh_N 794892
West_Siberia_N 670626
Anatolia_Neolithic 889986
LBK_EN 880957
Typical SNP coverage of target groups
Vepsa1M 971774
Levaluhta 377788
FIN-Southwest 1102712
Baltic_IA 97311
Scania_IA 373352
Nganasan1M 971774
I have lost some SNP's due to allele mismatches or multiallelic conversion errors in Plink, but the coverage is still reasonable. Some ancient samples need to be reconverted.
edit 11.12.2018 16:30
Adding Russians from Pinega makes a perfect match for Vepsas. Not a big surprise.
chisq tail prob
0.825 0.991377
FIN-Southwest 0.672
RusPinega 0.130
Baltic_IA 0.093
Levaluhta 0.105
- Saami 14%
- Baltic 49%
- Germanic 35%
The post is here.
Now I have tested same admixtures using Iron Age samples and qpAdm. QpAdm allocates admixtures for given populations, in these tests admixture populations were Baltic Iron Age, Scanian Iron Age and Levaluhta Iron Age. Levaluhta consists of five Iron Age remains found from Finnish Bothnia, Ostrobothnia. The id of the Scanian IA sample is RISE174 and the Baltic IA sample is DA171. Although my results are unambiguous, there are some uncertainty regarding the software and given references, and being the first exploring something like this I am curious to see results of professional geneticists. I simply can't understand why this matter wouldn't interest also researchers. All samples I use here are publicly available.
Southwest Finns:
chisq tail prob
1.112 0.992818
Levaluhta 0.115
Scania_IA 0.437
Baltic_IA 0.448
Vepsa:
chisq tail prob
1.555 0.980354
Levaluhta 0.202
FIN_Southwest 0.614
Baltic_IA 0.184
Nganasan -0.000
and another result of Vepsa
chisq tail prob
2.734 0.908488
Levaluhta 0.000
FIN_Southwest 0.764
Baltic_IA 0.164
Nganasan 0.072
The match is much worse when using present-day Baltic, Germanic and Saami counterparts.
Southwest Finns:
chisq tail prob
9.270 0.233881
Saami 0.156
Latvian 0.297
Swedish 0.548
Outgroups and coverages were
Kostenki14 813405
MA1 625746
WHG 703903
Iron_Gates_HG 884901
EHG 983976
CHG 889688
Ganj_Dareh_N 794892
West_Siberia_N 670626
Anatolia_Neolithic 889986
LBK_EN 880957
Typical SNP coverage of target groups
Vepsa1M 971774
Levaluhta 377788
FIN-Southwest 1102712
Baltic_IA 97311
Scania_IA 373352
Nganasan1M 971774
I have lost some SNP's due to allele mismatches or multiallelic conversion errors in Plink, but the coverage is still reasonable. Some ancient samples need to be reconverted.
edit 11.12.2018 16:30
Adding Russians from Pinega makes a perfect match for Vepsas. Not a big surprise.
chisq tail prob
0.825 0.991377
FIN-Southwest 0.672
RusPinega 0.130
Baltic_IA 0.093
Levaluhta 0.105
Tuesday, December 4, 2018
Finnish and North Russian ancestries derived from Levaluhta and Bolshoy Oleni Ostrov
Unfortunately the Lamnidis et al. 2018 did not test the origin of the Siberian ancestry in present-day Finns and Russians. I reveal it now. Ancient samples found from Bolshoy gives high F3-results for both groups. I included five Finnish groups, three of my own sample selection and two smaller selections from two academic sources.
FinnMostCW - least drifted Finnish group (- outliers)
FinnLocal - most drifted Finnish group (- outliers)
Finn21M - most Scandinavian group (- outliers)
Both academic Finnish groups look incoherent. I wonder why.
It is obvious that the Levaluhta admixture added to the present-day European backbone in Finns (negative F3-values) doesn't explain all non-Europeanness and there have to be an unknown third component.
Source1, source2, target, f_3, std.err, Z and SNP count
Additionally, here are resultw of my old admixture analysis based on Dna.Land's program. Although it sounds a weird idea to make admixture analysis of ancient samples using present-day populations, it is a normal practice in many academic studies. Studies usually use Nganasans to represent Siberians together with ancient samples. It is reasonable to say that the gene flow can be detected also backwards. It is also very likely that the gene flow between contemporary populations has been bidirectional. Nevertheless, I am not fully convinced about the gene flow direction regarding the European side of Finns and Northern Russians. F3-statistics gave several European candidates and the big picture is more complex.
Levaluhta (JK excluding JK2065 outlier)
Finnic 43,7
Saami 30,0
Uralic 12,1
Siberian 9,5
Baltic 2,2
East_Asian 1,9
Bolshoy
Uralic 38.9
Siberian 29.5
Saami 11.6
Finnic 7.1
East_Asian 6.1
Northeast_European 5.4
Baltic 1.5