XLS-sheet is available from here.
Monday, October 31, 2016
Project admixtures, fitted ancient proportions
Here are ancient European proportions of project members and for comparison some academic present-day samples (not all fully covered by references, though), one random sample per each population. Results don't express primary proportions of Anatolian Neolithic and various hunter-gatherers populations, but add-ons over European LNBA samples. The European LNBA itself was already a genetic mixture, including admixtures similar to aforesaid West Eurasians and probably also of still unknown ancient populations. Similarly "BA East European Steppe" already included eastern hunter gatherer admixture. My aim was not to fix all admixtures on the same time level, but to get a good coverage and make project samples comparable to each other.
XLS-sheet is available from here.
XLS-sheet is available from here.
Saturday, October 29, 2016
Project admixture results
While preparing my ancient haplotyping analyses I decided to test project members using Dna.Land's Ancestry program. Many thanks to authors for distributing it. All you need is to compile it and start your analyses,
All result are "as is" straight from the analyses. Some comments
- Finns and Norwegians are easily identified.
- Swedes and Estonians (the latter ones don't belong to to my project) can't be confidently identified by the academic reference I have used in this and in my previous analyses.
- many Finns have minor Saami admixture. This makes sense and Saami ancestry is the most likely source of the Finnish Siberian admixture. In most cases we can forget Nganasans and other distant and small Siberian populations. The minor Saami admixture among Finns is pervasive, not only pointing out Siberian ancestry, but to the complex history of ancient Fennoscandinavian, otherwise we would see in these results real Siberians also included into my tests (Nganasans, TunNenets, Nenets, Yakuts and numerous "semi-Siberians" from more southern North Asian regions.
- I didn't get weird "Finnish-South European" admixtures, seen on FamilyTreeDna and Dna.Land result pages. This because my Finnish reference is built of average Finns, not of Finnish minority groups.
- the ambiguous Balto-Slavic admixture among Finns is mostly from Latvia, Lithuania or Russian Tver. Russians living to the north from the Tver region are classified as "Northeast Europe", except Karelians and Veps who belong to Baltic-Finns with Estonians and Finns. Saamis form their own group.
- the ambiguous Northwest European admixture among Finns is mostrly Swedish.
- the ambiguous European admixture is usually some combination of two above-mentioned groups.
- "Ambiguous" means that the result of several individual bootstrap tests was ambiguous, meaning high dispersion of results.
All result are "as is" straight from the analyses. Some comments
- Finns and Norwegians are easily identified.
- Swedes and Estonians (the latter ones don't belong to to my project) can't be confidently identified by the academic reference I have used in this and in my previous analyses.
- many Finns have minor Saami admixture. This makes sense and Saami ancestry is the most likely source of the Finnish Siberian admixture. In most cases we can forget Nganasans and other distant and small Siberian populations. The minor Saami admixture among Finns is pervasive, not only pointing out Siberian ancestry, but to the complex history of ancient Fennoscandinavian, otherwise we would see in these results real Siberians also included into my tests (Nganasans, TunNenets, Nenets, Yakuts and numerous "semi-Siberians" from more southern North Asian regions.
- I didn't get weird "Finnish-South European" admixtures, seen on FamilyTreeDna and Dna.Land result pages. This because my Finnish reference is built of average Finns, not of Finnish minority groups.
- the ambiguous Balto-Slavic admixture among Finns is mostly from Latvia, Lithuania or Russian Tver. Russians living to the north from the Tver region are classified as "Northeast Europe", except Karelians and Veps who belong to Baltic-Finns with Estonians and Finns. Saamis form their own group.
- the ambiguous Northwest European admixture among Finns is mostrly Swedish.
- the ambiguous European admixture is usually some combination of two above-mentioned groups.
- "Ambiguous" means that the result of several individual bootstrap tests was ambiguous, meaning high dispersion of results.
FI1 | |
Finland | 63,9 |
Ambiguous Northeast-Europe | 11,9 |
RU_Pinega | 8,9 |
Ambiguous Balto-Slavic | 6,9 |
Ambiguous Europe | 4,6 |
Iran_Jew | 2,9 |
FI2 | |
Finland | 42,5 |
Ambiguous Northwest-Europe | 15,9 |
Karelia | 9,7 |
Ambiguous Balto-Slavic | 9,5 |
Ambiguous Europe | 8,3 |
Ambiguous Northeast-Europe | 7,2 |
Ambiguous | 3,8 |
Saami | 3,1 |
FI3 | |
Finland | 69,2 |
Latvia | 13,0 |
Ambiguous Baltic-Finnic | 8,2 |
Ambiguous Northwest-Europe | 6,3 |
Saami | 1,7 |
Ambiguous | 1,4 |
FI4 | |
Finland | 51,8 |
Ambiguous Northwest-Europe | 22,7 |
RU_Smolensk | 9,8 |
Ambiguous Northeast-Europe | 7,1 |
RU_Pinega | 4,7 |
Ambiguous Europe | 3,6 |
FI5 | |
Finland | 52,4 |
Estonia | 17,4 |
Karelia | 15,3 |
Ireland | 11,0 |
Saami | 2,0 |
Ambiguous Europe | 1,1 |
FI6 | |
Finland | 43,8 |
Karelia | 12,3 |
Ambiguous Northwest-Europe | 11,7 |
Ambiguous Baltic-Finnic | 10,2 |
Lithuania | 9,5 |
Ambiguous Northeast-Europe | 7,4 |
Ambiguous Europe | 3,5 |
Ambiguous Balto-Slavic | 1,0 |
FI9 | |
Finland | 44,2 |
Karelia | 27,9 |
Latvia | 12,4 |
Ambiguous Europe | 10,4 |
Ambiguous Baltic-Finnic | 3,4 |
Ambiguous | 1,6 |
FI11 | |
Finland | 66,5 |
Karelia | 22,5 |
Ambiguous Europe | 8,3 |
Saami | 2,3 |
FI15 | |
Finland | 63,3 |
Karelia | 23,2 |
Ambiguous Europe | 8,1 |
Ambiguous Baltic-Finnic | 2,8 |
Ambiguous | 2,6 |
FI18 | |
Finland | 54,7 |
Karelia | 17,0 |
Ambiguous Baltic-Finnic | 15,9 |
Ambiguous Balto-Slavic | 5,8 |
Saami | 3,5 |
Ambiguous Europe | 3,1 |
FI7 | |
Finland | 84,3 |
Ambiguous Balto-Slavic | 8,0 |
TunNenets | 4,2 |
Ambiguous Baltic-Finnic | 3,5 |
FI8 | |
Finland | 63,6 |
Karelia | 24,9 |
Ambiguous Europe | 10,6 |
FI10 | |
Finland | 48,7 |
Saami | 22,0 |
Karelia | 12,2 |
Ambiguous | 6,0 |
Nenets | 4,0 |
Latvia | 3,2 |
Ambiguous Europe | 2,8 |
Ambiguous Siberian | 1,0 |
FI12 | |
Finland | 72,9 |
Ambiguous Balto-Slavic | 16,0 |
Ambiguous Europe | 6,6 |
Ambiguous Baltic-Finnic | 3,3 |
Ambiguous | 1,3 |
FI14 | |
Finland | 82,1 |
Ambiguous Europe | 17,0 |
FI16 | |
Finland | 44,1 |
Estonia | 26,5 |
Karelia | 10,2 |
Ambiguous Europe | 13,1 |
Ambiguous Baltic-Finnic | 4,2 |
Ambiguous | 1,9 |
FI17 | |
Finland | 32,7 |
Karelia | 17,7 |
Estonia | 15,2 |
Sweden | 14,6 |
Tatar | 7,0 |
Ambiguous Europe | 6,5 |
RU_Pinega | 5,5 |
SC2 | |
Utah_CEU | 18,4 |
Ambiguous Northwest-Europe | 18,2 |
Sweden | 17,6 |
Belarussia | 10,8 |
Welsh | 8,2 |
Ambiguous Baltic-Finnic | 8,1 |
Latvia | 5,9 |
GermanyAustria | 5,8 |
Ambiguous Balto-Slavic | 3,1 |
Ambiguous | 2,9 |
Ambiguous Europe | 1,1 |
SC5 | |
Sweden | 20,5 |
Ambiguous Northwest-Europe | 19,7 |
Ambiguous Baltic-Finnic | 19,3 |
GermanyAustria | 13,1 |
Ireland | 11,3 |
Latvia | 5,1 |
Ambiguous Central-Europe | 4,8 |
Ambiguous Europe | 4,6 |
Ambiguous Balto-Slavic | 1,5 |
SC7 | |
Norway | 20,0 |
Sweden | 19,9 |
Veps | 13,9 |
Kent | 12,9 |
Orcadian | 12,5 |
Ambiguous Europe | 9,3 |
Ambiguous Central-Europe | 7,0 |
Ambiguous Northwest-Europe | 2,3 |
Ambiguous Baltic-Finnic | 2,0 |
SC3 | |
Norway | 17,9 |
France | 17,5 |
Estonia | 16,7 |
Finland | 14,2 |
Utah_CEU | 14,0 |
Ambiguous Europe | 7,2 |
Ambiguous Northwest-Europe | 6,6 |
Scotland | 5,6 |
SC4 | |
Norway | 53,0 |
Ambiguous Northwest-Europe | 24,3 |
Ambiguous Central-Europe | 11,2 |
Ambiguous Europe | 5,5 |
Veps | 5,2 |
SC6 | |
Utah_CEU | 35,5 |
Finland | 17,5 |
Ambiguous Northwest-Europe | 14,2 |
Ambiguous Balto-Slavic | 9,5 |
Veps | 8,7 |
GermanyAustria | 7,7 |
Ambiguous Northeast-Europe | 4,3 |
Ambiguous | 1,6 |
Ambiguous Europe | 1,0 |
Tuesday, October 18, 2016
European coarse population structure using 14.4 millions markers
I already made a Finestructure analysis before my previous Admixture based work, but didn't publish it because it gave so little additional information. I used same data than with Admixture. The workflow:
1 extracting chrpmosomes 1 and 6
2 running haplotypes (HAPI-UR ten times and making consensus)
3 running Chromopainter in linked mode, without defining donor haplotypes
4 running Finestructure with parameters burning 200000 and runtine 2000000
As a result we see a very obvious grouping, each ethnic group are grouped together. Some cautions have to be made about Chromopainter-Finestrucure combination
- first at all, Finestructure doesn't really use dedicated haplotypes, but the number of shared haplotypes and haplotype lengths between individuals. So there is no guarantee that in a triple sample case (individuals a, b and c) all three share common haplotypes, even when the result of Finestructure shows up haplotype sharing for all three samples. This can lead to a pseudo-ancestry between individuals and also to a wrong tree grouping.
- using donor haplotypes can be methodically unreliable. We can assign donor haplotypes for people living in Americas, but it is not equally reliable for people living in the old world. It is a chicken egg question. If we really know donors before testing we know the result before we have the result. I have seen methods creating donor types (selections of prepared haplotypes), but I can't see how it could really work reliably. Note also that speaking about donor populations (I have seen it) makes this even a more problematic question; to know donor populations we already know the population grouping before the analysis and bind donor populations to something that exists today, but did not necessarily exist thousands years ago.
While checking the data I see there a questionable sample qroup: Swedes. They look more eastern than can be healthily suggested.
In general, looking at any results the first question is "does the result look obvious?". If we have two different results based on any kind supervised method (like using donor haplogroups/populations) it is only common sense to see the more obvious result being the better one. Here we have a philosophic question: what "the obvious" means for you and for me. It makes sense, but an idea as "too obvious" lead us to tin foil hat theories. Perfection is suspicious. We don't want it, although also it is in practice possible. Another, much more sensible question in regards to donor haplotypes would be if we could assign donor haplotypes of Bronze Age Europeans based on ancient samples. It would make sense.
Dowload Finestructure picture here.
1 extracting chrpmosomes 1 and 6
2 running haplotypes (HAPI-UR ten times and making consensus)
3 running Chromopainter in linked mode, without defining donor haplotypes
4 running Finestructure with parameters burning 200000 and runtine 2000000
As a result we see a very obvious grouping, each ethnic group are grouped together. Some cautions have to be made about Chromopainter-Finestrucure combination
- first at all, Finestructure doesn't really use dedicated haplotypes, but the number of shared haplotypes and haplotype lengths between individuals. So there is no guarantee that in a triple sample case (individuals a, b and c) all three share common haplotypes, even when the result of Finestructure shows up haplotype sharing for all three samples. This can lead to a pseudo-ancestry between individuals and also to a wrong tree grouping.
- using donor haplotypes can be methodically unreliable. We can assign donor haplotypes for people living in Americas, but it is not equally reliable for people living in the old world. It is a chicken egg question. If we really know donors before testing we know the result before we have the result. I have seen methods creating donor types (selections of prepared haplotypes), but I can't see how it could really work reliably. Note also that speaking about donor populations (I have seen it) makes this even a more problematic question; to know donor populations we already know the population grouping before the analysis and bind donor populations to something that exists today, but did not necessarily exist thousands years ago.
While checking the data I see there a questionable sample qroup: Swedes. They look more eastern than can be healthily suggested.
In general, looking at any results the first question is "does the result look obvious?". If we have two different results based on any kind supervised method (like using donor haplogroups/populations) it is only common sense to see the more obvious result being the better one. Here we have a philosophic question: what "the obvious" means for you and for me. It makes sense, but an idea as "too obvious" lead us to tin foil hat theories. Perfection is suspicious. We don't want it, although also it is in practice possible. Another, much more sensible question in regards to donor haplotypes would be if we could assign donor haplotypes of Bronze Age Europeans based on ancient samples. It would make sense.
Dowload Finestructure picture here.
Friday, October 14, 2016
Worldwide admixture analysis based on 14.4 million SNP's
The EGDP data, available from Estonian Biocenter, made it possible to reach 15-30 times more genome density than earlier available data made possible. The new data lacks of West European samples, but it was not a big problem due to the publicly available western data from the 1000-genomes project. So I merged these two data sets. For the quality check I ran heterozygosity rates for all European samples in both data sets and found both sets being considerably close each other, although the read depth of the 1000-genome data is smaller. Actually Finnish samples in both sets showed exactly same level of heterozygosity.
After the succesful merge I had 14.4 million SNPs over all 22 chromosomes, which was far too much to process in few days on my desktop (i7, 3.5Ghz, 32 GB memory). Instead of thinning the whole data set to 1-2 millions SNPs I decided to use chromosomes 1 and 6 and leave the genome density untouched. So I had two chromosomes, a bit over 2 million SNPs showing still 15-30 times more genotype information per chromosome than other available genotype sets. Considering thinning over all chromosomes to get the dataset handy enough to be processed with my computer would likely have induced more algorithm dependent bias, which I wanted to avoid.
The process
1 merging EGDP and 1000g data sets
2 quaility checks, including homozygosity/heterozygosity ratios per populations
3 extracting chromosomes 1 and 6
4 thinning data by Plink: plink --file data --indep 50 5 2, resulting 1.1 million SNPs
5 running admixture analyses with k values from 3 to 13 in unsupervised mode and without reference populatons (=projection).
Each k-value was run in unsupervised mode without reference data, because projection reference data is not available for this SNP set. You can see analyses using projection reference for example in works analysing ancient and moderm genomes together. Analyses made on any kind of projection are cool, because we have no other way to designate proportion of ancient samples to modern ones. I am not saying that unsupervised analysis without references would be error-free, but that errors are systemic and not user dependant.
All analyses (k-values from 3 to 13) done here are run as individual runs without user supervision and for that reason colors on charts are not consistent (at least it sounded like a painful work the get colors consistent). Each analysis is optimized separately by the Admixture algorithm. All this makes it more difficult to perceive differences between different K values, but as soon as you get the idea I am sure you also can see the big picture and understand details.
Hopefully this test is helpful for you. In my opinion, it gives interesteing hints about Finnish relations with other populations, but the analysis itself is wordwide.
- Mordvins seem to differ from other Volga-Finnic populations and belong to Balto-Slavic ancestry and they probably are language shifters from a Baltic to a Volga-Finnic language.
- Estonians are just what can be expected, some Estonians have Baltic ancestry, some others Baltic-Finnic ancestry. We should, however, be cautious of in using linguistic terms when we speak about ancestry.
- North Russian Finno-Ugric populations seem to be Baltic-Finnic people with Siperian admixture. The Siberian admixture is present in a lesser amount among Finns and Estonians (note that the amount of minor admixtures depends on the used data/populations and Admixture is based on a selective method processing admixture proportions relatively).
- in some extent also Swedes show Baltic-Finnic ancestry, but the Swedish sample size is rather small to make a sure conclusion. However, if this is true, we can assume the present-day Baltic-Finnic people having largely Fennoscandinavian ancestry.
- Ingrian samples show up like pure unadmixed Baltic-Finnic people, which surprises me because of their long lasting minority status in Russia. Sample collectors have done good work. Those samples are valuable indeed.
- thinking all this and trying to rebuild the the history of Baltic-Finnic people it looks like they lived to the north from the axis Latvia-Moscow (Balts living to the south before the East-Slavic expansion). Mixing between Baltic and Finnic people happened and people also shifted language.
- open questions are how strong the Baltic-Finnic influence is/was in Scandinavia and conversely how strong the Germanic influence is/was in Finland and Estonia. For certain political reasons it is a difficult approach today.
CV errors, indicating quality in general, the lower the value is the better the quality, but absolute values depend on the used data and can't be compared to other Admixture tests.
K3: 0.19708
K4: 0.19503
K5: 0.19480
K6: 0.19451
K7: 0.19432
K8: 0.19503
K9: 0.19508
K10: 0.19576
K11: 0.19708
K12: 0.19797
K13: 0.20221
Population abbreviations, download here
Analysis, download here.
You definitely need a suitable picture viewer being able to handle big GIF-files.
After the succesful merge I had 14.4 million SNPs over all 22 chromosomes, which was far too much to process in few days on my desktop (i7, 3.5Ghz, 32 GB memory). Instead of thinning the whole data set to 1-2 millions SNPs I decided to use chromosomes 1 and 6 and leave the genome density untouched. So I had two chromosomes, a bit over 2 million SNPs showing still 15-30 times more genotype information per chromosome than other available genotype sets. Considering thinning over all chromosomes to get the dataset handy enough to be processed with my computer would likely have induced more algorithm dependent bias, which I wanted to avoid.
The process
1 merging EGDP and 1000g data sets
2 quaility checks, including homozygosity/heterozygosity ratios per populations
3 extracting chromosomes 1 and 6
4 thinning data by Plink: plink --file data --indep 50 5 2, resulting 1.1 million SNPs
5 running admixture analyses with k values from 3 to 13 in unsupervised mode and without reference populatons (=projection).
Each k-value was run in unsupervised mode without reference data, because projection reference data is not available for this SNP set. You can see analyses using projection reference for example in works analysing ancient and moderm genomes together. Analyses made on any kind of projection are cool, because we have no other way to designate proportion of ancient samples to modern ones. I am not saying that unsupervised analysis without references would be error-free, but that errors are systemic and not user dependant.
All analyses (k-values from 3 to 13) done here are run as individual runs without user supervision and for that reason colors on charts are not consistent (at least it sounded like a painful work the get colors consistent). Each analysis is optimized separately by the Admixture algorithm. All this makes it more difficult to perceive differences between different K values, but as soon as you get the idea I am sure you also can see the big picture and understand details.
Hopefully this test is helpful for you. In my opinion, it gives interesteing hints about Finnish relations with other populations, but the analysis itself is wordwide.
- Mordvins seem to differ from other Volga-Finnic populations and belong to Balto-Slavic ancestry and they probably are language shifters from a Baltic to a Volga-Finnic language.
- Estonians are just what can be expected, some Estonians have Baltic ancestry, some others Baltic-Finnic ancestry. We should, however, be cautious of in using linguistic terms when we speak about ancestry.
- North Russian Finno-Ugric populations seem to be Baltic-Finnic people with Siperian admixture. The Siberian admixture is present in a lesser amount among Finns and Estonians (note that the amount of minor admixtures depends on the used data/populations and Admixture is based on a selective method processing admixture proportions relatively).
- in some extent also Swedes show Baltic-Finnic ancestry, but the Swedish sample size is rather small to make a sure conclusion. However, if this is true, we can assume the present-day Baltic-Finnic people having largely Fennoscandinavian ancestry.
- Ingrian samples show up like pure unadmixed Baltic-Finnic people, which surprises me because of their long lasting minority status in Russia. Sample collectors have done good work. Those samples are valuable indeed.
- thinking all this and trying to rebuild the the history of Baltic-Finnic people it looks like they lived to the north from the axis Latvia-Moscow (Balts living to the south before the East-Slavic expansion). Mixing between Baltic and Finnic people happened and people also shifted language.
- open questions are how strong the Baltic-Finnic influence is/was in Scandinavia and conversely how strong the Germanic influence is/was in Finland and Estonia. For certain political reasons it is a difficult approach today.
CV errors, indicating quality in general, the lower the value is the better the quality, but absolute values depend on the used data and can't be compared to other Admixture tests.
K3: 0.19708
K4: 0.19503
K5: 0.19480
K6: 0.19451
K7: 0.19432
K8: 0.19503
K9: 0.19508
K10: 0.19576
K11: 0.19708
K12: 0.19797
K13: 0.20221
Population abbreviations, download here
Analysis, download here.
You definitely need a suitable picture viewer being able to handle big GIF-files.