torstai 16. helmikuuta 2017

Rare alleles show: Baltic-Finnic people are Central Europeans with Saami admixture

Speaking about Finns one of the most speculated issues have been the origin of their minor Siberian admixture.  The debate has been effusive, but in the end only boring.  Researchers have mentioned Mongols, Chinese, Nganasans, Khanties inter alia, but, as we use to say, one should not go farther than the sea to fish.  Using rare alleles, the method used by Schiffels et al. 2015 (,  we see that the Siberian admixture is credibly explained by the common history of Finnish and Saami people and the foundation of Finnish people is in this sense in Central Europe.   Of course we need to compare rare alleles of Finns and other European populations to find out who are the closest relatives for Finns and to see details.  Volga-Finnic and Eastern Uralic people show clearly different eastern admixture.  If we assume that the Finns came from Volga or Ural regions we have to explain the difference in Asian admixtures.  The simplest way to do this would be to determine the origin of the Saami-Siberian admixture and date it.  You can see this as a hint for Estonian and Finnish researchers :)

maanantai 13. helmikuuta 2017

Ancient Latvians, comparing to modern people

New ancient Latvian genomes were figured in the new study from Jones et al.   Although all new genomes show rather low quality I have now made some dstat comparisons against modern populations.  Present-day Latvians were used as a fixed point.

PCA, trying to locate three ancient samples

And three dstat figuring samples MN2, HG2 and HG3.  Most of those samples have too low quality to give reasonable results, so I have now only three results.  I tried also map original fastq files and experienced it possible, giving more available SNP's, but I decided to not use them to ensure full comparability with the study.

maanantai 9. tammikuuta 2017

Going ahead with the new data, clustering

My new data makes possible to cluster better samples according to ethnicities. It is now possible to see at least

South European
West European
East European
Finnish dwelling zone
Baltic dwelling zone

Unfortunately none of those new sample sources give reasonable South European view, which makes impossible to see inside the Mediterranean area.  With better sampling I probably could create at least Balkan, South-Italian, Iberian and Basque clusters.   It is probably now possible to classify also project individuals by PCA.

Europe, clustered by Saami, Mongolian, South-Asian and Middle-Eastern samples

Zoomed in

Europe, plotted exclusively.  You can see clearly western and eastern clusters, as well as Balts and the Baltic-Finnic group splitting into Scandinavian and East-Slavic relations.   We could see also a clearly distinct Scandinavian group with more proper samples.  Unfortunately the South European picture is fuzzy due to too few samples.  Due to the shortage of samples I narrowed each group down to four samples, except Tuscany to strengthen the southern cluster.  It is very possible that with a larger South European sampling the European west and east would diverge even more than we see now on this plot below.


tiistai 20. joulukuuta 2016

New data gives 11 million SNP's

Thanks for latest updates of free gene banks and hard work of several projects I have now been able to increase the SNP number to 11 millions per sample.  Increased amount of SNP's means increased accuracy whether I use all SNP's (especially in drift analyses)  or after pruning ld.  In my new data base I have combined samples from three sources:  the 1000 genomes project, Estonian Biocentre samples used by Pagani eta al. 3026 and Simons Genome Diversity Project (SGDP).  For the present the sample size is only 866.

Here, as a showcase of the new data two PCA prints and some comments.  Instead of making a central continental European picture I included four outgroups to see the effect.  Those outgroups are Armenians, Mongolians, Sardinians and Saamis.   For the present individual names are picked straight from original sources and can be somewhat ambiguous.

As we can see we have several clusters, which makes possible to evaluate the data.  For example Scandinavians of SGDP and Pagani et al. cluster with East Europeans.

Personally I don't give much attention to PCA-figures, because the result depends on the selected samples, amounts, ratios between populations sizes, about how mixed are individuals etc.  My upcoming high resolution tests will be much more interesting.

Added time 12:50

If someone is interested in how Mordvas locate on this map.  They are very similar to North Russians and RusKU (Pagani et al.) and move towards Mongolians. Baltic Finns moves towards Saamis.  Sorry, GIMP makes something unwanted with colors.


keskiviikko 16. marraskuuta 2016

Ancient admixtures look shifty

It is hard to believe in some ancestry results.   FamilyTreeDna's new Ancient Origins give me following results

Metal Age Invader 12%
Farmer 30%
Hunter-Gatherer 54%
Non-European 4%

Regarding Metal Age Invaders they refer to the Metal Age Yamnaya culture, regarding Farmers to the Neolithic Anatolian migration to Europe and regarding Hunter-Gatherers to ancient LaBrana, Loschbour and Motala samples.   Regarding non-European proportion they give a hint to look at myOrigins, which is FamilyTreeDna's admixture analysis based on present-day populations.  My myOrigins give me only one non-European group, Middle Easterners.  I doubt it, the non-European in my Ancient Origins test is likely Asian.

Going further in analyzing results I compared my Ancient Origin results to  scientific papers,  Haak et al. 2015 giving comparable results.  Haak et al.  gives following results for Finns:

EN (Farmers) 31.5%
Nganasan (Asian) 10.2%
WHG (Hunter-Gatherer) 7.9%
Yamnaya (Metal Age Intrurers) 50.4%

Respectively Norwegians get in this study
EN (Farmers) 48.2%
Nganasan (Asian) 4.2%
WHG (Hunter-Gatherer) 0%
Yamnaya (Metal Age Intrurers) 47.5%

We can see a huge transition between Yamnayas/Iron Age Intruders and Hunter-Gatherers between Ancient Origins and Haak et al.  I know something about the method used by Haak et al., but I have no idea what FamilyTreeDna did. However, if I try to guess, I would say that they could have used a very drastic LD-pruning.  I can get similar differences by heavily pruned data and it makes sense.  Metal-Age invasion to Europe happened during the Bronze Age, thousands years later than the arrival of hunter-gatherers.  So it is reasonable to assume that we have still much more Bronze Age genetic drift than drift from hunter-gatherers, thus removing LD removes more ancestry of Metal Age Intrurers.  Pruning present-day samples does't have same effect due to more similar genetic composition.

I made also some admixture tests.   Pruning LD gives a big change in ancient admixtures.

My result without pruning

Anatolian_Neolithic 31.4
BA_East_European_Steppe 44,8
East_and_Southeast_Asian 10,8
Western_Hunter_Gathrerer 13

and after pruning

Anatolian_Neolithic 27.5
BA_East_European_Steppe 25.9
East_and_Southeast_Asian 7.8
Western_Hunter_Gathrerer  38.8

I am not saying that the difference between results of FamilyTreeDna and Haak et al. is caused by pruning, because I don't know it.  I only state that pruning ancient samples is risky.

keskiviikko 9. marraskuuta 2016

Project admix results, revised

My previous test was missing of German reference samples.  Together with the fact that my Swedish reference samples seem to be somewhat off, this gave results biased towards Balto-Slavs.  I have now added German samples available from Pagani et al. 2016 and have rerun all project samples, plus two new Finnish samples. Additionally I tested three Finnish samples introduced by aforementioned study.  Soon after downloading those samples I understood that they don't represent average Finns.  So this point is included after project results.

I had difficulties in editing columns and after some useless efforts I copy-pasted all in plain text format.

A new grouping, Karelian-Finnic indicates a sum of Karelian and Veps people.

Finland     57,0
AMBIG_Europe     25,0
Balto-Slavic     12,9
Baltic-Finnic     2,5

Finland     37,2
AMBIG_Europe     28,0
Balto-Slavic     14,8
NW-Atlantic-Europe     10,6
Saami     3,9


Finland     62,3
AMBIG_Europe     33,0
Baltic-Finnic     2,3

Finland     47,2
AMBIG_Europe     18,9
NW-Atlantic-Europe     18,1
Northeast-Europe     15,8

Finland     53,8
AMBIG_Europe     33,1
Baltic-Finnic     11,7

Finland     43,0
AMBIG_Europe     36,0
Baltic-Finnic     12,5
NW-Atlantic-Europe     7,9


Finland     78,7
AMBIG_Europe     17,4
TunNenets     3,4


Finland     56,5
Karelia     25,4
AMBIG_Europe     17,4

Finland     42,1
AMBIG_Europe     27,7
Karelia     24,5
Karelian-Finnic     5,0

Finland     43,1
Saami     21,5
AMBIG_Europe     10,9
Karelian-Finnic     10,2
AMBIGUOUS     10.0
AMBIG_Siberian     4,3

Finland     63,7
AMBIG_Europe     31,7
Baltic-Finnic     1,8

Finland     71,6
AMBIG_Europe     18,0
Central-Europe     10,2


Finland     69,8
Balto-Slavic     16,0
AMBIG_Europe     11,3
Baltic-Finnic     1,6


Finland     62,0
Karelian-Finnic     21,2
AMBIG_Europe     14,9


Finland     43,1
AMBIG_Europe     22,9
Estonia     21,8
Karelia     10,3

Finland     33,9
Central-Europe     24,0
Karelia     13,8
Baltic-Finnic     9,8
AMBIG_Europe     9,5
RU_Pinega     5,6
Karelian-Finnic     1,3


Finland     46,1
Karelian-Finnic     19,7
Balto-Slavic     14,5
AMBIG_Europe     8,8
Baltic-Finnic     6,5
Saami     3,7


Finland    0,62
AMBIG_Europe    0,20
Northeast-Europe    0,08
RU_Pinega    0,05
Saami    0,03

Finland     57,8
AMBIG_Europe     21,8
Balto-Slavic     10,9
Baltic-Finnic     4,3

Finland     53,1
Karelia     28,0
AMBIG_Europe     10,7
Northeast-Europe     4,8
Karelian-Finnic     1,2

NW-Atlantic-Europe     32,8
Central-Europe     32,5
Balto-Slavic     19,3
AMBIG_Europe     13,3


Baltic-Finnic     27,6
Central-Europe     21,2
AMBIG_Europe     19,3
Norway     17,5
NW-Atlantic-Europe     12,9


Norway     53,0
Central-Europe     18,3
Balto-Slavic     13,7
NW-Atlantic-Europe     8,1
AMBIG_Europe     6,5

AMBIG_Europe     28,9
NW-Atlantic-Europe     18,3
Central-Europe     18,3
Ireland     14,1
GermanyAustria     11,5
Northeast-Europe     7,9

Central-Europe     31,5
NW-Atlantic-Europe     24,7
AMBIG_Europe     16,5
Finland     14,5
Balto-Slavic     11,9

AMBIG_Europe     29,7
NW-Atlantic-Europe     26,1
Sweden     20,5
Orcadian     11,0
Central-Europe     10,7

Additionally some freely available genomes, only for checking the method.

Genomes Unzipped, VXP
North-Italy     24,9
Central-Europe     20,7
AMBIG_Europe     18,4
Norway     13,7
NW-Atlantic-Europe     12,0
South-Europe     6,6

Genomes Unzipped, JKP
Central-Europe     28,9
South-Europe     19,8
NW-Atlantic-Europe     19,1
Spain     12,5
AMBIG_Europe     11,3
AMBIG_SEURASIA     2,0                                      

Razib Khan, downloaded here.
Indian     35,6
Sindhi     22,3
Cambodian     12,8
AMBIGUOUS     10,6
Burusho     8,6
IndianJew     6,3
AMBIG_Southeast-Asian     2,4

Blaine Bettinger, downloaded here.         
He looks British, with a small portion of Native American.
Central-Europe     24,9
Kent     24,1
AMBIG_Europe     21,2
Welsh     9,3
Ireland     7,3
Atlantic-Europe     3,3
Native-American     1,9

Tests using Pagani et al. Finns as a Finnish reference   
Karelia    28,0
AMBIG_Europe    23,8
Central-Europe    17,8
Baltic-Finnic    12,6
Finland    12,1
Karelian-Finnic    3,4

Estonia    23,7
AMBIG_Europe    22,5
Karelia    18,6
Central-Europe    18,5
Finland    7,9
Karelian-Finnic    4,7

Karelia    46,3
AMBIG_Europe    16,1
Finland    10,4
Baltic-Finnic    8,7
Northeast-Europe    8,5
Saami    4,3
Karelian-Finnic    2,8

I tested three Finns, seen above, two of them typical Western Finns without any obvious foreign admixture and one should be a typical Finn from East Finland. The first row below shows the average result using average Finnish reference picked from 1000-genomes and the second row shows the average result after changing the reference to Finnish samples of Pagani et al.
FI12, FI14 and FI21, average Finnish result when using average Finnsh reference    64,8

FI12, FI14 and FI21, average Finnish result when using Pagani Finnish samples as a reference    10,1

In this particular case, while Pagani Finns almost fully mismatch with average Finns, it also eliminates Finnish admixture of Swedish results where it is present in analyses based on average Finnish reference, in some cases substituting Finnish admixture by Karelian and Veps.  This is really odd.

A map giving an estimate of admixture regions in Europe

maanantai 31. lokakuuta 2016

Project admixtures, fitted ancient proportions

Here are ancient European proportions of project members and for comparison some academic present-day samples (not all fully covered by references, though),  one random sample per each population.  Results don't express primary proportions of Anatolian Neolithic and various hunter-gatherers populations, but add-ons over European LNBA samples.  The European LNBA itself was already a genetic mixture, including admixtures similar to aforesaid West Eurasians and probably also of still unknown ancient populations.  Similarly "BA East European Steppe" already included eastern hunter gatherer admixture.  My aim was not to fix all admixtures on the same time level, but to get a good coverage and make project samples comparable to each other. 

XLS-sheet is available from here.