Friday, January 19, 2018

I-CTS2208 update

This is a periodic work I do to confirm where we came from. 

A basic tree connecting the Finnish Bothnian I1-clade (I-L258) to Scandinavian roots, copied from the project "I1 Suomi Finland & N-CTS8565":

 - - - - - - Z74          - 4100 BP
- - - - - - - L813
- - - - - - - - Y30806
- - - - - - - - - BY3474      - 400 BP
- - - - - - - - Y18927
- - - - - - - - - Y21736
- - - - - - - - - - Y20861        - 2100 BP
- - - - - - - - - - - Y23712        - 1650 BP
- - - - - - - CTS2208     - 2900 BP
- - - - - - - - Y20287
- - - - - - - - CTS7676
- - - - - - - - - L287         - 1900 BP

- - - - - - - - - - BY594        - 1450 BP
- - - - - - - - - - L258          - 1700 BP

Corresponding PCA including new CTS2208 samples and locations: 


Saturday, January 13, 2018

Shared IBD in North Europe

One of the most ambiguous thing in genetic genealogy is IBD (identity by descend).   People make easily wrong conclusions by connecting individuals using IBD-segments.  Practically it is impossible to prove common ancestry by a single IBD-segment from Iron Age or earlier.  But it is even worse -  there are many chromosomal areas giving enormously false results.  I have hit with this problem many times, as well as companies on the market selling personal genetic genealogy.  Sometimes fixing the problem has worsened the outcome, rather than fixing it, because the fix in business purpose can have been less factual.

Phasing gives a more reliable result decreasing recombination error, but still the result can be wrong and the result can be useless and the dating error thousands years.   Read for instance Li et al 2014 .   Even in case the individual result is realistic it doesn't tell about the gene flow direction and is useless in searching ancient migrations.  Another issue is the difference between IBD and allelic statistics.   Allelic distances can become really bad for mixed individuals and populations and mostly seen IBD-statistics dealing with origins of whole populations in a long run are mostly false.  I am going to show it, or not, it is your decision.  

I use 800 thousands high coverage SNP's combined from two well-known data sources.  The data was improved by removing bad areas shown in Li et al 2014.   The data was processed by the latest version of Beagle (v. 4.1), using haplotype reference panels from the 1000 genomes project and recombination map from Beagle's own library.  Beagle reports the ancestry likelihood of one IBD-segment of two individual in LOD scores   LOD score 3 means that the probability of common ancestry between two individuals shown by one IBD-segment is 1000:1, which is considered as a strong evidence.   Because my goal was to make statistic between populations rather than individuals I accepted all positive LOD scores.   LOD scores were summed by population pairs and the sum was divided by the product of sample number of both populations, except in intra-populational cases by the product of sample number and sample number - 1.

Because of the small Swedish sample size (only 2) I ran two global PCA-plots, one including Swedish samples and another including Finnish samples,  to make sure that they had not Finnish ancestry.  It was easy done by checking the Asian/Siberian admixture.  Both samples were South-Swedish without for Finns typical eastern admixture and were located among Orcadians, West Slavs etc. (Global PCA with Siberians, East Asians and SSA samples loses nuances in Europe, but shows excellently global differences). This is interesting, because in this case the high Finnish IBD-sharing in Sweden actually means Swedish admixture in Finland, not the Finnish one in Sweden.

My previous blog entries disclosed the Finnish eastward expansion.  Although IBD-segments can't prove the origin of shared ancestry, the result indicates same strong Finnish influence far to the east.  This brought forth the obvious outcome of Swedish and Finnish influence in Northern Russia during the Iron Age.   

Average LOD scores between populations.

   13.1.2018 fixed some colors in matrix and again 16.1.

Friday, January 5, 2018

Searching for the Finnish root

We are unlucky people in Finland because the soil in Finland is acidic and destroys all organic remains in one millennium. We will never know the genetic appearance of people who lived here during the first millennium or earlier.  This fact let us speculate about our ancient ancestor and people also do it.  The outcome depends pretty much on beliefs and myths.  I try to bypass the exact solution to this problem by using modern genomes and a retroactive way.  I spit all 99 Finnish samples into 6 groups using Finestructure.  Then I ran each groups against global references using Globetrotter to find out which one of 6 Finnish groups shows the oldest admixture date.  It happened that the oldest Finnish mixture included three genetic elements:  Scandinavians, Estonians and Saamis.  Sound good so far, but it is not simple at all.  Although the Finns are a relatively homogeneous group and removing outliers is quite a simply task, this same doesn't fit with Estonians.  I am kind of sure that many thing happened changing Estonians, for example the Slavic expansion during the first millennium and the demolition of all old kingdoms by devastating German, East Baltic and Slavic armies on the Eastern Baltic coastline in the beginning of the second millennium.  The little can be done, can be done. 

First the test showing the present mixture of the "Finnish root":

Estonian 0,518
Scandinavian 0,415
Saami 0,067

And then the Finnish root after searching for the most obvious admixture date:

59 generation or 1620 years ago
Scandinavian 0,730
Estonian 0,164
Saami 0,106

Following results are obtained using Finnish root population

Karelian-Vepsa 44 generations or 1200 years ago
Finnish-root 0,698
Baltic 0,154
Mari_Chuvash 0,057
Northeast_Asian 0,024
Mongola 0,015
Central_Siberian 0,013
Saami 0,011

Estonian x generation x years  (unclear) 
Baltic 0,644
Finnish-root 0,194
Slavic 0,076
Scandinavian 0,038
Mari_Chuvash 0,028
Saami 0,011

Mordva  29 generations or 800 years  
Slavic 0,340
Baltic 0,252
Karelian_Vepsa 0,118
Mari_Chuvash 0,067
West_Europe 0,061
Mongola 0,045
Finnish-root 0,023
Caucasian 0,022
Khanti-Mansi 0,018
Armenian 0,018

Swedish x generation x years (unclear, probably very old, old enough that Finnish-root and Saami didn't yet exist and both designations mean something undetermined)    
West_Europe 0,577
Finnish-root 0,147
Saami 0,122
Slavic 0,111
British_Isles3 (Scottish) 0,033
Baltic 0,010

Tatar 30 generations or 825 years ago   
Slavic 0,209
Baltic 0,159
Mari_Chuvash 0,152
Balkan 0,093
Mongola 0,085
Karelian_Vepsa 0,065
West_Europe 0,050
Caucasian 0,043
Armenian 0,031
Finnish-root 0,028
Ulchi-Hezhen 0,029
Central_Siberian 0,015
East_Asian 0,011
North_Siberian 0,010