lauantai 13. tammikuuta 2018

Shared IBD in North Europe

One of the most ambiguous thing in genetic genealogy is IBD (identity by descend).   People make easily wrong conclusions by connecting individuals using IBD-segments.  Practically it is impossible to prove common ancestry by a single IBD-segment from Iron Age or earlier.  But it is even worse -  there are many chromosomal areas giving enormously false results.  I have hit with this problem many times, as well as companies on the market selling personal genetic genealogy.  Sometimes fixing the problem has worsened the outcome, rather than fixing it, because the fix in business purpose can have been less factual.

Phasing gives a more reliable result decreasing recombination error, but still the result can be wrong and the result can be useless and the dating error thousands years.   Read for instance Li et al 2014 .   Even in case the individual result is realistic it doesn't tell about the gene flow direction and is useless in searching ancient migrations.  Another issue is the difference between IBD and allelic statistics.   Allelic distances can become really bad for mixed individuals and populations and mostly seen IBD-statistics dealing with origins of whole populations in a long run are mostly false.  I am going to show it, or not, it is your decision.  

I use 800 thousands high coverage SNP's combined from two well-known data sources.  The data was improved by removing bad areas shown in Li et al 2014.   The data was processed by the latest version of Beagle (v. 4.1), using haplotype reference panels from the 1000 genomes project and recombination map from Beagle's own library.  Beagle reports the ancestry likelihood of one IBD-segment of two individual in LOD scores   LOD score 3 means that the probability of common ancestry between two individuals shown by one IBD-segment is 1000:1, which is considered as a strong evidence.   Because my goal was to make statistic between populations rather than individuals I accepted all positive LOD scores.   LOD scores were summed by population pairs and the sum was divided by the product of sample number of both populations, except in intra-populational cases by the product of sample number and sample number - 1.

Because of the small Swedish sample size (only 2) I ran two global PCA-plots, one including Swedish samples and another including Finnish samples,  to make sure that they had not Finnish ancestry.  It was easy done by checking the Asian/Siberian admixture.  Both samples were South-Swedish without for Finns typical eastern admixture and were located among Orcadians, West Slavs etc. (Global PCA with Siberians, East Asians and SSA samples loses nuances in Europe, but shows excellently global differences). This is interesting, because in this case the high Finnish IBD-sharing in Sweden actually means Swedish admixture in Finland, not the Finnish one in Sweden.

My previous blog entries disclosed the Finnish eastward expansion.  Although IBD-segments can't prove the origin of shared ancestry, the result indicates same strong Finnish influence far to the east.  This brought forth the obvious outcome of Swedish and Finnish influence in Northern Russia during the Iron Age.   

Average LOD scores between populations.

   13.1.2018 fixed some colors in matrix and again 16.1.

Ei kommentteja:

Lähetä kommentti