Sunday, March 30, 2014

Iberian caveman from La Braña




Finally I got downloaded the ancient La Braña genome.  It is publicly available from the NCBI sequence read archive.  I refined it by GATK, a program reading genome sequences and calling SNPs.  A bit over 1 million SNPs were called,  over 100k overlapped with  the 1M Illumina set used by 23andme, but only 20 kSNP’s overlapped with my standard LD-pruned set.  It is less than  some other projects have been able to reach, but still usable, although some amount of inaccuracy is seen in individual results.  The big view and locations of populations on MDS-plots  are still very correct.   

Playing around with PCA, which I have used earlier,  I found it being poorly applicable in this case and I moved to MDS.   It was obvious that the LaBrana1 sample being something different than any present-day population didn’t share as much common SNP-level principal components with them as MDS can retrieve using genetic distances. 

Comparing to other online results shows some similarity and also differences.  What made me pondering was the obvious Asian affinity of LaBrana1 shown on the Eurasian plot.   This Asian affinity was shown around at the same level with most eastern Russians, like most eastern samples from Vologda and most eastern Mordvins.  This is very close what I have seen achieved by some other projects, but surprisingly Lazaridis et al.  didn’t show Asian affinity for LaBrana1.   Unfortunately he didn’t inform us about the amount of SNPs used on his PCA-plot (or I didn't find it), but the study tells that only a 10 kSNP overlapping set was obtained for LaBrana1 and even less for some other ancient samples.  But maybe I am wrong.   Maybe I know more after testing  other ancient WHG samples. 

MDS plottings and IBS-statistics show closest similarity with Northern European populations.   My data lacks of Basques, so I can’t say how close the LaBrana1 they might be.  In general South Europeans are more distant for LaBrana1 than North Europeans and Near Easteners are even more distant.   The closest population at average IBS-level was Western/Southwestern Finns,  but the difference to Estonians and Lithuanians was quite small. 


European MDS






 Fulls size image available here


Eurasian MDS






Full size image available here


IBS-statistics for top ten individuals


Estonian            0.8101938
Lithuanian          0.8101357
FI0007              0.8099396
Belorussian         0.8098805
SC0001              0.8098763
Estonian            0.8098527
Lithuanian          0.8097566
FI0005              0.8096651
Lithuanian          0.809652
Russian             0.8094852



IBS average statistics including all population from my extended European data set (East and North Asians excluded)






 

Tuesday, March 4, 2014

Estonians, the genetic comparison





I succeeded to download 13 Estonian samples from the study “Upper Palaeolithic Siberiangenome reveals dual ancestry of Native Americans  released by Estonian Biocenter.   After the qualification I had 11 samples, two was removed due to being too close relatives with other samples.   I have only a few comments about the Finestructure, otherwise I only hope you enjoy the results.


  • the Finestructure analysis is based on IBS-similarity.
  • I removed Lithuanians from the Finestructure run because they show very high national IBS-similarity and it distorts results in the aggregate mode.  This happens although the aggregate level is quite low in my analysis.   This means practically that Estonians are compared to Slavs (Belarussians).
  • processes using MCMC-aggregate features usually change individual results and usually you can’t compare individual results on Finestructure matrixes.  To see individual results you have to look the raw data matrix, not the aggregate matrix, but in my run the difference between raw and aggregate is very small.   In this context I refer to my previous analyses and to the effects of young isolations, they tend to distort aggregated results.
  • as also in my previous Finestructure analysis, the South European result seems to be somewhat inaccurate, but Northern Europe looks quite good. 
  • the tree result on the matrix seems to imply more differences than similarities between samples.  So for example the Finns on the matrix are split into two or three groups.   It is at least for me impossible to estimate the balance of similarities and differences between individuals and populations in existing software analyzing tools.   It is strongly up to the selected tools, and accordingly to the authors, how similarities and differences are weighted.   Some authors prefer similarities, even in case it is very small, and disregard differences.  This can be seen by comparing results and analyzing tools.   This question is interesting and reminds of the complex history of each population and used data which can be inadequate. 


PCA world





Full size image available here


PCA extended Europe


Dimensions 1 and 2:




Full size image available here

Dimensions 1 and 3:





Full size image available here



PCA Europe


Dimensions 1 and 2:



Full size image available here

Dimensions 1 and 3:


Full size image available here


Finestructure




Full size image available here