Kalevan ja Untamon geenit: March 2014

Sunday, March 30, 2014

Iberian caveman from La Braña

Finally I got downloaded the ancient La Braña genome. It is publicly available from the NCBI sequence read archive. I refined it by GATK, a program reading genome sequences and calling SNPs. A bit over 1 million SNPs were called, over 100k overlapped with the 1M Illumina set used by 23andme, but only 20 kSNP’s overlapped with my standard LD-pruned set. It is less than some other projects have been able to reach, but still usable, although some amount of inaccuracy is seen in individual results. The big view and locations of populations on MDS-plots are still very correct.

Playing around with PCA, which I have used earlier, I found it being poorly applicable in this case and I moved to MDS. It was obvious that the LaBrana1 sample being something different than any present-day population didn’t share as much common SNP-level principal components with them as MDS can retrieve using genetic distances.

Comparing to other online results shows some similarity and also differences. What made me pondering was the obvious Asian affinity of LaBrana1 shown on the Eurasian plot. This Asian affinity was shown around at the same level with most eastern Russians, like most eastern samples from Vologda and most eastern Mordvins. This is very close what I have seen achieved by some other projects, but surprisingly Lazaridis et al. didn’t show Asian affinity for LaBrana1. Unfortunately he didn’t inform us about the amount of SNPs used on his PCA-plot (or I didn't find it), but the study tells that only a 10 kSNP overlapping set was obtained for LaBrana1 and even less for some other ancient samples. But maybe I am wrong. Maybe I know more after testing other ancient WHG samples.

MDS plottings and IBS-statistics show closest similarity with Northern European populations. My data lacks of Basques, so I can’t say how close the LaBrana1 they might be. In general South Europeans are more distant for LaBrana1 than North Europeans and Near Easteners are even more distant. The closest population at average IBS-level was Western/Southwestern Finns, but the difference to Estonians and Lithuanians was quite small.

European MDS

Fulls size image available here

Eurasian MDS

Full size image available here

IBS-statistics for top ten individuals

Estonian 0.8101938

Lithuanian 0.8101357

FI0007 0.8099396

Belorussian 0.8098805

SC0001 0.8098763

Estonian 0.8098527

Lithuanian 0.8097566

FI0005 0.8096651

Lithuanian 0.809652

Russian 0.8094852

IBS average statistics including all population from my extended European data set (East and North Asians excluded)

Tuesday, March 4, 2014

Estonians, the genetic comparison

I succeeded to download 13 Estonian samples from the study “Upper Palaeolithic Siberiangenome reveals dual ancestry of Native Americans “ released by Estonian Biocenter. After the qualification I had 11 samples, two was removed due to being too close relatives with other samples. I have only a few comments about the Finestructure, otherwise I only hope you enjoy the results.

the Finestructure analysis is based on IBS-similarity.
I removed Lithuanians from the Finestructure run because they show very high national IBS-similarity and it distorts results in the aggregate mode. This happens although the aggregate level is quite low in my analysis. This means practically that Estonians are compared to Slavs (Belarussians).
processes using MCMC-aggregate features usually change individual results and usually you can’t compare individual results on Finestructure matrixes. To see individual results you have to look the raw data matrix, not the aggregate matrix, but in my run the difference between raw and aggregate is very small. In this context I refer to my previous analyses and to the effects of young isolations, they tend to distort aggregated results.
as also in my previous Finestructure analysis, the South European result seems to be somewhat inaccurate, but Northern Europe looks quite good.
the tree result on the matrix seems to imply more differences than similarities between samples. So for example the Finns on the matrix are split into two or three groups. It is at least for me impossible to estimate the balance of similarities and differences between individuals and populations in existing software analyzing tools. It is strongly up to the selected tools, and accordingly to the authors, how similarities and differences are weighted. Some authors prefer similarities, even in case it is very small, and disregard differences. This can be seen by comparing results and analyzing tools. This question is interesting and reminds of the complex history of each population and used data which can be inadequate.