Tuesday, December 9, 2014

Big comparison of ancient samples

F3-statistics measures common genetic drift between populations.   I have done a simple run using ancient genomes and Eigensoft’s qp3Pop program, comparing all those ancient genomes to present-day populations and moved results to Excel graphics.  The bigger the result is the more the present population shares drift with ancient samples.

Ancient genomes:

Loschbour M Luxembourg_Mesolithic
LBK F GermanStuttgart_LBK
Otzi M Tyrolean_Iceman Copper Age
MA1 M Siberian_Upper_Paleolithic
AG2 M Siberian_Ice_Age
Scandinavian_HG M Swedish_HunterGatherer Neolithic
Scandinavian_farmer F Swedish_Farmer Neolithic
Motala12 M Swedish_Motala 7000 years old
LaBrana M    LaBrana Mesolithic
AngloSaxon several samples UK Hinxton Iron Age
Briton several samples UK Hinxton Iron Age
NE1…NE7  Hungarian Neolithic
KO1 Hungarian Neolithic
CO1 Hungarian Copper Age
IR1 Hungarian Iron Age
BR2 Hungarian Bronze age

The formula is f3(Mbuti;test1,test2)

All results can be downloaded and unzipped here.

I couldn’t resist to look why I have at 23andme many “cousins” in Southern Russia and Ukraine, but no one in Nothern Russia.  This looked weird because I have heard always that the Finns are from the north.  Indeed, Western Finns, like me,  are closer Ukrainians in terms of the common genetic drift than Northern Russians (Mordva and Russian people closer White Sea).  Surprising. 

Sunday, December 7, 2014

Starting with new data

I am now starting with a new data set and hopefully I can give you more reliable results.  My new set is based on the data publicly available from Lazaridis et al. 2014.   Some additions were made:

 Finnish, British and CEU samples from the 1000-genomes project
 5 ancient British samples from Hinxton
 8 ancient Hungarian samples (from Y-Str server)

Going to Lazaridis’ data meant also changing to Affymetrix coordinates.   Affymetrix doesn’t fit well with most commercial SNP sets, but seems to give better coverage for many ancient samples.  Switching to Affymetrix  was not a problem to me because my private sample collection is not big and I can give it away and take the advantage of better data.   Now I have

2244 samples
555268 SNP’s

The individual sample data mostly covers the whole 550kSNP, also many ancient samples reach over 500kSNP. 

Present ancient genomes

I am not yet familiar with those ancient samples and it takes a time to find out their secrets.

While I have a new data I am also starting with a new software toolkit.   I found the Eigenstrat format handy, because it is also easy to handle with SQL-tools.  This decision led to another one, to use Eigensoft’s software.  Luckily they just a few weeks ago released a new version.  

It is the time to uncork the new data, starting with Eigensoft’s excellent PCA tool.  It is capable to do many thing, like LD-pruning, random sample selection, sample projection etc.   Now I use only the random sample selection, which ensures that none of populations are oversampled.   I use the whole 550k data and  I do not use LD-pruning for two reasons.  At first it is not necessary to avoid excessive clustering.  Working with the LD-pruned data I did not notice any improvement in clustering.   Secondly, LD-pruning can be disadvantageous, because not all populations have the same genetic drift to remove.  Generally the LD-pruning must be used carefully and it is not for dummies like me, so I use it seldom and trust to the original data.

PCA including Europeans

Fst-distance table.  Average standard error is 0,000762745, meaning that there can be an error of one thousandth of unity.


Friday, November 21, 2014

Do_it_yourself Dodecad test for Finns (including Baltic Finns in general)

Wondering the Finnish history and migrations that happened during the last 2000 years I have done the following Do-It-Yourself Dodecad test.   My goal was to achieve a dedicated test for Finns, but it could work also for Estonians and other Baltic-Finnic people.  This test doesn’t work for other nationalities due to the regional reference assortment.  What I have done differently than in many other Dodecad Oracle tests is not only the reference selection, but also I had a tighter Finnish sample qualification.  It is also reasonable to mention that in some tests the preprocessing of genotype data has been biased.  My data includes 290000 SNPs and it doesn’t include any preprocessing based on differences between populations.  So it is as it is, straight from the stock.  

Reference populations:


You don ‘t need to worry about the “calculator effect”, because all my data is from public academic sources.

To perform the test you at first have to download DiyDodecad scripts.  You can do it here

Please notice that DiyDodecad is authorized by Dienekes and included in his Dodecad Ancestry Project:

After you have uncompressed all files into your own directory (for instance Kaleva) you have to download and uncompress four Kaleva-specific files to the same directory. 


Now everything is ready for making first analyses, to do it you need to read README.txt and follow Dienekes’ instructions, the only difference is that you need to use KALEVA.par instead of the Dodecad dv3.par file.

Friday, November 14, 2014

Ancient British genomes from Hinxton reveal the eastern Iron Age frontier

It is the time for ancient genomes.  A month ago I read about new ancient samples from England, Hinxton, and saw them to be interesting in terms of the Finnish history.  Those samples are around 1500-2000 years old, thus being rather suitable for estimating Finnish western connections.  The Finnish history in Finland is rather short, in the best case bloodlines goes around 2000 years to the past, quite a short history compared to many southern Europeans.


I use now the same data I had in my roll-off analyses.  Just to remind you, I made a very strict qualification for Finnish samples to remove all recent admixtures, meaning the time span from the beginning of the Swedish era in Finland.  All public western Finnish samples were selected by comparing to my own genealogically proved samples and outliers were removed.


I used Reich’s three population test (qp3Pop) with default settings.

Before going forward some words of caution.   After testing with larger data I realized that also qp3Pop makes an assumption that less diverse populations are source populations for more diverse sum populations, in other words diverse populations are usually composed from several less diverse populations.   This is not true and is a rather mechanic perception.  In genetics the process can be reverse; a more diverse population can turn in to a less diverse one through genetic drift.  This is important because just the drift is now what we analyze.  

Some general observations.  This above-mentioned problem doesn’t have effect on ancient samples, because they lived far before us and they can’t violate causality.   However the lack of diversity can overestimate the admixture.  

I have also some results using preset-day source populations and those results can be problematic. Nevertheless,  despite of the fact that some Finnish samples are from young isolations  I  assume that my Eastern Finnish samples represent historically most unmixed Finnic language speakers in my data, keeping in mind however that I have no Finnic (Baltic Finnish) speakers from Russia.  Additional samples from Russia could give information about possible admixtures of East Finns.

AS - Hinxton Anglo-Saxons BR - Hinxton Iron Age Briton EF – East Finns WF – West Finns
PL – Poles LT – Lithuanians EE – Estonians MA – Maris CH – Chuvashes NR – Norwegians
MR – Mordvas BU - Belarussians

Negative F_3 values mean likely that the target has admixtures of both source populations.



The Western Finnish map shows high ancient admixtures, especially the Anglo-Saxon - East Finnish admixture among them is outstanding!  Estonians show admistures with almost all their neighbors, which can point out continuous migrations to Estonia through the history.

Another way to find out speculative admixture of source populations is to pick the least probable target population, in this case African Pygmies.  Using this method we see that the most Anglo-Saxon-like are Norwegians and the most Iron Age Briton-like are Lithuanians.  West Finns are the third on the Anglo-Saxon axis.  This probably means that West Finns have Anglo-Saxon-like ancestry, or Anglo Saxons had common Fennoscandinavian ancestry with West Finns.  All those owning more Iron Age British ancestry than West Finns (NR, LT, EE, PL) likely have more ancient Celtic ancestry from Central Europe.

 edit Su 16.11.2014

I thought that it would be interesting to know more about the western outlier group, the Finns who are more western on PCA plots than the genealogically proved West Finnish group.  This is done by comparing both western Finnish groups to East Finns, that is to say the East Finns are a fixed landmark on which to base the comparison.  The result shows mixed results with negative f3-stat F3(WF;EF,test) and f3(WF2;EF,test) where "test" includes Estonians, Swedes, Iron Age Britons and ancient Anglo-Saxons.

The result shows that western outliers show more Iron Age Briton, more Estonian and more Swedish ancestry than the genealogical western group, but they show little less ancient Anglo-Saxon ancestry than the genealogical western group.  The result also proves that both western Finnish groups have significant Eastern Finnish -like ancestry.

The abbreviation "SE" stands for three Swedish samples who show only very little Finnish ancestry at 23andme's Ancestry Composition.

edit. Mon 17.11.2014

Another graphic showing the Swedish - ancient Anlo-Saxon ratio among Finnish individuals, both admixture gotten by 3Pop-software.  The East Finnish group is used as a fixed landmark.   The individual difference between AS and SE was used as a sort key and the trend line shows linear difference.   I would have done also comparisons to other populations, like to Estonians, but the difference in SNP-sets made an individual level comparison impossible.

Although ancient Anglo-Saxon and Swedish admixtures follow each other, my judgement is that the bigger the AS is compared to the SE, the bigger is the ancient admixture, and vice vesa.