lauantai 17. kesäkuuta 2017

Estonian Corded Ware was not Corded Ware

Despite of the common chronology the Corded Ware in Estonia was genetically a historical misstep if we believe dstat-statistics using samples of German Corded Ware and Bell Beaker cultures.  All Northern Europeans are closer German than Estonian samples.

torstai 15. kesäkuuta 2017

Shared drift with ancient Latvian, Estonian and British samples

Briefly said, shared drift of Latvian samples from Jones et al.   I have rebuild all samples using bam-files straight from the study and a new genotyping algorithm designed for ancient samples. 

perjantai 9. kesäkuuta 2017

British Viking Age samples placed on the genetic map

I got recently new samples from quite a new study, link here.   It looks more like a technical test than actual sampling for a purpose to study history, but anyway I sampled the data.  So far I have available eleven Viking Age samples from UK and have now tested them.   The data consist of around ten samples from each population, with exceptions of Swedes.  Only two Swedish samples were available for my mega-snp data base, both from the study "Genomic analyses inform on migration events during the peopling of Eurasia" (Luca Pagani 2016). The first one was from Nyköping, the second was without any place declaration.

The PCA lacks of a few Viking Age samples due to being too bad thus canceled by the outlier check.   British Viking Age samples look like to be German, but I should remind that PCA is based on dedicated components rather than genetic similarity in basis of the whole genome.   Let's see how those samples look in a formal analysis.  I have made several tests to give different views, for the reason that populations don't place in tests on one or two dimensional axis.

We see that in formal tests Swedes are closest to British Viking Age samples, followed by Irish and Scottish samples.  One straightforward conclusions could be that those Viking Age people were mixed Scandinavians and Celts/Britons. One bizarre remark:  Swedish samples are on the PCA prone to bias towards Finns and Norwegian samples show less this kind of similarity.  Still Swedish samples are closer those Viking Age samples from UK.   I have not tested this curio using formal analyses, but as far as I know this will be true in all tests.

edit 11.6.2017  12:50

German samples are from Leipzig.

torstai 25. toukokuuta 2017

PCA grouping of N1c1-haplogroup

Earlier I used TMRCA (time to the most recent common ancestor) calculation in making PCA analyses of YDNA clades, the analysis is here.  Now I use same method for grouping haplogroup N1c1.  The data was gathered from the  FamilyTreeDna's open project.   TMRCA calculation give only estimations, but  the result makes more sense because every cell in the TMRCA data is compared to every other cell.  I used 67 markers to get largest possible data.  Only a few Ftdna kits show less markers.

Download original picture here

Now I had only a few Altaic and Ugric samples.  More those samples would make possible to see the distance between Altaic/Ugric and European groups.  The result indicates three European groups:  Baltic,  Chuds and Finnish.  Actually also West Chuds are Finnish, but as far as I know it is prehistorically shared with Estonians.   The most distinct group is the Finnish one, implying local origin, despite of random distribution in North Scandinavia and Russia.

Download original picture here

The next picture shows what happen after removing Finnish clades (despite of the locations).  West and East Chuds cluster together and North Balts come close on the y-axis.  West-, East- and Central Balts cluster again.  The root group includes all samples not belonging to any named clades, but doesn't indicate any specific branch.

Download original picture here

After removing also all Chuds the picture shows more details.   We see that North Balts and Rurikids cluster together (with one classified Fennoscadinavian)  and all Balts make another cluster.  

torstai 6. huhtikuuta 2017

Estonian Comb Ceramic and Corded Ware cultures inherited to us

Thanks for the new study "Extensive farming in Estonia started through a sex-biased migration from the Steppe" I have now great new samples from Estonia dated to 4,500 to 6,300 years before present and representing local Comb Ceramic and Corded Ware cultures.   I have made dstat-analyses pointing out the comparative presence of those cultures among present-day populations. The data consisted of 11 millions SNP's to ensure reasonable coverage between ancient and present-day samples.

tiistai 14. maaliskuuta 2017

Haplotype sharing analysis, part two: Asian connections in Europe

Chromopainter is a software grouping phased data into so called chunks.  Created chunks are a practical implementation of haplotypes.  Usually Chromopainter is used with Finestructure or Globetrotter.  Finestructure reads an input coancestry matrix of individuals created by Chromopainter, which is not the best way to analyze shared chunks between populations, because it doesn't allow you to assign a coancestry connection between populations.  At least I didn't find to way to do it.  Chromopainter does it perfectly and it gives an option to use other softwares in analyzing results.  You can assign donors and recipients at population level.  This of course doesn't mean that the chunk flow goes from donor to recipient, because it is only my definition, but it defines perfectly what is common between population pairs. It neither tells us admixtures, for example the sharing between population x and Saamis tells only how much they share common chunks, not for example how much of  shared chunks are common with putative Siberians, if those Siberians even exist today.

Unfortunately my data is rather limited, some populations are well represented, some other are built only of a few samples.  In future I probably will do more similar tests and try to improve the data.  Just now I consider this step as a showcase of a new method.


edit 14.3.17 17:30

It looks like this works and it is time to play with real data.  Following small test shows how German, Icelandic and Polish haplotype references sort clearly out German and Balto-Slavic speakers, implying higher resolution than genotype data.  

keskiviikko 8. maaliskuuta 2017

Haplotype sharing analysis, part one: Europe

The following analysis was done using softwares Shapeit, Chromopainter and Finestructure.  Shapeit phasing conversion was aided by the 1000genomes V3 phasing reference.  The Finestructure report was run using chunk counts generated by Chromopainter.  Before runnig Finestructure the chunk counts file was modified to avoid "chunk leak" of population with low effective populations size.  I had earlier tested this dilemma and found that small populations being oversampled in respect to the effective population size give erroneous results due "chunk leak" towards other poipulations.  Both Shapeit and Chromopainter uses fixed effective populations size over all populations.  The remedy was to standardize intrapopulational chunk sharing to the average of all intrapopulational sharings.

Finestructure results showed also another weakness;  it is not able to treat big genetic distances in way giving readable graphic results.  For that reason I left East Uralic populations and Saamis away from this test.  I'll be back with them later.

Test conditions

- 10 randomly selected samples per population
- includes only the first chromosome
- around 40000 SNP's

Russians are from Kargopol.

Here is a link to the original gif-file, click here.


 I have also tested a new software developed by Estonian researchers called MixFit.  MixFit is a small software searching best fits using Chromopainter output.   It has a shortcoming making the fit only for three admixtures.   I tested the FinnMostCW group using same Chromopainter output as in my previous test (plus Saamis and east Uralics) and running several samples I accomplished more than three admixtures by calculating average distributions.

FinnLocal 0,3764146
West European 0,2435327
Estonian 0,2265092
Baltic 0,07271973
East FU / Saami 0,0841223

perjantai 3. maaliskuuta 2017

A short view: Were Scythians behind the Asian admixture in the European side of Russia?

As I earlier proved the Siberian admixture among Baltic Finns didn't come from East with them, it was already in Finland in the time when Baltic Finnic people reached Fennoscandinavia.  My statistics showed that rare alleles being found from Russia and Asia are in Finland just at the same level as in other European countries.

Looking closely the Asian admixture in Russia we can stretch the rare allele source to the Altay region.  How did Altaian admixture can be found in Mordvins?  Was it brought by Scythians or Mongols?  I don't know, but the fact is that it is there.

Scythian sphere according Wikipedia

For adjacent information about Mordva/Moksha see the supplementary figure 11.

perjantai 24. helmikuuta 2017

New members added to the project

Three members FI20, FI21 and FI22 are now added to the data and are now shown on following PCA plots.

Wide European PCA including Asian references

Previous PCA zoomed in

PCA including only Europeans

If you see movement in your position between the second and third PCA it is due to the difference in your Saami admixture.


I am moving on in my targets and methods and beginning to use haplotypes. and possibly rare alleles,  instead of using genotype data.  

torstai 16. helmikuuta 2017

Rare alleles show: Baltic-Finnic people are Central Europeans with Saami admixture

Speaking about Finns one of the most speculated issues have been the origin of their minor Siberian admixture.  The debate has been effusive, but in the end only boring.  Researchers have mentioned Mongols, Chinese, Nganasans, Khanties inter alia, but, as we use to say, one should not go farther than the sea to fish.  Using rare alleles, the method used by Schiffels et al. 2015 (,  we see that the Siberian admixture is credibly explained by the common history of Finnish and Saami people and the foundation of Finnish people is in this sense in Central Europe.   Of course we need to compare rare alleles of Finns and other European populations to find out who are the closest relatives for Finns and to see details.  Volga-Finnic and Eastern Uralic people show clearly different eastern admixture.  If we assume that the Finns came from Volga or Ural regions we have to explain the difference in Asian admixtures.  The simplest way to do this would be to determine the origin of the Saami-Siberian admixture and date it.  You can see this as a hint for Estonian and Finnish researchers :)

maanantai 13. helmikuuta 2017

Ancient Latvians, comparing to modern people

New ancient Latvian genomes were figured in the new study from Jones et al.   Although all new genomes show rather low quality I have now made some dstat comparisons against modern populations.  Present-day Latvians were used as a fixed point.

PCA, trying to locate three ancient samples

And three dstat figuring samples MN2, HG2 and HG3.  Most of those samples have too low quality to give reasonable results, so I have now only three results.  I tried also map original fastq files and experienced it possible, giving more available SNP's, but I decided to not use them to ensure full comparability with the study.

maanantai 9. tammikuuta 2017

Going ahead with the new data, clustering

My new data makes possible to cluster better samples according to ethnicities. It is now possible to see at least

South European
West European
East European
Finnish dwelling zone
Baltic dwelling zone

Unfortunately none of those new sample sources give reasonable South European view, which makes impossible to see inside the Mediterranean area.  With better sampling I probably could create at least Balkan, South-Italian, Iberian and Basque clusters.   It is probably now possible to classify also project individuals by PCA.

Europe, clustered by Saami, Mongolian, South-Asian and Middle-Eastern samples

Zoomed in

Europe, plotted exclusively.  You can see clearly western and eastern clusters, as well as Balts and the Baltic-Finnic group splitting into Scandinavian and East-Slavic relations.   We could see also a clearly distinct Scandinavian group with more proper samples.  Unfortunately the South European picture is fuzzy due to too few samples.  Due to the shortage of samples I narrowed each group down to four samples, except Tuscany to strengthen the southern cluster.  It is very possible that with a larger South European sampling the European west and east would diverge even more than we see now on this plot below.