lauantai 20. elokuuta 2016

Mitochondrial diversity in Europe


I have seen several mitochondrial statistics using main haplogroups, H, U, I etc.  Haplogroups, being tens of thousand years old are a very robust way to analyze geographic areas where people have moved and mixed during latest centuries and in maximum during some thousands years.   Because of this I decided to use mutation information based on RSRS-reference.  The RSRS was introduced a few years ago and lists mitochondrial mutations defined from so called "mito-Eve", from the reconstructed first woman in the human ancestral tree.  Even RSRS lets lot to be desired, because many mutations are common in several mitochondrial branches.


The data is collected from publicly available FamilyTreeDna's projects and includes two hypervariable regions, HVR1 and HVR2.   HVR2 is not available for all samples, in those cases it is marked as "no call", otherwise all mutations are included.

Countries and sample sizes

Finnish sample size is probably biggest ever seen in academic or any studies.  Even taking into account some bias in regional personal activity this have to be the best ever seen sample data from Finland.

Some geographical areas are underrepresented, like White Sea Karelians, but I was expecting some interest and included them.


Fst distances

Seeking for country level rather than individual statistics I ran at first Fst-statistics between countries.  Keeping in mind the nature of mitochondrial data and mutations it is not relevant to expect any strict ancestral sum information, on the contrary results mirror European migrations during thousands years.

Fst distances

 Image with better resolution can be downloaded here

 MDS-plot based on Fst-distances:

Two dots to the most left are Poland and Germany.

And classical euclidean tree plot:

edit 20.0.2016 13:40

Here I  reconstructed mitochondrial genome instead of using straightforwardly hypervariable mutations.  Reconstructed SNP data was analyzed by standard analyzing tools.   I am very sure that analyzes done using only mutation indicators will not be successful.  

22.9.2016 11:30

Added Fst and genome data.  Notice that the genome data is reconstructed using minimum labor input and original kit-id numbers are substituted by surrogates!

Fst-data download here
Genome data download here