tiistai 28. marraskuuta 2017

Historical admixture results

I am running historical admixtures using Globetrotter software.  The results will be added to this same blog entry.

Western Volga-Finnic (mainly Mordvas)

time:  36 generations, near 1000 years ago

East_Slavic    0,4867536482
Baltic    0,2887079451
Central_European    0,0570663623
Mongola    0,051551692
East_Volga_Finnic_Chuvash    0,04939485
Mansi_Khanty    0,0270296952
East_Baltic_Finnic    0,0169141037
South_European    0,0098787291
Central_Siberian    0,0073332031
North_Siberian    0,0023436624
North_Baltic-Finnic_G4    0,0015424098
Saami    0,0014836991

East_Volga_Finnic_Chuvash stands for Maris, Chuvashes and Udmurts, East Baltic Finns for Karelians and Vepsians, North Finnic G4 for Ingrians.


West_European    0,6080567694
Baltic    0,203714483
North_Baltic_Finnic_G3    0,092400554
North_Baltic_Finnic_G1    0,0431817267
East_Slavic    0,0387924755
Saami    0,0096464481
North_Siberian    0,0042075434

Globetrotter was not able to infer the admixture date.  Groups North_Baltic_Finnic_G1 and North_Baltic_Finnic_G3 are both Finnish. I modified references from the previous blog entry.  Now the South_Baltic_Finnic is split into Baltic and Finnish portions.


time: 56 generations, around 1500 years ago

East_Baltic_Finnic    0,2279573629
North_Baltic_Finnic_G1    0,1883732327
North_Baltic_Finnic_G3    0,134836117
Basque    0,1242665217
North_Siberian    0,1043483876
West_European    0,0686019039
Central_Siberian    0,0590794466
North_Baltic-Finnic_G4    0,0513959273
Baltic    0,0303972753
East_Volga_Finnic_Chuvash    0,0107438251

Groups North_Baltic_Finnic_G1 and North_Baltic_Finnic_G3 are Finnish, North_Baltic_Finnic_G4 is Ingrian and East_Baltic_Finnic is Karelian+Vepsian.  Basque ancestry was higher that among modern Saamis.

 update 30.11.17

After separating Estonians from the main Baltic group it is possible to test also Estonians.  At first however Finnish results,  Estonians still with Balts.

North Baltic-Finnic, group 1

time: 70,6 generations, near 2000 years ago

Baltic    0,7403595543
Saami    0,1586845132
Mansi_Khanty    0,0874546485
North_Siberian    0,0117447072
Central_Siberian    0,0017565769

Basically this was expected,  the Baltic part (including Estonians) is highest and the rest is composed from Euro-Siberian populations.  What surprises is the Mansi-Khanty part, I would have expected  even more Saami or some Mari-Chuvash -like minor admixture.   However the magnitude is right; three quarters Baltic plus one quarter mixed Euro-Siberians.   2000 years sounds credible, it is near the time of the migration of Baltic Finns to Finland.

North Baltic-Finnic, group 2

time: 43,8 generations,  around 1200 years ago

Baltic    0,6802833838
Scandinavian    0,1470929006
Saami    0,1149188786
Mansi_Khanty    0,0503062607
Central_Siberian    0,0038576149
North_Siberian    0,0035409614

Scandinavian admixture appears around 1200 years ago.  It is impossible to name any exact Scandinavian migration or demographic event in Finland, but the first Swedish crusade can be ruled out.  Scandinavian increases, Baltic and Euro-Siberian decreases.

North Baltic-Finnic, group 3

time: 69 generations, around 1900 years ago

Baltic    0,7973987698
Saami    0,1346869263
Mansi_Khanty    0,0644377493
North_Siberian    0,0032512571
Central_Siberian    0,0002252975

The group 3 is very similar to the group 1, both are Finnish, as well as the group 2.

North Baltic-Finnic, group 4 (Ingrians)

time: 106 generations, around 2900 years ago

Baltic    0,7901064987
Saami    0,1221914423
North_Siberian    0,0845129477
Central_Siberian    0,0031891113

Surprisingly this doesn't look like an East-Finnish migration, which was expected.  It reminds me of an extinct Baltic-Finnic population.

Eastern Baltic-Finnic (Karelians and Vepsians)

time: 43,2 generations, near 1200 years ago

Baltic    0,8118374791
Saami    0,099396225
East_Volga_Finnic_Chuvash    0,0295501283
Mansi_Khanty    0,0263499058
Central_Siberian    0,0214967778
North_Siberian    0,0113694841

Near 1200 years match with the formation of East Baltic-Finnic people.

Estonians with Finnish references

time unclear

Baltic    0,3959754479
East_Slavic    0,3691411424
North_Baltic_Finnic_G1    0,1308129559
West_European    0,0752899347
Basque    0,014077665
Saami    0,0090402024
Mansi_Khanty    0,0037994745
Central_Siberian    0,0018631772

perjantai 17. marraskuuta 2017

Introductory Globetrotter analysis

Globetrotter is a new software being able to estimate admixtures and also admixture dates. The analysis itself is based on autosomal haplotype data, which is produced by the software Chromopainter, version 2.  My job queue was Plink, Shapeit, CromopainterV2 and Globetrotter.   The Plink format data consisted of 399000 SNPs and 254 individuals over the Eurasian continent.  I liked to have more individuals, but I can use only publicly available data and it is always my restriction.

In the first phase I made a phylogenetic tree using softwares Chromopainter and Finestructure.  Chromopainter was run in two phases, at first to define necessary run parameters and in the second phase generating a tree figure and ancestral matrices.  In the next step individual samples were grouped according to the phylogenetic tree and the result was moved to the following Chromopainter runs preceding Globetrotter analysis.  So there was no handmade grouping and all definitions were done by softwares.



The deep past can't be figured correctly by present day populations.  Names like Finnish, Polish and Eastern_Baltic_Finnic didn't exist thousands years ago and all group names should be understood representing something now unknown.  Another imperfection is that some populations are unmixed.   For example Balts and Basques cannot be defined by any other present day populations, with exception of themselves, which is not clever at all if we want to see ancient migrations.   In those cases there are sure unknown ancient admixtures without present day proxies and for example Balts are figured as East Slavs.


Khanty_Mansi    0.00669230541442569
Saami    0.0318001424720861
Scandinavian    0.0406288973530398
Eastern_Baltic_Finnic    0.372195068297064
South_Baltic_Finnic    0.547727866737746


Basque    0.00627519770432461
West_Europe    0.0203347268787166
Mongola    0.0285387511835476
Nganasan    0.0312587449978488
Irish_Scottish    0.0348178173049934
West_Siberia    0.0372717151545831
Khanty_Mansi    0.058915141944102
Eastern_Volga_Finnic_Chuvash    0.108026814008228
Eastern_Baltic_Finnic    0.120473106582914
Finnish    0.554087984240742


Basque    0.0168485068643567
Southwest_European    0.085582697894791
West_Europe    0.897568795240852


Saami    0.00341643675195708
Nganasan    0.00501267346037914
RushanVanch_Tajikistan    0.00854395216066372
West_Siberia    0.0142527316035022
Irish_Scottish    0.0232322099579794
South_Baltic_Finnic    0.0375544108580537
Baltic    0.0616973387576695
Mongola    0.102871971437878
East_Slavic    0.119214847082816
South_European    0.12153228301688
Eastern_Volga_Finnic_Chuvash    0.184584853664197
Western_Volga_Finnic    0.317983363220266


RushanVanch_Tajikistan    0.00496570412383918
Western_Volga_Finnic    0.00973524498071095
Saami    0.0209045306129885
Scandinavian    0.0314001203298436
Mongola    0.0468217500718522
West_Europe    0.0512201282635752
Eastern_Volga_Finnic_Chuvash    0.322872736453914
West_Siberia    0.510808280436572


Baltic    0.0039132358786016
Saami    0.0040658994834093
South_Baltic_Finnic    0.331326055831535
West_Europe    0.660694808806454

Western Volga-Finnic

West_Siberia    0.001211877430627
Basque    0.00153548108955792
Mongola    0.00217721497484364
Irish_Scottish    0.00441065912271718
Nganasan    0.00489486172279255
Saami    0.00654435392074552
South_Baltic_Finnic    0.00873613821602628
Khanty_Mansi    0.011149101703621
Eastern_Volga_Finnic_Chuvash    0.0443170163661487
Tatar    0.170123511582196
East_Slavic    0.744899783870725


Saami    0.00434883756492699
East_Slavic    0.995651162435073

East Slavs

Western_Volga_Finnic    0.0180131830693537
Mediterranean-East    0.0941857330298036
Central_Europe    0.170383068761459
Baltic    0.717418015139384


Southwest_European 1

South Baltic-Finnic
Saami    0.0015403209540466
Basque    0.00277573750507665
Irish_Scottish    0.00668755305870894
Southwest_European    0.0131644799855559
Eastern_Volga_Finnic_Chuvash    0.0132143162745874
Eastern_Baltic_Finnic    0.0231748074310308
East_Slavic    0.152341012943326
Baltic    0.168097194491377
Scandinavian    0.203236330851964
Finnish    0.415614563156978

East Baltic-Finnic

Nganasan    0.00749839275609302
Khanty_Mansi    0.0101318772456883
Saami    0.0189334364744419
Eastern_Volga_Finnic_Chuvash    0.0341857812007321
Western_Volga_Finnic    0.0445466151938662
Baltic    0.259991450633669
Finnish    0.624712446495509

Finnish admixture dates and proportions.  

date in generations:  69.2367424689291


Khanty_Mansi 0,0290405745
Nganasan 0,0343370651
Saami 0,0360340021
Russian_Pinega 0,0402546721
South_Baltic_Finnic 0,8603336861

The software inferring admixture dates is quite sophisticated and I am still learning how to use it.   Before knowing more about it  I can't comment previous results, they are "as is".   

sunnuntai 29. lokakuuta 2017

Tollense Valley Bronze Age battle field, standardized PCA-results

Using the same standardized data we have the following PCA plot, which differs from what we see on plots made using only partly overlapping SNP sets.  I don't see any reason to use Mediterranean samples, because of the small SNP number of some samples.  What we see in general on the plot is that most ancient samples  fall between Germans and Poles.  We see also that Finns, Russians, Poles and Norwegians show genetic drift.  The most Polish ancient sample is WEZ56 and WEZ54 falls inside the British cluster.  Samples WEZ39, WEZ40 and WEZ51 fall somewhat closer Finns, being still Central European. WEZ56 is the most Polish sample in the original study graphics too.

lauantai 28. lokakuuta 2017

Tollense Valley Bronze Age battle field, standardized F3-results

According to my experience the f3-analysis (and dstat) generates error due to differences in SNP numbers between individuals.  Because the SNP number can vary as to the sample source, I removed all "bad" SNPs from the study data and added Finnish samples to make it equal to other sample groups.   Actually the average SNP amount in individual tests didn't change much from the original situation, with exception of samples gathered from the 1000genomes project.  After this operation the SNP number in each test between ancient and modern samples was almost constant.

Average SNP numbers per ancient sample, the difference inside one test group a few hundreds in maximum

WEZ15    56606
WEZ16    5453
WEZ24    13749
WEZ35-2    28766
WEZ39    9343
WEZ40    21711
WEZ48    6392
WEZ51    15313
WEZ53    14758
WEZ54    29468
WEZ56    28034
WEZ57    34161
WEZ58    21152
WEZ59    28256
WEZ61    34657
WEZ63    11920
WEZ64-1    26150
WEZ71    15698
WEZ74    9891
WEZ77    15721
WEZ83    14999

 Result of f3-tests using Mbuti as an outgroup:

All data with exception of Finnish samples (1000genomes) are from the study

Addressing Challenges of Ancient DNASequence Data Obtained with NextGeneration Methods


torstai 21. syyskuuta 2017

Estimates of ancient mixture proportions in present-day Europeans / software

The first Linux script makes possible to generate 23andMe format files from EIGENSTRAT-data and the second one estimates ancient mixture proportions of European people.  Download the full package (without Python interpreter) here.   Instructions are included in the README file.

lauantai 16. syyskuuta 2017

European ancient admixtures

This admixture test was done using the latest data from Reich Laboratory and Dna.Land's Admixture program.  Instead of following the rule and using a mixture triangle Steppe-Farmer-HG I picked also the Central-European Bronze Age to ensure the best coverage of European ancestry.  European Bronze Age was built up from all those three dimensions, so results of all those three roots will be somewhat lower than in tests without the European Bronze Age.  This test works best for Europeans and have obvious blind spots in Africa, Northern America, Near East and Central Asia. I took also some Western Asian samples in the interest of seeing the Iranian and Armenian Neolithic proportions.  

All samples were picked randomly, one sample per population, not as population averages.  No consensus was run.   Both, averaging and consensus calculation would minimize fluctuation in results of minor components, like between Siberian and East-Asian.  The data was converted from the EIGENSTRAT data format I usually use in my tests, of course with exception of the project members.  For that purpose I made a small Bash script which converts Eigenstrat to 23andMe format.  The data and Dna.Land's Admixture program will be soon available here and if anyone see the Eigenstrat-to-23andMe conversion useful I'll include also it to the downloadable library.

sunnuntai 13. elokuuta 2017

Project admixture analyses, revised

Now I used more SNP's with the method coded by Dna.Land authors.  It is now also possible to download all necessary tools for DIY purpose.  It works only on Linux and needs Python to be installed.  Here is a help how to install Python on Ubuntu.

Some comments to understand more about results:

- after a lot of testing I found that the Swedish sample bunch published by the study "No Evidence from Genome-Wide Data of a Khazar Origin for the Ashkenazi Jews" (Behar et.al) doesn't fit well with my Swedish project samples and all of them express more Northwest and Central European than the aforementioned Swedish reference.  This happens even if their self-declarations presume some Finnish admixture.   Therefore I decided to label them as East_Scandinavians, which seemed to be correct.  I wonder where they are geographically from.  

- Saami reference samples, unfortunately too few of them were available leading to increased statistic error,  cannot be considered as a source of Siberian.  They represent here a much more diverse source of genetic history.   The small Siberian admixture usually seen in Finnic results is built in Finnish results for the reason that the present-day Siberianness among Finnic people is old and distinct and doesn't match with present Siberians if we simultaneously use also Finnic reference samples.

The summarizing tree:


Finnic 54.6
East_Scandinavian 25.5
Saami 8.0
Northeast_European 5.7
Slavic 2.2
Northwest_European 1.0
Central_European 1.3
AMBIG_European 1.7

Finnic 52.7
Northwest_European 31.3
East_Scandinavian 6.1
Saami 3.8
Northeast_European 2.3
Slavic 1.9
Central_European 1.4

Finnic 72.9
East_Scandinavian 16.3
Saami 3.6
Baltic 3.4
Northwest_European 1.6
Northeast_European 1.7

Finnic 49.1
Northwest_European 25.1
East_Scandinavian 12.9
Northeast_European 5.1
Slavic 2.4
Saami 2.9
Baltic 2.3

Finnic 80.0
East_Scandinavian 13.4
Saami 3.4
Central_European 2.5

Finnic 54.2
East_Scandinavian 27.2
Baltic 9.3
Northwest_European 2.3
Saami 1.8
Northeast_European 1.7
Mediterranean 1.1
Central_European 2.0

Finnic 97.8
AMBIG_European 1.7

Finnic 95.1
East_Scandinavian 2.1
Baltic 1.7
AMBIG_European 1.1

Finnic 85.7
East_Scandinavian 11.9
Baltic 2.1

Finnic 64.0
Saami 31.5
Siberian 2.5
Uralic 1.0

Finnic 92.7
East_Scandinavian 5.2
Saami 2.1

Finnic 83.6
East_Scandinavian 15.2
AMBIG_European 1.1

Finnic 77.7
Baltic 16.8
East_Scandinavian 2.9
AMBIG_European 2.1

Finnic 97.8
AMBIG_European 1.7

Finnic 73.6
East_Scandinavian 14.6
Northwest_European 5.1
Central_European 5.4
Saami 1.0

Finnic 67.8
East_Scandinavian 14.8
Central_European 7.1
Slavic 5.1
Saami 1.4
Mediterranean 1.6
AMBIG_East_European 1.1

Finnic 82.0
East_Scandinavian 12.9
Saami 2.9
Baltic 1.5

Finnic 73.6
East_Scandinavian 14.3
Saami 6.9
Northwest_European 3.8
Slavic 1.0

Finnic 75.8
East_Scandinavian 17.3
Saami 4.6
Slavic 1.8

Finnic 94.0
Saami 2.1
AMBIG_European 2.0
Baltic 1.0

Finnic 94.5
Saami 1.3
Baltic 1.9
AMBIG_European 1.2
AMBIG_East_European 1.1

Finnic 68.8
East_Scandinavian 20.0
Saami 3.6
Northwest_European 3.3
Slavic 2.4
Central_European 1.7

Northwest_European 40.3
East_Scandinavian 22.4
Central_European 15.9
Finnic 9.0
Slavic 4.2
Baltic 3.9
Saami 1.6
Mediterranean 1.7
AMBIG_European 1.0

Northwest_European 52.1
East_Scandinavian 20.5
Finnic 14.8
Slavic 4.1
Central_European 4.3
Baltic 3.9

Northwest_European 59.5
East_Scandinavian 27.3
Central_European 5.4
Baltic 5.8
Saami 1.8

Northwest_European 38.1
East_Scandinavian 32.9
Finnic 11.5
Baltic 9.9
Northeast_European 3.4
Uralic 1.6
Central_European 1.9

Northwest_European 40.8
Finnic 20.7
Northeast_European 13.1
Central_European 9.0
East_Scandinavian 8.4
Slavic 4.0
Saami 2.1
Baltic 1.9

Northwest_European 45.8
East_Scandinavian 31.9
Finnic 11.5
Mediterranean 6.2
Slavic 2.2
Northeast_European 2.1

Although my primary goal was to find out Finnic and Scandinavian admixtures this obviously works fine for almost all Europeans, at least to some extent.

Other samples for a verification purpose:
Irish sample
Northwest_European 90.0
East_Scandinavian 8.7
AMBIG_European 1.3

Western Polish sample
Slavic 49.8
Baltic 18.3
Central_European 14.4
Northwest_European 6.5
Northeast_European 3.5
East_Scandinavian 4.0
Mediterranean 2.3
Uralic 1.1

Sardinian sample
Mediterranean 93.2
Northwest_European 4.5
East_Scandinavian 1.3

Baltic sample
Baltic 70.6
East_Scandinavian 12.3
Slavic 7.9
Northeast_European 6.3
Central_European 2.6
Lithuanian/Yotvingian sample
Baltic 49.0
Slavic 37.5
Central_European 5.8
Mediterranean 4.1
Northeast_European 1.8
AMBIG_European 1.7

Estonian sample
Finnic 41.4
Baltic 19.6
Slavic 16.8
Central_European 9.7
East_Scandinavian 7.8
Saami 2.3
Northeast_European 2.3

Genomes Unzipped sample
Mediterranean 45.7
Northwest_European 19.2
Central_European 19.9
East_Scandinavian 12.3
Slavic 1.9

Genomes Unzipped sample
Mediterranean 37.4
Northwest_European 37.0
East_Scandinavian 15.5
Central_European 9.3

Admixture sums don't give full 100 % because all admixtures below 1% are ignored.

Program downloading and running

Download programs here.  Unzip and locate all programs into a same directory.  To run tests you need use a command line "bash ./ajo1.sh <sample-id>,  where sample-id is the file name holding your genetic data in 23andme format.  The sample file must be compressed with gz file extension (gzip format), but on the command line you give only the sample id (sample-id.gz), not the extension.  The test works fine with following genome builds:  HG18, HG19, GRCh36, GRCh37, but if your genome file is in the FtDna format you have to convert it into the 23andme style.  On Linux it is done easily using four command line entries:

first unzip your genome file and then

cp <original filename> <sample-id>
sed -i 's/\"//g' <sample-id>
sed -i 's/,/\t/g' <sample-id>
gzip <sample-id> 

If your data is already in the 23andme format, but not compressed with gz file extension then you need to unzip it first and run the first and fourth commands explained as above.

edit date 14.8.17 time 17:30

Another Estonian results.  I can only say that it is plausible considering the history

Baltic 37.2
Slavic 29.6
Finnic 22.8
East_Scandinavian 8.1
Saami 1.5

edit 15.8.17 time 17:45

A British results.  It looks like Irish with more Mediterranean and minor Central European admixture..

Northwest_European 81.6
Mediterranean 10.3
Baltic 3.4
Central_European 2.9
AMBIG_European 1.8

tiistai 27. kesäkuuta 2017

Estonian Corded Ware enigma

The following simple dstat-figure shows the mystery of Estonian Corded Ware samples released during this spring.   There can't be any populational continuum from them to present-day Balts, including Estonians.  All thousands years older hunter-gatherer samples are overwhelmingly closer present-day Balts.  The change regarding HG ancestry can be seen in Western and Central Europe where we see a clear cut decrease of HG ancestry, obviously caused by increasing real Corded Ware and Bell Beaker ancestries.   We have to compare pure Neolithic populations against Estonian CW samples to reach parity in the Baltic area.  There is a tiny evidence about the given continuum;  Finns are closer German BA samples than Balts, giving a hint that there could be some subtle continuum.   

lauantai 24. kesäkuuta 2017

Yamnaya and Bell Beaker drift and ratio in present-day Europe

Following statistics gives an insight into how the Bronze Age Steppe ancestry transforms to a modern Northwest European genetic model and gives an idea of differences seen in Europe today.   I made free tests:

f3(Yamnaya Samara, X: Ju_hoan North)
f3(Bell Beaker Germany, X: Ju-hoan_North)
dstat(Bell Beaker Germany, Yamnaya Samara: X, Ju_hoan_North)

All results are based on around 450000 SNP's.

Results of F3_statistics were standardized to a common value 1 and also dstat-results were standardized separately to value 1.  The results show that a Yamnaya type ancestry is still significant in East Europe and the turning line from Yamnaya to Bell Beaker goes from Western Finland to Lithuania and Belarusia.   European farmer or Middle Eastern ancestry becomes dominant in South Europe leading to decreasing Bell Beaker ancestry in absolute terms.

lauantai 17. kesäkuuta 2017

Estonian Corded Ware was not Corded Ware

Despite of the common chronology the Corded Ware in Estonia was genetically a historical misstep if we believe dstat-statistics using samples of German Corded Ware and Bell Beaker cultures.  All Northern Europeans are closer German than Estonian samples.

torstai 15. kesäkuuta 2017

Shared drift with ancient Latvian, Estonian and British samples

Briefly said, shared drift of Latvian samples from Jones et al.   I have rebuild all samples using bam-files straight from the study and a new genotyping algorithm designed for ancient samples. 

perjantai 9. kesäkuuta 2017

British Viking Age samples placed on the genetic map

I got recently new samples from quite a new study, link here.   It looks more like a technical test than actual sampling for a purpose to study history, but anyway I sampled the data.  So far I have available eleven Viking Age samples from UK and have now tested them.   The data consist of around ten samples from each population, with exceptions of Swedes.  Only two Swedish samples were available for my mega-snp data base, both from the study "Genomic analyses inform on migration events during the peopling of Eurasia" (Luca Pagani 2016). The first one was from Nyköping, the second was without any place declaration.

The PCA lacks of a few Viking Age samples due to being too bad thus canceled by the outlier check.   British Viking Age samples look like to be German, but I should remind that PCA is based on dedicated components rather than genetic similarity in basis of the whole genome.   Let's see how those samples look in a formal analysis.  I have made several tests to give different views, for the reason that populations don't place in tests on one or two dimensional axis.

We see that in formal tests Swedes are closest to British Viking Age samples, followed by Irish and Scottish samples.  One straightforward conclusions could be that those Viking Age people were mixed Scandinavians and Celts/Britons. One bizarre remark:  Swedish samples are on the PCA prone to bias towards Finns and Norwegian samples show less this kind of similarity.  Still Swedish samples are closer those Viking Age samples from UK.   I have not tested this curio using formal analyses, but as far as I know this will be true in all tests.

edit 11.6.2017  12:50

German samples are from Leipzig.

torstai 25. toukokuuta 2017

PCA grouping of N1c1-haplogroup

Earlier I used TMRCA (time to the most recent common ancestor) calculation in making PCA analyses of YDNA clades, the analysis is here.  Now I use same method for grouping haplogroup N1c1.  The data was gathered from the  FamilyTreeDna's open project.   TMRCA calculation give only estimations, but  the result makes more sense because every cell in the TMRCA data is compared to every other cell.  I used 67 markers to get largest possible data.  Only a few Ftdna kits show less markers.

Download original picture here

Now I had only a few Altaic and Ugric samples.  More those samples would make possible to see the distance between Altaic/Ugric and European groups.  The result indicates three European groups:  Baltic,  Chuds and Finnish.  Actually also West Chuds are Finnish, but as far as I know it is prehistorically shared with Estonians.   The most distinct group is the Finnish one, implying local origin, despite of random distribution in North Scandinavia and Russia.

Download original picture here

The next picture shows what happen after removing Finnish clades (despite of the locations).  West and East Chuds cluster together and North Balts come close on the y-axis.  West-, East- and Central Balts cluster again.  The root group includes all samples not belonging to any named clades, but doesn't indicate any specific branch.

Download original picture here

After removing also all Chuds the picture shows more details.   We see that North Balts and Rurikids cluster together (with one classified Fennoscadinavian)  and all Balts make another cluster.  

torstai 6. huhtikuuta 2017

Estonian Comb Ceramic and Corded Ware cultures inherited to us

Thanks for the new study "Extensive farming in Estonia started through a sex-biased migration from the Steppe" I have now great new samples from Estonia dated to 4,500 to 6,300 years before present and representing local Comb Ceramic and Corded Ware cultures.   I have made dstat-analyses pointing out the comparative presence of those cultures among present-day populations. The data consisted of 11 millions SNP's to ensure reasonable coverage between ancient and present-day samples.

tiistai 14. maaliskuuta 2017

Haplotype sharing analysis, part two: Asian connections in Europe

Chromopainter is a software grouping phased data into so called chunks.  Created chunks are a practical implementation of haplotypes.  Usually Chromopainter is used with Finestructure or Globetrotter.  Finestructure reads an input coancestry matrix of individuals created by Chromopainter, which is not the best way to analyze shared chunks between populations, because it doesn't allow you to assign a coancestry connection between populations.  At least I didn't find to way to do it.  Chromopainter does it perfectly and it gives an option to use other softwares in analyzing results.  You can assign donors and recipients at population level.  This of course doesn't mean that the chunk flow goes from donor to recipient, because it is only my definition, but it defines perfectly what is common between population pairs. It neither tells us admixtures, for example the sharing between population x and Saamis tells only how much they share common chunks, not for example how much of  shared chunks are common with putative Siberians, if those Siberians even exist today.

Unfortunately my data is rather limited, some populations are well represented, some other are built only of a few samples.  In future I probably will do more similar tests and try to improve the data.  Just now I consider this step as a showcase of a new method.


edit 14.3.17 17:30

It looks like this works and it is time to play with real data.  Following small test shows how German, Icelandic and Polish haplotype references sort clearly out German and Balto-Slavic speakers, implying higher resolution than genotype data.  

keskiviikko 8. maaliskuuta 2017

Haplotype sharing analysis, part one: Europe

The following analysis was done using softwares Shapeit, Chromopainter and Finestructure.  Shapeit phasing conversion was aided by the 1000genomes V3 phasing reference.  The Finestructure report was run using chunk counts generated by Chromopainter.  Before runnig Finestructure the chunk counts file was modified to avoid "chunk leak" of population with low effective populations size.  I had earlier tested this dilemma and found that small populations being oversampled in respect to the effective population size give erroneous results due "chunk leak" towards other poipulations.  Both Shapeit and Chromopainter uses fixed effective populations size over all populations.  The remedy was to standardize intrapopulational chunk sharing to the average of all intrapopulational sharings.

Finestructure results showed also another weakness;  it is not able to treat big genetic distances in way giving readable graphic results.  For that reason I left East Uralic populations and Saamis away from this test.  I'll be back with them later.

Test conditions

- 10 randomly selected samples per population
- includes only the first chromosome
- around 40000 SNP's

Russians are from Kargopol.

Here is a link to the original gif-file, click here.


 I have also tested a new software developed by Estonian researchers called MixFit.  MixFit is a small software searching best fits using Chromopainter output.   It has a shortcoming making the fit only for three admixtures.   I tested the FinnMostCW group using same Chromopainter output as in my previous test (plus Saamis and east Uralics) and running several samples I accomplished more than three admixtures by calculating average distributions.

FinnLocal 0,3764146
West European 0,2435327
Estonian 0,2265092
Baltic 0,07271973
East FU / Saami 0,0841223

perjantai 3. maaliskuuta 2017

A short view: Were Scythians behind the Asian admixture in the European side of Russia?

As I earlier proved the Siberian admixture among Baltic Finns didn't come from East with them, it was already in Finland in the time when Baltic Finnic people reached Fennoscandinavia.  My statistics showed that rare alleles being found from Russia and Asia are in Finland just at the same level as in other European countries.

Looking closely the Asian admixture in Russia we can stretch the rare allele source to the Altay region.  How did Altaian admixture can be found in Mordvins?  Was it brought by Scythians or Mongols?  I don't know, but the fact is that it is there.

Scythian sphere according Wikipedia

For adjacent information about Mordva/Moksha see the supplementary figure 11.