Saturday, August 16, 2025

Yakutian LNBA, the urheimat of Uralic languages?

 While reading the study "Ancient DNA reveals the prehistory of the Uralic and Yeniseian peoples"  I noticed that according to the presented f4 analysis, the distance of Yakutia LNBA (LNBA - Late Neolithic and Bronze Age) from the Tatars is smaller than the distance of Cis-Baikal LNBA from Tatars. I found it strange, because the Tatars are significantly closer to the Baikal region. My second observation was related to the connection between languages and genetics. The study proves that the genetic distance between the Yakutia LNBA and the speakers of the Uralic languages proves that the Yakutia LNBA is the original population of the Uralic languages. However, the study does not make a comparison between all language groups in relation to the Yakutia LNBA samples, but uses as evidence the occurrence of a single Yakutia LNBA-type sample in the assumed area of occurrence of the Uralic language group. The emphasis is placed on the Seima Turbino phenomenon. 


Seima Turbino was a Bronze Age multi-ethnic trade channel extending from northwestern China to the regions of Finland and Estonia, and Seima Turbino has been thought to be connected to the westward spread of the Uralic languages. However, linguistic theory does not define the eastern home of the Uralic languages on the basis of Seima Turbino. The evidence for the presence of Seima Turbino is purely archaeological. Whether this unity of the Uralic languages is true or not, Seima Turbino was a multi-ethnic commercial and cultural phenomenon in which the Uralic languages may have been involved, but the evidence of a few ancient genomes cannot be significant.


I decided to perform a similar F4 test to see the distances of the Yakutia and Cisbaikal LNBA samples to the present-day samples of the different language groups. All the samples used in that test are available on the Reichlab website. The only exception compared to the Yakutia LNBA samples from the study is that I did not include one sample classified as Yakutia LNBA in the study, supposed to have a Seima Turbino origin and in this respect I only kept to the local Yakutia LNBA samples. This sample is presumably not relevant to the F4 results.


The study has been presented in Nature magazine in its final form, but is unfortunately behind a paywall there. The preprint from 2023 can be found at https://www.biorxiv.org/content/10.1101/2023.10.01.560332v2.full#ref-89.


All Yakutia LNBA and Cisbaikal LNBA samples are ready-made genotype files from public samples shared by Reichlab. There may be more samples from these two LNBA-era populations in the study. I have downloaded the bam-format samples used in the study, but I will leave them for later use for now.

Southern Uralic Mordovian and Mishar Tatars seem to be more like Cisbaikal LNBA than like Yakutia LNBA.  I don't know why my results using Reichlab genotypes and Reichlab program qpDstat gives different results. Maybe my readers can run tests to confirm which one is correct.

My results:







Study results:



















Yakutia LNBA samples:
N4a1
N4b2
YAK021
YAK022
YAK024

Cisbaikal LNBA samples:
irk022
irk025
irk036
irk057
irk033
irk034
irk040
DA358
DA360
irk071
irk075
irk008
irk061
irk068
irk017
mak026
DA334
DA336
DA337
DA339
KAG002 
KPT001
KPT002
KPT003
KPT004
KPT006
STB001
ZPL001
ZPL002
I1526
I7335
I7759
I7779
I7780
I7782
I8296
DA343
DA353
DA356
DA361

Update 17.8.25 09.30. Used samples are listed in the study, in  "supplementary data 3" 

Friday, August 1, 2025

Late Iron Age and historic Finnish samples

I then processed those Finnish less than 1000 years old samples to see how close they are to the average of modern Finns. The processing took about 11 days on an antique Intel i7-4770k processor (old, but still powerful in sequential processing) core load 4x95%, core temperature 50° C  (outdoor temp was around 30° C).  The processor is overclocked to 4 Ghz and equipped with heatpipe cooling.


The fst table is attached here, according to which these ancient samples are Finnish, excluding poor quality samples and one outlier.  The first number represents Fst, the second margin of error.  Estonian and Finnish samples for comparison.  For some reason Swedish Human Origin samples are of low quality and excluded in this test. 

AncFinns Albanian.HO 0.011 0.0007 22.774
AncFinns Basque.HO 0.016 0.0006 33.853
AncFinns Belarusian.HO 0.005 0.0007 16.687
AncFinns Bulgarian.HO 0.009 0.0007 21.411
AncFinns CEU.SG 0.008 0.0004 34.300
AncFinns Croatian.HO 0.008 0.0007 20.183
AncFinns Estonian.HO 0.004 0.0007 14.664
AncFinns FIN 0.0 0.0004 15.356
AncFinns GBR.SG 0.007 0.0004 32.802
AncFinns Hungarian.HO 0.006 0.0006 19.687
AncFinns Italian_North.HO 0.011 0.0006 27.213
AncFinns Karelian.HO 0.002 0.0006 13.159
AncFinns Lithuanian.HO 0.007 0.0007 19.564
AncFinns Mordovian.HO 0.007 0.0006 20.099
AncFinns Norwegian.HO 0.007 0.0007 19.322
AncFinns Orcadian.HO 0.009 0.0007 23.084
AncFinns Romanian.HO 0.01 0.0007 23.574
AncFinns Rusnorth.HO 0.004 0.0006 17.414

Estonian.HO Albanian.HO 0.008 0.0005 17.099
Estonian.HO Basque.HO 0.013 0.0003 42.936
Estonian.HO Belarusian.HO 0.002 0.0003 4.656
Estonian.HO Bulgarian.HO 0.006 0.0004 16.452
Estonian.HO CEU.SG 0.005 0.0003 17.522
Estonian.HO Croatian.HO 0.005 0.0003 14.002
Estonian.HO FIN 0.004 0.0003 14.716
Estonian.HO GBR.SG 0.005 0.0003 18.039
Estonian.HO Hungarian.HO 0.003 0.0003 11.087
Estonian.HO Italian_North.HO 0.009 0.0003 28.091
Estonian.HO Karelian.HO 0.005 0.0003 15.439
Estonian.HO Lithuanian.HO 0.002 0.0004 5.323
Estonian.HO Mordovian.HO 0.003 0.0002 12.244
Estonian.HO Norwegian.HO 0.005 0.0004 15.060
Estonian.HO Orcadian.HO 0.007 0.0003 21.023
Estonian.HO Romanian.HO 0.006 0.0004 17.358
Estonian.HO Rusnorth.HO 0.001 0.0002 4.661

FIN Albanian.HO 0.011 0.0004 26.164
FIN Basque.HO 0.016 0.0002 64.162
FIN Belarusian.HO 0.006 0.0003 20.520
FIN Bulgarian.HO 0.009 0.0003 29.894
FIN CEU.SG 0.008 0.0002 47.108
FIN Croatian.HO 0.008 0.0003 29.320
FIN Estonian.HO 0.004 0.0003 14.716
FIN GBR.SG 0.008 0.0002 47.075
FIN Hungarian.HO 0.006 0.0002 33.404
FIN Italian_North.HO 0.011 0.0002 49.400
FIN Karelian.HO 0.002 0.0002 9.903
FIN Lithuanian.HO 0.007 0.0003 26.003
FIN Mordovian.HO 0.006 0.0002 37.043
FIN Norwegian.HO 0.007 0.0003 24.096
FIN Orcadian.HO 0.009 0.0003 36.037
FIN Romanian.HO 0.01 0.0003 33.117
FIN Rusnorth.HO 0.004 0.0002 27.333


A little explanation of why I consider the use of such samples to be completely pointless for the study of Finnish prehistory.  I am so enthusiastic about studying Finnish migrations and this age of the samples are of course disappointing.  It may be that these can provide additional information about local development in Finland. 

Archaeologists and linguists estimate that the Baltic Finns came to the Baltic Sea region 3000 years ago and the Finnish language arrived in Finland about 1700 years ago. With this timing, these samples cannot possibly give a development trajectory different from that of modern Finns in relation to ancient migrations. Determining the origin described as important would require the use of much older samples. If we take the 1700 year arrival in Finland as the starting point, then the samples from the Merovingian Age would represent 8-20 generations in Finland and a corresponding time of mixing with the populations already living here. This  corresponds to the time that the settlers have now lived in Kuusamo, Finland after Finnish settlers came there. 

Edit 1.8.2025 20.45 Polish, IBS, TSI, German and Saami results removed due to low quality (IBS, TSI and Saami due to DG-sample set  problems).