keskiviikko 26. helmikuuta 2020

Projecting or imputing

Purpose:  to test whether imputation of missing SNP's is better than use projection algorithms.  It is understood that using projection algorithms closely corresponds the result achieved by data without missing SNP's. 

Object:  the Finnish-Saami relation in the context of their local admixture and Siberian admixtures in general.  I assume that these results can be generalized, although my mathematical skills are not enough strong that I could tell the situations in which projection could be better than imputation or vice versa. 

Background:  we have scientific evidence about contacts between Iron Age Saami and Finnish population and today these two populations share partially common ancestry.

Method:  running PCA-plots using imputed and full coverage present-day Finnish and full coverage Saami samples, ancient Saami samples and running comparative plots using SmartPCA projections and imputations.  The imputation was done using Beagle software using combined reference data including samples from projects SGDP and 1000 genomes. SmartPCA projection setup was lsqproject YES and autoshrink YES.   Both projection parameters were set to NO in the test figuring out the effect of projection itself.

The basic plot based on the full Human Origin SNP space (no missing SNP's).  Finns from the 1000 genomes project.

The plot using FamilyTreeDna and 23andme Finnish samples with coverage around 30% of the HO data set.  No projection algorithms. The location of Finnish samples is determined by filling missing SNP's by average values of the PCA analysis.

Same as above, but now with projection algorithms ( lsqproject and autoshrink). Finns still have lost Saami specific admixture and group with North Russians.

Same as above, but now Finnish FamilyTreeDna and 23andme samples are imputed.  Finns are back almost in the same place as lossless 1000-genome samples in the first picture.

Present-day Saamis replaced by ancient Saamis (Levaluhta and Chalmyvarre).  Finnish and ancient Saami samples both projected.

Same as above, but now Finnish samples imputed.  Projection used.

Same as above, but both sample groups imputed. No projection.


torstai 23. tammikuuta 2020

Baltic Sea Vikings and wandering Germanic groups

Following the idea presented in my previous post, I compared in a wide picture the common genetic drift between modern Europeans and those loosely understood Viking groups reported in my results.   Before 3Pop-analyses something has to be done, all snp-sets has to be standardized.  I can guarantee that if you put data sets into analysis at random you will get simply garbage.  Actually also Dstat calls for extra attention, but especially 3Pop is a pitfall if you have not done any pre-analysis.  Even though in my previous Dstat I was able to use commercial Finnish groups without data standardization, the same wouldn't work with 3Pop.  To get a valid Finnish samples representing Southwestern Finns and covering the Human Origin data set,  I made a 3Pop-pre-analysis  selecting best 1000genomes samples corresponding my genealogically confirmed project samples. 

Oeselian samples:

X04 dated 420BC
V12 dated 215BC

keskiviikko 22. tammikuuta 2020

Trans-Volga Forest Steppes CWC and Finns

This topic became into my mind after seeing my personal results provided by a new service,  Without judging what they do I simply borrow some of their ideas based on my results.  I see three components they propose to me

- Trans-Volga Forest Steppes CWC

- Viking Sweden

- Medieval Oeselian Saaremaa


Additionally the analysis included two Iron Age Germanic speaker groups: Allemannic and Lombardic groups.   Their method searches IBD similarity between customer data and ancient samples.  No process description is available, so I can't say much more about it.   My following tests show Dstat-similarity comparing two Finnish groups,  Southwest and East Finns, to present-day Europeans in comparison to those ancient samples.  This time I used Finnish samples from my own project and all samples are confirmed by genealogical research and are not defined by any approximation using 1000genomes data.   


Ancient data:







The 7th century Alemannic burial site at Niederstotzingen in southern Germany, used from 580 to 630 AD. 13 human skeletal remains, 10 adults and 3 infants. 10 adult samples used.


Szolad, Hungary. Graves dated to the middle of the sixth century.  Two samples SZ4 and SZ16.

Bolshoy Oleni Ostrov

Five samples dated around 3500BP, with high Siberian admixture and carrying N-L392/N-L1026 male haplotype.

For comparison Iberian Bell-Beaker samples.

In following statistics positive numbers stand for more similarity between Finnish groups and ancient samples, negative values more similarity between other Europeans and ancient samples.

We see that Swedes and Latvians fit well with Lombards, but Alemannic people only with Latvians, the Latvians being a top match. Trans-Volga CWC and Sigtuna fits well with Finns and Latvians and best with Latvians.   Actually Sigtuna IA is a bit closer Southwest Finns than two Swedish Human Origin samples.  Bolshoy Oleni Ostrov samples are strongly related to Finns, even so that East Finns are closer BOO samples than Southwestern Finns, despite the fact that in my results Levaluhta samples are closer Southwestern Finns. Iberian BB doesn't fit well with present-day Iberians (1000genomes IBS).


edit 23.1.20 15:00

Added dstat-analyses showing Western Finnish Iron Age Saami results (Levaluhta).  Both Finnish groups are top matches, followed by Swedish and Latvian sample groups.







lauantai 28. joulukuuta 2019

Some ancestral changes in Iron Age Estonia

QpAdm was designed to detect admixtures giving also probability and standard error statistics.  Two kind of parameters are inputted: admixture candidates and outgroup populations.  The result quality depends on both inputted groups.  They can be incomplete in many ways.  Outgroups should represent some ancestral junctions of tested populations, giving ancestral differences.  If there are multiple outgroups representing same ancestral junction they should give a coherent picture.   It is not a  foregone conclusion in every case because every ancestral branch lives its own history.  That is why it is not easy to choose outgroups. Many other problems exist too.  If there are in tested groups several ancestrally close populations there will be big standard errors due to common genetic drift of those groups.  This means that qpAdm is more suitable for testing very ancient admixtures which are distinctly detectable.  Following tests, made using Iron Age populations in the Baltic area, are only directional.     
Iron Age Estonians

                                Estonian_BA / Baltic_IA / Scania_IA
best coefficients:     0.334     0.613     0.053
Jackknife mean:      0.325213405     0.531536401     0.143250193
      std. errors:        0.326     0.452     0.490

chisq: 5.922
tail prob: 0.655969

The result imply that the best available result calls for all three ancestral populations, one from Sweden, but sorting of them is difficult due to largely common genetic drift.

Sample 0LS10 (Kunda, Lääne-Viru. EST IA 770–430 BC H13a1a1a N3a3'5)

                                Estonian_IA / Saami
best coefficients:     0.847     0.153
Jackknife mean:      0.846059225     0.153940775
      std. errors:     0.059     0.059

chisq: 4.147
tail prob: 0.901439

The standard error is much lower due to lesser common genetic drift.  In this case the total error is probably generated by the Finnish admixture in present-day Saami samples.  The eastern admixture of 0LS10 is very purely eastern and barely shows any Finnish ancestry.  Would I say that it is question about proto-Saami ancestry...

keskiviikko 11. joulukuuta 2019

New "Viking sword find" in North Estonia turned out to be of local origin?

I read about this find a month ago and my first thought was that yes, has to be Finnish.  Why? Just because Finland is the nearest place where we have seen plenty of these swords and connections between Finland and North Estonia were still tight during that time.  Now also Estonian research deny the Scandinavian origin.  One reason for this new idea was the find of a brooch with typical Finnish-North Estonian shape.  Not much, but obviously something more is to be published sooner or later.  Especially so called crawfish brooches were distinctive in Finland, but I have to wait more detailed pictures before saying more because Estonian vocabulary can differ from what I have seen.  The new article states:

"Crossbow-shaped brooches were usually worn by warriors from southwestern Finland and northwestern Estonia on the passage that intersects with the main thoroughfare of the Eastern Route," Kiudsoo said."

I'll wrote more after more details are revealed, including pictures of those finds.

lauantai 7. joulukuuta 2019

Increasing western influence in Iron Age Estonia

Despite of premature rumors of eastern influence in Iron Age Estonia the  change in the beginning of the Iron Age came from the west.  Saag et al. 2019 seemingly gives contradictory information, because given samples tell another story.    I grouped those samples into two groups depending on sample timings.

Estonian BA

Estonian_BA V9_2 1060–850 BC R1a1’2
Estonian_BA V14_2 1280–1050 BC R1a1’2
Estonian_BA V16_1 730–390 BC R1a1’2
Estonian_BA X11_1 1030–890 BC R1a
Estonian_BA X14_1 780–430 BC R1a1c
Estonian_BA X20_1 900–800 BC R1a

Estonian IA

Estonian_IA V11_1 390–200 BC -
Estonian_IA V12_1 360–40 BC N3a3a
Estonian_IA VII3_1 380–180 BC ?
Estonian_IA VIII5_2 75–300 AD R1a
Estonian_IA VIII7_1 75–200 AD -
Estonian_IA VIII8_1 75–200 AD R1a1c
Estonian_IA VIII9_1 75–200 AD -

I made the following PCA to fulfill needs of those who are interested in Siberian admixtures, want to see possible Siberian connections in context of Baltic Finnic and Volga Finnic languages in terms of genes if it is even possible. PCA plots are always made for certain purposes and every individual plot tells certain true story, but never the full story.
Notice also that so called Levanluhta outlier found from Iron Age Ostrobothnia fits well with the Viking group and Vikings don't match well with present-day Scandinavian and British people.

Update 9.12.2019 21:00

Now added a PCA plot figuring ancient Estonian N1c1 samples.

0LS10 Kunda Lääne-Viru, EST IA 770–430 BC M XY H13a1a1a N3a3 0 5
V12 Kurevere, Saare, EST IA 360–40 BC M XY I1a1c N3a3a
VII4 Vohma lääne-Viru, EST IA 760–400 BC M XY T1a1b N3a3a
IIa Karja, Saare, EST MA 1230–1300 AD M XY H3h1 N3a3a
IIf Otepää, Valga, EST MA 1360–1390 AD M XY T2b N3a3a
IIg Pada, Lääne-Viru, EST MA 1210–1230/1240 AD M XY U4a2b N3a3a

0LS10 drifts towards Siberian admixed groups and IIa drifts towards Finns and Saamis.  Both deviants are however too weak to be considered as real migrants and their locations probably represent only admixture.   All N1c1 samples are more comparable to those more western Iron Age Estonians than to more Baltic-like Bronze Age Estonians.

sunnuntai 17. marraskuuta 2019

Mitochondrial lineages in Iron Age Fennoscandinavia, comments

The distribution between lines U and H in Finland has been a point of discussion now.  Many thinkers have underlined that in a big view H is connected to Neolithic farmers and U to Mesolithic hunter gatherers.  Although it is interesting to combine different point of views, sometimes people forget the context and tend to overreact, as in this case has happened too.   Overall the transition to farming happened in Southern Europe around 7000-5000 BCE and in Northern Europe 3000-2000 BCE.  In light of this history all settlements in Finland are extremely young and in Southern Finland differences between female and male haplogroups manifest more migration routes from different compass points and different farmer populations, rather than introduction of Finnish agriculture.  Although I don't believe that this fact is totally missed, wrong highlighting can't be any good. Populations and migrations varied depending on time, during the latest 2000 years especially in the East Europe.   In my opinion in this point researchers have lost the substance, which would hold a question "where did they come and why so much later than in most other places in Europe".  The answer is to be discovered by comparing female and male haplogroups and types in Finland and in possible contact areas.