Sunday, June 21, 2020

My Mytrueancestry results

Before expanding my own analyses to this area I shortly publish something of my own results from MyTrueAncestry.  I know, here are many readers who don't give much credit to MTA and likely there are false and true results.  However, I have seen that those results are  usually in line with their customers' origins.

We see that my results focus on Estonian and Germanic aDna, but in my opinion the most interesting part of this is my older Trans-Volga Forest Steppe - Sintashta - Andronovo aDna.  Were  Trans-Volga and  Sintashta cultures the same and one culture speaking IE language, or parallel cultures,  the first speaking Uralic and the latter Indo-Iranian language?

Tuesday, May 5, 2020

Creating average genetic samples

Often especially ancient samples have poor quality, although sample number can be reasonable.  I made a simple Linux shell script making an average sample from a sample group, giving possibility to increase sample quality to a reasonable level for analyses based on allele frequencies.  The script reads EIGENSTRAT-format and the result is formed by picking alleles randomly from pooled samples.  Unfortunately Linux shell scripts don't support indexed files and I had to make some compromises to keep run time reasonable. The result of this script will not work with analyses based on IBD's or principal components, so it is not possible to use it f.ex. as an input file of the popular Eurogenes G25 test, but this should work with all analyses on Dodecad platform.  If someone is not familiar with these semantics it is very possible that the outcome is disappointing.  The script is freely available here.   I forgot, you need also a rs-id file and it is available here.  It should be unzipped to the same directory with the script.   Some results below using available academic samples and based on my own models:

Medieval Nomads

East_Asian 42.4
Uralic 24.5
Siberian 15.9
Northeast_European 6.1
Mediterranean 4.4
East_Scandinavian 3.3
Fennoscandinavian 2.2
AMBIG_European 1.2

Iberian Chalcolithic

Mediterranean 80.4
Northwest_European 15.6
Central_European 2.7
East_Scandinavian 1.4

Hungarian Bronze Age

Mediterranean 86.4
Central_European 11.3
Fennoscandinavian 1.1
East_Scandinavian 1.2

Polish Bronze Age

Northwest_European 39.0
Slavic 37.8
Fennoscandinavian 11.9
Central_European 6.5
East_Scandinavian 3.6
Baltic 1.2

Estonian Iron Age

Baltic 57.1
Slavic 26.1
Fennoscandinavian 11.5
Finnic 2.8
East_Scandinavian 2.2

edit 8.5. 15:30

The script edited so that it will accept also rs-id's in input, the original version accepted only  concatenation id's (chr:location).  Please notice that only hg19/GRCh37 mappings are possible.  New version is available here.

Wednesday, February 26, 2020

Projecting or imputing

Purpose:  to test whether imputation of missing SNP's is better than use projection algorithms.  It is understood that using projection algorithms closely corresponds the result achieved by data without missing SNP's. 

Object:  the Finnish-Saami relation in the context of their local admixture and Siberian admixtures in general.  I assume that these results can be generalized, although my mathematical skills are not enough strong that I could tell the situations in which projection could be better than imputation or vice versa. 

Background:  we have scientific evidence about contacts between Iron Age Saami and Finnish population and today these two populations share partially common ancestry.

Method:  running PCA-plots using imputed and full coverage present-day Finnish and full coverage Saami samples, ancient Saami samples and running comparative plots using SmartPCA projections and imputations.  The imputation was done using Beagle software using combined reference data including samples from projects SGDP and 1000 genomes. SmartPCA projection setup was lsqproject YES and autoshrink YES.   Both projection parameters were set to NO in the test figuring out the effect of projection itself.

The basic plot based on the full Human Origin SNP space (no missing SNP's).  Finns from the 1000 genomes project.

The plot using FamilyTreeDna and 23andme Finnish samples with coverage around 30% of the HO data set.  No projection algorithms. The location of Finnish samples is determined by filling missing SNP's by average values of the PCA analysis.

Same as above, but now with projection algorithms ( lsqproject and autoshrink). Finns still have lost Saami specific admixture and group with North Russians.

Same as above, but now Finnish FamilyTreeDna and 23andme samples are imputed.  Finns are back almost in the same place as lossless 1000-genome samples in the first picture.

Present-day Saamis replaced by ancient Saamis (Levaluhta and Chalmyvarre).  Finnish and ancient Saami samples both projected.

Same as above, but now Finnish samples imputed.  Projection used.

Same as above, but both sample groups imputed. No projection.


Thursday, January 23, 2020

Baltic Sea Vikings and wandering Germanic groups

Following the idea presented in my previous post, I compared in a wide picture the common genetic drift between modern Europeans and those loosely understood Viking groups reported in my results.   Before 3Pop-analyses something has to be done, all snp-sets has to be standardized.  I can guarantee that if you put data sets into analysis at random you will get simply garbage.  Actually also Dstat calls for extra attention, but especially 3Pop is a pitfall if you have not done any pre-analysis.  Even though in my previous Dstat I was able to use commercial Finnish groups without data standardization, the same wouldn't work with 3Pop.  To get a valid Finnish samples representing Southwestern Finns and covering the Human Origin data set,  I made a 3Pop-pre-analysis  selecting best 1000genomes samples corresponding my genealogically confirmed project samples. 

Oeselian samples:

X04 dated 420BC
V12 dated 215BC

Wednesday, January 22, 2020

Trans-Volga Forest Steppes CWC and Finns

This topic became into my mind after seeing my personal results provided by a new service,  Without judging what they do I simply borrow some of their ideas based on my results.  I see three components they propose to me

- Trans-Volga Forest Steppes CWC

- Viking Sweden

- Medieval Oeselian Saaremaa


Additionally the analysis included two Iron Age Germanic speaker groups: Allemannic and Lombardic groups.   Their method searches IBD similarity between customer data and ancient samples.  No process description is available, so I can't say much more about it.   My following tests show Dstat-similarity comparing two Finnish groups,  Southwest and East Finns, to present-day Europeans in comparison to those ancient samples.  This time I used Finnish samples from my own project and all samples are confirmed by genealogical research and are not defined by any approximation using 1000genomes data.   


Ancient data:







The 7th century Alemannic burial site at Niederstotzingen in southern Germany, used from 580 to 630 AD. 13 human skeletal remains, 10 adults and 3 infants. 10 adult samples used.


Szolad, Hungary. Graves dated to the middle of the sixth century.  Two samples SZ4 and SZ16.

Bolshoy Oleni Ostrov

Five samples dated around 3500BP, with high Siberian admixture and carrying N-L392/N-L1026 male haplotype.

For comparison Iberian Bell-Beaker samples.

In following statistics positive numbers stand for more similarity between Finnish groups and ancient samples, negative values more similarity between other Europeans and ancient samples.

We see that Swedes and Latvians fit well with Lombards, but Alemannic people only with Latvians, the Latvians being a top match. Trans-Volga CWC and Sigtuna fits well with Finns and Latvians and best with Latvians.   Actually Sigtuna IA is a bit closer Southwest Finns than two Swedish Human Origin samples.  Bolshoy Oleni Ostrov samples are strongly related to Finns, even so that East Finns are closer BOO samples than Southwestern Finns, despite the fact that in my results Levaluhta samples are closer Southwestern Finns. Iberian BB doesn't fit well with present-day Iberians (1000genomes IBS).


edit 23.1.20 15:00

Added dstat-analyses showing Western Finnish Iron Age Saami results (Levaluhta).  Both Finnish groups are top matches, followed by Swedish and Latvian sample groups.







Saturday, December 28, 2019

Some ancestral changes in Iron Age Estonia

QpAdm was designed to detect admixtures giving also probability and standard error statistics.  Two kind of parameters are inputted: admixture candidates and outgroup populations.  The result quality depends on both inputted groups.  They can be incomplete in many ways.  Outgroups should represent some ancestral junctions of tested populations, giving ancestral differences.  If there are multiple outgroups representing same ancestral junction they should give a coherent picture.   It is not a  foregone conclusion in every case because every ancestral branch lives its own history.  That is why it is not easy to choose outgroups. Many other problems exist too.  If there are in tested groups several ancestrally close populations there will be big standard errors due to common genetic drift of those groups.  This means that qpAdm is more suitable for testing very ancient admixtures which are distinctly detectable.  Following tests, made using Iron Age populations in the Baltic area, are only directional.     
Iron Age Estonians

                                Estonian_BA / Baltic_IA / Scania_IA
best coefficients:     0.334     0.613     0.053
Jackknife mean:      0.325213405     0.531536401     0.143250193
      std. errors:        0.326     0.452     0.490

chisq: 5.922
tail prob: 0.655969

The result imply that the best available result calls for all three ancestral populations, one from Sweden, but sorting of them is difficult due to largely common genetic drift.

Sample 0LS10 (Kunda, Lääne-Viru. EST IA 770–430 BC H13a1a1a N3a3'5)

                                Estonian_IA / Saami
best coefficients:     0.847     0.153
Jackknife mean:      0.846059225     0.153940775
      std. errors:     0.059     0.059

chisq: 4.147
tail prob: 0.901439

The standard error is much lower due to lesser common genetic drift.  In this case the total error is probably generated by the Finnish admixture in present-day Saami samples.  The eastern admixture of 0LS10 is very purely eastern and barely shows any Finnish ancestry.  Would I say that it is question about proto-Saami ancestry...

Wednesday, December 11, 2019

New "Viking sword find" in North Estonia turned out to be of local origin?

I read about this find a month ago and my first thought was that yes, has to be Finnish.  Why? Just because Finland is the nearest place where we have seen plenty of these swords and connections between Finland and North Estonia were still tight during that time.  Now also Estonian research deny the Scandinavian origin.  One reason for this new idea was the find of a brooch with typical Finnish-North Estonian shape.  Not much, but obviously something more is to be published sooner or later.  Especially so called crawfish brooches were distinctive in Finland, but I have to wait more detailed pictures before saying more because Estonian vocabulary can differ from what I have seen.  The new article states:

"Crossbow-shaped brooches were usually worn by warriors from southwestern Finland and northwestern Estonia on the passage that intersects with the main thoroughfare of the Eastern Route," Kiudsoo said."

I'll wrote more after more details are revealed, including pictures of those finds.