Sunday, June 21, 2020

My Mytrueancestry results

Before expanding my own analyses to this area I shortly publish something of my own results from MyTrueAncestry.  I know, here are many readers who don't give much credit to MTA and likely there are false and true results.  However, I have seen that those results are  usually in line with their customers' origins.

We see that my results focus on Estonian and Germanic aDna, but in my opinion the most interesting part of this is my older Trans-Volga Forest Steppe - Sintashta - Andronovo aDna.  Were  Trans-Volga and  Sintashta cultures the same and one culture speaking IE language, or parallel cultures,  the first speaking Uralic and the latter Indo-Iranian language?

Tuesday, May 5, 2020

Creating average genetic samples

Often especially ancient samples have poor quality, although sample number can be reasonable.  I made a simple Linux shell script making an average sample from a sample group, giving possibility to increase sample quality to a reasonable level for analyses based on allele frequencies.  The script reads EIGENSTRAT-format and the result is formed by picking alleles randomly from pooled samples.  Unfortunately Linux shell scripts don't support indexed files and I had to make some compromises to keep run time reasonable. The result of this script will not work with analyses based on IBD's or principal components, so it is not possible to use it f.ex. as an input file of the popular Eurogenes G25 test, but this should work with all analyses on Dodecad platform.  If someone is not familiar with these semantics it is very possible that the outcome is disappointing.  The script is freely available here.   I forgot, you need also a rs-id file and it is available here.  It should be unzipped to the same directory with the script.   Some results below using available academic samples and based on my own models:

Medieval Nomads

East_Asian 42.4
Uralic 24.5
Siberian 15.9
Northeast_European 6.1
Mediterranean 4.4
East_Scandinavian 3.3
Fennoscandinavian 2.2
AMBIG_European 1.2

Iberian Chalcolithic

Mediterranean 80.4
Northwest_European 15.6
Central_European 2.7
East_Scandinavian 1.4

Hungarian Bronze Age

Mediterranean 86.4
Central_European 11.3
Fennoscandinavian 1.1
East_Scandinavian 1.2

Polish Bronze Age

Northwest_European 39.0
Slavic 37.8
Fennoscandinavian 11.9
Central_European 6.5
East_Scandinavian 3.6
Baltic 1.2

Estonian Iron Age

Baltic 57.1
Slavic 26.1
Fennoscandinavian 11.5
Finnic 2.8
East_Scandinavian 2.2

edit 8.5. 15:30

The script edited so that it will accept also rs-id's in input, the original version accepted only  concatenation id's (chr:location).  Please notice that only hg19/GRCh37 mappings are possible.  New version is available here.

Wednesday, February 26, 2020

Projecting or imputing

Purpose:  to test whether imputation of missing SNP's is better than use projection algorithms.  It is understood that using projection algorithms closely corresponds the result achieved by data without missing SNP's. 

Object:  the Finnish-Saami relation in the context of their local admixture and Siberian admixtures in general.  I assume that these results can be generalized, although my mathematical skills are not enough strong that I could tell the situations in which projection could be better than imputation or vice versa. 

Background:  we have scientific evidence about contacts between Iron Age Saami and Finnish population and today these two populations share partially common ancestry.

Method:  running PCA-plots using imputed and full coverage present-day Finnish and full coverage Saami samples, ancient Saami samples and running comparative plots using SmartPCA projections and imputations.  The imputation was done using Beagle software using combined reference data including samples from projects SGDP and 1000 genomes. SmartPCA projection setup was lsqproject YES and autoshrink YES.   Both projection parameters were set to NO in the test figuring out the effect of projection itself.

The basic plot based on the full Human Origin SNP space (no missing SNP's).  Finns from the 1000 genomes project.

The plot using FamilyTreeDna and 23andme Finnish samples with coverage around 30% of the HO data set.  No projection algorithms. The location of Finnish samples is determined by filling missing SNP's by average values of the PCA analysis.

Same as above, but now with projection algorithms ( lsqproject and autoshrink). Finns still have lost Saami specific admixture and group with North Russians.

Same as above, but now Finnish FamilyTreeDna and 23andme samples are imputed.  Finns are back almost in the same place as lossless 1000-genome samples in the first picture.

Present-day Saamis replaced by ancient Saamis (Levaluhta and Chalmyvarre).  Finnish and ancient Saami samples both projected.

Same as above, but now Finnish samples imputed.  Projection used.

Same as above, but both sample groups imputed. No projection.


Thursday, January 23, 2020

Baltic Sea Vikings and wandering Germanic groups

Following the idea presented in my previous post, I compared in a wide picture the common genetic drift between modern Europeans and those loosely understood Viking groups reported in my results.   Before 3Pop-analyses something has to be done, all snp-sets has to be standardized.  I can guarantee that if you put data sets into analysis at random you will get simply garbage.  Actually also Dstat calls for extra attention, but especially 3Pop is a pitfall if you have not done any pre-analysis.  Even though in my previous Dstat I was able to use commercial Finnish groups without data standardization, the same wouldn't work with 3Pop.  To get a valid Finnish samples representing Southwestern Finns and covering the Human Origin data set,  I made a 3Pop-pre-analysis  selecting best 1000genomes samples corresponding my genealogically confirmed project samples. 

Oeselian samples:

X04 dated 420BC
V12 dated 215BC

Wednesday, January 22, 2020

Trans-Volga Forest Steppes CWC and Finns

This topic became into my mind after seeing my personal results provided by a new service,  Without judging what they do I simply borrow some of their ideas based on my results.  I see three components they propose to me

- Trans-Volga Forest Steppes CWC

- Viking Sweden

- Medieval Oeselian Saaremaa


Additionally the analysis included two Iron Age Germanic speaker groups: Allemannic and Lombardic groups.   Their method searches IBD similarity between customer data and ancient samples.  No process description is available, so I can't say much more about it.   My following tests show Dstat-similarity comparing two Finnish groups,  Southwest and East Finns, to present-day Europeans in comparison to those ancient samples.  This time I used Finnish samples from my own project and all samples are confirmed by genealogical research and are not defined by any approximation using 1000genomes data.   


Ancient data:







The 7th century Alemannic burial site at Niederstotzingen in southern Germany, used from 580 to 630 AD. 13 human skeletal remains, 10 adults and 3 infants. 10 adult samples used.


Szolad, Hungary. Graves dated to the middle of the sixth century.  Two samples SZ4 and SZ16.

Bolshoy Oleni Ostrov

Five samples dated around 3500BP, with high Siberian admixture and carrying N-L392/N-L1026 male haplotype.

For comparison Iberian Bell-Beaker samples.

In following statistics positive numbers stand for more similarity between Finnish groups and ancient samples, negative values more similarity between other Europeans and ancient samples.

We see that Swedes and Latvians fit well with Lombards, but Alemannic people only with Latvians, the Latvians being a top match. Trans-Volga CWC and Sigtuna fits well with Finns and Latvians and best with Latvians.   Actually Sigtuna IA is a bit closer Southwest Finns than two Swedish Human Origin samples.  Bolshoy Oleni Ostrov samples are strongly related to Finns, even so that East Finns are closer BOO samples than Southwestern Finns, despite the fact that in my results Levaluhta samples are closer Southwestern Finns. Iberian BB doesn't fit well with present-day Iberians (1000genomes IBS).


edit 23.1.20 15:00

Added dstat-analyses showing Western Finnish Iron Age Saami results (Levaluhta).  Both Finnish groups are top matches, followed by Swedish and Latvian sample groups.