Wednesday, February 26, 2020

Projecting or imputing

Purpose:  to test whether imputation of missing SNP's is better than use projection algorithms.  It is understood that using projection algorithms closely corresponds the result achieved by data without missing SNP's. 

Object:  the Finnish-Saami relation in the context of their local admixture and Siberian admixtures in general.  I assume that these results can be generalized, although my mathematical skills are not enough strong that I could tell the situations in which projection could be better than imputation or vice versa. 

Background:  we have scientific evidence about contacts between Iron Age Saami and Finnish population and today these two populations share partially common ancestry.

Method:  running PCA-plots using imputed and full coverage present-day Finnish and full coverage Saami samples, ancient Saami samples and running comparative plots using SmartPCA projections and imputations.  The imputation was done using Beagle software using combined reference data including samples from projects SGDP and 1000 genomes. SmartPCA projection setup was lsqproject YES and autoshrink YES.   Both projection parameters were set to NO in the test figuring out the effect of projection itself.

The basic plot based on the full Human Origin SNP space (no missing SNP's).  Finns from the 1000 genomes project.

The plot using FamilyTreeDna and 23andme Finnish samples with coverage around 30% of the HO data set.  No projection algorithms. The location of Finnish samples is determined by filling missing SNP's by average values of the PCA analysis.

Same as above, but now with projection algorithms ( lsqproject and autoshrink). Finns still have lost Saami specific admixture and group with North Russians.

Same as above, but now Finnish FamilyTreeDna and 23andme samples are imputed.  Finns are back almost in the same place as lossless 1000-genome samples in the first picture.

Present-day Saamis replaced by ancient Saamis (Levaluhta and Chalmyvarre).  Finnish and ancient Saami samples both projected.

Same as above, but now Finnish samples imputed.  Projection used.

Same as above, but both sample groups imputed. No projection.