Controlling data

It is somewhat coincidental what we get when looking for genetic samples for our analyses.   We don't know whether our samples are typical representatives for people they should represent according to the given title.  Usually researchers confirm us only grandparents of gathered samples belonging to the mentioned group.  But are they third generation immigrants, villagers from same village, do they speak same language or belong to some certain cultural group - we don't usually know.  We ought to have pretty much trust in the coordination of researchers and what they have done all over the world.  It would be a good idea to report some key figures about used samples and going further to compare these key figures between public data bases and studies (using same SNP-sets of.c).   After that we could see whether results are comparable. 

Here are two key figures for my European samples.

1. Similarity

This graphics figures the similarity of each population as an average of shared IBS between samples in each population (136835 SNPs):

2.  Level of homozygosity

This figures the average homozygosity of each population (same data as above):

   In both cases Finns belong to the selected old settlement group and CEU samples are selected samples with low genetic drift, most CEU samples owning significant genetic drift. 

