tiistai 20. joulukuuta 2016

New data gives 11 million SNP's

Thanks for latest updates of free gene banks and hard work of several projects I have now been able to increase the SNP number to 11 millions per sample.  Increased amount of SNP's means increased accuracy whether I use all SNP's (especially in drift analyses)  or after pruning ld.  In my new data base I have combined samples from three sources:  the 1000 genomes project, Estonian Biocentre samples used by Pagani eta al. 3026 and Simons Genome Diversity Project (SGDP).  For the present the sample size is only 866.

Here, as a showcase of the new data two PCA prints and some comments.  Instead of making a central continental European picture I included four outgroups to see the effect.  Those outgroups are Armenians, Mongolians, Sardinians and Saamis.   For the present individual names are picked straight from original sources and can be somewhat ambiguous.

As we can see we have several clusters, which makes possible to evaluate the data.  For example Scandinavians of SGDP and Pagani et al. cluster with East Europeans.

Personally I don't give much attention to PCA-figures, because the result depends on the selected samples, amounts, ratios between populations sizes, about how mixed are individuals etc.  My upcoming high resolution tests will be much more interesting.

Added time 12:50

If someone is interested in how Mordvas locate on this map.  They are very similar to North Russians and RusKU (Pagani et al.) and move towards Mongolians. Baltic Finns moves towards Saamis.  Sorry, GIMP makes something unwanted with colors.


2 kommenttia:

  1. The Greek sample clustering with Armenians must be from an Anatolian Greek (most probably from an eastern Pontian Greek giving its strong clustering with Armenians). This explains why it is not in the South European genetic cluster.

    Note: I have seen many Anatolian Greek genetic results and know well how genetically different they are from Balkan Greeks.

    1. Very possible, these high resolution samples are quite diverse and in many cases atypical. Another example Pagani's Finns actually live in Estonia, are likely Estonian citizens and they don't give exact ancestral information. Instead of this SGDP's samples are likely from 1000genomes, newly scanned of course. Pretty good work. Pagani's decision is inexplicable, like searching Brits from America instead of using already available British samples. They also give wrong locations (geographic coordinates).