The Baltic question

After getting Latvian samples it is good time to try to make an effort to define Baltic genetic position among neighbors.  As far as I had only Lithuanians I was not sure are they representative for all Balts.  In numerous tests Lithuanians are seen as least mixed North Europeans.  This can be true, but even then they have to resemble their neighbors in south, east and north.   I try to find this out using PCA, but there are some general problems regarding it.  The position of each sample on PCA mainly depends on two factors:  1) common components with other samples and 2) components differing from other samples.  There is no problem with the first factor, because it reveales something we are just searching for, but the second factor is problematic.  In practice the second one consists of genetic drift between local samples or distinct admixtures.  Genetic drift is something that makes some populations differing from all other populations, because it is a fully local attribute.   So, to make an objective view showing genetic relations between populations we should decrease genetic drift.  There is a simple way to do it – reducing the population sample size on PCA.  Those who have more math skills can give pedantic explanations why this is true, I can only say that in practice you can reduce the effect of local genetic drift by reducing sample size, and increase it by inflating sample size. 

We see that Balts are very close Slavs, but also in the minimum of y value.  This means that the Balts show something more than any other population on the plot (with exception of two Belasrussian). By redusing the Baltic sample size we can see what happens if we get rid of this local and excessive Baltic attribute, which doesn’t imply directly any large scale commonness among other Europeans.  On the other hand eastern Finnic groups are placed on the plot to the maximum of Y value.  Can we assume that populations living almost in neighborhood, like Balts and Karelians could form genetic extremities within whole Europe?  I don’t believe it, it is question about local genetic drift which makes things look weird.  To reduce the drift I reduced the Baltic sample size from 16 to 8 and here is the result:

You can see a new loose cluster including Balts and Finnic groups, excluding Finns who are closer Scandinavians.  The Balts are still close Slavs, but now between them and Finnic groups in a manner pretty much corresponding with the geography.    

I have here a bit more information regarding the divergence of tested populations.  If any of sample groups show smaller divergence than average and are undoubtedly overrepresented the result will be biased on PCA.   Following numbers are gathered from the SmartPCA output and sorted by divergence.  (These numbers represent only the available academic data, not real populations. I am quite sure that for example the divergence in Sweden is higher.)

