After
getting Latvian samples it is good time to try to make an effort to define
Baltic genetic position among neighbors.
As far as I had only Lithuanians I was not sure are they representative
for all Balts. In numerous tests Lithuanians
are seen as least mixed North Europeans.
This can be true, but even then they have to resemble their neighbors in
south, east and north. I try to find
this out using PCA, but there are some general problems regarding it. The position of each sample on PCA mainly
depends on two factors: 1) common
components with other samples and 2) components differing from other samples. There is no problem with the first factor,
because it reveales something we are just searching for, but the second factor
is problematic. In practice the second
one consists of genetic drift between local samples or distinct admixtures. Genetic drift is something that makes some
populations differing from all other populations, because it is a fully local
attribute. So, to make an objective
view showing genetic relations between populations we should decrease genetic
drift. There is a simple way to do it –
reducing the population sample size on PCA.
Those who have more math skills can give pedantic explanations
why this is true, I can only say that in practice you can reduce the effect of
local genetic drift by reducing sample size, and increase it by inflating
sample size.
We see that
Balts are very close Slavs, but also in the minimum of y value. This means that the Balts show something more
than any other population on the plot (with exception of two Belasrussian). By redusing the Baltic sample size we can see what happens if we get rid of this local and excessive Baltic attribute, which doesn’t imply directly any large scale commonness among other Europeans.
On the other hand eastern Finnic groups are placed on the plot to the
maximum of Y value. Can we assume that
populations living almost in neighborhood, like Balts and Karelians could form genetic
extremities within whole Europe? I don’t
believe it, it is question about local genetic drift which makes things look
weird. To reduce the drift I reduced the
Baltic sample size from 16 to 8 and here is the result:
Unfortunately SmartPCA flipped the x-axis.
You can see
a new loose cluster including Balts and Finnic groups, excluding Finns who are
closer Scandinavians. The Balts are
still close Slavs, but now between them and Finnic groups in a manner pretty much
corresponding with the geography.
edit 24.8.2015 18:40
I have here a bit more information regarding the divergence of tested populations. If any of sample groups show smaller divergence than average and are undoubtedly overrepresented the result will be biased on PCA. Following numbers are gathered from the SmartPCA output and sorted by divergence. (These numbers represent only the available academic data, not real populations. I am quite sure that for example the divergence in Sweden is higher.)
edit 24.8.2015 18:40
I have here a bit more information regarding the divergence of tested populations. If any of sample groups show smaller divergence than average and are undoubtedly overrepresented the result will be biased on PCA. Following numbers are gathered from the SmartPCA output and sorted by divergence. (These numbers represent only the available academic data, not real populations. I am quite sure that for example the divergence in Sweden is higher.)