If you have followed my blog, you
may remember what I wrote about the Finnish settlements (Testing 23andme's Ancestry Composition continues / Finnish results), here:
“The term late settlement is used by Finnish
historians and means areas which were mostly populated during the Swedish era
by administrative transactions (king’s orders) or by occupying areas in wars
between Sweden and Novgorod/ later Moscow.
This means actually that the age of Finnish reference group (used by
23andme) is around 500-700 years and people living in older settlements …”
We have a problem (sure) when
analyzing young expanding settlements and populations, because
they have more genetic drift than old populations. In fact old rural populations
have also genetic drift, but only very locally. I encountered this same problem also with the
Utah-CEU samples and resolved it by selecting least drifted of them. It is obvious that I can’t do this same with
Finnish samples without becoming questioned as a neutral actor. I
wrote also about the genetic drift, here (the chapter written in English):
“At first
I found that all groups with high genetic drift due to isolation will
strongly distort the result. It was easy to see the effect of
genetic drift and the consequential distortion, for example I dropped out a lot
of HGDP-CEU samples being too homogeneous or drifted. Young genetic
drift generating own genetic componenents in analyses inside one sample group
doesn’t figure their older common history with other groups, the reason why
groups with genetic drift are useless in searching the common history of
populations. They will also
affect the root population where they come from. This kind of genetic drift can be found from
rapid expansions in some subpopulation, like in villages or in smaller cultural
communities.”
So what
to do?
If we want to see behind the
birth of local settlements we should get rid of the genetic drift in results and
prevent the PCA generating drift components.
It is not difficult at all. We
should only be aware of the sample size (number of samples) that triggers the
formation of young drift components. It
can be anything between two or tens samples, depending on the sample set. Now I don’t exactly know how many late
settlement Finns I need to reach the threshold value (because I have not them
enough). But I don’t need to know it,
because even a few late settlement Finns belonging to the same root population show
the trend, where they belong without genetic drift and where they came from (or
where at least a significant part of their ancestors came from). I can add more samples, nothing will change
until I reach the threshold value where PCA starts generating drift
components on the dimensions we want to see.
Results
The first PCA include same
Eurasian samples which I used when analyzing old settlement Finns. In this case Finnish samples (SK0001, SK0002
and SK0003) are located clearly inside the North Russian cluster, but on the
opposite side than Slavic Belarussians. After
adding more late settlement Finns this would look more dramatic. I can’t avoid making a conclusion that
Northern Russians (Vologda people and Mordvas) are a mixture of Finnish
look-alike people and Slavs.
You can see an image with better
resolution here
Secondly here is the same
European plot as before with old settlement Finns. Now a lot of Caucasian and Eurasian
components are missing compared to the Eurasian plot and Finnish samples move
towards Lithuanians who represent the gene pool around the eastern Baltic Sea
region. This effect would be stronger
with Estonians, and strongest with late settlement Finns. This happens due to the gene flow between late
settlement Finns and old settlement Finns and between them and Estonians. I don't know whether this gene swap happened before they adopted North Russian genes, or after that. Maybe the Baltic-Finnic gene pool was much more widespread before the Slavic expansion.
You can see an image with better
resolution here
My last graphic shows how those
three Finns under the test were related to the effective PCA components. Sorry, this is available only for the
European PCA, I was too lazy to work with the bigger Eurasian data.
No comments:
Post a Comment
English preferred, because readers are international.
No more Anonymous posts.