Late settlement Finns

If you have followed my blog, you may remember what I wrote about the Finnish settlements (Testing 23andme's Ancestry Composition continues / Finnish results), here:

“The term late settlement is used by Finnish historians and means areas which were mostly populated during the Swedish era by administrative transactions (king’s orders) or by occupying areas in wars between Sweden and Novgorod/ later Moscow.   This means actually that the age of Finnish reference group (used by 23andme) is around 500-700 years and people living in older settlements …”


We have a problem (sure) when analyzing young expanding settlements and populations, because they have more genetic drift than old populations. In fact old rural populations have also genetic drift, but only very locally.  I encountered this same problem also with the Utah-CEU samples and resolved it by selecting least drifted of them.  It is obvious that I can’t do this same with Finnish samples without becoming questioned as a neutral actor.   I wrote also about the genetic drift, here  (the chapter written in English):


“At first I found that all groups with high genetic drift due to isolation will strongly distort the result.   It was easy to see the effect of genetic drift and the consequential distortion, for example I dropped out a lot of HGDP-CEU samples being too homogeneous or drifted.  Young genetic drift generating own genetic componenents in analyses inside one sample group doesn’t figure their older common history with other groups, the reason why groups with genetic drift are useless in searching the common history of  populations.   They will also affect the root population where they come from.  This kind of genetic drift can be found from rapid expansions in some subpopulation, like in villages or in smaller cultural communities.”


So what to do?


If we want to see behind the birth of local settlements we should get rid of the genetic drift in results and prevent the PCA generating drift components.  It is not difficult at all.  We should only be aware of the sample size (number of samples) that triggers the formation of young drift components.  It can be anything between two or tens samples, depending on the sample set.  Now I don’t exactly know how many late settlement Finns I need to reach the threshold value (because I have not them enough).  But I don’t need to know it, because even a few late settlement Finns belonging to the same root population show the trend, where they belong without genetic drift and where they came from (or where at least a significant part of their ancestors came from).    I can add more samples, nothing will change until I reach the threshold value where PCA starts generating drift components on the dimensions we want to see. 




The first PCA include same Eurasian samples which I used when analyzing old settlement Finns.  In this case Finnish samples (SK0001, SK0002 and SK0003) are located clearly inside the North Russian cluster, but on the opposite side than Slavic Belarussians.  After adding more late settlement Finns this would look more dramatic.  I can’t avoid making a conclusion that Northern Russians (Vologda people and Mordvas) are a mixture of Finnish look-alike people and Slavs. 





You can see an image with better resolution here


Secondly here is the same European plot as before with old settlement Finns.  Now a lot of Caucasian and Eurasian components are missing compared to the Eurasian plot and Finnish samples move towards Lithuanians who represent the gene pool around the eastern Baltic Sea region.   This effect would be stronger with Estonians, and strongest with late settlement Finns.  This happens due to the gene flow between late settlement Finns and old settlement Finns and between them and Estonians.  I don't know whether this gene swap happened before they adopted North Russian genes, or after that.  Maybe the Baltic-Finnic gene pool was much more widespread before the Slavic expansion.  





You can see an image with better resolution here


My last graphic shows how those three Finns under the test were related to the effective PCA components.  Sorry, this is available only for the European PCA, I was too lazy to work with the bigger Eurasian data.   




