Since the last post I have
done a lot of testing, I have tried to find limitations of the
analysing tool as well as increase my own understanding what all results
mean. There is still much work to do,
but I am going forward piece by piece and I try to shed light on the Finnish
genetic history. In his purpose started
my shared LD tests from present-day populations, not from the ancient ones, although
it would be more intriguing to resolve big historical questions in our deep
past.
Data
The data mainly consists of
publicly available academic samples.
Everyone can download same samples over internet. Additionally I have a few Finnish,
genealogically classified Finnish samples.
I use them to categorize public Finnish samples, because the public data
includes some Finns with foreign admixture.
Finns 96 1000genomes
Finns 7 my own collection
Norwegians 15 other sources
Poles 10 other sources
Belarussians 9 Est.BC
Chuvashes 16 Est.BC
Estonians 13 Est.BC
Lithuanians 10 Est.BC
Maris 15 Est.BC
Mordvas 14 Est.BC
Ukrainians 16 Est.BC
Swedes 3 my own collection
Preparing data
I found the maximum overlap
being in my data around 550000 SNP and the minimum around 290000 SNP. The number under the test varies depending on
the selected references and target populations.
I found also that the minimum SNP space for reliable results is over 20
million SNP’s. It is however likely that
larger individual sample sizes would give steadier LD-sharings and smoother
roll-off curves than larger sample amount, as well as also less standard error. It
would be better to have millions individual SNP’s, but I didn’t see big quality
differences when comparing curves in this test to other similar results
achieved by authors using same programs and I suggest that our data is quite
similar in terms of reasonable results
Preparing the Finnish data
In the first step I ran a
European level PCA figure to see possible foreign admixture and removed 13 Finnish
samples locating to the west from my genealogical west Finns. Secondly I ran a new PCA including only Finnish samples, grouping it
to three portions: 19 most eastern
samples (excluded 11 outliers), 17 most
western samples (including my genealogical West Finns) and the rest forming an
intermediate group. By this arrangement it
was possible to have distinct eastern and western groups, but also a working
Finnish reference (56 samples), suggesting that the intermediate group probably consists of purest present-day Finns.
Software
My aim is to use at first
Reich’s programs starting with Rolloff.
Rolloff is a software outputting LD-sharings from target populations filtered
by two reference populations. You can
search different mixing routes for the target by changing references. It also gives an estimate for the admixture
time. This dating suggests one pulse
admixture between the target and references, so continuous gene flow will give
erroneous admixture times, but still showing real admixture.
Results
All analyses are run using
Rolloff’s defaults, with exception of the resolution being 0.5 cM instead
of 1 cM. I tested both values and didn’t notice the
lower value increasing standard error, just conversely the lower resolution
reduced it a bit. I also noticed that
Alder (another roll-off program) uses this lower value. The lower the value is the more we get
LD-transaction. Too high resolution
however increases statistic noise.
These results were surprising,
but the truth is that similar shared LD-tests obviously have never been done
before regarding Finns, so I had no expectations. I can only say that if someone sees these
results unexpected, do not shoot the messenger, I prefer repeating my tests,
perhaps under tighter quality control, if you wish. I would be happy to see new results to
evaluate possible differences.
These results suggest that the
Finnish genetic shape is an outcome of
several migrations and admixture events, more than I could expect using PCA
and formal admixture analyses based on averagely LD-pruned data. The big genetic difference (in Fst-distances)
between East and West Finland might be more due to the migration history than
genetic drift. Eastern Finnish results
show rather young southeastern or eastern admixture history (Mordvas), while
western results show older southern admixture (present-day Belarussians). Both groups show also northeastern admixture
(Mari->Saami?). It is possible that
those three populations are all proxies, most likely this is true in case of
Belarussians.
The common history with
present Scandinavians is smaller and older than expected, but this doesn’t rule
out possible ancient regional migrations from there to Finland. Unfortunately I have not enough samples to
check it and regarding Scandinavian migrations to Finland before the Swedish
era in Finland my expectations are more focused at ancient genomes. It is worth noticing that I removed all
known foreign admixture, including obvious Finland-Swedish samples. It was possible, thanks to my genealogical
western Finnish data.
It looks like no particular
Estonian migration existed to Finland since the common language diverged and
southern migrations to Finland bypassed Estonia.
I am going to find out admixture
amounts in following analyses.
Admixture times for Finnish
people
Related roll-off graphics
Related PCA-plots
PCA dimensions 1 and 3
edit 9.11.14
I got
yesterday a feedback that I could verify my results by checking the French
admix among Finns. My first thought was
oh no, I am not going to start
qualifying the software which has been used in several academic studies. It is in principle unfair to ask me to do such thing. But
then I rethought it. Why not, but using
Spaniards I could check if the admixture time fits to the Stone Age and to the times when southern migration waves expanded to Finland. Here are the result.
Admixture time 197.139 generations +-55.497 = 5914 years +-1665 years.