perjantai 7. marraskuuta 2014

The long and dark shadow of history

Since the last post I have done a lot of testing, I have tried to find limitations of the analysing tool as well as increase my own understanding what all results mean.  There is still much work to do, but I am going forward piece by piece and I try to shed light on the Finnish genetic history.  In his purpose started my shared LD tests from present-day populations, not from the ancient ones, although it would be more intriguing to resolve big historical questions in our deep past.


The data mainly consists of publicly available academic samples.  Everyone can download same samples over internet.  Additionally I have a few Finnish, genealogically classified Finnish samples.  I use them to categorize public Finnish samples, because the public data includes some Finns with foreign admixture.  

Finns 96  1000genomes
Finns 7 my own collection
Norwegians 15 other sources
Poles 10 other sources
Belarussians 9 Est.BC
Chuvashes 16 Est.BC
Estonians 13 Est.BC
Lithuanians 10 Est.BC
Maris 15 Est.BC
Mordvas 14 Est.BC
Ukrainians 16 Est.BC
Swedes 3 my own collection

Preparing data

I found the maximum overlap being in my data around 550000 SNP and the minimum around 290000 SNP.  The number under the test varies depending on the selected references and target populations.  I found also that the minimum SNP space for reliable results is over 20 million SNP’s.  It is however likely that larger individual sample sizes would give steadier LD-sharings and smoother roll-off curves than larger sample amount, as well as also less standard error.   It would be better to have millions individual SNP’s, but I didn’t see big quality differences when comparing curves in this test to other similar results achieved by authors using same programs and I suggest that our data is quite similar in terms of reasonable results

Preparing  the Finnish data 

In the first step I ran a European level PCA figure to see possible foreign admixture and removed 13 Finnish samples locating to the west from my genealogical west Finns.  Secondly I ran a new PCA  including only Finnish samples, grouping it to three portions:  19 most eastern samples (excluded  11 outliers), 17 most western samples (including my genealogical West Finns) and the rest forming an intermediate group.  By this arrangement it was possible to have distinct eastern and western groups, but also a working Finnish reference (56 samples), suggesting that the intermediate group probably consists of purest present-day Finns.  


My aim is to use at first Reich’s programs starting with Rolloff.  Rolloff is a software outputting  LD-sharings from target populations filtered by two reference populations.  You can search different mixing routes for the target by changing references.  It also gives an estimate for the admixture time.  This dating suggests one pulse admixture between the target and references, so continuous gene flow will give erroneous admixture times, but still showing real admixture.  


All analyses are run using Rolloff’s defaults, with exception of the resolution being 0.5 cM instead of  1 cM.   I tested both values and didn’t notice the lower value increasing standard error, just conversely the lower resolution reduced it a bit.  I also noticed that Alder (another roll-off program) uses this lower value.   The lower the value is the more we get LD-transaction.  Too high resolution however increases statistic noise.   
These results were surprising, but the truth is that similar shared LD-tests obviously have never been done before regarding Finns, so I had no expectations.  I can only say that if someone sees these results unexpected, do not shoot the messenger, I prefer repeating my tests, perhaps under tighter quality control, if you wish.  I would be happy to see new results to evaluate possible differences. 

These results suggest that the Finnish genetic shape is an outcome of several migrations and admixture events, more than I could expect using PCA and formal admixture analyses based on averagely LD-pruned data.  The big genetic difference (in Fst-distances) between East and West Finland might be more due to the migration history than genetic drift.  Eastern Finnish results show rather young southeastern or eastern admixture history (Mordvas), while western results show older southern admixture (present-day Belarussians).  Both groups show also northeastern admixture (Mari->Saami?).  It is possible that those three populations are all proxies, most likely this is true in case of Belarussians. 

The common history with present Scandinavians is smaller and older than expected, but this doesn’t rule out possible ancient regional migrations from there to Finland.  Unfortunately I have not enough samples to check it and regarding Scandinavian migrations to Finland before the Swedish era in Finland my expectations are more focused at ancient genomes.  It is worth noticing that I removed all known foreign admixture, including obvious Finland-Swedish samples.  It was possible, thanks to my genealogical western Finnish data.

It looks like no particular Estonian migration existed to Finland since the common language diverged and southern migrations to Finland bypassed Estonia.  

I am going to find out admixture amounts in following analyses. 

Admixture times for Finnish people

Related roll-off graphics

Related PCA-plots

PCA dimensions 1 and 2

PCA dimensions 1 and 3

edit 9.11.14

I got yesterday a feedback that I could verify my results by checking the French admix among Finns.  My first thought was oh no,  I am not going to start qualifying the software which has been used in several academic studies.  It is in principle unfair to ask me to do such thing.  But then I rethought it.  Why not, but using Spaniards I could check if the admixture time fits to the Stone Age and to the times when southern migration waves expanded to Finland.  Here are the result. 
Admixture time   197.139  generations  +-55.497 = 5914 years +-1665 years.

Ei kommentteja:

Lähetä kommentti