Thursday, February 20, 2014

Looking closer: The Finnish cluster in Europe

New PCA's

A new PCA analysis including only Europeans.   The first plot including dimensions 1 and 2 shows for Finns East European and Scandinavian affinities, more precisely for the Eastern Finns the East European affinity closer eastern Vologda samples and the Western Finns match closer western Vologda samples and Belarussians.    Scandinavian affinity on second dimension looks quite constant for all Finns.   This result could mean that the common affinity between Finns and Scandinavians is older than the present genetic shape of these populations (Finns and Scandinavians) and the difference in the East European affinity among Finns idicates also different origins for Eastern and Western Finns in Eastern Europe. This explanation is supported also by the position of Scandinavian and some CEU samples who seem to diverge from Central Europeans toward Finns (dimension 2). The second plot, showing dimensions 1 and 3, confirms the East European affinity of Finns.  Scandinavians keep their position near Central Europeans. 

Full size image available here

Full size image available here

Retesting Finestructure

I have a new interesting experiment with Finestructure.   I played with it a couple years ago testing it several days and found some problems.  It seemed to exaggerate young and small isolated populations by increasing their similarities with other populations in straight relation to the shared haplotypes within the isolation.  This effect is a bit difficult to describe exactly by words although testing revealed it certainly. After seeing this problem I decided to give up with Finestructure.  Maybe the problem was partially in Chromopainter and it’s data selections.   Now, in my new experiment I used IBS data as an input to the Finestructure.  The authors of Finestructure claim that IBS data is much worse than the data generated by Chromopainter.  My results show that IBS data gives fair good results, better than in my previous tests with Chromopainter data which was pretty much distorted by the problem I described above.   Pretty much testing ought to be done before this can be proved stone cold, and even then many readers will not believe me now, so just now it is enough for me to know this.   I think that the effect is basically the same I have seen so many times in population genetic tests;   many analyses “infer” that most homogeneous populations, owning highest shared genome similarity, are root populations.  This means that those analyses often select typical genetic features from most homogeneous subpopulations.  But this is a big mistake; in many cases the most homogeneous subpopulation is younger than the root population, the one they once diverged from.     In other words:  the portrait doesn’t come better by increasing contrast, because we likely will lose nuances and sometimes also profile resembles the neighbor.    
So I spent a little time more with testing and I think that I found a way to avoid the false gene pumping between populations described above.   Definitely, this test done with IBS-data looks very clean.   Some problems can be seen in sorting out Mediterranean populations, but in Northern and Eastern Europe the grouping result is amazing.   The Finns cluster with Northern Russians, which is of course right and follows the known history.  Northwest Russia was the homeland for many Finnic speaking groups before the russification took place.    Tree model can’t figure simultaneously the Finns belonging also to the Fennoscandinavian group.    

The graphic shows also affinities between individuals.

Full size image available here

Friday, February 7, 2014

PCA update, a Scandinavian sample more

I have now finished the sample gathering and will go on with more specific analyses.   Here are the last update of PCA plots with the basic data.  I think that all PC- analyses have been quite good, matching well with first-rate academic studies I have seen.

Europe, dimensions 1 and 2:

Full size image available here

Europe, dimensions 1 and 3:

Full size imag available here

Thursday, February 6, 2014

Mordovian affinity in Finland

The genetic origin of Finns is for the big public unknown.  If you follow the public discussion,   including random scientific views, you find ideas that resemble more a conspiracy theory than common sense.  It is hard to find neutral scientific work with natural explanations.  I am going to bite a small piece of this cake.   Unfortunately I can’t do very inclusive work because I have not same resources than professional scientists working with resources they have, but I have my liberty to do my own work.  It is not away from anyone.   It is possible to do fair work if you familiarize yourself with methods and data, and you know something about the Finnish history. 

Test goals

My goal is to see what is common with Eastern Finns and two key populations in Russia; Mordovians and North Asians.   Finnish scientists define Eastern Finns as a mixture of Karelians and Tavastians.   The history of Tavastians is quite well documented by linguists and archaeologists from the first millennium and there is no reason to ponder who they are. Tavastians are ancient Finns who have lived in Finland a very long time.  The history of Karelians is less clear.  They appeared to the historical scene around 1000 years ago, just after the Slavic eastern expansion started.  During those times numerous Finnic speaking groups lived in the area of present-day Russia.   There are many possibilities for the origin of Karelians; they can be their own origin or a mix of several Finnic groups to the east from Finland or in the neighborhood of ancient Tavastians in Finland.   Today Finnish enthusiasts have done great work with the East Finnish Ydna and resolved routes where their paternal lines came from.   Mostly Eastern Finnish paternal results suggest the origin from Russia, inside the area from Finland to near Moscow.   But my goal is to see inside autosomal genetic ancestry.  Autosomal genes tell more about admixtures between populations. 

Additionally I try to find out the possible Asian admix among Eastern Finns.   As I earlier found the North Asian admixture usually seen among Finns is vague.  Look at here and here.  It exists only if we use samples from Mediterranean or Atlantic regions, or from Central Europe with prominent Mediterranean or Atlantic affinity.   Obviously those samples dominate the test arrangement and assign some Northeast European genetic attributes to North Asian, if they are common with them.  The lesson would be:  don’t use presumable non-ancestral samples if you want to make good analyses in certain time span. Easier said than done.


Test arrangement


  • Han Chinese – defining East Asian affinity
  • Nganassans  – defining North Asian affinity
  • Mordovians – defining Northeast European / East-Finnic ancestry
  • Belorussians – defining Slavic ancestry
  • Scandinavians – defining Fennoscandinavian ancestry
  • East Finns – 3 samples
  • West Finns – 5 samples for comparison

Data selections and preparation

  • Asian and North/Northeast European groups are around in same size to prevent under/oversampling in finding Asian admixtures.
  • Mordovians are selected using pre-analysis including individuals with most Finnish looking genetic profile.   Another end of Mordovians resembles Slavs.
  •  Nganassans are pruned, without outliers.  Outliers had European admixture around 10-50%. 
  •  Scandinavians are the group with least Finnish admixture, selected using pre-analysis.  Two of them in my Scandinavian sample group (7 individuals) were dropped out for this reason.



  • Structure 2.3.4 (rel. 2012)
  • run parameter: initial 20000 cycles, analysis 200000 cycles




                                                  Pop       1      2      3      Individuals

                                                  Han:     0,996  0,003  0,000       10
                                                  Ngan:   0,971  0,029  0,000        9
                                                  Mrd:     0,003  0,279  0,718        5
                                                  EFinn:  0,004  0,303  0,693        3
                                                  Belarus:0,001  0,014  0,985        5
                                                  Scand:   0,001  0,016  0,983        5




                                                      Pop       1      2      3      4      Individuals

                                                      Han:     0.022  0.977  0.000  0.001       10
                                                      Ngan:   0.044  0.955  0.000  0.000        9
                                                      Mrd:     0.297  0.000  0.698  0.005        5
                                                      EFinn:  0.304  0.001  0.692  0.002        3
                                                      Belarus:0.027  0.000  0.973  0.000        5
                                                      Scand:   0.027  0.000  0.970  0.003        5




                                                     Pop       1      2      3      4      5      Individuals

                                                     Han:     0.007  0.680  0.313  0.000  0.000       10
                                                     Ngan:   0.011  0.249  0.740  0.000  0.000        9
                                                     Mrd:     0.808  0.000  0.001  0.013  0.178        5
                                                     EFInn:  0.790  0.000  0.000  0.209  0.000        3
                                                     Belarus:0.999  0.000  0.000  0.001  0.000        5
                                                     Scand:   0.997  0.000  0.000  0.002  0.001        5


 West Finns, K=3


The result shows for Western Finns around a half of the supposed Northeast European admix, being in the halfway between Scandinavians and Mordovians.  This fits well with Finnish genetic studies where Western Finns are in the halfway between Eastern Finns (Eastern Finns and Mordovians have equally NE-European affinity) and Scandinavians. Western Finns trigger a bit different Northeast European cluster than Eastern Finns, including some common affinity between Western Finns, Mordovians, Belorussians and Scandinavians.



                                                    Pop       1      2      3      Individuals

                                                    Han:     0.022  0.978  0.000       10
                                                    Ngan:   0.045  0.955  0.000        9
                                                    Mrd:     0.347  0.000  0.653        5
                                                    WFinn: 0.170  0.000  0.830        5
                                                    Belarus:0.059  0.000  0.941        5
                                                    Scand:   0.059  0.000  0.941        5


Saturday, February 1, 2014

What’s wrong with population genetics? An example how things can go wrong

This question is provocative, but still rational and defends it's existence.   If something looks like being wrong despite of being science, your bells should ring. It is your responsibility.  The  world trusts and believes that science must be based on objectivity and evidences, not on subjective visions.  If something is in contradiction with other observations which are earlier proved scientifically, it is a reason to be suspicious until the contradiction is resolved.    The population genetics should follow these rules.  Unfortunately today this is not true, the population genetics is sometimes a confused bunch of  subjective visions and doesn’t follow healthy logic where today is before tomorrow, yesterday was before today.  

Expressing this kind of claims needs evidences.  I can prove my claims by methodological way and by facts.   Because methods are often a bit complex and need more learning and special terminology I‘ll present now only simple facts based on data and point out where things go wrong.  Anyone knowing something behind these facts can see problems and everyone is free to show my possible mistakes.   I would appreciate it.  Unfortunately I believe that the discussion will be negligible, because the population genetics is in certain cases already so deep in it’s own swamp that it can’t help itself.    

Here are two principles to follow:

1  Everyone knows that if we derive something from something we must have a correlation between these somethings.   Everyone, even a child intuitively knows that there is a basic law of causation between events if they are in some relation.   The causation obliges that things are bound to time; something happens before and leads to sequential events.  

2  Everyone knows that statistics are based on sampling.   Everyone also knows that the sampling in statistics must be comprehensive and representative.   It cannot be biased.  If we want to make statistics of plants in certain forest, we should sample the whole forest by some method, not only a small part of it, like a small part near a meadow.  

You think that of course scientists know this and do their best.  Perhaps, maybe not.  Maybe they do their best, but then they don’t know what they are doing.   They break both principles I listed above, not a little but they break these rules brutally.   After doing this they can’t go back, thanks for the failed scientific peer reviewing, there is no way to go back.  

So what is so badly wrong that I have reason to worry about the prowess of scientists?  Keeping in mind two principles mentioned above I can present following peculiarities.  Remember that my case is only my observations, somebody else can find other peculiarities.

1  A half of Finnish samples represents a settlement that is around 300 years old and represents a new genetic structure.  Finnish old settlements, representing around  70% of the whole Finnish population, have over 1000 years old roots (defined by written history), over 1500 years old roots (defined by archaeology)  and defined by uniparental genes (I1 and N1c1) thousands years old roots.  Scientists use a younger population to define older populations.  This is the causation error I found in many studies.  

2   Around 50% of used Finnish samples represent only 0,32% of the Finnish population, living far away from the population centers.   It is like using a small village from Sicily to represent 50% of  Italians, including northern Italians.  This means that samples from a sparse populated area is overrepresented by a factor 150!   This is the sampling error I found in studies. 

Maybe someone finds it possible to defend the causation and sampling errors by saying that it is more interesting to see the population diversity, but this is only a desperate way to defend an unscientific personal opinion, although a predictable way.   Following the logic like this we could as well gather samples from Helsinki suburbs where 20-30% of residents are immigrants with Finnish citizenship.   Looking from the statistic perspective the error is same, only the place and the population age differs.  The only acceptable way is to use statistic rules and include minorities if they pass these  rules.    



This what I found is unbelievable.  Although I am not going to handle methods, like admix and PCA analyses, it is clear to everyone who has digested these methods that this error in sampling will affect studies flattening, impoverishing and distorting also results obtained by other samples.  

I can’t understand three things:  

-        -   how Finnish researchers have released these samples taking into account the use for representative purpose of Finns in worldwide studies

-         -  how researchers making these studies had not even single doubt about this problem which could destroy their studies

-         -  how the peer reviewing didn’t reveal anything.  Obviously no bells rang.  Why?


Finnish settlement sampling in studies (late settlement or drifted population in studies)

Until the 17th century, the area of Kuusamo was inhabited by the semi-nomadic Sami.

  • From the 15th century Finnish fishermen also took advantage of fishing grounds on the lower reaches of the river Iijoki near Kuusamo. They took regular trips of a few weeks from Kuusamo, but because the land could not provide hay for cattle elsewhere than near the river, they founded no fixed settlements. Only when, in 1673, the Swedish government granted to all settlers in Lapland a tax exemption for 15 years, settlers from Savo and Kainuu did settle in Kuusamo.

  •  The first parish in Kuusamo was founded in 1685. In 1687 a temporary chapel was built, in 1695 the first church. From the end of the 17th century the area around the lake Kuusamojärvi began to be called Kuusamo. The precise etymology of the name is unclear, however, one possible derivation is from a Sami word for "spruce forest".

Kuusamo population 2008: 16 779

2  Finnish population, age and location

                     Finland population 2014: 5 450 614

Population density today:

Today around 70% of the Finnish populations lives in old settlements, most of the rest in the cities of late settlements.  Only a few percents live outside these areas.  Old regions are  Ahvenanmaa, Etelä-Karjala, Etelä-Savo, Kanta-Häme, Kymenlaakso, Pirkanmaa,  Etelä-Pohjanmaa,  Päijät-Häme,  Satakunta, Uusimaa, Varsinais-Suomi.

                      Settlements during the Iron Age:

Agricultural regions around 1000AD (brown and red spots):

Finnish settlements 1540AD according to old tax catalogues (each small black spot represents 50 houses, every house around 5-10 occupants):


Examples of studies using biased Finnish data:

Olalde et. al 
Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European

Khrunin et al.
A Genome-Wide Analysis of Populations from European Russia Reveals a New Pole of Genetic Diversity in Northern Europe

Nelis et al. 
Genetic structure in Europeans