maanantai 25. elokuuta 2014

Fst-correlation of Admix-analyses

The aim of admix analyses is to find similarities from large data and section it to most probable proportions assuming admixtures. There are likely several ways to see differences, one is overall similarities founding groups and another pure genetic distances. It is likely that the "industrial standard" admix program mostly used today uses both methods, group similarities and Fst-distances. The question is how the result depends on the method. I have run a test using one of the best Gedmatch analysis and compared results between standard results and Fst-corrected results and made euclidean trees to see correlations.  I am not going to reveal this particular Gedmatch admix-analysis, only can say that it is far away being the worst one.   I noticed following differences

- in the stadard output Siberians don't group with East Asians. The corrected output clusters Siberians and East Asians.

- Somalis and Ethiopians group in both outputs, but their common branch groups in the std output with SSA and in the corrected form with North Africans, although in both the distance between the Somali-Etiophian group and the others is moderate.

- in the std output Armenians group with Turks and in the corrected one with Assyrian-Mandean.

- Finns group in the std output with North Russians and after Fst -correction with the group containing Estonians, Balts and North Slavs.

Some other differences were also generated.

This small test probably gives some answers how does the k-grouping  in admix analyses work.  In my opinion it is in practice question about two genetic distances; one between ancestral groups and another inside ancestral groups.

Plain admix: