This test will show genetic similarities in the Baltic Sea region. It works well for all people from Northern Russia to Norway, including all areas alongside Baltic Sea. It doesn’t work if your origin is more southern or eastern, including that you are an immigrant with genetic admixture from other regions. This test doesn’t tell exactly the place where you are from or your ancestry is from, not even in the above-mentioned zone. It doesn’t tell about ancient migrations. It tells reliably about genetic similarities between people.
I implemented this test into the DIYDodecad procedure (made and authorized by Dienekes), but to reach the goal I made some changes to the original run procedure, to get more resolution. It is clear to me that the original concept loses genetic signals.
Another question is how to select reference samples to obtain local genetic differences. I used PCA to find out criterions. The selection is based on the fact that all ancient migrations came to Northern Europe through same routes. It is easy to see how this happened by analyzing European-wide PCA-pictures.
This PCA represents also samples used in this admix analytsis. Distances on the plot don't correspond to genetic distances, the main European cluster is shrunk by the distance impact of Uralic groups.
Using PCA I cropped all Near Eastern and Caucasian samples, as well as all behind them seen from the European perspective. Almost all Near Eastern gene flow came to North Europe through South and West Europe. All this can be seen on PCA-pictures. Some Asian gene flow came from the east to Baltic Sea, especially to Russia and Finland. So I included some Uralic populations to represent this gene flow, but I didn’t took any East or North Asian samples, because eastern Uralic people, speaking Uralic and Turkic languages, show up to 30% of Siberian gene flow in Europe and adding Asians would have reduced local resolution.
- I tried to balance the size of ethnic groups and avoid oversampling local populations.
- I don't make population dependent LD-prosessing. IMO LD-pruning is often the reason of poor results, among with oversampling of test groups.
- I still miss North German and Saami samples. It would be nice to have also those Baltic-Finnic Russian minorities which are available for Russian researchers.
- I use only public data distributed by universities, thus all individual DIY-results are free of the “calculator effect”. ADMIXTURE runs are carried out in two phases. In the initial phase samples were run in UNSUPERVISED mode. The output phase was run in SUPERVISED mode using all homogeneous populations as control groups, despite of their admixture rate shown in the preceeding unsupervised run. By this mean I was hopefully able to fix the usual problem of homogeneous populations ruining admixture analyses. This means that any population showing for all individuals results like 30%/70% in two k-group is considered as a control group, but if individuals under same population label show various admixes it is not used as a control in the supervised output phase.
- I cannot publish k-distribution data per used samples because of the overfitting in admixture runs and because I have not enough samples to run meaningful “calculator effect” free results. But here are some results for instance:
North Russian k6
North Russian k7
North American k5
North American k6
North American k7
Southwestern Finnish k6
Southwestern Finnish k7
You can download all necessary Diydodecad files to perform your own tests, download files from here. In running analyses use parameters “k<n>.par”, where n represents the desired k value, 5, 6 or 7. More detailed instructions about Diydodecad and installation can be found here and original Diydodecad files are downloadable here. My download package however includes all necessary to run tests.
You can download the genotype data used in this test in EIGENSTRAT format, here. It is possible to convert it to PED format using for example Eigensoft's CONVERTF, the result is usable although you'll probably lose original allele pairs.