This test
will show genetic similarities in the Baltic Sea region. It works well for all people from Northern
Russia to Norway, including all areas alongside Baltic Sea. It doesn’t work if your origin is more
southern or eastern, including that you are an immigrant with genetic admixture from other
regions. This test doesn’t tell exactly
the place where you are from or your ancestry is from, not even in the above-mentioned zone. It doesn’t
tell about ancient migrations. It tells reliably about genetic similarities between
people.
I implemented
this test into the DIYDodecad procedure (made and authorized by Dienekes), but
to reach the goal I made some changes to the original run procedure, to get
more resolution. It is clear to me that
the original concept loses genetic signals.
Another question
is how to select reference samples to obtain local genetic differences. I used PCA to find out criterions. The selection is based on the fact that
all ancient migrations came to Northern Europe through same routes. It is easy to see how this happened by
analyzing European-wide PCA-pictures.
kuva
This PCA
represents also samples used in this admix analytsis. Distances on the
plot don't correspond to genetic distances, the main European cluster is shrunk
by the distance impact of Uralic groups.
Using PCA I cropped all Near Eastern and
Caucasian samples, as well as all behind them seen from the European
perspective. Almost all Near Eastern
gene flow came to North Europe through South and West Europe. All this can be seen on PCA-pictures. Some Asian gene flow came from the east to
Baltic Sea, especially to Russia and Finland.
So I included some Uralic populations to represent this gene flow, but I
didn’t took any East or North Asian samples, because eastern Uralic people,
speaking Uralic and Turkic languages, show up to 30% of Siberian gene flow in
Europe and adding Asians would have reduced local resolution.
Some notes
- I tried
to balance the size of ethnic groups and avoid oversampling local populations.
- I don't make population dependent LD-prosessing. IMO LD-pruning is often the reason of poor results, among with oversampling of test groups.
- I still miss North German and Saami samples. It would be nice to have also those Baltic-Finnic Russian minorities which are available for Russian researchers.
- I use only public data distributed by
universities, thus all individual DIY-results are free of the “calculator
effect”. ADMIXTURE runs are carried out in two phases. In the
initial phase samples were run in UNSUPERVISED mode. The output phase was
run in SUPERVISED mode using all homogeneous populations as control
groups, despite of their admixture rate shown in the preceeding unsupervised run. By this mean I was hopefully
able to fix the usual problem of homogeneous populations ruining admixture
analyses. This means that any population showing for all individuals
results like 30%/70% in two k-group is considered as a control group, but if
individuals under same population label show various admixes it is not used as
a control in the supervised output phase.
- I cannot publish
k-distribution data per used samples because of the overfitting in admixture runs and because I have not enough samples to run meaningful “calculator
effect” free results. But here are some
results for instance:
Lithuanian
k6
17.46% West-Europe
16.49% North-Baltic
2.56% South-Europe
4.50%
East-Europe&Volgaic
51.92%
E-Cntral-Euro&S-Balt
7.08% Southeast-Europe
Lithuanian k7
17.11% West-Europe
14.76% North-Baltic
2.43% South-Europe
16.82% North-Russia
2.07%
East-Europe&Volgaic
39.94% E-Cntral-Euro&S-Balt
6.88% Southeast-Europe
6.88% Southeast-Europe
North Russian k6
15.69% West-Europe
14.97% North-Baltic
5.81% South-Europe
19.05%
East-Europe&Volgaic
37.83%
E-Cntral-Euro&S-Balt
6.65% Southeast-Europe
6.65% Southeast-Europe
North Russian k7
15.49% West-Europe
12.58% North-Baltic
5.68% South-Europe
20.67% North-Russia
15.55%
East-Europe&Volgaic
23.70%
E-Cntral-Euro&S-Balt
6.33% Southeast-Europe
6.33% Southeast-Europe
North American k5
42.61% West-Europe
9.36% North-Baltic
25.47% South-Europe
2.79%
East-Europe&Volgaic
19.77%
E-Cntral-Euro&S-Balt
North American k6
39.04% West-Europe
39.04% West-Europe
10.26% North-Baltic
17.24% South-Europe
3.02%
East-Europe&Volgaic
19.60%
E-Cntral-Euro&S-Balt
10.84% Southeast-Europe
10.84% Southeast-Europe
North American k7
38.80% West-Europe
9.24% North-Baltic
17.13% South-Europe
8.42% North-Russia
1.91%
East-Europe&Volgaic
13.84%
E-Cntral-Euro&S-Balt
10.67% Southeast-Europe
10.67% Southeast-Europe
Southwestern Finnish
k6
25.25% West-Europe
27.50% North-Baltic
3.94% South-Europe
11.39%
East-Europe&Volgaic
26.53%
E-Cntral-Euro&S-Balt
5.38% Southeast-Europe
5.38% Southeast-Europe
Southwestern Finnish
k7
24.99% West-Europe
26.20% North-Baltic
3.67% South-Europe
11.76% North-Russia
9.48%
East-Europe&Volgaic
18.59%
E-Cntral-Euro&S-Balt
5.32% Southeast-Europe
5.32% Southeast-Europe
You can download all necessary Diydodecad files to perform your own tests, download files from here. In running analyses use parameters “k<n>.par”, where n represents the desired k value, 5, 6 or 7. More detailed instructions about Diydodecad and installation can be found here and original Diydodecad files are downloadable here. My download package however includes all necessary to run tests.
You can download the genotype data used in this test in EIGENSTRAT format, here. It is possible to convert it to PED format using for example Eigensoft's CONVERTF, the result is usable although you'll probably lose original allele pairs.
No comments:
Post a Comment
English preferred, because readers are international.
No more Anonymous posts.