I have seen
numerous PCA and ADMIXTURE analyses which try to demonstrate who are full-blooded
Europeans, as well as many analyses proving real or false migration inside/outside
Europe. This is sometimes misleading and hides actual European
ancestry because admixtures revealed by selective tests can be very small and
detectable only by detaching it from main history events in Europe. My aim is now to find out large scale similarities
inside Europe. This can be done by using dstat-analyses which compares whole
genomes without dropping meaningful genetic proportions. I do now tests by searching differences between
suggested non-European and actual European ancestries.
I suggest following non-European populations
- Nganassans representing pure Siberians, found in North Siberia and Northeast Europe
- Mongolians representing medieval Mongolian invasion to Europe
- Bedouins representing present-day Middle-Easterners, ruling out Early European Farmers
Doing any comparison needs a baseline, suggested least admixed Europeans. Brits live in an island isolated from the
mainland Europe. People in Kent are thought
to have their origin in Iron Age and medieval continental West Europe. My previous analyses also prove that they
have very little newer non-European admixtures, less than French and Germans.
I use original Haak et al 2015 Lazaridis et al. 2014 data with additional British Kent and
Finnish samples downloaded from the 1000genome project. Each sample consists of 555268 SNPs. West Finns
are filtered in three steps using PCA from 1000genome data: 1) removing 20 westernmost samples to get rid
of possible Finland-Swedes, 2) splitting the rest 80 into eastern and western groups and finally 3) picking randomly 13 western samples. Kents are randomly sampled as well.
The data is available if
someone wants to repeat my tests, or make own tests. Please contact me in that
case.
The first
task to do is to verify the data. For
this purpose I ran three PCA-plots:
Before
testing admixtures it is a good idea to see wide genome distances between
British Kents and other Europeans. I do it using two outgroups, the first one being extreme (Chimp), setting another one (Ju-hoan-North) to the base line.
Admixture Dstat
analyses follow the formula:
dstat(Kent,non-European
population:Outgroup,European population).
If the
result is negative the European population is closer to Kent than it is to
the non-European population on axis, the bigger the negative value is the closer it is to the Kent
compared to non-European population. Be
aware of the fact that this test doesn’t figure how much the population under
test has non-European admixture in question, but the full genome genetic distance
between populations, which mainly depends on the common history between population pairs. If tested European population is
“multimixtured” then the result could be surprising for a reader who has seen
only analyses figuring minor admixtures. In other words, your genetic profile can be A1+b
or A2+c, where b and c are minor admixtures. You can’t figure the overlapping between A1 and A2 without knowing both minor admixtures if you try to do it using PCA or ADMIXTURE, but you can use dstat to determine genomewide similarity.
D stats are quite tricky to estimate proportions from;
ReplyDeleteOne thing with them is that, taking a Dstat like D(Kent,Mongolian;Ju_Hoan_North,European) then a population could tend towards 0 because it has a balanced effect size from ancestry from both Kent and Mongolian, as you would expect from a population that has a roughly (probably not exactly) equal mix of Kent and Mongolian ancestry *or* it could tend to 0 because it is equally unrelated to either (as a population like Yoruba would be, as an extreme example, having descent from many people who were totally neutrally to the founding of Eurasia).
You could sort of visually control for this by using your innovative Wide genome similarity measure as the X and then stats like D(Kent,Mongolian;Ju_Hoan_North,European) as the Y. That would visually distinguish between populations like Cypriots who tend more towards a 0 on D(Kent,Mongolian;Ju_Hoan_North,European), but who don't share much drift with Kent overall, vs populations who do share more drift with Kent and yet still tend more to 0 on D(Kent,Mongolian;Ju_Hoan_North,European) than others.
Of course, you are right, this method doesn't show admixture amounts, only a balance between baselines (f.ex. Mongolian and Kents). Every method has weaknesses. As far as I know we have today only one freely available software trying to give absolute results: qpAdm. If you have used it you know how painful it is in practice, just because it try to test real admixture amounts.
DeleteNot PCA, not ADMIXTURE gives realiable results as to admixture amounts proportioned to ALL tested populations and individuals. However qpDstat is a wonderful tool because it simply makes wide genome pairwise comparisons between pooled populations, unlike PCA and ADMIXTURE. PCA and ADMIXTURE analyses all populations simultaneously, but those tools have no idea about directions of gene flows, so the result can be wrong. We have one tool giving gene flow directions, Chromopainter, but it is YOU who make decicions, the software itself can't do it. But what is important in using qpDstat is that you don't place two "unknowns" on the same side of colon. If you put two "unkowns", like an ancient sample set and modern populations you are testing on the same side you can't expect reliable results.
In other words we have many softwares, but none of them are infallible. So we making analyses have some kind of responsibility to not misinform readers.
One point more. When the formula dstat(Kent,non-European population:Outgroup,European population) doesn't show admixture amounts, only a relative place between baselines(because we have two variables and two constants, one outgroup), another formula, dstat(Ju_hoan-North,X:Chimp,Kent) shows exact genetic distance between Kents and other Europeans from baselines, because it uses one variable and three constants, of which two are outgroups i.e. distant ancestral observation points.
ReplyDelete