1. Hinxton2 is HI2 from the study "Iron Age and Anglo-Saxon genomes from East England reveal British migration history". HI2 Hinxton Male 170 BCE – 80 CE.
2. Rabrit3 is one of Roman Age samples from the study "Genomic signals of migration and continuity in Britain before the Anglo-Saxons". I can't identify which one it is of those six local samples from Driffield Terrace, because study authors don't tell connections between sample labels and sample data. Rabrit3 is processed using sample files ERR1043145, ERR1043146, ERR1043147.
3. Iabrit is M1489 from the same study (Genomic signals...). M1489 Iron Age Melton, age estimate between 210 BC and 40 AD.
4. Anglosaxon/anglosaxon2 is NO3423, again from the same study. NO3423 Anglo-Saxon Norton on Tees. Age estimate is unknown, but it is mentioned to be Anglo-Saxon.
All four samples are remastered using BWA-mem as described in my previous post. BWA-mem makes automatic trimming for reads and gives great results with minimum personal action and control, the process is fully automated. Before choosing BWA-mem I tested three additional softwares.
I have also standardized the sample selection in this test to ensure same SNP coverage for all samples and to avoid errors due to SNP qualification and differences in SNP counts. So each ancient sample is compared almost exactly similarly against modern populations. This is fundamental, because especially differences in the SNP count can cause severe biases to results.
Here are Dstat results:
result: CEU Mbuti_Pygmy iabrit Chimp.DG 0.4532 100.000 17762 6683 184450
result: CEU Mbuti_Pygmy anglosaxon Chimp.DG 0.4604 100.000 28390 10489 287312
result: French Mbuti_Pygmy iabrit Chimp.DG 0.4533 100.000 17714 6663 184450
result: French Mbuti_Pygmy anglosaxon Chimp.DG 0.4579 100.000 28268 10511 287312
result: FinnLocal Mbuti_Pygmy iabrit Chimp.DG 0.4491 100.000 17677 6721 184450
result: FinnLocal Mbuti_Pygmy anglosaxon Chimp.DG 0.4542 100.000 28218 10592 287312
result: FinnMostCW Mbuti_Pygmy iabrit Chimp.DG 0.4539 100.000 17757 6670 184450
result: FinnMostCW Mbuti_Pygmy anglosaxon Chimp.DG 0.4592 100.000 28352 10509 287312
result: IBS Mbuti_Pygmy iabrit Chimp.DG 0.4481 100.000 17602 6709 184450
result: IBS Mbuti_Pygmy anglosaxon Chimp.DG 0.4521 100.000 28066 10590 287312
result: Kent Mbuti_Pygmy iabrit Chimp.DG 0.4546 100.000 17771 6664 184450
result: Kent Mbuti_Pygmy anglosaxon Chimp.DG 0.4604 100.000 28375 10484 287312
result: Estonia Mbuti_Pygmy iabrit Chimp.DG 0.4536 100.000 17611 6620 183032
result: Estonia Mbuti_Pygmy anglosaxon Chimp.DG 0.4609 100.000 28197 10406 285211
result: Sardinian Mbuti_Pygmy iabrit Chimp.DG 0.4498 100.000 17634 6692 184450
result: Sardinian Mbuti_Pygmy anglosaxon Chimp.DG 0.4528 100.000 28105 10587 287311
result: Orcadian Mbuti_Pygmy iabrit Chimp.DG 0.4552 100.000 17766 6652 184450
result: Orcadian Mbuti_Pygmy anglosaxon Chimp.DG 0.4595 100.000 28345 10496 287311
result: TSI Mbuti_Pygmy iabrit Chimp.DG 0.4479 100.000 17597 6711 184450
result: TSI Mbuti_Pygmy anglosaxon Chimp.DG 0.4529 100.000 28076 10573 287312
result: North_Italian Mbuti_Pygmy iabrit Chimp.DG 0.4497 100.000 17653 6702 184450
result: North_Italian Mbuti_Pygmy anglosaxon Chimp.DG 0.4557 100.000 28173 10533 287311
result: Russian_Vologda Mbuti_Pygmy iabrit Chimp.DG 0.4483 100.000 17635 6718 184450
result: Russian_Vologda Mbuti_Pygmy anglosaxon Chimp.DG 0.4525 100.000 28129 10603 287311
result: CEU Mbuti_Pygmy hinxton2 Chimp.DG 0.4391 100.000 45931 17904 433006
result: CEU Mbuti_Pygmy rabrit3 Chimp.DG 0.4557 100.000 29993 11214 299676
result: French Mbuti_Pygmy hinxton2 Chimp.DG 0.4375 100.000 45806 17923 433006
result: French Mbuti_Pygmy rabrit3 Chimp.DG 0.4536 100.000 29889 11235 299676
result: FinnLocal Mbuti_Pygmy hinxton2 Chimp.DG 0.4327 100.000 45622 18064 433006
result: FinnLocal Mbuti_Pygmy rabrit3 Chimp.DG 0.4486 100.000 29784 11336 299676
result: FinnMostCW Mbuti_Pygmy hinxton2 Chimp.DG 0.4384 100.000 45896 17921 433006
result: FinnMostCW Mbuti_Pygmy rabrit3 Chimp.DG 0.4540 100.000 29933 11239 299676
result: IBS Mbuti_Pygmy hinxton2 Chimp.DG 0.4329 100.000 45492 18006 433006
result: IBS Mbuti_Pygmy rabrit3 Chimp.DG 0.4505 100.000 29732 11263 299676
result: Kent Mbuti_Pygmy hinxton2 Chimp.DG 0.4402 100.000 45988 17876 433006
result: Kent Mbuti_Pygmy rabrit3 Chimp.DG 0.4574 100.000 30041 11186 299676
result: Estonia Mbuti_Pygmy hinxton2 Chimp.DG 0.4394 100.000 45638 17775 430442
result: Estonia Mbuti_Pygmy rabrit3 Chimp.DG 0.4558 100.000 29766 11126 297499
result: Sardinian Mbuti_Pygmy hinxton2 Chimp.DG 0.4327 100.000 45516 18021 433005
result: Sardinian Mbuti_Pygmy rabrit3 Chimp.DG 0.4515 100.000 29768 11250 299675
result: Orcadian Mbuti_Pygmy hinxton2 Chimp.DG 0.4384 100.000 45892 17919 433005
result: Orcadian Mbuti_Pygmy rabrit3 Chimp.DG 0.4571 100.000 30016 11185 299675
result: TSI Mbuti_Pygmy hinxton2 Chimp.DG 0.4329 100.000 45487 18005 433006
result: TSI Mbuti_Pygmy rabrit3 Chimp.DG 0.4491 100.000 29704 11292 299676
result: North_Italian Mbuti_Pygmy hinxton2 Chimp.DG 0.4346 100.000 45645 17989 433005
result: North_Italian Mbuti_Pygmy rabrit3 Chimp.DG 0.4528 100.000 29838 11238 299675
result: Russian_Vologda Mbuti_Pygmy hinxton2 Chimp.DG 0.4337 100.000 45616 18017 433005
result: Russian_Vologda Mbuti_Pygmy rabrit3 Chimp.DG 0.4493 100.000 29754 11306 299675
And here are corresponding graphic maps:
Results differ somewhat from what I got earlier, obviously due to the stricter data preparation and more neutral outgroups.
Finally, I made also IBS-statistics using same data and a PCA-plot. It is however reasonable to state that due to the homozygosity error of ancient samples most homozygous modern populations get extra boost and give us too high results. This is typical for Balts, Irismen and Scots. I don't know about Basque homozygosity.
I was able to catch extra populations using Plink and --geno 0.01 option to standardize the SNP set as much as possible.
Creating PCA needs more samples to pick proper and all-inclusive components and is here done using another data set with less SNPs and more populations.
edit 17.3.2016 23:05
I read a comment on a Finnish history forum that using two outgroups, as I did in this post, is not recommended and can distort results. I gladly admit that this is true. But the reason for using two outgroups is very clear; I used this way to get big amount of results comparable instead of comparing only three populations. Using three target populations and one outgroup makes impossible to compare results from separate qpDstat runs, or make it at least painful. Of course the latter method, using three targets, gives better accuracy.
But no smoke here without fire, my tests using two outgroups looks reliable. In my previous results (above) the FinnMostCW group was very close to the Iron Age British sample, closer than the French sample group. I made a new test using same data, now using three target populations and one outgroup. It confirms my previous results:
0 FinnMostCW 16
1 French 28
2 iabrit 1
3 Chimp.DG 1
jackknife block size: 0.050
snps: 605676 indivs: 46
number of blocks for jackknife: 551
nrows, ncols: 46 605676
result: FinnMostCW French iabrit Chimp.DG 0.0020 1.039 9159 9122 184450
Indeed, I will have to come back to this question with larger data.
edit 18.3.2016 12:40
Here is another dpDstat result using three target populations. I am quite disappointed to the way some people react when they are not happy seeing some results. My only goal is to make objective tests using primarily European genetic data. My focus is not on Finnish results, neither I try to avoid making reliable results about Finns.
0 FinnMostCW 16
1 FinnLocal 15
2 iabrit 1
3 Chimp.DG 1
jackknife block size: 0.050
snps: 605676 indivs: 33
number of blocks for jackknife: 551
nrows, ncols: 33 605676
result: FinnMostCW FinnLocal iabrit Chimp.DG 0.0073 3.750 9120 8989 184450
You could do the IBS stats for French, Kargopol Russians, Orcadians and Estonians too. If ancients' inflated homozygosity has an effect compared to modern populations, e.g Basques should no longer be very close to everyone in comparisons with moderns.
ReplyDeleteYes, it would be helpful. I can easily add population into ibs-tests from HGDP data. Now, after building a new datadase for ibs tests I could also add HGDP-populations into Dstat tests. I don't know, have to look if it possible without losing the snp parity.
Delete