sunnuntai 13. maaliskuuta 2016

Continuing tests with ancient Brits, better material and final results

Continuing with ancient British samples.  This is fascinating because these samples represent high scanning quality giving precise results.  I use now 4 samples:

1. Hinxton2 is HI2 from the study "Iron Age and Anglo-Saxon genomes from East England reveal British migration history". HI2 Hinxton Male 170 BCE – 80 CE.

2. Rabrit3 is one of Roman Age samples from the study "Genomic signals of migration and continuity in Britain before the Anglo-Saxons".  I can't identify which one it is of those six local samples from Driffield Terrace, because study authors don't tell connections between sample labels and sample data.  Rabrit3 is processed using sample files ERR1043145, ERR1043146, ERR1043147.

3. Iabrit is M1489 from the same study (Genomic signals...).  M1489 Iron Age Melton, age estimate between 210 BC and 40 AD.

4. Anglosaxon/anglosaxon2 is NO3423, again from the same study.  NO3423 Anglo-Saxon Norton on Tees. Age estimate is unknown, but it is mentioned to be Anglo-Saxon.

All four samples are remastered using BWA-mem as described in my previous post.  BWA-mem makes automatic trimming for reads and gives great results with minimum personal action and control, the process is fully automated.  Before choosing BWA-mem I tested three additional softwares.

I have also standardized the sample selection in this test to ensure same SNP coverage for all samples and to avoid errors due to SNP qualification and differences in SNP counts.  So each ancient sample is compared almost exactly similarly against modern populations. This is fundamental, because especially differences in the SNP count can cause severe biases to results.

Here are Dstat results:

result:        CEU Mbuti_Pygmy     iabrit   Chimp.DG      0.4532   100.000 17762   6683 184450
result:        CEU Mbuti_Pygmy anglosaxon   Chimp.DG      0.4604   100.000 28390  10489 287312
result:     French Mbuti_Pygmy     iabrit   Chimp.DG      0.4533   100.000 17714   6663 184450
result:     French Mbuti_Pygmy anglosaxon   Chimp.DG      0.4579   100.000 28268  10511 287312
result:  FinnLocal Mbuti_Pygmy     iabrit   Chimp.DG      0.4491   100.000 17677   6721 184450
result:  FinnLocal Mbuti_Pygmy anglosaxon   Chimp.DG      0.4542   100.000 28218  10592 287312
result: FinnMostCW Mbuti_Pygmy     iabrit   Chimp.DG      0.4539   100.000 17757   6670 184450
result: FinnMostCW Mbuti_Pygmy anglosaxon   Chimp.DG      0.4592   100.000 28352  10509 287312
result:        IBS Mbuti_Pygmy     iabrit   Chimp.DG      0.4481   100.000 17602   6709 184450
result:        IBS Mbuti_Pygmy anglosaxon   Chimp.DG      0.4521   100.000 28066  10590 287312
result:       Kent Mbuti_Pygmy     iabrit   Chimp.DG      0.4546   100.000 17771   6664 184450
result:       Kent Mbuti_Pygmy anglosaxon   Chimp.DG      0.4604   100.000 28375  10484 287312
result:    Estonia Mbuti_Pygmy     iabrit   Chimp.DG      0.4536   100.000 17611   6620 183032
result:    Estonia Mbuti_Pygmy anglosaxon   Chimp.DG      0.4609   100.000 28197  10406 285211
result:  Sardinian Mbuti_Pygmy     iabrit   Chimp.DG      0.4498   100.000 17634   6692 184450
result:  Sardinian Mbuti_Pygmy anglosaxon   Chimp.DG      0.4528   100.000 28105  10587 287311
result:   Orcadian Mbuti_Pygmy     iabrit   Chimp.DG      0.4552   100.000 17766   6652 184450
result:   Orcadian Mbuti_Pygmy anglosaxon   Chimp.DG      0.4595   100.000 28345  10496 287311
result:        TSI Mbuti_Pygmy     iabrit   Chimp.DG      0.4479   100.000 17597   6711 184450
result:        TSI Mbuti_Pygmy anglosaxon   Chimp.DG      0.4529   100.000 28076  10573 287312
result: North_Italian Mbuti_Pygmy     iabrit   Chimp.DG      0.4497   100.000 17653   6702 184450
result: North_Italian Mbuti_Pygmy anglosaxon   Chimp.DG      0.4557   100.000 28173  10533 287311
result: Russian_Vologda Mbuti_Pygmy     iabrit   Chimp.DG      0.4483   100.000 17635   6718 184450
result: Russian_Vologda Mbuti_Pygmy anglosaxon   Chimp.DG      0.4525   100.000 28129  10603 287311


result:        CEU Mbuti_Pygmy   hinxton2   Chimp.DG      0.4391   100.000 45931  17904 433006
result:        CEU Mbuti_Pygmy    rabrit3   Chimp.DG      0.4557   100.000 29993  11214 299676
result:     French Mbuti_Pygmy   hinxton2   Chimp.DG      0.4375   100.000 45806  17923 433006
result:     French Mbuti_Pygmy    rabrit3   Chimp.DG      0.4536   100.000 29889  11235 299676
result:  FinnLocal Mbuti_Pygmy   hinxton2   Chimp.DG      0.4327   100.000 45622  18064 433006
result:  FinnLocal Mbuti_Pygmy    rabrit3   Chimp.DG      0.4486   100.000 29784  11336 299676
result: FinnMostCW Mbuti_Pygmy   hinxton2   Chimp.DG      0.4384   100.000 45896  17921 433006
result: FinnMostCW Mbuti_Pygmy    rabrit3   Chimp.DG      0.4540   100.000 29933  11239 299676
result:        IBS Mbuti_Pygmy   hinxton2   Chimp.DG      0.4329   100.000 45492  18006 433006
result:        IBS Mbuti_Pygmy    rabrit3   Chimp.DG      0.4505   100.000 29732  11263 299676
result:       Kent Mbuti_Pygmy   hinxton2   Chimp.DG      0.4402   100.000 45988  17876 433006
result:       Kent Mbuti_Pygmy    rabrit3   Chimp.DG      0.4574   100.000 30041  11186 299676
result:    Estonia Mbuti_Pygmy   hinxton2   Chimp.DG      0.4394   100.000 45638  17775 430442
result:    Estonia Mbuti_Pygmy    rabrit3   Chimp.DG      0.4558   100.000 29766  11126 297499
result:  Sardinian Mbuti_Pygmy   hinxton2   Chimp.DG      0.4327   100.000 45516  18021 433005
result:  Sardinian Mbuti_Pygmy    rabrit3   Chimp.DG      0.4515   100.000 29768  11250 299675
result:   Orcadian Mbuti_Pygmy   hinxton2   Chimp.DG      0.4384   100.000 45892  17919 433005
result:   Orcadian Mbuti_Pygmy    rabrit3   Chimp.DG      0.4571   100.000 30016  11185 299675
result:        TSI Mbuti_Pygmy   hinxton2   Chimp.DG      0.4329   100.000 45487  18005 433006
result:        TSI Mbuti_Pygmy    rabrit3   Chimp.DG      0.4491   100.000 29704  11292 299676
result: North_Italian Mbuti_Pygmy   hinxton2   Chimp.DG      0.4346   100.000 45645  17989 433005
result: North_Italian Mbuti_Pygmy    rabrit3   Chimp.DG      0.4528   100.000 29838  11238 299675
result: Russian_Vologda Mbuti_Pygmy   hinxton2   Chimp.DG      0.4337   100.000 45616  18017 433005
result: Russian_Vologda Mbuti_Pygmy    rabrit3   Chimp.DG      0.4493   100.000 29754  11306 299675



And here are corresponding graphic maps:

 


Results differ somewhat from what I got earlier, obviously due to the stricter data preparation and more neutral outgroups.


Finally,  I made also IBS-statistics using same data and a PCA-plot.  It is however reasonable to state that due to the homozygosity error of ancient samples most homozygous modern populations get extra boost and give us too high results.  This is typical for Balts, Irismen and Scots.  I don't know about Basque homozygosity.

I was able to catch extra populations using Plink and --geno 0.01 option to standardize the SNP set as much as possible.






Creating PCA needs more samples to pick proper and all-inclusive components and is here done using another data set with less SNPs and more populations. 





























edit 17.3.2016 23:05

I read a comment on a Finnish history forum that using two outgroups, as I did in this post, is not recommended and can distort results.  I gladly admit that this is true.  But the reason for using two outgroups is very clear; I used this way to get big amount of results comparable instead of comparing only three populations.  Using three target populations and one outgroup makes impossible to compare results from separate qpDstat runs, or make it at least painful.  Of course the latter method, using three targets, gives better accuracy.

But no smoke here without fire, my tests using two outgroups looks reliable.  In my previous results (above) the FinnMostCW group was very close to the Iron Age British sample, closer than the French sample group.  I made a new test using same data, now using three target populations and one outgroup.  It confirms my  previous results:

  0           FinnMostCW   16
  1           French   28
  2           iabrit    1
  3           Chimp.DG    1
jackknife block size:     0.050
snps: 605676  indivs: 46
number of blocks for jackknife: 551
nrows, ncols: 46 605676
result: FinnMostCW     French     iabrit   Chimp.DG      0.0020     1.039  9159   9122 184450

Indeed, I will have to come back to this question with larger data.  


edit 18.3.2016 12:40

Here is another dpDstat result using three target populations.   I am quite disappointed to the way some people react when they are not happy seeing some results.   My only goal is to make objective tests using primarily European genetic data. My focus is not on Finnish results, neither I try to avoid making reliable results about Finns.

  0           FinnMostCW   16
  1           FinnLocal   15
  2           iabrit    1
  3           Chimp.DG    1
jackknife block size:     0.050
snps: 605676  indivs: 33
number of blocks for jackknife: 551
nrows, ncols: 33 605676
result: FinnMostCW  FinnLocal     iabrit   Chimp.DG      0.0073     3.750  9120   8989 184450

2 kommenttia:

  1. You could do the IBS stats for French, Kargopol Russians, Orcadians and Estonians too. If ancients' inflated homozygosity has an effect compared to modern populations, e.g Basques should no longer be very close to everyone in comparisons with moderns.

    VastaaPoista
    Vastaukset
    1. Yes, it would be helpful. I can easily add population into ibs-tests from HGDP data. Now, after building a new datadase for ibs tests I could also add HGDP-populations into Dstat tests. I don't know, have to look if it possible without losing the snp parity.

      Poista

English preferred, because readers are international.

No more Anonymous posts. Do not act like folks on poorly moderated forums.