Wednesday, February 24, 2016

Iron Age Briton and Anglo-Saxon genomes tested using dstat

After a long testing period I have now tools to process fastq-files and I can create PED and EIGENSTRAT samples from original scan results.  The work flow is based on BWA and GATK, figured simply:

1. mapping fastq-files separately using BWA-mem
2. sorting and merging(samtools)
3. extracting mapped reads over certain map-quality (samtools)
4. dropping doubles (Picard tools)
5. recalculating base quality scores (GATK)
6. mapping genotypes (GATK), checking the base quality
7. updating RS-ids
8. extracting ped from vcf (vcftools)
9. converting ped to eigenstrat

Processing one sample takes on my laptop (i7/3,5Ghz/8threads used/32GB memory) 6-12 hours.

After checking all samples from the study release I was sure that I could find more information using qpDstat, which compares genetic drift rather than IBS, which was used in the original study.   I considered this being possible because I have seen in my works that IBS gives often high results for unmixed and drifted populations and the result doesn't of necessity imply common ancestry in all magnitude.  Mixed populations evidently become underestimated.   In this meaning qpDstat beats IBS-statistics.

A few comment about results.  I used Kent samples as a baseline in comparison to ancient Anglo-Saxon and Iron Age Briton, assuming that present-day Brits should be closest relatives for their ancestors.   It was not true in all cases.  Apparently Brits are more mixed than some other North Europeans.

I used my new Finnish grouping splitting Finns into two genetic patterns, one consisting of more local ancestry and another resembling German Corded Ware samples published last year.  Both groups represent around 20% of the original 1000genomes Finnish sample set, after removing outliers.  It looks like the local 1000g group differs particularly in this test from East Finnish samples, which I have gathered straight from volunteers.   The difference between East and West Finland is explained more by the Iron Age British sample than the Anglo-Saxon sample.  Anglo-Saxon shows high similarity with present-day Scandinavians and looks more widespread than Iron Age Briton everywhere in Northernmost Europe. Of course more Iron Age West European samples could tell more and perhaps confirm my results.  Hopefully British researchers dig soon more Iron Age samples to fulfill my dreams.

I have two samples from my project members (ISX and LSX), added to figure better Finnish 1000gemone samples.  Both project samples are from genealogists of Finnish speaking ancestry.

I have two databases, the smaller one holding 1 million SNPs, but only a few populations, and the larger one 0.25 million SNPs and around 3000 samples.  The first one makes possible to use in this particular case around 200kSNPs, the latter gives 112368 SNPs (Anglo-Saxon) and 71478 SNPs( Iron Age sample).



 



























edit 25.2.2016 23:25

Replacing Kent by an outgroup (Mbuti) we get absolute distances in reasonable accuracy.  Closest to the Anglo-Saxon sample are

Sweden
Norway
Kent

sounds good.

And closest to the Iron Age Briton are

France
Norway
Welsh

followed by Ireland, FinnsMostCW and Kent.

edit 26.2.2016 13:40

Ranking of ancient genomes released last year by Reich Lab.  Should be noted that some results are based on small amounts of SNPs.  It is likely that Hungary_MBA and Germany Bronze Age get too high scores due to fewer SNPs.

The second column is for the calculated difference between Yoruba and Anglo-Saxon or Iron Age Briton, compared to the difference between ancient populations and Anglo-Saxon or Iron Age Briton.  The third column is the SNP number.

Anglo-Saxon:

Hungary_MBA.SG    0.4435    23477
Remedello_BA.SG    0.4254    178071
Germany_Bronze_Age.SG    0.4209    26827
Bell_Beaker_Germany.SG    0.4137    136565
Sintashta_MBA_RISE.SG    0.4127    245264
Andronovo.SG    0.4016    285166
Corded_Ware_Estonia.SG    0.4002    154859
Bell_Beaker_Czech.SG    0.3981    190277



Iron Age Brit:

Hungary_MBA.SG    0.4531    15019
Germany_Bronze_Age.SG    0.4109    16253
Bell_Beaker_Germany.SG    0.4071    86809
Remedello_BA.SG    0.4061    114783
Sintashta_MBA_RISE.SG    0.3982    154956
Nordic_LBA.SG    0.3839    11704
Andronovo.SG    0.3764    182788
Maros.SG    0.3758    59040






14 comments:

  1. IBS-results from "Genomic signals of migration and continuity in Britain before the Anglo-Saxons" seemed quite reasonable, the major difference compared to these D-stats is that the Finnish sample was relatively closer to the Anglo-Saxon than to the Iron Age samples but in absolute terms it was still closer than Central and East European samples.

    Could you run an IBS similarity test for the IA and AS samples using the larger database of your first figure so we could compare the difference between IBS and D-stat?

    ReplyDelete
    Replies
    1. The difference berween those ibs-results and my dstat-result is that my results show less similarity between ančient samples and drifted populations, like Scots. I do have Scots, Irish and Welsh samples. Unfortunateli I have no real German samples. Another shortcoming in the study is that it uses only one Finnsh sample and because the Finns are a very diverge population (mixed) I have to state you that one sample is not a statistic.


      I am going to continue with other ancient British samples. After i am ready i can do also ibs-statistics, although ibs is not as reliable as dstat and genetic drift in finding commn ancestry.

      Delete
    2. Sorry typos due to too small touchscreen.

      Delete
    3. The low amount of Finnish samples in the original study is just another reason to do the IBS stats with your dataset. Only that way we can see if mostcw, local and maybe your project references show different results than what D-stats give here and how will those results differ.

      We can already see that there is massive differentiation, far more than between many individual European countries, between different Finnish samples and groups using D-stats, and I figure the result will be same using IBS. But can't be absolutely sure before the test is done.

      Delete
    4. I really like to do it and will do it. But there is some basics to follow before one can understand the difference between ibs and genetic drift. While ibs stands for simple statistics sĥowing allele similarity and more mixed populations usually get loẃ results despite if the ancestry, genetic drift shows smaller dedicated proportions of common ancestry. In Finnish results it means that intrapopulational difference can grow, because some Finnish "tribes" can lack of some admixtures, even though the Finns can have partially same root. It follows also that other Finns show high drift similarity with Iron Age Brits and some show almost no common drift with those Brits. When we then look ibs we see no similar sorting because ibs sees the whole genome as one big lump. I hope you got wind if my explanation.

      Delete
  2. I think we would still see pretty significant differences. In the study these ancient British samples are from, the IBS results of Scots and Welsh differed from each other more than those of Lithuanians and French did, and differences between Finnish subpopulations should be greater than those between subpopulations from the British Isles.

    ReplyDelete
    Replies
    1. We will see. What I have seen at 23andme the Finnish intrapopulational ibs difference is puzzlng; many pure Finns are very far away from each other, but in general people living late Finnish settlements get high ibs numbers. This is what happens when people share commn root and are less mixed. Then in old settlement they share less common ibs than for example Lithuanians. But as i wrote ibs is not straight comparable with genetic drift. Especially comparing ancient genomes to modern populations using ibs is problematic. For that reason we use qpDsat and qpAdm.

      Delete
    2. I mean that the IBS sharing of Scots and Welsh, Lithuanians and French with ancient Britons differed. Not Scots sharing with other Scots or Welsh, or Lithuanians with other Lithuanians.

      Delete
    3. One more thing, you might want to put the 1000genomes id's for mostcw and local groups up (hg00X etc) up in a post. That way others with the data can repeat/verify the results too.

      Delete
    4. Of course, you're welcome

      FinnLocal

      HG00176
      HG00182
      HG00185
      HG00187
      HG00266
      HG00276
      HG00282
      HG00304
      HG00309
      HG00313
      HG00326
      HG00330
      HG00338
      HG00365
      HG00378


      FinnMostCW

      HG00174
      HG00177
      HG00178
      HG00190
      HG00268
      HG00274
      HG00277
      HG00285
      HG00311
      HG00320
      HG00353
      HG00362
      HG00364
      HG00376
      HG00381
      HG00384


      Originally I had a few more, but some samples turned out to be outliers, likely with Slavic admixture (although those outliers are classified as pure Finns in 1000g project).

      Delete
    5. Who ever tries to generate PCA's using those samples will surprise after seeing ordinary West-Finns / East-Finns groupings. The PCA is not the way to go. I have said it hundreds times. Why? Because PCA makes thing wrongly; it makes a bifurcation based on 4-8% Siberian and the genetic drift common in East Finland. It gives less importance to the rest 80% or so.

      Delete
  3. I don't think the difference between "local" and East Finns is related to Siberian because we did that comparison before using verified East Finns and the difference was insignificant, a near-zero result.

    EastFinnish FinnLocal Nganassan Chimp.DG 0.0005 Z 0.256 SNP 231799

    So it has to be something else, drift or, if admixture, something unrelated to Nganasans.

    ReplyDelete
  4. Sorry, my previous message was that USUALLY pca makes the finnish bifurcation on the Siberian and genetic drift of Finnish late settlements. I don't do it, my grouping is based on historical main events in Finland. I don't use pca in this meaning. I only stated that other people do it using pca and if you use pca with my dedicated Local and MostCW samples you get nothing informative if you expect "old fashion results" typically got in tests. So your previous dstat is just expected; there is practically no Siberian difference, but it is not question about genetic drift.

    My FinnLocal is not made extracting highest Siberian among Finns, so it is not the question. The local Finnish group seems to be a local group, as I named it. But mostly people (other bloggers and those making businees, like 23andme and FtDja) create the Finnish or East Finnish group by extracting Siberian admixture and/or genetic drift. I don't do it. My FinnLocal groups represents local ancestry from the beginning of times. in other words I am very well aware of what other people do and what has gone wrong historically speaking. I am not bragging, I only know more about the Finnish history and can follow it. Frankly speaking following insistently a presumption expecting only Siberian admixture and/or genetic drift is a simple and patent answer for what people don't know. Sorry to say that, but I have to be honest.

    ReplyDelete

English preferred, because readers are international.

No more Anonymous posts.