Wednesday, February 24, 2016

Iron Age Briton and Anglo-Saxon genomes tested using dstat

After a long testing period I have now tools to process fastq-files and I can create PED and EIGENSTRAT samples from original scan results.  The work flow is based on BWA and GATK, figured simply:

1. mapping fastq-files separately using BWA-mem
2. sorting and merging(samtools)
3. extracting mapped reads over certain map-quality (samtools)
4. dropping doubles (Picard tools)
5. recalculating base quality scores (GATK)
6. mapping genotypes (GATK), checking the base quality
7. updating RS-ids
8. extracting ped from vcf (vcftools)
9. converting ped to eigenstrat

Processing one sample takes on my laptop (i7/3,5Ghz/8threads used/32GB memory) 6-12 hours.

After checking all samples from the study release I was sure that I could find more information using qpDstat, which compares genetic drift rather than IBS, which was used in the original study.   I considered this being possible because I have seen in my works that IBS gives often high results for unmixed and drifted populations and the result doesn't of necessity imply common ancestry in all magnitude.  Mixed populations evidently become underestimated.   In this meaning qpDstat beats IBS-statistics.

A few comment about results.  I used Kent samples as a baseline in comparison to ancient Anglo-Saxon and Iron Age Briton, assuming that present-day Brits should be closest relatives for their ancestors.   It was not true in all cases.  Apparently Brits are more mixed than some other North Europeans.

I used my new Finnish grouping splitting Finns into two genetic patterns, one consisting of more local ancestry and another resembling German Corded Ware samples published last year.  Both groups represent around 20% of the original 1000genomes Finnish sample set, after removing outliers.  It looks like the local 1000g group differs particularly in this test from East Finnish samples, which I have gathered straight from volunteers.   The difference between East and West Finland is explained more by the Iron Age British sample than the Anglo-Saxon sample.  Anglo-Saxon shows high similarity with present-day Scandinavians and looks more widespread than Iron Age Briton everywhere in Northernmost Europe. Of course more Iron Age West European samples could tell more and perhaps confirm my results.  Hopefully British researchers dig soon more Iron Age samples to fulfill my dreams.

I have two samples from my project members (ISX and LSX), added to figure better Finnish 1000gemone samples.  Both project samples are from genealogists of Finnish speaking ancestry.

I have two databases, the smaller one holding 1 million SNPs, but only a few populations, and the larger one 0.25 million SNPs and around 3000 samples.  The first one makes possible to use in this particular case around 200kSNPs, the latter gives 112368 SNPs (Anglo-Saxon) and 71478 SNPs( Iron Age sample).


edit 25.2.2016 23:25

Replacing Kent by an outgroup (Mbuti) we get absolute distances in reasonable accuracy.  Closest to the Anglo-Saxon sample are


sounds good.

And closest to the Iron Age Briton are


followed by Ireland, FinnsMostCW and Kent.

edit 26.2.2016 13:40

Ranking of ancient genomes released last year by Reich Lab.  Should be noted that some results are based on small amounts of SNPs.  It is likely that Hungary_MBA and Germany Bronze Age get too high scores due to fewer SNPs.

The second column is for the calculated difference between Yoruba and Anglo-Saxon or Iron Age Briton, compared to the difference between ancient populations and Anglo-Saxon or Iron Age Briton.  The third column is the SNP number.


Hungary_MBA.SG    0.4435    23477
Remedello_BA.SG    0.4254    178071
Germany_Bronze_Age.SG    0.4209    26827
Bell_Beaker_Germany.SG    0.4137    136565
Sintashta_MBA_RISE.SG    0.4127    245264
Andronovo.SG    0.4016    285166
Corded_Ware_Estonia.SG    0.4002    154859
Bell_Beaker_Czech.SG    0.3981    190277

Iron Age Brit:

Hungary_MBA.SG    0.4531    15019
Germany_Bronze_Age.SG    0.4109    16253
Bell_Beaker_Germany.SG    0.4071    86809
Remedello_BA.SG    0.4061    114783
Sintashta_MBA_RISE.SG    0.3982    154956
Nordic_LBA.SG    0.3839    11704
Andronovo.SG    0.3764    182788
Maros.SG    0.3758    59040