Often especially ancient samples have poor quality, although sample
number can be reasonable. I made a simple Linux shell script making an
average sample from a sample group, giving possibility to increase
sample quality to a reasonable level for analyses based on allele
frequencies. The script reads EIGENSTRAT-format and the result is
formed by picking alleles randomly from pooled samples. Unfortunately
Linux shell scripts don't support indexed files and I had to make some
compromises to keep run time reasonable. The result of this script
will not work with analyses based on IBD's or principal components,
so it is not possible to use it f.ex. as an input file of the popular Eurogenes G25
test, but this should work with all analyses on Dodecad platform. If
someone is not familiar with these semantics it is very possible that
the outcome is disappointing. The script is freely available
here. I forgot, you need also a rs-id file and it is available
here. It should be unzipped to the same directory with the script.
Some results below using available academic samples and based on my own
models:
Medieval Nomads
East_Asian 42.4
Uralic 24.5
Siberian 15.9
Northeast_European 6.1
Mediterranean 4.4
East_Scandinavian 3.3
Fennoscandinavian 2.2
AMBIG_European 1.2
Iberian Chalcolithic
Mediterranean 80.4
Northwest_European 15.6
Central_European 2.7
East_Scandinavian 1.4
Hungarian Bronze Age
Mediterranean 86.4
Central_European 11.3
Fennoscandinavian 1.1
East_Scandinavian 1.2
Polish Bronze Age
Northwest_European 39.0
Slavic 37.8
Fennoscandinavian 11.9
Central_European 6.5
East_Scandinavian 3.6
Baltic 1.2
Estonian Iron Age
Baltic 57.1
Slavic 26.1
Fennoscandinavian 11.5
Finnic 2.8
East_Scandinavian 2.2
edit 8.5. 15:30
The script edited so that it will accept also rs-id's in input, the original version accepted only concatenation id's (chr:location). Please notice that only hg19/GRCh37 mappings are possible. New version is available
here.