sunnuntai 13. elokuuta 2017

Project admixture analyses, revised

Now I used more SNP's with the method coded by Dna.Land authors.  It is now also possible to download all necessary tools for DIY purpose.  It works only on Linux and needs Python to be installed.  Here is a help how to install Python on Ubuntu.

Some comments to understand more about results:

- after a lot of testing I found that the Swedish sample bunch published by the study "No Evidence from Genome-Wide Data of a Khazar Origin for the Ashkenazi Jews" (Behar et.al) doesn't fit well with my Swedish project samples and all of them express more Northwest and Central European than the aforementioned Swedish reference.  This happens even if their self-declarations presume some Finnish admixture.   Therefore I decided to label them as East_Scandinavians, which seemed to be correct.  I wonder where they are geographically from.  

- Saami reference samples, unfortunately too few of them were available leading to increased statistic error,  cannot be considered as a source of Siberian.  They represent here a much more diverse source of genetic history.   The small Siberian admixture usually seen in Finnic results is built in Finnish results for the reason that the present-day Siberianness among Finnic people is old and distinct and doesn't match with present Siberians if we simultaneously use also Finnic reference samples.

The summarizing tree:













Results:

FI1
Finnic 54.6
East_Scandinavian 25.5
Saami 8.0
Northeast_European 5.7
Slavic 2.2
Northwest_European 1.0
Central_European 1.3
AMBIG_European 1.7

FI2
Finnic 52.7
Northwest_European 31.3
East_Scandinavian 6.1
Saami 3.8
Northeast_European 2.3
Slavic 1.9
Central_European 1.4

FI3
Finnic 72.9
East_Scandinavian 16.3
Saami 3.6
Baltic 3.4
Northwest_European 1.6
Northeast_European 1.7

FI4
Finnic 49.1
Northwest_European 25.1
East_Scandinavian 12.9
Northeast_European 5.1
Slavic 2.4
Saami 2.9
Baltic 2.3

FI5
Finnic 80.0
East_Scandinavian 13.4
Saami 3.4
Central_European 2.5

FI6
Finnic 54.2
East_Scandinavian 27.2
Baltic 9.3
Northwest_European 2.3
Saami 1.8
Northeast_European 1.7
Mediterranean 1.1
Central_European 2.0

FI7
Finnic 97.8
AMBIG_European 1.7

FI8
Finnic 95.1
East_Scandinavian 2.1
Baltic 1.7
AMBIG_European 1.1

FI9
Finnic 85.7
East_Scandinavian 11.9
Baltic 2.1

FI10
Finnic 64.0
Saami 31.5
Siberian 2.5
Uralic 1.0

FI11
Finnic 92.7
East_Scandinavian 5.2
Saami 2.1

FI12
Finnic 83.6
East_Scandinavian 15.2
AMBIG_European 1.1

FI14
Finnic 77.7
Baltic 16.8
East_Scandinavian 2.9
AMBIG_European 2.1

FI15
Finnic 97.8
AMBIG_European 1.7

FI16
Finnic 73.6
East_Scandinavian 14.6
Northwest_European 5.1
Central_European 5.4
Saami 1.0

FI17
Finnic 67.8
East_Scandinavian 14.8
Central_European 7.1
Slavic 5.1
Saami 1.4
Mediterranean 1.6
AMBIGUOUS 1.1
AMBIG_East_European 1.1

FI18
Finnic 82.0
East_Scandinavian 12.9
Saami 2.9
Baltic 1.5

FI19
Finnic 73.6
East_Scandinavian 14.3
Saami 6.9
Northwest_European 3.8
Slavic 1.0

FI20
Finnic 75.8
East_Scandinavian 17.3
Saami 4.6
Slavic 1.8

FI21
Finnic 94.0
Saami 2.1
AMBIG_European 2.0
Baltic 1.0

FI22
Finnic 94.5
Saami 1.3
Baltic 1.9
AMBIG_European 1.2
AMBIG_East_European 1.1

FI23
Finnic 68.8
East_Scandinavian 20.0
Saami 3.6
Northwest_European 3.3
Slavic 2.4
Central_European 1.7

SC2
Northwest_European 40.3
East_Scandinavian 22.4
Central_European 15.9
Finnic 9.0
Slavic 4.2
Baltic 3.9
Saami 1.6
Mediterranean 1.7
AMBIG_European 1.0

SC3
Northwest_European 52.1
East_Scandinavian 20.5
Finnic 14.8
Slavic 4.1
Central_European 4.3
Baltic 3.9

SC4
Northwest_European 59.5
East_Scandinavian 27.3
Central_European 5.4
Baltic 5.8
Saami 1.8

SC5
Northwest_European 38.1
East_Scandinavian 32.9
Finnic 11.5
Baltic 9.9
Northeast_European 3.4
Uralic 1.6
Central_European 1.9

SC6
Northwest_European 40.8
Finnic 20.7
Northeast_European 13.1
Central_European 9.0
East_Scandinavian 8.4
Slavic 4.0
Saami 2.1
Baltic 1.9

SC7
Northwest_European 45.8
East_Scandinavian 31.9
Finnic 11.5
Mediterranean 6.2
Slavic 2.2
Northeast_European 2.1

Although my primary goal was to find out Finnic and Scandinavian admixtures this obviously works fine for almost all Europeans, at least to some extent.

Other samples for a verification purpose:
 
Irish sample
Northwest_European 90.0
East_Scandinavian 8.7
AMBIG_European 1.3

Western Polish sample
Slavic 49.8
Baltic 18.3
Central_European 14.4
Northwest_European 6.5
Northeast_European 3.5
East_Scandinavian 4.0
Mediterranean 2.3
Uralic 1.1

Sardinian sample
Mediterranean 93.2
Northwest_European 4.5
East_Scandinavian 1.3

Baltic sample
Baltic 70.6
East_Scandinavian 12.3
Slavic 7.9
Northeast_European 6.3
Central_European 2.6
 
Lithuanian/Yotvingian sample
Baltic 49.0
Slavic 37.5
Central_European 5.8
Mediterranean 4.1
Northeast_European 1.8
AMBIG_European 1.7

Estonian sample
Finnic 41.4
Baltic 19.6
Slavic 16.8
Central_European 9.7
East_Scandinavian 7.8
Saami 2.3
Northeast_European 2.3

Genomes Unzipped sample
Mediterranean 45.7
Northwest_European 19.2
Central_European 19.9
East_Scandinavian 12.3
Slavic 1.9

Genomes Unzipped sample
Mediterranean 37.4
Northwest_European 37.0
East_Scandinavian 15.5
Central_European 9.3

Admixture sums don't give full 100 % because all admixtures below 1% are ignored.

Program downloading and running

Download programs here.  Unzip and locate all programs into a same directory.  To run tests you need use a command line "bash ./ajo1.sh <sample-id>,  where sample-id is the file name holding your genetic data in 23andme format.  The sample file must be compressed with gz file extension (gzip format), but on the command line you give only the sample id (sample-id.gz), not the extension.  The test works fine with following genome builds:  HG18, HG19, GRCh36, GRCh37, but if your genome file is in the FtDna format you have to convert it into the 23andme style.  On Linux it is done easily using four command line entries:

first unzip your genome file and then

cp <original filename> <sample-id>
sed -i 's/\"//g' <sample-id>
sed -i 's/,/\t/g' <sample-id>
gzip <sample-id> 

If your data is already in the 23andme format, but not compressed with gz file extension then you need to unzip it first and run the first and fourth commands explained as above.

edit date 14.8.17 time 17:30

Another Estonian results.  I can only say that it is plausible considering the history

Baltic 37.2
Slavic 29.6
Finnic 22.8
East_Scandinavian 8.1
Saami 1.5

edit 15.8.17 time 17:45

A British results.  It looks like Irish with more Mediterranean and minor Central European admixture..

Northwest_European 81.6
Mediterranean 10.3
Baltic 3.4
Central_European 2.9
AMBIG_European 1.8
 








2 kommenttia:

  1. Wow super interesting. I'll pay you money to put mine and my father's DNA through this test. Is $10 ok?

    We both have a drop of Native Americana and African ancestry, due to Latin American ancestry, which I see you don't have a reference for. Will that mess up our European scores at all?

    VastaaPoista
    Vastaukset
    1. A drop non-European wouldn't mess up the result. If it is more then the Native American will turn out to be East Asian or Siberian. The job queue on my laptop is just now full and will be for the next three weeks. I have a plan to make worldwide test too. You can send message to a temporary email mjxa at luukku.com and I'll contact you later. Don't send data yet please.

      Poista