Tuesday, October 18, 2016

European coarse population structure using 14.4 millions markers

I already made a Finestructure analysis before my previous Admixture based work, but didn't publish it because it gave so little additional information.   I used same data than with Admixture.   The workflow:

1 extracting chrpmosomes 1 and 6
2 running haplotypes (HAPI-UR ten times and making consensus)
3 running Chromopainter in linked mode, without defining donor haplotypes
4 running Finestructure with parameters burning 200000 and runtine 2000000

As a result we see a very obvious grouping, each ethnic group are grouped together.   Some cautions have to be made about Chromopainter-Finestrucure combination

-  first at all,  Finestructure doesn't really use dedicated haplotypes, but the number of shared haplotypes and haplotype lengths between individuals.  So there is no guarantee that in a triple sample case (individuals a, b and c)  all three share common haplotypes, even when the result of  Finestructure shows up haplotype sharing for all three samples.  This can lead to a pseudo-ancestry between individuals and also to a wrong tree grouping.

- using donor haplotypes can be methodically unreliable.  We can assign donor haplotypes for people living in Americas, but it is not equally reliable for people living in the old world.  It is a chicken egg question.  If we really know donors before testing we know the result before we have the result.   I have seen methods creating donor types (selections of prepared haplotypes), but I can't see how it could really work reliably.  Note also that speaking about donor populations (I have seen it) makes this even a more problematic question; to know donor populations we already know the population grouping before the analysis and bind donor populations to something that exists today, but did not necessarily exist thousands years ago.

While checking the data I see there a questionable sample qroup:  Swedes. They look more eastern than can be healthily suggested.

In general, looking at any results the first question is "does the result look obvious?".  If we have two different results based on any kind supervised method (like using donor haplogroups/populations) it is only common sense to see the more obvious result being the better one.   Here we have a philosophic question: what "the obvious" means for you and for me.  It makes sense, but an idea as "too obvious" lead us to tin foil hat theories. Perfection is suspicious.  We don't want it, although also it is in practice possible.   Another, much more sensible question in regards to donor haplotypes would be if we could assign  donor haplotypes of Bronze Age Europeans based on ancient samples.  It would make sense.

Dowload Finestructure picture here.

11 comments:

  1. Assigning ancient European samples as donors for modern samples is possible, and has been done in a few studies.

    The Swedes have additional Northeastern European affinity compared to English or Germans, but if they're in Northeastern European clusters they should have extra affinity to Northwest Europeans relative to other members of the cluster, visible in individual coancestry heatmap like this:
    http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002453

    The picture you linked is just the aggregate, which shows the cluster's average values.

    ReplyDelete
    Replies
    1. "Assigning ancient European samples as donors for modern samples is possible, and has been done in a few studies".

      If we want to use haplotypes.

      "The picture you linked is just the aggregate, which shows the cluster's average values".

      Yes it is a coarse grouping, but still grouping which shows for example Swedish samples being not so average, in my opinion.

      Delete

    2. I read the Pagani et al. paper about this dataset and the Swedes looked typical. Can you link the table for individuals' values so we see if they get more donations from the West European cluster than others in their grouping, as should be expected? That W-Euro cluster btw has Italians and Iberians which should never cluster with Swedes. If you remove those and rerun the data, the Swedes should change cluster.

      Delete
    3. I made a PCA including all data (Pagani+1000g). Two Swedish samples are far away from each others, not making any cluster like other ethnioc groups do including two or three samples in Pagani data. Comparing to TSI, Kent, Polish and Belarusian people the first Swede is very eastern, another is on the other side of the Slavic group, but no hell something I expected. I would never pass them through my quality control. It is sad to say this and wouldn't if you wouldn't ask.

      Delete
    4. Pagani et al. has a PCA too, and Swedes cluster as expected. Figure 1 is how they should cluster with 1000genomes populations http://biorxiv.org/content/biorxiv/early/2016/10/17/081505.full.pdf

      The two Swedes cluster together in their Chromopainter runs too. Why don't you provide the Chromopainter data I asked, since you've done that with your previous runs? I don't want to conclude you're actually trying to discredit a method by withholding data...

      Delete
    5. I have just now other plans and duscussion about Swedish samples doesn't help it forward. What I wrote about those two samples is true anyway. Sorry.

      Delete
    6. Riiight...

      Anyway, they pretty much behave as Swedish samples are expected to. There's too few of them to make their own cluster in this run of yours, but their chunkcount/chunklength sharing, which you don't want to show, would show their affinity to Northwest Europe is easily higher than other samples in their clusters.

      Delete
  2. Juu, tämän bloginhan tarkoitus on tarkastella Kalevan ja Untamon geenejä, eikä Svenin ja Fridan geenejä, joten se, miten optimaalisesti ruotsalaisten geenit kuvissa piirtyvät, ei ole olennaista.

    On hieno homma, että joku tarkastelee nimenomaan suomalaisten geenejä.

    Uskon, että kaikilla etnisillä ryhmillä on aivan oma mutkikas geneettinen historiansa, ja nykyisillä esihistorian tulkinnoilla, jotka perustuvat kovin yksipuoliseen otokseen muinaiskulttuurien näytteitä, rakennetaan liian yksinkertaista kuvaa menneisyydestä.

    Otaksun, että esimerkiksi, että minun geneettiset esi-isäni ovat puhuneet ennen kaikkea Suomen kampakeraamisen ajan kieltä, nuorakeramiikan ja Kiukaisten kulttuurin kieltä, sisä-Suomen saamea ja nyt viimeksi suomen kieltä. Meidän esihistoria on aivan meidän oma historiamme eikä kuulu kenellekään muulle.

    ReplyDelete
  3. Tämä kysymys ei liity suoraan aiheeseen, mutta kysyn kuitenkin. :-)

    Mitä mieltä sinä olet näistä Rise00:n prosenteista: https://s18.postimg.org/n7saz3s15/Admixtures.png

    On minusta aika mielenkiintoista, jos RISE00:ssa ei ole lainkaan suomalaista. Kyseinen näyte on ajoitettu 2575–2349 cal BC. Jos nuorakeramiikka on tullut Baltian ja Viron kautta Suomeen, luulisi, että Viron nuorakeraamisen näytteen Suomi-osuus ei olisi nolla. Onko tähän taas joku tekninen selitys kuten esim. että Sopen geenivirta kulki vain Virosta Suomen suuntaan eikä toisinpäin, joten RRISE00:ssa ei ole mitään nimenomaan vain suomalaisille tyypillistä perimää. Toisaalta Ruotsin vasarakirvesmiehessä on kuitenkin 2,5% suomalaista ja Esperstedtin Uneticessa ja nuorakeramiikassa on jopa 5,9 ja 4,5% suomalaista.
    Kuvan on postannut Anthrogenicaan Tomenable ja hän kirjoittaa, että kyseessä on “DNA Land results for some ancient samples (sorted from highest to lowest "North Slavic").”

    ReplyDelete
    Replies
    1. En tiedä miten tämä testi on tehty, mutta sain tietää sen perustuvan DNA.Land:n menetelmään. DNA.LAND ei itse tietääkseni tee tällaisia testejä. DNA.LAND:n software on kuitenkin ladattavissa omalle koneelle, tein sen itse noin vuosi sitten, joten on mahdollista tehdä omia testejä omalla datalla. EI niihin pidä tietenkään yhdistää DNA.LANDia auktoriteettina. Ohjelma on Dienekesin sekoituslaskuria vastaava. Tuloksiin en osaa sanoa, ne riippuvat monesta asiasta.

      Delete
    2. Uusin tieto on että muinaisgenomit on syötetty DNA.LANDin prosessiin. Tulos on suomalaisten osalta hämmentävä, enkä osaa sitä selittää.

      Delete

English preferred, because readers are international.

No more Anonymous posts.