tiistai 3. maaliskuuta 2015

Starting with a new data

Just a very beginning, a new data downloaded  and compiled.  

Click here to see a large picture.

edit 04.03.15    Just to remind readers that this plot wasn't done as a projection of ancient genomes onto the present-day ones and is not straight comparable to the original Haak's plot (which projects ancient genomes).    All ancient genomes are a part of the composition.  I'll do a projected version in the next update.

edit 05.03.15  Reduced the amount of Finnish samples to correspond to average North European sample size.

I also made a new map showing projected ancient genomes, i.e. ancient genomes are placed according to coordinates of present-day populations and ancient genomes themselves have no effect on generated principal components.  Click here to download.

edit 06.03.15   Following the advice sent me I ran new PCA's with and without projection of ancient samples using original Haak's data (just same SNP's) and I didn't see difference compared to my previous plots.   Yamna samples are still on my plots a long chain between North Europe and Turkish/Armenian/Middle East, not like on Haak's plot, northward from North Europe.  

5 kommenttia:

  1. Does the new data have all the markers for each sample?

  2. This includes all common with 1000-genomes, 332000 SNP's. The original data containded 354000 SNP's. I merged Finnish, Tuscan, CEU, Kent and Cornwall samples to the Haak data and that's all. Additionally I added also ancient AngloSaxons and Britons. No other action has done. I could have imputed the 1000g data to correspond the Haak's one, but didn't see it necessary, especially because this same 1000g data corresponds exactly to the Lazaridis data with no loss. These PCA's are like a dump I made before going forward. No samples were taken away, unlike Haak in his PCA. I haven't analysed why he took some ancient samples away, there must be a reason and probably I find it out in further analysis with f4-statistics.

  3. I think that the reason why the location of my Yamna samples is different than in the Haak paper is due to my different present-day sample set. I have Uralic people and more Asian populations. It looks like Uralic or Asian samples push them. For the same reason MA1 moves to different places, depending on whether the projection is used or not, but additional present-day samples have an effect on results regardless of the use of projection. Further analyses will reveal more about the possible impact between Yamna and Uralic samples. I have also more Scandinavian HG's than Haak (he has less Scandinavian HG's although those samples are included in his data) and some of them are closer Estonians and Western Finns.

  4. Can you make a global PCA using the same data?

  5. Illya, I can use all the data of Haak et al. I am back in business after a few days and can try, but I can't promise consistent results because two dimensions are not always enough to figure things, and more than two dimensions are often difficult to digest.