maanantai 27. kesäkuuta 2016

Global ROH-results

ROH (runs of homozygosity) predicts or estimates individual autozygosity for a subpopulation.   After reading the study "Genetic characterization of northeastern Italian population isolates in the context of broader European genetic diversity" I stopped to ponder its statistics, because the presentation in the figure 5 shows decimals for country ROH averages.  Using integers results below 1 are not possible without individual zero values and zero values in practice means some lost data.  It seemed necessary to count shorter ROH segments to get more precise results.  Although my statistics looks in general reasonable, I can't take responsibility for possible bad sampling regarding some ethnic groups. 

Data and processes

Primaty data: 600 ksnp, with very low no-call rate
LD-pruning:  ./plink --noweb --bfile LARGEDATA --indep-pairwise 200 25 0.4
Pruned data: 160 ksnp
ROH process: ./plink --noweb --bfile LARGEDATA --extract --homozyg --homozyg-window-kb 5000 --homozyg-window-snp 25 --homozyg-snp 50 --homozyg-window-het 1 --homozyg-window-missing 1  --homozyg-density 50 --homozyg-window-threshold 0.05 --homozyg-gap 100 --homozyg-kb 1000

My goal was to find smaller ROH segments and it was done by changing three parameters: homozygosity-density, homozygosity-snp and homozygosity-kb, not big changes, but enough to do it.   There is an optimum combination of SNP and basepair lenghts and comparing to the study I picked smaller basepair length (1500->1000) and longer SNP length (25->50).  This did the trick.   

ROH count on the X-axis, total ROH length in basepairs on the Y-axis.   

Large picture:

Small picture covering the left bottom corner:

Pictures with better resolution:


Zoom in