ROH (runs of homozygosity) predicts or estimates individual autozygosity for a subpopulation. After reading the study "Genetic characterization of northeastern Italian population isolates in the context of broader European genetic diversity" I stopped to ponder its statistics, because the presentation in the figure 5 shows decimals for country ROH averages. Using integers results below 1 are not possible without individual zero values and zero values in practice means some lost data. It seemed necessary to count shorter ROH segments to get more precise results. Although my statistics looks in general reasonable, I can't take responsibility for possible bad sampling regarding some ethnic groups.
Data and processes
Primaty data: 600 ksnp, with very low no-call rate
LD-pruning: ./plink --noweb --bfile LARGEDATA --indep-pairwise 200 25 0.4
Pruned data: 160 ksnp
ROH process: ./plink --noweb --bfile LARGEDATA --extract plink.prune.in --homozyg --homozyg-window-kb 5000 --homozyg-window-snp 25 --homozyg-snp 50 --homozyg-window-het 1 --homozyg-window-missing 1 --homozyg-density 50 --homozyg-window-threshold 0.05 --homozyg-gap 100 --homozyg-kb 1000
My goal was to find smaller ROH segments and it was done by changing three parameters: homozygosity-density, homozygosity-snp and homozygosity-kb, not big changes, but enough to do it. There is an optimum combination of SNP and basepair lenghts and comparing to the study I picked smaller basepair length (1500->1000) and longer SNP length (25->50). This did the trick.
ROH count on the X-axis, total ROH length in basepairs on the Y-axis.
Large picture:
Small picture covering the left bottom corner:
No comments:
Post a Comment
English preferred, because readers are international.
No more Anonymous posts.