Evaluation of three methods related to Genome-Wide Association studies for identify gene locus using simulated data

Document Type : Research Paper

Authors

1 student

2 Professor

3 teacher

Abstract

Introduction: Due to the widespread distribution of SNPs throughout the genome, these markers are widely used in livestock breeding research. These markers were used to predict the disease risk in human, to localize genetic variations responsible for complex traits through genome wide association study (GWAS), and to predict the genetic values of economically important traits in plant and animal breeding (Zhang et al 2015). Mostly whole genome scanning methods are based on two SSGWAS (Single SNP Genome-Wide Association Studies) and multiple markers methods. The SSGWAS method is able to identify a large number of common variables affecting quantitative traits. However, a large proportion of the genetic variance remains to be explained (Shirali et al 2018). In quantitative traits the proportion of phenotypic variance explained by SNPs is related to the number of adjacent SNPs in the genomic region. The heritability created by these genomic regions is defined as the regional heritability. The RHM (Regional Heritability Mapping) method is used to identify small genomic regions. This method can capture more of the missing genetic variation (Nagamine et al 2012). In RHM, a mixed model framework based on Restricted Maximum Likelihood (REML) is used, and two variance components, one contributed by the whole genome and a second one by a specific genomic region, are fitted in the model to estimate genomic and regional heritabilities, respectively (Uemoto et al 2013). Also fastBAT (fast and flexible set-Based Association Test) is a method that performs a fast set-based association analysis (Bakshi et al 2016). The purpose of this study is compare SNPs and regions identified by the Genome-Wide Association methods, compare these results with the simulated QTLs and also investigate and determine the false positive results in each method.
Material and methods: In this study, markers and populations were simulated as a Forward-in-time process using QMSim software (Sargolzaei and Schenkel 2009). For this population, 27586 single nucleotide polymorphisms (SNPs) were counted on 3 pairs of autosomal chromosomes. Simulation was performed in 3 scenarios with 75, 150 and 300 quantitative trait loci (QTL). The minimum and maximum number of SNPs in the analysis after quality control were 19662 and 23817 SNPs, respectively. For each scenario, 10 replicates were simulated, in all scenarios, heritability was 0.2 which corresponded equally to the polygenic and QTLs effects. Whole genomic relationship and pedigree base genetic relationship matrices were used in all 3 methods to estimate genetic parameters. To create the whole genomic relationships matrix, whole genomic additive effects was estimated using all SNPs. Also the additive effect of genomic regions was estimated using the regional genomic relationship matrix. Whole genomic relationships matrix and regional genomic relationship matrix were estimated based on genetic relationships between individuals using SNPs by GCTA software (Yang et al 2011). Pedigree based genetic relationship matrix was created by the kinship relationship between individuals using pedigree package (Coster 2013) of RStudio software (RStudio Inc 2013). To perform RHM and to estimate variance components, windows containing 50 genotyped SNPs were considered. Also windows containing 25 genotyped SNPs to overlap between two consecutive windows throughout the genome were used. SSGWAS analysis were performed by MLMA (Yu et al 2006) method using GCTA software. MLMA results were adjusted based on P-value at 5% significant threshold using Bonferroni correction. To evaluate the results of SSGWAS using fastBAT method, GCTA software was used.
Results and discussion: For each replication after identifying significant SNPs, the genetic variance explained by these SNPs was estimated by equation (Faulkner & McKay 1996). In Table 1, the number of QTLs detected by the SSGWAS method, the MAF of QTLs, the range and mean of genetic variance explained by significant SNPs and QTLs are reported. For 30 replicates of simulation in SSGWAS, 16 QTLs were detected containing 2 QTLs with MAF≤0.1 and other detected QTLs with MAF≥0.1. 107 Significant regions were identified in fastBAT method. In this method, 120 QTLs were detected in 3 scenarios containing 52 QTLs with MAF≤0.1. All QTLs detected in the fastBAT and SSGWAS methods were also detected in the RHM method. In RHM method, 612 regions containing simulated QTLs and number of 316 QTLs with MAF≤0.1 were detected. In all replications, the variance explained by SNPs was equal to the variance explained by QTLs. In SSGWAS, less number of QTLs were detected than the other two methods and the maximum variance explained by QTLs was 14.9%. The criterion used to determine false positive QTLs was the absence of significant QTL in the before and after significant windows containing QTLs. In SSGWAS method the percentage of false positive QTLs was higher than the other two methods. In fastBAT, unlike the other two methods, detected QTLs were not false positive. In table 5 Number of detected QTLs, MAF range of QTLs, range and mean of genetic variance explained by detected QTLs and SNPs in fastBAT are shown. Many QTLs and regions detected by RHM method were not detected by SSGWAS and fastBAT methods. The genetic variance explained by detected QTLs in the RHM was at the range of 7.26 to 46.86% that was higher than other two methods. In table 6 the three methods compared by the number of detected QTLs, number of false positive QTLs, number of stable QTLs and the number of detected QTLs with MAF≤0.1. We found that QTLs with MAF≤0.1 were more frequently detected in RHM than the other two methods. These results confirmed that the RHM method was able to identifying more of QTLs affecting the trait variance.

Keywords


Bakshi A Zhu ZH Anna AE Vinkhuyzen W Hill D McRae AF Visscher PM and Yang J, 2016. Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits. Scientific Reports 6: 32894.
Brito FV Braccini Neto J Sargolzaei M Cobuci JA and Schenkel FS, 2011. Accuracy of genomic selection in simulated populations mimicking the extent of linkage disequilibrium in beef cattle. BMC Genetics 12: 80.
Cebamanos L Gray A Stewart I and Tenesa A, 2014. Regional Heritability Advanced Complex Trait Analysis for GPU and Traditional Parallel Architectures. Bioinformatics 30(8): 1177-1179.
Coster A, 2013. https://CRAN.R-project.org/package=pedigree.
Eaves LJ Last KA Young PA and Martin NG, 1978. Model-fitting approaches to the analysis of human behaviour. Heredity (Edinb) 41: 249–320.
Evans LM Tahmasbi R Vrieze SI Abecasis GR Das S Gazal S and Keller MC, 2018. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nature Genetics 50(5): 737-745.
Falconer DS and Mackay TFC, 1996. Introduction to Quantitative Genetics, 4th edn. Pearson Education Limited: Harlow, UK.
Fisher RA, 1918. The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh 52: 399-433.
Hayes BJ Visscher PM Goddard ME, 2009. Increased accuracy of artificial selection by using the realized relationship matrix. Genetics Research 91: 47-60.
Hill WG and Robertson A, 1968. Linkage disequilibrium in finite populations. Theoretical and Applied Genetics 38: 226-231.
Hindorff LA Sethupathy P Junkins HA Ramos EM Mehta JP Collins FS and Manolio TA, 2009. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceeding of the National Academy of Sciences 106(23): 9362–9367.
Keller MC and Coventry WL, 2005. Quantifying and addressing parameter indeterminacy in the classical twin design. Twin Research and Human Genetics 8: 201–213.
Lee JJ and Chow CC, 2014. Conditions for the validity of SNP-based heritability estimation. Human Genetics 133(8): 1011–1022.
Nagamine Y Pong-Wong R Navarro P Vitart V Hayward C Rudan I Campbell H Wilson J Wild S Hicks AA Pramstaller PP Hastie N Wright AF and Haley CS, 2012. Localising loci underlying complex trait variation using Regional Genomic Relationship Mapping. PLOS ONE 7(10): e46501.
Nolte IM Jansweijer JA Riese H Asselbergs FW Harst PVD Spector TD Pinto YM Snieder H and Jamshidi Y, 2017. A Comparison of Heritability Estimates by Classical Twin Modeling and Based on Genome-Wide Genetic Relatedness for Cardiac Conduction Traits. Twin Research and Human Genetics 20(6): 489-498.
Purcell S Neale B Todd-Brown K Thomas L Ferreira MA Bender D Maller J Sklar P de Bakker PI Daly MJ and Sham PC, 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. American journal of human genetics 81(3): 559–575.
RStudio Inc, 2013. shiny: web application framework for R. http://CRAN.R-project.org/package=shiny.
Sargolzaei M and Schenkel FS, 2009. QMSim: a large-scale genome simulator for livestock. Bioinformatics 25: 680-681.
Shirali M Pong-Wong R Navarro P Knott S Hayward C Vitart V Rudan I Campbell H Hastie ND Wright AF and Haley CS, 2016. Regional heritability mapping method helps explain missing heritability of blood lipid traits in isolated populations. Heredity (Edinb) 116: 333-338.
Shirali M Knott SA Pong-Wong R Navarro P and Haley CS, 2018. Haplotype Heritability Mapping Method Uncovers Missing Heritability of Complex Traits. Scientific Reports 8(1): 4982.
Simeone R Misztal I Aguilar I and Legarra A, 2011. Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabeled genotyped animals in a broiler chicken population. Journal of animal Breeding and Genetics 128(5): 386–393.
Tropf FC, Hong Lee S, Verweij RM, Stulp G, van der Most PJ, Vlaming RD, Bakshi A, Briley DA, Rahal C Hellpap R Nyman A Iliadou AN Esko T Metspalu A Medland SE Martin NG Barban N Snieder H Robinson MR and Mills MC, 2017. Hidden heritability due to heterogeneity across seven populations. Nature human behaviour 1(10): 757–765.
Uemoto Y Pong-Wong R Navarro P Vitart V Hayward C Wilson JF Rudan I Campbell H. Hastie ND Wright AF and Haley CS, 2013. The power of regional heritability analysis for rare and common variant detection: simulations and application to eye biometrical traits. Frontiers in Genetics 4: 232.
Valdisser PAMR Pereira WJ Almeida Filho JE Müller BSF Coelho GRC de Menezes IPP Vianna JPG Zucchi MI Lanna AC Coelho ASG de Oliveira JP da Cunha Moraes A Brondani C and Vianello RP, 2017. In-depth genome characterization of a Brazilian common bean core collection using DArTseq high-density SNP genotyping. BMC Genomics 18: 423.
Yang J Benyamin B McEvoy B Gordon S Henders AK Nyholt DR and Visscher PM, 2010. Common SNPs explain a large proportion of the heritability for human height. Nature Genetics 42: 565–571.
Yang J Lee SH Goddard ME, and Visscher PM, 2011. GCTA: a tool for genome-wide complex trait analysis. American journal of human genetics 88(1): 76–82.
Yu J Pressoir G Briggs WH Vroh Bi I Yamasaki M Doebley JF McMullen MD Gaut BS Nielsen DM Holland JB Kresovich S and Buckler ES, 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics 38: 203–208.
Zaitlen N Kraft P Patterson N Pasaniuc B Bhatia G Pollack S and Price AL, 2013. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLOS Genetics 9: e1003520.
Zeng Y Navarro P Fernandez-Pujals AM Hall LS Clarke TK and Thomson PA, 2017. A combined pathway and regional heritability analysis indicates NETRIN1 pathway is associated with major depressive disorder. Biological Psychiatry 81(4): 336–346.
Zhang Zh Li X Ding X Jiaqi L and Zhang Q, 2015. GPOPSIM: a simulation tool for whole-genome genetic data. BMC Genetics 16:10.