Genome-wide association study based on gene-set enrichment analysis associated with milk yield in Holstein cattle

Document Type : Research Paper

Authors

Abstract

Introduction: Genomic selection has provided the dairy industry with a powerful tool to increase genetic gains on economically important traits such as milk production (Taylor et al. 2016). One way to identify new loci and confirm existing QTL is through genome-wide association analysis (GWAA). In addition, identifying of genes' loci with large effects on economically important traits has been one of the important goals to dairy cattle breeding. Quantitative Trait Loci assisted selection and genomic regions affecting the production traits have been considered to increase the efficiency of selection and improve production performance. Genome wide association studies typically focus on genetic markers with the strongest evidence of association. However, single markers often explain only a small component of the genetic variance and hence, offer a limited understanding of the trait under study. A solution to tackle the aforementioned problems, and deepen the understanding of the genetic background of complex traits, is to move up the analysis from the SNP to the gene and gene-set levels. In a gene-set analysis, a group of related genes that harbor significant SNP previously identified in GWAS, is tested for over-representation in a specific pathway. The aim of the present study was genome wide association studies (GWAS) based on gene set enrichment analysis for identifying the loci associated with milk yield and somatic cell score traits in Holstein cattle breed using the high-confidence SNPs that enable us to study 164312 SNP markers simultaneously.
Material and methods: In this study, the 1092 Holstein cattle and 164312 markers were analyzed with milk yield, fat percentage, and somatic cell score using plink software with no corrections to adjust the error rate. The gene set analysis consists basically in three different steps: (1) the assignment of SNPs to genes, (2) the assignment of genes to functional categories, and finally (3) the association analysis between each functional category and the phenotype of interest.
In brief, for each trait, nominal P-values < 0.05 from the GWAS analyses were used to identify significant SNP. Using the biomaRt R package the SNP were assigned to genes if they were within the genomic sequence of the gene or within a flanking region of 20 kb up- and downstream of the gene, to include SNP located in regulatory regions. For the assignment of the genes to functional categories, the gene ontology and Kyoto encyclopedia of genes and genome pathway databases were used. The GO database designates biological descriptors to genes based on attributes of their encoded products and it is further partitioned into three components: biological process, molecular function, and cellular component. The KEGG pathway database contains metabolic and regulatory pathways, representing the actual knowledge on molecular interactions and reaction networks. Finally, a Fisher’s exact test was performed to test for overrepresentation of the significant genes for each gene-set. The gene enrichment analysis was performed with the goseq R package. In the next step, a bioinformatics analysis was implemented to identify the biological pathways performed in BioMart, Panther, DAVID, and GeneCards databases.
Results and discussion: Gene set enrichment analysis has proven to be a great complement of genome-wide association analysis (Gambra et al. 2013; Abdalla et al. 2016). Among available gene set databases, GO is probably the most popular, whereas KEGG is a relatively new tool that is gaining ground in livestock genomics (Morota et al. 2015, 2016). It was hypothesized that the use of gene set information could improve prediction. However, neither of the gene set SNP classes outperformed the standard whole-genome approach. Gene sets have been primarily developed using data from model organisms, such as mice and flies; so, it is possible that some of the genes included in these terms are irrelevant for milk production. It is likely that a better understanding of the biology underlying milk production specifically, plus an advance in the annotation of the bovine genome, can provide new opportunities for predicting production using gene set information. Eleven SNP markers  were identified on  chromosomes  5,  6,  7,  8, 14, 19, 22, 24, 25, 27, and 28 located  in  ASIC2,  ANXA3CCL2CCL11CCL24, IL33, TLR3, WWOX, EGFR, PRKCA, CAMK2A, KCNMA1, FABP2, SPP1, THBS4, HSP90B1, and ITPR1 genes. Some of the found genes are consistent with some previous studies and are involved in biological pathways related to milk yield and immune systems. According to pathway analysis, 25 pathways from gene ontology and KEGG pathway were associated with the milk yield and somatic cell score traits (P˂0.01). Among those pathways, the sensory perception of chemical stimulus, positive regulation of inflammatory response, and defense response biological pathway have an important role in the immune system and somatic cell score. Also, the GnRH signaling pathway, PPAR signaling pathway, oxytocin signaling pathway, and focal adhesion had a significant association with milk yield and fat percentage traits. Some of these regulatory regions, such as enhancers, are located far from the genes. Therefore, the gene might be part of the analysis, but the relevant variant would probably not be included in the gene set SNP class. Finally, linkage disequilibrium interferes with the use of biological information in prediction, because irrelevant regions (regions without any biological role) capture part of the information encoded in relevant regions, causing both regions to exhibit similar predictive abilities. The use of very high density SNP data or even whole genome sequence data could alleviate some of these issues. Finally, it worth’s noting that our gene-set enrichment analysis was conducted using a panel of SNP obtained from a single marker regression GWAS, which relies on a simplified theory of the genomic background of traits, without considering for instance the joint effect of SNP. Hence, other approaches (e.g. GWAS exploring SNP by SNP interactions) might provide a better basis for biological pathway analysis.
Conclusion: This study supported previous results from GWAS of milk yield and somatic cell score, also revealed additional regions in the cattle genome associated with these economically important traits. So, using these findings can accelerate the genetic progress in the breeding programs.

Abdalla EA, Peñagaricano F, Byrem TM, Weigel KA and Rosa GJ, 2016. Genome-wide association mapping and pathway analysis of leukosis incidence in a US Holstein cattle population. Animal Genetics 47: 395–407.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM and Sherlock G, 2000. Gene ontology: Tool for the unification of biology. Nature Genetics 25: 25–29.
Boleckova J, Matejickova J, Stipkova M, Kyselova J, Bartonand L and Vyzkumny J, 2012. The association of five polymorphisms with milk production traits in Czech Fleckvieh cattle. Czech Journal of Animal Science (2): 45–53.
Bohlouli M, Mohammadi H and Alijani S, 2013. Genetic evaluation and genetic trend of growth traits of Zandi sheep in semi-arid Iran using random regression models. Small Ruminant Research 114: 195–201.
Chen Z, Yao Y, Ma P, Wang Q and Pan Y, 2018. Haplotype-based genome-wide association study identifies loci and candidate genes for milk Yield in Holsteins. PLoS ONE 13(2): e0192695.
Chen X, Cheng Z, Zhang S, Werling D and Wathes DC, 2015. Combining Genome Wide Association Studies and Differential Gene Expression Data Analyses Identifies Candidate Genes Affecting Mastitis Caused by Two Different Pathogens in the Dairy Cow. Open Journal of Animal Sciences 5: 358-393.
Daetwyler HD, Schenkel FS, Sargolzaei M and Robinson JA, 2008. A genome scan to detect quantitative trait loci for economically important traits in Holstein cattle using two methods and a dense single nucleotide polymorphism map. Journal of Dairy Science 91: 3225–3236.
Dadousis C, Pegolo S, Rosa GJM, Gianola D, Bittante G and Cecchinato A, 2017. Pathway-based genome-wide association analysis of milk coagulation properties, curd firmness, cheese yield, and curd nutrient recovery in dairy cattle. Journal of Dairy Science 100: 1223-1231.
Durinck S, Spellman PT, Birney E and Huber W, 2009. Mapping identifiers for the integration of genomic datasets with the R/bioconductor package biomaRt. Nature Protocols 4: 1184–1191.
Gerlando R, Sutera AM, Mastrangelo S, Tolone M, Portolano B, Sottile G, Bagnato A, Strillacci MG and Sardina MT, 2019. Genome-wide association study between CNVs and milk production traits in Valle del Belice sheep. PLoS ONE 14: e0215204.
Goddard ME and Hayes BJ, 2009. Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nature Reviews Genetics 10: 381-388.
Kanehisa M and Goto S, 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28: 27–30.
Kiser JN, Neupane M, White SN and Neibergs HL, 2018. Identification of loci associated with susceptibility to Mycobacterium Avium subspecies paratuberculosis (Map) tissue infection in cattle. Mammalian Genome 298: 539-549.
Kułaj D, Pokorska J, Ochrem A, Dusza M and Makulska J, 2019. Effects of the c.8514C > T polymorphism in the osteopontin gene (OPN) on milk production, milk composition and disease susceptibility in Holstein Friesian cattle. Italian Journal of Animal Science 18: 546-553.
May K, Scheper C, Brügemann K, Yin T, Strube C, Korkuć P, Brockmann GA and König S, 2019. Genome-wide associations and functional gene analyses for endoparasite resistance in an endangered population of native German Black Pied cattle. BMC Genomics 20: 277-236.
Marete A, Lund MS, Boichard D and Ramayo-Caldas Y, 2018. A system-based analysis of the genetic determinism of udder conformation and health phenotypes across three French dairy Cattle breeds. PLoS ONE 13: e0199931.
Mooney MA and Wilmot B, 2015. Gene Set Analysis: A Step-By-Step Guide. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics 168: 517-527.
Mohammadi H, Rafat SA, Moradi shahrebabak H, Shodja J and Moradi MH, 2018. An assessment of population stratification and haplotype based Genome-wide association for wool quality traits in Zandi sheep breed. Journal of Animal Science Researches (Agricultural science) 28(2): 193-204.
Ogorevc J, Kunej T, Razpet A and Dovc P, 2009. Database of cattle candidate genes and genetic markers for milk production and mastitis. Animal Genetics 40: 832–851.
Pacheco HA, da Silva S, Sigdel A, Mak CK, Galvão KN, Texeira RA, Dias LT and Peñagaricano F, 2018. Gene Mapping and Gene-Set Analysis for Milk Fever Incidence in Holstein Dairy Cattle. Frontiers Genetics 9: 465-478.
Peñagaricano F, Weigel KA, Rosa GJ and Khatib H, 2013. Inferring quantitative trait pathways associated with bull fertility from a genome-wide association study. Frontiers Genetics 3: 307-314.
Peng G, Luo L, Siu H, Zhu Y, Hu P, Hong S, Zhao J, Zhou X, Reveille JD, Jin L, Amos CI and Xiong M, 2010. Gene and pathway-based second wave analysis of genome-wide association studies. European Journal of Human Genetics 18: 111–117.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR and Bender D, 2007. PLINK: a toolset for whole-genome association and population-based linkage analysis. The American Journal of Human Genetics 81: 559-575.
Rincón G, Islas-Trejo A, Casellas J, Ronin Y, Soller M, Lipkin E and Medrano JF, 2009. Fine mapping and association analysis of a quantitative trait locus for milk production traits on Bos taurus autosome 4. Journal of Dairy Science 92: 758–764.
Sordillo LM and Streicher KL, 2002. Mammary gland immunity and mastitis susceptibility. Journal Mammary Gland Biology 7: 135-146.
Wang L, Jia P, Wolfinger RD, Chen X and Zhao Z, 2011. Gene set analysis of genome-wide association studies: Methodological issues and perspectives. Genomics 98: 1–8.
Yang Y, Wang Q, Chen Q, Liao R, Zhang X, Yang H, Zheng Y, Zhang Z and Pan Y, 2014. New genotype imputation method with tolerance to high missing rate and rare variants. PLoS ONE 9: e101025.
Young MD, Wakefield MJ, Smyth GK and Oshlack A, 2010. Method gene ontology analysis for RNA-seq: Accounting for selection bias. Genome Biology 11: 14-23.
Zhang H, Liu A, Li X, Xu W, Shi R, Luo H, Su G, Dong G, Guo G and Wang Y, 2019. Genetic analysis of skinfold thickness and its association with body condition score and milk production traits in Chinese Holstein population. Journal of Dairy Science 102: 2347-2352.