Genome wide association study associated with milk protein composition

Document Type : Research Paper

Authors

1 Assistant Professor, Department of Animal Sciences, Faculty of Agriculture, University of Ilam. Ilam, Iran

2 Professor, Key Laboratory of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, China Agricultural University, Beijing, China

Abstract

Abstract
Introduction: Genomic selection has provided the dairy industry with a powerful tool to increase genetic gains on economically important traits such as milk production (Taylor et al. 2016). One way to identify new loci and confirm existing QTL is through genome-wide association analysis (GWAA). In addition identifying of genes loci with large effects on economically important traits, has been one of the important goal to dairy cattle breeding. QTL assisted selection and genomic regions affecting the production traits have been considered to increase the efficiency of selection and improve production performance. Genome wide association studies typically focus on genetic markers with the strongest evidence of association. However, single markers often explain only a small component of the genetic variance and hence offer a limited understanding of the trait under study. A solution to tackle the aforementioned problems, and deepen the understanding of the genetic background of complex traits, is to move up the analysis from the SNP to the gene and gene-set levels. In a gene-set analysis, a group of related genes that harbor significant SNP previously identified in GWAS, is tested for over-representation in a specific pathway.
Material and methods: The present study aimed to conduct a genome wide association studies (GWAS) based on Gene-set enrichment analysis for identifying the loci associated with milk protein composition traits. For each cow, a total of eight traits including protein yield, protein percentage, αs1-casein, αs2-casein, β-casein, κ-casein, α-lactalbumin and β-lactoglobulin were recorded using plink software and no any correction to adjust the error rate. The gene set analysis consists basically in three different steps: (1) the assignment of SNPs to genes, (2) the assignment of genes to functional categories, and finally (3) the association analysis between each functional category and the phenotype of interest.
In brief, for each trait, nominal P-values < 0.05 from the GWAS analyses were used to identify significant SNP. Using the biomaRt R package the SNP were assigned to genes if they were within the genomic sequence of the gene or within a flanking region of 15 kb up- and downstream of the gene, to include SNP located in regulatory regions. For the assignment of the genes to functional categories, the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway databases were used. The GO database designates biological descriptors to genes based on attributes of their encoded products and it is further partitioned into 3 components: biological process, molecular function, and cellular component. The KEGG pathway database contains metabolic and regulatory pathways, representing the actual knowledge on molecular interactions and reaction networks. Finally, a Fisher’s exact test was performed to test for overrepresentation of the significant genes for each gene-set. The gene enrichment analysis was performed with the goseq R package. In the next step, a bioinformatics analysis was implemented to identify the biological pathways performed in BioMart, Panther, DAVID and GeneCards databases.
Results and discussion:
Gene set enrichment analysis has proven to be a great complement of genome-wide association analysis (Gambra et al., 2013; Abdalla et al., 2016). Among available gene set databases, GO is probably the most popular, whereas KEGG is a relatively new tool that is gaining ground in livestock genomics (Morota et al., 2015, 2016). We had hypothesized that the use of gene set information could improve prediction. However, neither of the gene set SNP classes outperformed the standard whole-genome approach. Gene sets have been primarily developed using data from model organisms, such as mice and flies, so it is possible that some of the genes included in these terms are irrelevant for milk production. It is likely that a better understanding of the biology underlying milk production specifically, plus an advance in the annotation of the bovine genome, can provide new opportunities for predicting production using gene set information. According to gene set enrichment analysis, 20 categories from gene ontology and KEGG pathway were associated with the related to traits (P˂0.05). Among those categories, the Oxytocin signaling pathway, Glycerolipid metabolism, Response to progesterone, detection of calcium ion, Complement and coagulation cascades and amino acid binding including candidate genes CDH13, P4HTM, SPP1, CSN1S1, CSN2, SERPINA1, SLC35A3, ODC1 and PAEP have significant association with protein yield and content, phosphorylation of glycoproteins, coagulation and curd firming milk and lactose synthesis. Some of these regulatory regions, such as enhancers, are located far from the genes. Therefore, although the gene might be part of the analysis, the relevant variant would probably not be included in the gene set SNP class. Finally, linkage disequilibrium interferes with the use of biological information in prediction because irrelevant regions (regions without any biological role) capture part of the information encoded in relevant regions, causing both regions to exhibit similar predictive abilities. The use of very high density SNP data or even whole genome sequence data could alleviate some of these issues.
Finally, it is worth noting that our gene-set enrichment analysis was conducted using a panel of SNP obtained from a single marker regression GWAS, which relies on a simplified theory of the genomic background of traits, without considering for instance the joint effect of SNP. Hence, other approaches (e.g., GWAS exploring SNP by SNP interactions) might provide a better basis for biological pathway analysis.
Conclusion: Our finding showed a genetic selection potential for improving milk quality to milk protein composition per animal. Since genetic improvements are heritable, cumulative, and permanent, this improvement would be permanent and beneficial.

Keywords: pathway analysis, casein, α-lactalbumin, β-lactoglobulin, candidate gene

Keywords

Main Subjects


Aimutis WR, 2004. Bioactive properties of milk proteins with particular focus on anticariogenesis. Journal of Nutrition 134(4):89-95.
Aleandari R, Buttazoni G, Schneider JC, Caroli A and Davoli R, 1990. The effect of milk protein polymorphisms on milk components and cheese producing ability. Journal Dairy Science 73:241-255.
Alinaghizadeh R, Mohammad Abadi MR and Moradnasab Badrabadi S, 2007. Kappa-casein gene study in Iranian Sistani cattle breed (Bos indicus) using PCR-RFLP. Pakistan Journal of Biological Sciences 10 (23):4291-4294.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM and Sherlock G, 2000. Gene ontology: Tool for the unification of biology. Nature Genetics 25:25–29.
Boleckova J, Matejickova J, Stipkova M, Kyselova J, Bartonand L and Vyzkumny J, 2012. The association of five polymorphisms with milk production traits in Czech Fleckvieh cattle. Czech Journal of Animal Science (2):45–53. 
Browning SR and Browning BL, 2007. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. The American Journal of Human Genetics (5):1084-97.
Bu D, Luo H, Huo P, Wang Z, Zhang S, He Z, Wu Y, Zhao L, Liu and Guo J, 2021. KOBAS-i: intelligent prioritization and exploratory visualization of biological functions for gene enrichment analysis. Nucleic Acids Research 49:317–325.
Dadousis C, Pegolo S, Rosa GJM, Gianola D, Bittante G and Cecchinato A. 2017. Pathway-based genome-wide association analysis of milk coagulation properties, curd firmness, cheese yield, and curd nutrient recovery in dairy cattle. Journal of Dairy Science 100:1223-1231.
Durinck S, Spellman PT, Birney E and Huber W, 2009. Mapping identifiers for the integration of genomic datasets with the R/bioconductor package biomaRt. Nature Protocols 4:1184–1191.
Kanehisa M and Goto S. 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research. 28:27–30.
Kułaj D, Pokorska J, Ochrem A, Dusza M and Makulska J, 2019. Effects of the c.8514C > T polymorphism in the osteopontin gene (OPN) on milk production, milk composition and disease susceptibility in Holstein Friesian cattle. Italian Journal of Animal Science 18:546-553.
Kim S, Lim B, Cho J, Lee S, Dang CG, Jeon JH, Kim JM and Lee J, 2021. Genome-Wide Identification of Candidate Genes for Milk Production Traits in Korean Holstein Cattle. Animals (Basel) 11(5):1392.
Kishore A, Mukesh M, Sobti RC, Mishra BP and Sodhi M, 2013. Variations in the Regulatory Region of Alpha S1-Casein Milk Protein Gene among Tropically Adapted Indian Native (Bos Indicus) Cattle. ISRN Biotechnology 14:926025.
Kolenda M, Sitkowska B, Kamola D and Lambert BD, 2021. Composite genotypes of progestogen-associated endometrial protein gene and their association with composition and quality of dairy cattle milk. Animal Bioscience 34(8):1283-1289.
Li C, Cai W, Zhou C, Yin H, Zhang Z, Loor JJ, Sun D, Zhang Q, Liu J and Zhang S, 2016. RNA-Seq reveals 10 novel promising candidate genes affecting milk protein concentration in the Chinese Holstein population. Scientific Reports 6:26813.
Li Q, Liang R, Li Y, Gao Y, Li Q, Sun D and Li J, 2020. Identification of candidate genes for milk production traits by RNA sequencing on bovine liver at different lactation stages. BMC Genetics 21(1):72.
Liu JJ, Liang AX, Campanile G, Plastow G, Zhang C, Wang Z, Salzano A, Gasparrini B, Cassandro M and Yang LG, 2018. Genome-wide association studies to identify quantitative trait loci affecting milk production traits in water buffalo. Journal of Dairy Science 101(1):433–444.
Jassal B, Matthews L, Viteri G, Gong C, Lorente P, Fabregat A, Sidiropoulos K, Cook J, Gillespie M and Haw R, 2020. The reactome pathway knowledgebase. Nucleic Acids Research 48:498–503.
Mohammadi H, Rafat SA, Moradi Shahrbabak H, Shodja J and Moradi MH, 2020. Genome-wide association study and gene ontology for growth and wool characteristics in Zandi sheep. Journal of Livestock Science and Technologies 8(2):45-55.
Najafi MH, Mohammadi Y, Najafi A, Shamsolahi M and Mohammadi H, 2020. Lairage time effect on carcass traits, meat quality parameters and sensory properties of Mehraban fat-tailed lambs subjected to short distance transportation. Small Ruminant Research 188:106122.
Pedrosa VB, Schenkel FS, Chen SY, Oliveira HR, Casey TM, Melka MG and Brito LF, 2021. Genome wide Association Analyses of Lactation Persistency and Milk Production Traits in Holstein Cattle Based on Imputed Whole-Genome Sequence Data. Genes (Basel) 12(11):1830.
Peng G, Luo L and Siu H, 2010. Gene and pathway-based second wave analysis of genome-wide association studies. European Journal of Human Genetics 18:111–117.
Peñagaricano F, Weigel KA, Rosa GJ and Khatib H, 2013. Inferring quantitative trait pathways associated with bull fertility from a genome-wide association study. Frontiers in Genetics 3:307-314.
Playne ML, Bennett L and Smithers G, 2003. Functional dairy foods and ingredients. Australian Journal of Dairy Technology 58(3):242-64.
Rezvannejad E, Asadollahpour Nanaei H, Esmailizadeh A. 2022. Detection of candidate genes affecting milk production traits in sheep using whole-genome sequencing analysis. Veterinary Medical Science 8(3):1197-1204. 
Singh A, Kumar A, Gondro C, Pandey AK, Dutt T and Mishra BP, 2022. Genome Wide Scan to Identify Potential Genomic Regions Associated With Milk Protein and Minerals in Vrindavani Cattle. Frontiers in Veterinary Science 9:760364.
Sulimova GE, Abani Azari M, Rostamzadeh J, Mohammad Abani MR and Lazebnyĭ OE, 2007. Allelic polymorphism of kappa-casein gene (CSN3) in Russian cattle breeds and its informative value as a genetic marker. Genetika 43(1): 88-95.
Wang L, Jia P and Wolfinger RD, 2011. Gene set analysis of genome-wide association studies: Methodological issues and perspectives. Genomics 98:1–8.
Zhang J, Sun D, Womack JE, Zhang Y, Wang Y and Zhang Y, 2007. Polymorphism identification, RH mapping and association of α-lactalbumin gene with milk performance traits in Chinese Holstein. Asian-Australasian Journal of Animal Science 20(9):1327-1333.
Zhang H, Liu A, Li X, Xu W, Shi R, Luo H, Su G, Dong G, Guo G Wang Y, 2019. Genetic analysis of skinfold thickness and its association with body condition score and milk production traits in Chinese Holstein population. Journal of Dairy Science 102:2347-2352.
Zhou JP and Dong CH, 2013. Association between  a  polymorphism  of  the  α-lactalbumin  gene  and  milk  production traits in Chinese Holstein cows. Genetics and Molecular Research 12(3):3375-3382.
Zhou C, Li C, Cai W, Liu S, Yin H, Shi S, Zhang Q and Zhang S, 2019. Genome-Wide Association Study for Milk Protein Composition Traits in a Chinese Holstein Population Using a Single-Step Approach. Frontiers in Genetics 10:72.