Identification of key genes in bovine mammary epithelial cells challenged with Escherichia coli and Staphylococcus aureus by integrated bioinformatics analysis

ABSTRACT This study integrated four microarray datasets by Robust Rank Aggregation (RRA) method to identify the differentially expressed genes (DEG) in bovine mammary epithelial (BME) cells in response to Escherichia coli and Staphylococcus aureus infection. Furthermore, the GO function and KEGG pathway enrichment analysis of the integrated DEG were performed. Finally, the protein-protein interaction (PPI) network was constructed. A total of 72 integrated DEG were identified from the four datasets. The most significantly enriched terms within the integrated DEG were mainly involved in the immune response. The PPI network of DEG was constructed with 53 nodes. Seventeen genes, which constitute a significant module, were identified as hub genes. Among them, CD40, CXCL6, and NFKBIZ were further screened as the key genes and have the potential to become biomarkers of E. coli and S. aureus mastitis, considering the specificity of biomarkers for diseases. The identified key genes and pathways in this study can assist in the search for biomarkers for mastitis diagnosis and disease resistance breeding.


Introduction
Cattle breeding, including dairy and beef cattle breeding, is the dominant livestock industry in many countries.It provides a significant source of nutrition to humans.However, pathogenic microbial infection is an important obstacle hindering the development of cattle breeding, causing significant losses and animal welfare problems.For example, bovine mastitis, an inflammatory disease caused mostly by the dysbiosis of the mammary gland microbiota (Steinberg et al., 2022), is the costliest disease in dairy farming.Bovine mastitis can cause a series of problems to the production system, such as a decrease in milk quality and production, changes in milk composition, and increased costs for treatments, labor and culling.These problems not only lead to huge economic losses but also create concerns for animal welfare and human health (Islam et al., 2020).
Escherichia coli and Staphylococcus aureus are the common pathogenic bacteria that lead to mastitis in dairy cows (Lin et al., 2021).A large number of studies have confirmed E. coli often leads to an acute and severe form of mastitis, while S. aureus leads to chronic but sub-clinical mastitis (Petzl et al., 2018).In addition, mastitis caused by S. aureus is frequently infectious (Leuenberger et al., 2019), whereas E. coli is usually environmental (Zaatout, 2022).However, despite many efforts dedicated to understanding the pathogenesis of E. coli and S. aureus mastitis, the pathogenic mechanism is still poorly understood.In addition, the current treatment of mastitis caused by pathogens is based on antibiotic therapy, which has disadvantages including the development of resistance, causing drug residues in milk (Younis et al., 2016) and increasing milk production costs.Hence, new alternative options are urgently needed for mastitis treatment and prevention.Better knowledge on the pathogenic mechanism of mastitis is required to allow the development of new effective curative strategies.Moreover, the comprehensive molecular investigation on host-pathogen interactions may aid in the identification of new biomarkers for the rapid detection of mastitis caused by pathogens.
The expression profile of genes associated with E. coli and S. aureus mastitis infection in cows have provided very useful information for the understanding of the infectious process and screening differentially expressed genes (DEG) as rapid detection biomarkers.Many studies on data analysis of gene expression profile on mastitis infection of cows have been carried out, and hundreds of DEG have been identified (Han, 2019).For instance, a study showed that 447 and 520 DEG are involved in bovine mammary epithelial (BME) cells line developed by the authors after lipopolysaccharide and S. aureus stimulation, respectively (Islam et al., 2020).Another study identified 92 and 81 DEG in the primary BME cells challenged with E. coli or S. aureus in all periods (Han, 2019).It can be seen that in different studies, the DEG identified are not completely consistent.The reason is that different independent experiments are always limited by the heterogeneity of experimental samples with the environment, breed, population, and specific animal differences.Therefore, it is difficult to fully elucidate the molecular mechanism of this disease by using an expression profile obtained in a single independent study.The Robust Rank Aggregation (RRA) method is a suitable and effective solution to obtain integrated DEG from different datasets performed by different technological platforms (Võsa et al., 2014).This method checks the ranking of each gene in each dataset and is based on the assumption that each gene identified in each experiment is randomly arranged.Therefore, the RRA method compares the ranking of the randomly sorted list with the baseline case, and a higher gene ranking is associated with a lower P-value (Gao et al., 2018).The RRA method has been used in many studies to integrate different genes under different conditions (Gao et al., 2018;Subramanian and Natarajan, 2021).However, this method has not been used to integrate and analyze the key genes in the state of mastitis.Hence, this study used the RRA method to integrate and analyze the results of multiple gene expression datasets to better understand the molecular mechanism of mastitis.
In this study, two Gene Expression Series, namely GSE24560 and GSE25413, were selected from the Gene Expression Omnibus (GEO) databases.Through the analysis of the database, this study identified the key genes and pathways in E. coli and S. aureus mastitis of dairy cows, to further clarify the immune response mechanism of dairy cows and explore candidate genes for dairy cow mastitis biomarkers.Therefore, this research focuses on finding the common points of the two bacterial infections, such as finding the common triggering mechanism and the common key genes in the body's response to the two bacterial infections.

Dataset collection and analysis
The GSE24560 and GSE25413 were selected from the GEO databases (https://www.ncbi.nlm.nih.gov/geo/).Each dataset contains the expression profile of normal and challenged BME cells with E. coli or S. aureus, respectively.Each dataset is divided into two subsets according to whether it is processed by E. coli or S. aureus and renamed as GSE24560E, GSE24560S, GSE25413E, and GSE25413S.These studies were based on the platform of the GPL2112 (Affymetrix Bovine Genome Array).In total, 24 normal data, 26 challenged with E. coli and 26 challenged with S. aureus, were obtained (Table 1).These studies were all generated using primary bovine mammary epithelial cells.Datasets GSE24560E and GSE24560S were obtained after cells (3×10 5 cells/well) were treated with heat-inactivated E. coli isolates and S. aureus M60 (100 μL of bacterial-solution) derived from bovine milk samples of mastitis for 6 and 24 h, respectively.Datasets GSE25413E and GSE25413S Identification of key genes in bovine mammary epithelial cells challenged with Escherichia coli and... Chen et al.
Then, the GEO2R online analysis tool (https://www.ncbi.nlm.nih.gov/geo/geo2r/) was used to normalize the data and compare the DEG in the samples of the normal control group and the treatment group.The significant DEG were selected by a threshold of P-value <0.05 and |logFC|>0.585.The upregulated and downregulated DEG lists were saved as TXT files independently for subsequent integration analysis.

Integration of microarray data
The TXT files of all DEG lists sorted by |logFC| were integrated using the RRA R package (version 1.1), and the integrated DEG were saved for subsequent analysis.This method aggregates the provided gene rankings into a list, and each gene is assigned a P-value.P<0.05 was chosen as a threshold for defining significant genes which were identified as the integrated DEG.

Enrichment analyses of DEG
The Gene Ontology (GO) annotation and notably relevant Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of the integrated DEG were performed using DAVID 6.8 database (https://david.ncifcrf.gov/).Gene counts >5 and P<0.01 were chosen as the threshold for defining statistical significance.R software (version 4.1.0)and ggplot2 package (version 3.3.5)were used for data visualization.

Protein-protein interaction network construction and analysis of the module
The STRING database v11.5 (https://string-db.org/)was used to analyze the interaction relationships between integrated DEG.Based on this information, a protein-protein interaction (PPI) network was visualized by Cytoscape, and Centiscap, a plugin in Cytoscape, was used to calculate the degree of each node.Then, the PPI network was analyzed by another Cytoscape plugin-MCODE to search the module of highly interconnected nodes.

Identification of primitive DEG
By employing the GEO2R tool, the expression datasets were first normalized.Then, the primitive DEG were identified in the E. coli-infected (26 samples) and the S. aureus-infected (26 samples) group, respectively.Compared with the normal control group, the infected group in GSE24560E contained Identification of key genes in bovine mammary epithelial cells challenged with Escherichia coli and... Chen et al.
390 upregulated genes, 209 downregulated genes, and a total of 599 DEG; the infected group in GSE24560S contained 227 upregulated genes, 195 downregulated genes, and a total of 422 DEG; the infected group in GSE25413E contained 387 upregulated genes, 497 downregulated genes, and a total of 884 DEG; the infected group in GSE25413S contained 179 upregulated genes, 90 downregulated genes, and a total of 269 DEG (Figure 1).

Identification of integrated DEG
The primitive DEG of the four datasets were saved in a TXT file according to the upregulated and downregulated genes separately, and the genes were sorted by |logFC| order.These primitive DEG were analyzed by the RRA package in R software.If the gene ranks high in as many datasets as possible, its associated P-value will be lower.In this study P<0.05 was chosen as a threshold.Through A: GSE24560E (heat-inactivated E. coli isolates); B: GSE24560S (heat-inactivated S. aureus M60); C: GSE25413E (heat-inactivated E. coli strain 1303); D: GSE25413S (heat-inactivated S. aureus strain 1027).Red dots represent the upregulated genes, green dots represent the downregulated genes, and black spots represent stable genes.the RRA analysis, 59 upregulated genes, 13 downregulated genes, and a total of 72 integrated DEG were identified (Table 2).The top 10 upregulated genes and the top 10 downregulated genes were mapped on a heat map (Figure 2).
The ordinate represents the gene symbol, the abscissa represents the datasets, red represents upregulated genes, green represents downregulated genes; NA -not identified as DEG in this dataset; the number in the box represents the Log FC value.

Gene ontology enrichment analysis
The GO biological process (BP) terms and KEGG pathways were enriched for the 72 integrated DEG using the DAVID database.In the infected group, 18 BP terms and 24 KEGG pathways were enriched with P<0.01 and gene counts >5 as the cutoff points (Figure 3).From this, the integrated DEG were mainly involved in the immune process such as inflammatory response, chemokine-mediated signaling pathway, immune response, and innate immune response.The top 20 KEGG pathways were analyzed according to the P-value of the integrated DEG (Figure 4).The integrated DEG were mainly enriched in five pathways (counts>10), namely, TNF signaling pathway, Toll-like receptor signaling pathway, Rheumatoid arthritis, Influenza A, and Cytokine-cytokine receptor interaction.

Protein-protein interaction network construction and hub gene identification
Protein interactions among integrated DEG were predicted with STRING and Cytoscape stools.A total of 53 nodes and 710 edges were involved in the PPI network (Figure 5).The top 10 genes with high degrees identified by the Cytoscape were marked with red (Figure 5).In addition, the most significant module, which has the highest score through MCODE analysis, was screened from the PPI network (Figure 6).There are 17 nodes in this module, which are mainly related to inflammatory response, immune response, TNF signaling pathway, and Legionellosis, etc. (Table 3).In addition, all the 17 genes in this module belong to high degree genes and include the top 10 genes with degree value.These 17 genes were identified as hub genes (Table 4).
In the infected group, 18 BP terms were enriched terms with P<0.01 and gene counts >5 as the cutoff points.The integrated DEG mainly involved in the immune process.Identification of key genes in bovine mammary epithelial cells challenged with Escherichia coli and... Chen et al. 9

Discussion
Despite many efforts dedicated to understanding the pathogenesis of mastitis, the infectious process is still poorly understood (Tartaglia et al., 2018), and the detrimental economic losses caused by cow mastitis remain high (Petzl et al., 2018).The detailed investigation of gene expression response of BME cells to pathogen infections can be of great importance for understanding the innate immune defense mechanisms of dairy cows, and for exploring the candidate genes as biomarkers of mastitis.
Numerous studies on gene expression analysis of cow mastitis have been carried out, which often focus on studying the gene expression response of single bacterial mastitis alone (Sharifi et al., 2018;Li et al., 2019), or the difference between S. aureus mastitis and E. coli mastitis (Gunther et al., 2017;Han, 2019;Islam et al., 2020).To the best of our knowledge, there is no research on gene expression after co-stimulation of mixed bacteria in BME cells.However, in reality, a large part of dairy cow mastitis may be caused by mixed bacteria.The latest research found that mastitis in dairy cows is a dysbiosis of the mammary gland microbiota (Steinberg et al., 2022).The results of studying the difference between the two types of bacterial mastitis may have a limited effect on the treatment of infections caused by mixed bacteria.To obtain unique genes and pathways associated with the two types of bacterial mastitis, for the first time, we used the RRA method to integrate the results of multiple gene expression datasets about mastitis.
Our results showed that the integrated DEG were mainly enriched to the GO terms of the immune response, inflammation, and TNF signaling pathway.This result is similar to the biological process caused by E. coli stimulation (Li et al., 2019).Moreover, these integrated DEG associated with S. aureus and E. coli mastitis were involved in signaling pathways which were also similar to those caused by E. coli mastitis (Li et al., 2019).In addition to this consistency, more importantly, we found that these integrated DEG were enriched in the RIG-I-like receptor pathway and ranked high, but in the study of E. coli mastitis, this signaling pathway did not appear in the top 20 (Li et al., 2019).RIG-I like receptors are a type of pattern recognition receptor that can induce the production of interferon and proinflammatory cytokines, and are critical for the development of anti-pathogen immunity (Kato et al., 2011).The result of this study indicates RIG-I-like receptor pathway may be closely associated with the response of BME cells to these two pathogens.
In addition to the pathway, this study also identified 17 hub genes associated with S. aureus and E. coli mastitis namely TNF, IL1B, CXCL8, IL6, GRO1, NFK-BIA, CCL2, CXCL2, CCL20, TLR2, IL1A, CCL5, CD40, CXCL6, NFKB2, NFKBIZ, and NOS2.Some of these genes have been identified as key genes for mastitis in other studies.For example, Sharifi et al. (2018) found CXCL8, CXCL2, and GRO1 were the top three genes related to E. coli mastitis, and the study of Han (2019) confirmed that CXCL2 and GRO1 were key genes in both the E. coli and S. aureus-mastitis.These similar results further verify the correctness of our integrated analysis.
Among these 17 hub genes, CD40, CXCL6, and NFKBIZ, through our analysis, were identified as key genes and had the potential to be biomarkers of E. coli and S. aureus mastitis.An important reason for choosing these three genes is that KEGG pathways analysis proved that other genes were involved in signaling pathways triggered by other diseases (Table 3).For example, NFKBIA, IL6, CXCL8, TNF, and TLR2 were involved in Chagas disease, legionellosis, and so on.Cattle can also get these diseases in some cases; for example, Legionella pneumophila can cause cattle natural pneumonia (Mondino et al., 2020).This result indicates that those genes may not be suitable as a molecular biomarker of mastitis because biomarker should be specific for disease and should remain unchanged by unrelated disorders (Duarte et al., 2015).Therefore, after removing genes that cross with other diseases, CD40, CXCL6, and NFKBIZ have the potential to be biomarkers of E. coli and S. aureus mastitis in this study.
CD40 belongs to the tumor necrosis factor receptor family and can promote CD8+ T cell-mediated immunity against tumor cells and some intracellular pathogens (Wallemacq et al., 2012).A recent study showed that S. aureus superantigens can stimulate epithelial cells through CD40 to produce chemokines (Schlievert et al., 2019), which play a crucial role in regulating a variety of inflammatory reactions.Wallemacq et al. (2012) successfully reduced the severity of S. aureus mastitis by blocking CD40 function (Wallemacq et al., 2012).Consistent with the above research, in this study, CD40 was identified as a key gene.But how CD40 plays an important role in E. coli mastitis is worthy of further study.
CXCL6 is a member of the C-X-C motif chemokine family and has chemotactic granulocytes and immune functions.There was evidence that indicated the CXCL6 level is increased during E. coli mastitis (Li et al., 2019) and S. aureus mastitis (Kiku et al., 2016).CXCL6 is secreted through a signaling cascade of ligand/receptor reactions and serves as important mediator of the immune reaction in the innate immune system response.Combined with the analysis results of this experiment, it is further confirmed that CXCL6 plays critical roles in the response to pathogen infection in BME cells.
IκBζ, encoded by the NFKBIZ gene, is a member of the nuclear IκB protein family and plays a role in inflammatory responses through interaction with NFκB proteins (Capoferri et al., 2021).Several previous studies revealed a critical role of IκBζ signaling in the regulation of immune responses in cow mastitis.For example, IκBζ is crucial for induced IL-6 expression but is also known to block TNF-synthesis in BME cells challenged with S. aureus (Gunther et al., 2011).Moreover, a study identified NFKBIZ as a highly connected gene with Streptococcus uberis mastitis and introduced this gene as a potential marker of mastitis resistance in dairy heifers (Bakhtiarizadeh et al., 2020).In this study, NFKBIZ was also identified as a hub gene with a continuous increase in expression in the BME cells to cope with E. coli and S. aureus infection.These results suggest NFKBIZ is critical for mastitis pathogenesis and has the potential to be a biomarker of E. coli and S. aureus mastitis.
Due to the significant expenses of mastitis, early detection is critical.Most existing detection technologies, however, do not match the high accuracy necessary for clinical diagnosis of mastitis (Jensen et al., 2016).As a result, it is critical to build a new diagnosis system that can be converted to quick, on-farm diagnostic methods.The development of novel biomarkers for mastitis diagnosis Identification of key genes in bovine mammary epithelial cells challenged with Escherichia coli and... Chen et al.

11
has been identified as a mastitis detection trend (Duarte et al., 2015).A biomarker is a characteristic that can be measured and evaluated as an indicator of pathological processes, or pharmacological responses to therapeutic interventions.Genes coding for proteins such as Haptoglobin and Lingual antimicrobial peptide have been identified as potential biomarkers for mastitis detection (Sharifi et al., 2018).The genes identified in this experiment have the potential to build test strips that comprise many biomarkers on a single test strip, thereby improving diagnostic efficiency.

Conclusions
Our research identified key candidate genes (CD40, CXCL6, and NFKBIZ) and pathway (RIG-I-like receptor pathway) that have critical roles in both the E. coli and S. aureus mastitis.This study lays the foundation for further mechanistic studies about this disease and will assist in the search for biomarkers for this disease diagnosis and disease resistance breeding.
Identification of key genes in bovine mammary epithelial cells challenged with Escherichia coli and... Chen et al.

Figure 1 -
Figure 1 -Volcano plot of the primitive differentially expressed genes between the two groups of samples in each dataset.

Figure 2 -
Figure 2 -Log FC heatmap of top 10 upregulated and downregulated genes of the integrated differentially expressed genes (DEG) in each expression microarray.

Figure 3 -
Figure 3 -Enriched Gene Ontology (GO) terms of the integrated differentially expressed genes (DEG).
genes in bovine mammary epithelial cells challenged with Escherichia coli and... Chen et al.
Encyclopedia of Genes and Genomes.The integrated DEG were mainly enriched in five pathways (counts>10), namely, TNF signaling pathway, Toll-like receptor signaling pathway, Rheumatoid arthritis, Influenza A, and Cytokine-cytokine receptor interaction.

Figure 4 -
Figure 4 -Top 20 KEGG pathways analyzed according to the P-value of the integrated differentially expressed genes (DEG).
-I-like receptor signaling pathway bta04621:NOD-like receptor signaling pathway bta04620:Toll-like receptor signaling pathway bta04380:Osteoclast differentiation bta04064:NF-kappa B signaling pathway bta04062:Chemokine signaling pathway bta04060:Cytokine-cytokine receptor interaction Top The top 10 genes with high degree identified by the Cytoscape are marked with red.A total of 53 nodes and 710 edges were involved in the PPI network.

Figure 5 -
Figure 5 -Protein-protein interaction (PPI) network of the integrated differentially expressed genes.

Figure 6 -
Figure 6 -Protein-protein interaction (PPI) network of the module (= 17 nodes) that has the highest score through MCODE analysis.

Table 1 -
Summary of the microarray datasets included in the analysis

Table 3 -
Gene ontology (GO; BP) and KEGG pathway analysis of genes in selected module KEGG -Kyoto Encyclopedia of Genes and Genomes.

Table 4 -
Hub genes in selected module with degree values and mean LogFC Identification of key genes in bovine mammary epithelial cells challenged with Escherichia coli and... Chen et al.