Identification of hub genes and potential molecular mechanisms in gastric cancer by integrated bioinformatics analysis

Similar documents
CircHIPK3 is upregulated and predicts a poor prognosis in epithelial ovarian cancer

Expression of mir-1294 is downregulated and predicts a poor prognosis in gastric cancer

RNA-Seq profiling of circular RNAs in human colorectal Cancer liver metastasis and the potential biomarkers

Downregulation of serum mir-17 and mir-106b levels in gastric cancer and benign gastric diseases

Prognostic significance of overexpressed long non-coding RNA TUG1 in patients with clear cell renal cell carcinoma

Expression of lncrna TCONS_ in hepatocellular carcinoma and its influence on prognosis and survival

Decreased expression of mir-490-3p in osteosarcoma and its clinical significance

Overexpression of long-noncoding RNA ZFAS1 decreases survival in human NSCLC patients

Circular RNA_LARP4 is lower expressed and serves as a potential biomarker of ovarian cancer prognosis

Low levels of serum mir-99a is a predictor of poor prognosis in breast cancer

Downregulation of long non-coding RNA LINC01133 is predictive of poor prognosis in colorectal cancer patients

Mir-595 is a significant indicator of poor patient prognosis in epithelial ovarian cancer

Expression of long non-coding RNA linc-itgb1 in breast cancer and its influence on prognosis and survival

Genome-wide association study of esophageal squamous cell carcinoma in Chinese subjects identifies susceptibility loci at PLCE1 and C20orf54

Plasma Bmil mrna as a potential prognostic biomarker for distant metastasis in colorectal cancer patients

Original Article Tissue expression level of lncrna UCA1 is a prognostic biomarker for colorectal cancer

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

Original Article CREPT expression correlates with esophageal squamous cell carcinoma histological grade and clinical outcome

mir-218 tissue expression level is associated with aggressive progression of gastric cancer

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

Review Article Long Noncoding RNA H19 in Digestive System Cancers: A Meta-Analysis of Its Association with Pathological Features

Expression of mir-146a-5p in patients with intracranial aneurysms and its association with prognosis

A meta-analysis of the effect of microrna-34a on the progression and prognosis of gastric cancer

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

CURRICULUM VITA OF Xiaowen Chen

Long non-coding RNA Loc is a potential prognostic biomarker in non-small cell lung cancer

Clinical significance of up-regulated lncrna NEAT1 in prognosis of ovarian cancer

Association between downexpression of mir-1301 and poor prognosis in patients with glioma

Down-regulation of long non-coding RNA MEG3 serves as an unfavorable risk factor for survival of patients with breast cancer

Clinicopathologic and prognostic relevance of mir-1256 in colorectal cancer: a preliminary clinical study

Original Article Reduced serum mir-138 is associated with poor prognosis of head and neck squamous cell carcinoma

95% CI: , P=0.037).

Long non-coding RNA ROR is a novel prognosis factor associated with non-small-cell lung cancer progression

SUPPLEMENTARY FIGURE LEGENDS

Supporting Information

Original Article Increased LincRNA ROR is association with poor prognosis for esophageal squamous cell carcinoma patients

Down-regulation of mir p is correlated with clinical progression and unfavorable prognosis in gastric cancer

Decreased mir-198 expression and its prognostic significance in human gastric cancer

Up-regulation of long non-coding RNA SNHG6 predicts poor prognosis in renal cell carcinoma

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies

Reduced mirna-218 expression in pancreatic cancer patients as a predictor of poor prognosis

Association of mir-21 with esophageal cancer prognosis: a meta-analysis

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

Supplementary. properties of. network types. randomly sampled. subsets (75%

Increased long noncoding RNA LINP1 expression and its prognostic significance in human breast cancer

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

Enterovirus 71 Outbreak in P. R. China, 2008

Long noncoding RNA LINC01510 is highly expressed in colorectal cancer and predicts favorable prognosis

Research Article Expression and Clinical Significance of the Novel Long Noncoding RNA ZNF674-AS1 in Human Hepatocellular Carcinoma

Long non-coding RNA CCHE1 overexpression predicts a poor prognosis for cervical cancer

Original Article Up-regulation of mir-10a and down-regulation of mir-148b serve as potential prognostic biomarkers for osteosarcoma

Bin Liu, Lei Yang, Binfang Huang, Mei Cheng, Hui Wang, Yinyan Li, Dongsheng Huang, Jian Zheng,

Long noncoding RNA linc-ubc1 promotes tumor invasion and metastasis by regulating EZH2 and repressing E-cadherin in esophageal squamous cell carcinoma

A General Workflow to Analyze Key Cancer-Related Genes Based on NCBI illustrated by the case of gastric cancer

Wei-Chung Cheng ( 鄭維中 )

Study on the expression of MMP-9 and NF-κB proteins in epithelial ovarian cancer tissue and their clinical value

Original Article MicroRNA-27a acts as a novel biomarker in the diagnosis of patients with laryngeal squamous cell carcinoma

Title:Identification of a novel microrna signature associated with intrahepatic cholangiocarcinoma (ICC) patient prognosis

Increasing expression of mir-5100 in non-small-cell lung cancer and correlation with prognosis

Gene-microRNA network module analysis for ovarian cancer

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles

Clinical significance of serum mir-196a in cervical intraepithelial neoplasia and cervical cancer

High AU content: a signature of upregulated mirna in cardiac diseases

SUPPLEMENTARY INFORMATION

Models of cell signalling uncover molecular mechanisms of high-risk neuroblastoma and predict outcome

The diagnostic value of determination of serum GOLPH3 associated with CA125, CA19.9 in patients with ovarian cancer

Cancer incidence and patient survival rates among the residents in the Pudong New Area of Shanghai between 2002 and 2006

Original Article Association of osteopontin polymorphism with cancer risk: a meta-analysis

Cancer Cell Research 19 (2018)

Long non-coding RNA FEZF1-AS1 is up-regulated and associated with poor prognosis in patients with cervical cancer

Original Article High expression of mir-20a predicated poor prognosis of hepatocellular carcinoma

Evaluation of altered expression of mir-9 and mir-106a as an early diagnostic approach in gastric cancer

HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA LEO TUNKLE *

Original Article Reduced expression of mir-506 in glioma is associated with advanced tumor progression and unfavorable prognosis

Analysis of circulating long non-coding RNA UCA1 as potential biomarkers for diagnosis and prognosis of osteosarcoma

Identification of novel gene expression signature in lung adenocarcinoma by using next-generation sequencing data and bioinformatics analysis

Analysis of paired mirna-mrna microarray expression data using a stepwise multiple linear regression model

Original Article Down-regulation of tissue microrna-126 was associated with poor prognosis in patients with cutaneous melanoma

Cancer Cell Research 17 (2018)

Supplementary Information

LncRNA GHET1 predicts a poor prognosis of the patients with non-small cell lung cancer

Reduced expression of mir-564 is associated with worse prognosis in patients with osteosarcoma

Correlation between expression and significance of δ-catenin, CD31, and VEGF of non-small cell lung cancer

microrna PCR System (Exiqon), following the manufacturer s instructions. In brief, 10ng of

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Research Article Plasma HULC as a Promising Novel Biomarker for the Detection of Hepatocellular Carcinoma

Integrative Omics for The Systems Biology of Complex Phenotypes

Clinical impact of serum mir-661 in diagnosis and prognosis of non-small cell lung cancer

The 8th National Gastric Cancer Academic Conference: focus on specification, translational research, and plan

R2: web-based genomics analysis and visualization platform

EMPEROR'S COLLEGE MTOM COURSE SYLLABUS HERB FORMULAE II

Long noncoding RNA expression signatures of bladder cancer revealed by microarray

Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities

Original Article Downregulation of microrna-26b functions as a potential prognostic marker for osteosarcoma

DIAGNOSTIC SIGNIFICANCE OF CHANGES IN SERUM HUMAN EPIDIDYMIS EPITHELIAL SECRETORY PROTEIN 4 AND CARBOHYDRATE ANTIGEN 125 IN ENDOMETRIAL CARCINOMA

A study on clinicopathological features and prognostic factors of patients with upper gastric cancer and middle and lower gastric cancer.

Identification of key pathways and genes in the progression of cervical cancer using bioinformatics analysis

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers

Objective research on tongue manifestation of patients with eczema

High Expression of Forkhead Box Protein C2 is Related to Poor Prognosis in Human Gliomas

Transcription:

Identification of hub genes and potential molecular mechanisms in gastric cancer by integrated bioinformatics analysis Ling Cao 1,2, *, Yan Chen 3, *, Miao Zhang 2, De-quan Xu 2, Yan Liu 4, Tonglin Liu 5, Shi-xin Liu 2 and Ping Wang 1 1 Department of Radiation Oncology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer; Key Laboratory of Cancer Prevention and Therapy, Tianjin; Tianjin s Clinical Research Center for Cancer, Tianjin, Tianjin, China 2 Department of Radiation Oncology, Cancer Hospital of Jilin Province, Changchun, Jilin, China 3 Department of Gastrointestinal Surgery, First Hospital of Jilin University, Changchun, Jilin, China 4 Medical Oncology Translational Research Lab, Cancer Hospital of Jilin Province, Changchun, Jilin, China 5 Information Centre, Cancer Hospital of Jilin Province, Changchun, Jilin, China * These authors contributed equally to this work. Submitted 24 February 2018 Accepted 18 June 2018 Published 2 July 2018 Corresponding authors Shi-xin Liu, liushixin1964@sina.com Ping Wang, wangping@tjmuch.com Academic editor Li Shen Additional Information and Declarations can be found on page 12 DOI 10.7717/peerj.5180 Copyright 2018 Cao et al. Distributed under Creative Commons CC-BY 4.0 ABSTRACT Objective: Gastric cancer (GC) is the fourth most common cause of cancer-related deaths in the world. In the current study, we aim to identify the hub genes and uncover the molecular mechanisms of GC. Methods: The expression profiles of the genes and the mirnas were extracted from the Gene Expression Omnibus database. The identification of the differentially expressed genes (DEGs), including mirnas, was performed by the GEO2R. Database for Annotation, Visualization and Integrated Discovery was used to perform GO and KEGG pathway enrichment analysis. The protein protein interaction (PPI) network and mirna-gene network were constructed using Cytoscape software. The hub genes were identified by the Molecular Complex Detection (MCODE) plugin, the CytoHubba plugin and mirna-gene network. Then, the identified genes were verified by Kaplan Meier plotter database and quantitative real-time PCR (qrt-pcr) in GC tissue samples. Results: A total of three mrna expression profiles (GSE13911, GSE79973 and GSE19826) were downloaded from the Gene Expression Omnibus (GEO) database, including 69, 20 and 27cases separately. A total of 120 overlapped upregulated genes and 246 downregulated genes were identified. The majority of the DEGs were enriched in extracellular matrix organization, collagen catabolic process, collagen fibril organization and cell adhesion. In addition, three KEGG pathways were significantly enriched, including ECM-receptor interaction, protein digestion and absorption, and the focal adhesion pathways. In the PPI network, five significant modules were detected, while the genes in the modules were mainly involved in the ECM-receptor interaction and focal adhesion pathways. By combining the results of MCODE, CytoHubba and mirna-gene network, a total of six hub genes including COL1A2, COL1A1, COL4A1, COL5A2, THBS2 and ITGA5 were chosen. The Kaplan Meier plotter database confirmed that higher expression levels of these genes How to cite this article Cao et al. (2018), Identification of hub genes and potential molecular mechanisms in gastric cancer by integrated bioinformatics analysis. PeerJ 6:e5180; DOI 10.7717/peerj.5180

were related to lower overall survival, except for COL5A2. Experimental validation showed that the rest of the five genes had the same expression trend as predicted. Conclusion: In conclusion, COL1A2, COL1A1, COL4A1, THBS2 and ITGA5 may be potential biomarkers and therapeutic targets for GC. Moreover, ECMreceptor interaction and focal adhesion pathways play significant roles in the progression of GC. Subjects Bioinformatics, Gastroenterology and Hepatology, Data Mining and Machine Learning Keywords Gastric cancer, Bioinformatic analysis, Differentially expressed genes, Protein protein interaction network, Bioinformatic mining, mirna-gene network, KEGG pathway INTRODUCTION As the fourth most common cancer in the world, gastric cancer (GC) is a worldwide health problem. An estimated 24,590 new GC cases and 10,720 GC related deaths occurred in the United States alone in 2015 (Siegel, Miller & Jemal, 2015). The mortality and incidence rates of GC is even higher in South America and Eastern Europe (Jemal et al., 2011). A large number of biomarkers involved in transcriptional, and post-transcriptional regulation via methylation and phosphorylation, have been implicated in cancer development (Blee, Gray & Brook, 2015). However, the findings related to GC are not consistent, and specific targets for the diagnosis and treatment of GC require confirmation (Fakhri & Lim, 2017). Gene Expression Omnibus (GEO) database provides the opportunity for the bioinformatics mining of gene expression profiles in various cancers (Jiang & Liu, 2015). After conducting an interactive analysis on the GEO database datasets, we extracted a set of differentially expressed genes (DEGs) and mirnas (DE mirnas), which are potentially involved in tumorigenesis and cancer progression. In addition, we also compared the gene expression profiles of GC tissues and adjacent normal tissues. By analyzing the GO (Thomas, 2017) and KEGG pathway enrichment (Xing et al., 2016), along with the construction of protein protein interaction (PPI) network (Khunlertgit & Yoon, 2016) and the mirna-gene network (De Cecco et al., 2017), we identified the key genes, and the underlying signaling pathways playing a significant role in GC development. MATERIALS AND METHODS Gene expression profile data A total of three mrna expression datasets (GSE13911, GSE79973 and GSE19826) and one mirna expression (GSE93415) dataset were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) (Clough & Barrett, 2016). All mrna profiles were based on the GPL570 platform (Affymetrix Human Genome U113 Plus 2.0 Array) (Agilent Technologies, Santa Clara, CA, USA). The GSE13911 dataset included 69 patients with GC, 31 non-cancerous tissues and 38 tumor tissues. The datasets of GSE79973 and GSE19826 included 10 non-cancerous tissues and 10 tumor tissues and 15 noncancerous tissues and 12 tumor tissues respectively. In addition, the GSE93415 is based Cao et al. (2018), PeerJ, DOI 10.7717/peerj.5180 2/14

on the GPL19071 platform, and included 20 tumor tissues and 20 corresponding healthy gastric mucosal tissues. DEG identification GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/) was used to screen DEGs and DE mirnas between GC tissues and corresponding healthy gastric tissues. As an R programming language-based dataset analysis tool, GEO2R is based on the analysis of variance or t-test. Therefore, over 90% GEO datasets can be accessed and analyzed with this approach. Furthermore, GEO2R consists of a large number of experimental data types and designs, and it applies an adjusted P-value (adj. P) to help correct false-positives. Fold-change (FC) in gene expression was calculated with a threshold criteria of log 2 FC 1andtheadj.P <0.05 was set for DEGs and DE mirnas selection. Funrich Software (Version 3.0, http://funrich.org/ index.html) was used to analyze the overlapping DEGs in the three datasets. Functional network establishment of DEG candidates To determine the functions of the overlapping DEGs, an enrichment analysis was performed on KEGG and GO pathways using the Database for Annotation, Visualization and Integrated Discovery (DAVID) (Version 6.7, https://david.ncifcrf.gov/). DAVID is a reliable program for demonstrating and integrating biological functional annotations of proteins or genes (Dennis et al., 2003). In addition, the cutoff value for pathway screening and significant functionality was set to P < 0.01. PPI network construction and app analysis The Search Tool for the Retrieval of Interacting Genes database (Version 10.0, http:// string-db.org) was used to predict potential interactions between gene candidates at the protein level. A combined score of >0.4 (medium confidence score) was considered significant. Additionally, Cytoscape software (Version 3.4.0, http://www.cytoscape.org/) was utilized for constructing the PPI network. Degree 20 was set as the cutoff criterion. The Molecular Complex Detection (MCODE) app was used to analyze PPI network modules (Bandettini et al., 2012), and MCODE scores >3 and the number of nodes >5 were set as cutoff criteria with the default parameters (Degree cutoff 2, Node score cutoff 2, K-core 2 and Max depth = 100). DAVID was utilized to perform pathway enrichment analysis of gene modules. Finally, CytoHubba, a Cytoscape plugin, was utilized to explore PPI network hub genes; it provides a user-friendly interface to explore important nodes in biological networks and computes using eleven methods, of which MCC has a better performance in the PPI network (Chin et al., 2014). MiRNA-gene network construction and prognosis analysis The DE mirnas target genes were predicted using three established programs: TargetScan (Lewis, Burge & Bartel, 2005), mirtarbase (Chou et al., 2016) and mirdb (Wong & Wang, 2015), most popular databases of experimentally validated mirna interactions. The genes that were predicted by all three programs were chosen as the targets of DE mirnas and the mirna-gene network was constructed. To identify the hub genes, we combined the results of MCODE, CytoHubba and mirna-gene networks. Cao et al. (2018), PeerJ, DOI 10.7717/peerj.5180 3/14

Table 1 Primer sequences of PCR. cdna Forward primer (5 3 ) Reverse primer (5 3 ) COL1A2 GCCCTCAAGGTTTCCAAG CCTTCAATCCATCCAGACC COL1A1 GACGAGACCAAGAACTGCC CACGAGGACCAGAGGGA COL4A1 AGGATTTCCTGGTACATCTCTG GACATTCCACAATTCCATTTG THBS2 GCATCAAGGATAACTGCCCCCATCT TTCATTGAAGACATCGTCCCCATCA ITGA5 TAATACCAGCCAGCCAGGAGTG TGTCAAATTCAATGGGGGTGC Note: Primer sequences for five hub genes. The prognostic significance of the identified hub genes was analyzed using Kaplan Meier plotter (http://kmplot.com), an online database which includes both clinical and expression data (Hou et al., 2017). Patients and tissue specimens We analyzed samples from 20 GC patients who underwent tumor resection at the Department of Abdominal Surgery. Our study was approved by the Ethical Committee and Institutional Review Board of Jilin Cancer Hospital, Jilin, China (Approval number: 201711-045-01). The patients had not received any preoperative radiation or chemotherapy. GC and corresponding normal mucosa (at least 5 cm distant from the tumor edge) were immediately frozen in liquid nitrogen and stored at -80 C until further analysis. Every specimen was anonymously handled based on ethical standards. All participants provided written informed consent and our study had full approval from the hospital s Ethical Review Committee. RNA extraction and quantitative real-time PCR Trizol reagent (Invitrogen, Carlsbad, CA, USA) was used to extract total RNA from the tissue samples according to the manufacturer s protocol. Superscript II reverse transcriptase and random primers were used to synthesize cdna (Toyobo, Osaka, Japan). Quantitative real-time PCR (qrt-pcr) was conducted on the ABI 7900HT Sequence Detection System with SYBR-Green dye (Applied Biosystems, Foster City, CA, USA; Toyobo, Osaka, Japan). The reaction parameters included a denaturation program (5 min at 95 C), followed by an amplification and quantification program over 40 cycles (15 s at 95 C and 45 s at 65 C). Each sample was tested in triplicates, and each sample underwent a melting curve analysis to check for the specificity of amplification. Table 1 illustrates the primer sequences of hub genes. The expression level was determined as a ratio between the hub genes and the internal control GAPDH in the same mrna sample, and calculated by the comparative CT method. Levels of COL1A2, COL1A1, COL4A1, THBS2 and ITGA5 expression were calculated by the 2 -Ct method. RESULTS Identification of overlapping DEGs We identified 1,191, 502, 640 upregulated DEGs, and 2,887, 930, 1,162 downregulated DEGs in the GSE13911, GSE19826 and GSE79973 datasets, respectively. As shown in Cao et al. (2018), PeerJ, DOI 10.7717/peerj.5180 4/14

Figure 1 Identification of overlapping DEGs. Identification of DEGs (A) Venn diagram of 120 overlapping upregulated genes in GSE13911, GSE19826 and GSE79973; (B) Venn diagram of the 246 overlapping downregulated genes in the same datasets. Full-size DOI: 10.7717/peerj.5180/fig-1 Fig. 1, 120 upregulated genes and 246 downregulated genes overlapped across the three datasets. Functional enrichment analysis of overlapped DEGs GO analysis revealed 366 overlapping genes that are involved in a number of biological processes (BP), including extracellular matrix organization, collagen metabolism and fibril organization, and cell adhesion (Fig. 2A). In terms of cellular components, DEGS were mostly enriched in the extracellular space, extracellular region and the extracellular matrix (Fig. 2B). The overlapping DEGs were mainly associated with the structural organization of the extracellular matrix, platelet-derived growth factor binding, integrin binding and extracellular matrix binding in terms of molecular functions (Fig. 2C). In addition, the DEGs were enriched in three KEGG pathways, which included ECMreceptor interaction, protein digestion and absorption, focal adhesion and the PI3K-Akt signaling pathway (Fig. 2D). PPI network construction and analysis of modules The overlapping DEGs indicated a distinct set of interactions and networks. The PPIs with combined scores greater than 0.4 were selected for constructing PPI networks. The entire PPI network was analyzed using MCODE, following which, five modules were chosen (Fig. 3). In addition, the KEGG pathway enrichment analysis of the module genes showed enrichment in ECM-receptor interaction and focal adhesion (Fig. 4). The first 25 genes in the MCC method were chosen by CytoHubba plugin and sequentially ordered as follows: COL1A2, COL1A1, COL3A1, COL5A1, COL4A1, COL2A1, COL4A2, COL5A2, COL6A1, SERPINH1, COL6A3, COL11A1, COL12A1, COL10A1, COL8A1, FN1, SPARC, THBS1, FBN1, THBS2, ITGA5, ADAMTS2, TIMP1, BGN, and BMP1 (Fig. 5). MiRNA-gene network A total of 32 DE mirnas were identified by analyzing the mirna expression dataset (GSE93415). Furthermore, the genes predicted by all three programs (TargetScan, mirtarbase, and mirdb) were identified as DE mirnas target genes. In addition, to Cao et al. (2018), PeerJ, DOI 10.7717/peerj.5180 5/14

Figure 2 GO analysis of the overlapped DEGs. Black bars represent the number of DEGs. Here only show the top 10: (A) Biological processes (BP); (B) Cellular components (CC); (C) Molecular function (MF); (D) KEGG pathway analysis of DEGs. Full-size DOI: 10.7717/peerj.5180/fig-2 Figure 3 PPI network construction and module analysis. PPI network construction and module analysis. The light blue nodes in the middle represent the DEGs. The nodes of the different colors around represent the genes involved in modules and the lines represent the interaction between two nodes. Full-size DOI: 10.7717/peerj.5180/fig-3 Cao et al. (2018), PeerJ, DOI 10.7717/peerj.5180 6/14

Figure 4 KEGG pathway of genes in five modules. KEGG pathway analysis of genes in five modules. Bars represent the number of DEGs. Full-size DOI: 10.7717/peerj.5180/fig-4 identify reliable hub genes, the target genes were compared to the DEGs and only overlapping genes were selected as the hub genes. After combining the results of MCODE, CytoHubba and mirna-gene network, six hub genes were chosen and all of them were upregulated DEGs, including COL1A2, COL1A1, COL4A1, COL5A2, THBS2 and ITGA5 (Fig. 6). Kaplan Meier survival analysis The overall survival (OS) was analyzed for 876 patients with GC using the Kaplan Meier survival plot. Briefly, the six genes (COL1A2, COL1A1, COL4A1, COL5A2, THBS2 and ITGA5) were uploaded to the database and Kaplan Meier curves were plotted. High expression of COL1A1 (P = 8.2e-05; adjust. P = 4.92e-04), COL1A2 (P = 1.4e-07; adjust. P = 8.40e-0.7), COL4A1 (P = 6.4e-07; adjust. P = 384e-0.6), ITGA5 (P < 1E-16; adjust. P < 9.60e-16), and THBS2 (P = 1.4e-07; adjust. P = 8.40e-0.7) were correlated with significantly worse OS in GC patients, while COL5A2 expression was not relevant to survival (P = 0.19; adjust. P =1)(Fig. 7). The hub genes were verified within GC tissues To further verify the results of bioinformatics analysis, the mrna levels of the five hub genes (COL1A2, COL1A1, COL4A1, ITGA5, and THBS2) were determined in 20 paired tumor and adjacent healthy gastric tissues with qrt-pcr. As illustrated in Fig. 8, each of the five identified genes was significantly upregulated in tumor tissue (P < 0.001), as predicted by the bioinformatics analysis. DISCUSSION We identified 366 overlapping DEGs 120 upregulated and 246 downregulated in the three GC expression profile datasets. Furthermore, results from the GO analysis Cao et al. (2018), PeerJ, DOI 10.7717/peerj.5180 7/14

Figure 5 The first 25 genes network. The first 25 genes of the MMC method were chosen using CytoHubba plugin. The more forward ranking is represented by a redder color. Full-size DOI: 10.7717/peerj.5180/fig-5 indicated that the overlapping genes were mostly involved in extracellular matrix organization, collagen catabolism and fibril organization, and cell adhesion at the level of BP. In addition, KEGG pathway enrichment analysis showed that the overlapping DEGs were enriched significantly within ECM-receptor interaction, protein digestion and absorption, focal adhesion pathway and PI3K-Akt signaling pathway. ECM-receptor interaction and focal adhesion were identified as the major pathways of the important modules of overlapping DEGs. These enriched pathways provide insights into the molecular mechanism of GC initiation and progression, and can therefore be useful for the development of new therapeutic strategies. In recent years, bioinformatics analysis have been increasingly used for finding new therapeutic targets and diagnosis markers for various cancers (Jiang & Liu, 2015). Tan, Jin & Wang (2017) analyzed the potential biomarkers for prostate cancer by integrated bioinformatics analysis and identified BZRAP1-AS1 as a novel biomarker. In addition, another study (Yin, Chang & Xu, 2017) showed the vital role of the G2/M checkpoint Cao et al. (2018), PeerJ, DOI 10.7717/peerj.5180 8/14

Figure 6 mirna-gene network. Regulation of six hub genes in mirna-gene network. Circular nodes stand for DEGs, red circular nodes stand for the six gene biomarkers and purple rhombus nodes represent DE mirnas. The lines represent the regulation of relationship between two nodes. Full-size DOI: 10.7717/peerj.5180/fig-6 in early hepatocellular carcinoma indicating the potential prognostic and diagnostic importance of the genes involved in the checkpoint. Similar studies have been conducted for GC as well. Sun et al. (2017) have found the core genes involved in GC by bioinformatics analysis. However, when compared to our study, their study only analyzed a profile, and only used the module method to select the gene with a high degree of connectivity. In addition, their selected genes were validated only via the Kaplan Meier plotter database. Our study integrated three profile datasets, and then combined the results of MCODE, CytoHubba and mirna-gene network for the identification of the hub genes. Furthermore, we verified our results by Kaplan Meier plotter database and RT-PCR, thus increasing the reliability of our results. We predicted five hub genes including COL1A2, COL1A1, COL4A1, THBS2 and ITGA5. Previous studies have reported some of these genes. For example, Rong et al. Cao et al. (2018), PeerJ, DOI 10.7717/peerj.5180 9/14

Figure 7 Prognostic curve of six hub genes. The prognostic significance of the hub genes in patients with GC, according to the Kaplan Meier plotter database. (A) COL1A2; (B) COL1A1; (C) COL4A1; (D) COL5A2: (E) THBS2; (F) ITGA5. The red lines represent patients with high gene expression, and black lines represent patients with a low gene expression. Full-size DOI: 10.7717/peerj.5180/fig-7 (2018) have reported high expression of COL1A2 in GC tissues, which was significantly correlated to the histological type and the lymph node status. Li, Ding & Li (2016) hypothesized a diagnostic use of COL1A1 to screen for early GC. They also considered COL1A1 and COL1A2 as predictors of poor clinical outcomes in GC patients. In contrast, Sun et al. (2014) showed that THBS2 expression was significantly lower in GC tissues compared to normal tissues, and that patients with higher levels of THBS2 had better prognosis. However, Zhuo et al. (2016) found higher expression of THBS2 and COL1A2 in tumor tissues and better prognosis in patients with lower THBS2 expression. Furthermore, COL4A1 and ITGA5 are not specific to GC, and studying these hub genes can therefore increase the understanding of other cancers. To verify the results of bioinformatics analysis (Chavan, Shaughnessy & Edmondson, 2011), we used Kaplan Meier plotter database to predict the prognostic value of these hub genes, and analyzed their expression levels in 20 pairs of GC and adjacent normal tissue samples by qrt-pcr. All hub genes, except COL5A2, were significantly correlated Cao et al. (2018), PeerJ, DOI 10.7717/peerj.5180 10/14

Figure 8 PCR results of five hub genes. Quantitative real-time PCR results for the five gene biomarkers. (A) COL1A2; (B) COL1A1; (C) COL4A1; (D) THBS2; (E) ITGA5. Expression of these DEGs was normalized against GAPDH expression. The statistical significance of differences was calculated by the Student s t-test. P < 0.001. Full-size DOI: 10.7717/peerj.5180/fig-8 with worse OS for GC patients. In addition, they showed the same trend in expression as predicted by bioinformatics, thereby verifying the accuracy of our method. CONCLUSION A total of five genes including COL1A2, COL1A1, COL4A1, THBS2 and ITGA5, were identified as GC biomarkers, and ECM-receptor interaction and focal adhesion were revealed to be important mechanisms of GC. However, our study has certain limitations such as low number of primary tissues samples, and the use of only the Kaplan Meier plotter database to predict the prognostic value of the hub genes. The prognostic utility of these markers have to explored in GC patients and further studies need to be conducted to dissect the underlying mechanisms and related pathways of these genes. Since cancer is a result of multiple and highly complex molecular mechanisms, a single pathway is insufficient to explain cancer pathogenesis (Zeng et al., 2013). However, our findings provide novel insights into the occurrence and progression of GC. In addition, mrna expression profiling and the interactive network of mirnas and mrna are highly complex, and our bioinformatics methods are relatively new (Xu et al., 2017). Therefore, more experimental studies are necessary to confirm the present findings. Cao et al. (2018), PeerJ, DOI 10.7717/peerj.5180 11/14

ADDITIONAL INFORMATION AND DECLARATIONS Funding The authors received no funding for this work. Competing Interests The authors declare that they have no competing interests. Author Contributions Ling Cao conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft, financial support. Yan Chen conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft, financial support. Miao Zhang conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft, financial support. De-quan Xu conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft, financial support. Yan Liu conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/ or tables, authored or reviewed drafts of the paper, approved the final draft, financial support. Tonglin Liu conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft, financial support. Shi-xin Liu conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft, financial support. Ping Wang conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft, financial support. Human Ethics The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers): This study was approved by the Ethical Committee and Institutional Review Board of Jilin Cancer Hospital, Jilin, China: 201711-045-01. Cao et al. (2018), PeerJ, DOI 10.7717/peerj.5180 12/14

Data Availability The following information was supplied regarding data availability: The raw data are provided in the Supplemental Files. Supplemental Information Supplemental information for this article can be found online at http://dx.doi.org/ 10.7717/peerj.5180#supplemental-information. REFERENCES Bandettini WP, Kellman P, Mancini C, Booker OJ, Vasu S, Leung SW, Wilson JR, Shanbhag SM, Chen MY, Arai AE. 2012. MultiContrast Delayed Enhancement (MCODE) improves detection of subendocardial myocardial infarction by late gadolinium enhancement cardiovascular magnetic resonance: a clinical validation study. Journal of Cardiovascular Magnetic Resonance 14(1):83 DOI 10.1186/1532-429x-14-83. Blee TK, Gray NK, Brook M. 2015. Modulation of the cytoplasmic functions of mammalian posttranscriptional regulatory proteins by methylation and acetylation: a key layer of regulation waiting to be uncovered? Biochemical Society Transactions 43(6):1285 1295 DOI 10.1042/bst20150172. Chavan SS, Shaughnessy JD Jr, Edmondson RD. 2011. Overview of biological database mapping services for interoperation between different omics datasets. Human Genomics 5(6):703 708 DOI 10.1186/1479-7364-5-6-703. Chin CH, Chen SH, Wu HH, Ho CW, Ko MT, Lin CY. 2014. cytohubba: identifying hub objects and sub-networks from complex interactome. BMC Systems Biology 8(Suppl 4):S11 DOI 10.1186/1752-0509-8-s4-s11. Chou CH, Chang NW, Shrestha S, Hsu SD, Lin YL, Lee WH, Yang CD, Hong HC, Wei TY, Tu SJ, Tsai TR, Ho SY, Jian TY, Wu HY, Chen PR, Lin NC, Huang HT, Yang TL, Pai CY, Tai CS, Chen WL, Huang CY, Liu CC, Weng SL, Liao KW, Hsu WL, Huang HD. 2016. mirtarbase 2016: updates to the experimentally validated mirna-target interactions database. Nucleic Acids Research 44(D1):D239 D247 DOI 10.1093/nar/gkv1258. Clough E, Barrett T. 2016. The Gene Expression Omnibus database. Methods in Molecular Biology 1418:93 110 DOI 10.1007/978-1-4939-3578-9_5. De Cecco L, Giannoccaro M, Marchesi E, Bossi P, Favales F, Locati LD, Licitra L, Pilotti S, Canevari S. 2017. Integrative mirna-gene expression analysis enables refinement of associated biology and prediction of response to cetuximab in head and neck squamous cell cancer. Genes 8(1):35 DOI 10.3390/genes8010035. Dennis G Jr, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA. 2003. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biology 4:R60 DOI 10.1186/gb-2003-4-9-r60. Fakhri B, Lim KH. 2017. Molecular landscape and sub-classification of gastrointestinal cancers: a review of literature. Journal of Gastrointestinal Oncology 8(3):379 386 DOI 10.21037/jgo.2016.11.01. Hou GX, Liu P, Yang J, Wen S. 2017. Mining expression and prognosis of topoisomerase isoforms in non-small-cell lung cancer by using Oncomine and Kaplan Meier plotter. PLOS ONE 12(3): e0174515 DOI 10.1371/journal.pone.0174515. Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D. 2011. Global cancer statistics. CA: A Cancer Journal for Clinicians 61(2):69 90 DOI 10.3322/caac.20107. Cao et al. (2018), PeerJ, DOI 10.7717/peerj.5180 13/14

Jiang P, Liu XS. 2015. Big data mining yields novel insights on cancer. Nature Genetics 47(2):103 104 DOI 10.1038/ng.3205. Khunlertgit N, Yoon BJ. 2016. Incorporating topological information for predicting robust cancer subnetwork markers in human protein protein interaction network. BMC Bioinformatics 17(S13):351 DOI 10.1186/s12859-016-1224-1. Lewis BP, Burge CB, Bartel DP. 2005. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microrna targets. Cell 120(1):15 20 DOI 10.1016/j.cell.2004.12.035. Li J, Ding Y, Li A. 2016. Identification of COL1A1 and COL1A2 as candidate prognostic factors in gastric cancer. World Journal of Surgical Oncology 14(1):297 DOI 10.1186/s12957-016-1056-5. Rong L, Huang W, Tian S, Chi X, Zhao P, Liu F. 2018. COL1A2 is a novel biomarker to improve clinical prediction in human gastric cancer: integrating bioinformatics and meta-analysis. Pathology & Oncology Research 24(1):129 134 DOI 10.1007/s12253-017-0223-5. Siegel RL, Miller KD, Jemal A. 2015. Cancer statistics, 2015. CA: A Cancer Journal for Clinicians 65(1):5 29 DOI 10.3322/caac.21254. Sun C, Yuan Q, Wu D, Meng X, Wang B. 2017. Identification of core genes and outcome in gastric cancer using bioinformatics analysis. Oncotarget 8(41):70271 70280 DOI 10.18632/oncotarget.20082. Sun R, Wu J, Chen Y, Lu M, Zhang S, Lu D, Li Y. 2014. Down regulation of Thrombospondin2 predicts poor prognosis in patients with gastric cancer. Molecular Cancer 13(1):225 DOI 10.1186/1476-4598-13-225. Tan J, Jin X, Wang K. 2017. Integrated bioinformatics analysis of potential biomarkers for prostate cancer. Epub ahead of print 19 December 2017. Pathology & Oncology Research DOI 10.1007/s12253-017-0346-8. Thomas PD. 2017. The gene ontology and the meaning of biological function. Methods in Molecular Biology 1446:15 24 DOI 10.1007/978-1-4939-3743-1_2. Wong N, Wang X. 2015. mirdb: an online resource for microrna target prediction and functional annotations. Nucleic Acids Research 43(D1):D146 D152 DOI 10.1093/nar/gku1104. Xing Z, Chu C, Chen L, Kong X. 2016. The use of Gene Ontology terms and KEGG pathways for analysis and prediction of oncogenes. Biochimica et Biophysica Acta 1860(11):2725 2734 DOI 10.1016/j.bbagen.2016.01.012. Xu Z, Zhou Y, Shi F, Cao Y, Dinh TLA, Wan J, Zhao M. 2017. Investigation of differentiallyexpressed micrornas and genes in cervical cancer using an integrated bioinformatics analysis. Oncology Letters 13(4):2784 2790 DOI 10.3892/ol.2017.5766. Yin L, Chang C, Xu C. 2017. G2/M checkpoint plays a vital role at the early stage of HCC by analysis of key pathways and genes. Oncotarget 8(44):76305 76317 DOI 10.18632/oncotarget.19351. Zeng T, Sun SY, Wang Y, Zhu H, Chen L. 2013. Network biomarkers reveal dysfunctional gene regulations during disease progression. FEBS Journal 280(22):5682 5695 DOI 10.1111/febs.12536. Zhuo C, Li X, Zhuang H, Tian S, Cui H, Jiang R, Liu C, Tao R, Lin X. 2016. Elevated THBS2, COL1A2, and SPP1 expression levels as predictors of gastric cancer prognosis. Cellular Physiology and Biochemistry 40(6):1316 1324 DOI 10.1159/000453184. Cao et al. (2018), PeerJ, DOI 10.7717/peerj.5180 14/14