Integrated Cox s model for predicting survival time of glioblastoma multiforme

Similar documents
The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies

National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2009 Formula Grant

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles

Association between downexpression of mir-1301 and poor prognosis in patients with glioma

Introduction to Gene Sets Analysis

PROCARBAZINE, lomustine, and vincristine (PCV) is

Advances in Brain Tumor Research: Leveraging BIG data for BIG discoveries

Original Article Reduced expression of mir-506 in glioma is associated with advanced tumor progression and unfavorable prognosis

A Population-Based Study on the Uptake and Utilization of Stereotactic Radiosurgery (SRS) for Brain Metastasis in Nova Scotia

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

Only Estrogen receptor positive is not enough to predict the prognosis of breast cancer

We have previously reported good clinical results

Supplementary. properties of. network types. randomly sampled. subsets (75%

microrna PCR System (Exiqon), following the manufacturer s instructions. In brief, 10ng of

High Expression of Forkhead Box Protein C2 is Related to Poor Prognosis in Human Gliomas

High expression of fibroblast activation protein is an adverse prognosticator in gastric cancer.

Supplementary Material

Gene expression profiling predicts clinical outcome of prostate cancer. Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M.

Association of mir-21 with esophageal cancer prognosis: a meta-analysis

Comparison of Triple Negative Breast Cancer between Asian and Western Data Sets

Predicting Kidney Cancer Survival from Genomic Data

Downregulation of serum mir-17 and mir-106b levels in gastric cancer and benign gastric diseases

Temporal Trends in Demographics and Overall Survival of Non Small-Cell Lung Cancer Patients at Moffitt Cancer Center From 1986 to 2008

Proteomic Biomarker Discovery in Breast Cancer

4. Model evaluation & selection

SUPPLEMENTARY INFORMATION

microrna Presented for: Presented by: Date:

Nature Methods: doi: /nmeth.3115

A new score predicting the survival of patients with spinal cord compression from myeloma

Laboratory data from the 1970s first showed that malignant melanoma

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

Survival of High Grade Glioma Patients Treated by Three Radiation Schedules with Chemotherapy: A Retrospective Comparative Study

Screening for novel oncology biomarker panels using both DNA and protein microarrays. John Anson, PhD VP Biomarker Discovery

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

Original Article CREPT expression correlates with esophageal squamous cell carcinoma histological grade and clinical outcome

Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:

Androgen Receptor Expression in Renal Cell Carcinoma: A New Actionable Target?

The effect of delayed adjuvant chemotherapy on relapse of triplenegative

Genetic variability of genes involved in DNA repair influence treatment outcome in osteosarcoma

THE EFFECTIVE OF BRAIN CANCER AND XAY BETWEEN THEORY AND IMPLEMENTATION. Mustafa Rashid Issa

Influence of ERCC2 gene polymorphisms on the treatment outcome of osteosarcoma

3. Model evaluation & selection

Pediatric Brain Tumors: Updates in Treatment and Care

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers

Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer

Patient characteristics of training and validation set. Patient selection and inclusion overview can be found in Supp Data 9. Training set (103)

Computational Investigation of Homologous Recombination DNA Repair Deficiency in Sporadic Breast Cancer

Mir-595 is a significant indicator of poor patient prognosis in epithelial ovarian cancer

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

About OMICS Group Conferences

Surgical resection improves survival in pancreatic cancer patients without vascular invasion- a population based study

MicroRNA expression profiling and functional analysis in prostate cancer. Marco Folini s.c. Ricerca Traslazionale DOSL

Revisit of Primary Malignant Neoplasms of the Trachea: Clinical Characteristics and Survival Analysis

Breast cancer in elderly patients (70 years and older): The University of Tennessee Medical Center at Knoxville 10 year experience

Conditional survival after a diagnosis of malignant brain tumour in Canada:

1. Study Title. Exercise and Late Mortality in 5-Year Survivors of Childhood Cancer: a Report from the Childhood Cancer Survivor Study.

Research Article Breast Cancer Prognosis Risk Estimation Using Integrated Gene Expression and Clinical Data

Expanded View Figures

micrornas (mirna) and Biomarkers

Selection and Combination of Markers for Prediction

Evaluation of AJCC, UICC, and Brigham and Women's Hospital Tumor Staging for Cutaneous Squamous Cell Carcinoma

Clinicopathological Factors Affecting Distant Metastasis Following Loco-Regional Recurrence of breast cancer. Cheol Min Kang 2018/04/05

Chapter 17 Sensitivity Analysis and Model Validation

Hypofractionated radiation therapy for glioblastoma

Integrated Analysis of Copy Number and Gene Expression

Personalized Therapy for Prostate Cancer due to Genetic Testings

Outcome and Prognostic Features in Pediatric Gliomas

mir-218 tissue expression level is associated with aggressive progression of gastric cancer

Hyponatremia in small cell lung cancer is associated with a poorer prognosis

Clinicopathologic Characteristics and Prognosis of Gastric Cancer in Young Patients

Circulating microrna-137 is a potential biomarker for human glioblastoma

Oncotype DX testing in node-positive disease

Original Article Up-regulation of mir-10a and down-regulation of mir-148b serve as potential prognostic biomarkers for osteosarcoma

Early postoperative tumor progression predicts clinical outcome in glioblastoma implication for clinical trials

MicroRNA-21 expression is associated with overall survival in patients with glioma

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

SUPPLEMENTARY INFORMATION

MammaPrint, the story of the 70-gene profile

Li-Xuan Qin 1* and Douglas A. Levine 2

CircHIPK3 is upregulated and predicts a poor prognosis in epithelial ovarian cancer

Integration of high-throughput biological data

Cancer Cell Research 19 (2018)

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

S1 Appendix: Figs A G and Table A. b Normal Generalized Fraction 0.075

Supplementary Online Content

Title:Identification of a novel microrna signature associated with intrahepatic cholangiocarcinoma (ICC) patient prognosis

The Prognostic Impact of Neutrophil Lymphocytic Ratio (NLR) on Survival of Patients with Glioblastoma Multiforme (GBM): A Retrospective Cohort Study

Journal: Nature Methods

Long non-coding RNA TUSC7 expression is independently predictive of outcome in glioma

Low grade glioma: a journey towards a cure

A systemic review and meta-analysis for prognostic values of pretreatment lymphocyte-to-monocyte ratio on gastric cancer

Comparing Multifunctionality and Association Information when Classifying Oncogenes and Tumor Suppressor Genes

Association of microrna-7 and its binding partner CDR1-AS with the prognosis and prediction of 1 st -line tamoxifen therapy in breast cancer

See the corresponding editorial in this issue, pp 1 2. J Neurosurg 115:3 8, An extent of resection threshold for newly diagnosed glioblastomas

Inferring condition-specific mirna activity from matched mirna and mrna expression data

Cancer Cell Research 14 (2017)

5-hydroxymethylcytosine loss is associated with poor prognosis for

An Example of Business Analytics in Healthcare

Arecent randomized controlled trial (RCT) established

Transcription:

694574TUB0010.1177/1010428317694574Tumor BiologyAi et al. research-article2017 Original Article Integrated Cox s model for predicting survival time of glioblastoma multiforme Tumor Biology April 2017: 1 8 The Author(s) 2017 Reprints and permissions: sagepub.co.uk/journalspermissions.nav https://doi.org/10.1177/1010428317694574 DOI: journals.sagepub.com/home/tub Zhibing Ai 1*, Longti Li 2*, Rui Fu 3, Jing-Min Lu 4, Jing-Dong He 5 and Sen Li 6 Abstract Glioblastoma multiforme is the most common primary brain tumor and is highly lethal. This study aims to figure out signatures for predicting the survival time of patients with glioblastoma multiforme. Clinical information, messenger RNA expression, microrna expression, and single-nucleotide polymorphism array data of patients with glioblastoma multiforme were retrieved from The Cancer Genome Atlas. Patients were separated into two groups by using 1 year as a cutoff, and a logistic regression model was used to figure out any variables that can predict whether the patient was able to live longer than 1 year. Furthermore, Cox s model was used to find out features that were correlated with the survival time. Finally, a Cox model integrated the significant clinical variables, messenger RNA expression, microrna expression, and singlenucleotide polymorphism was built. Although the classification method failed, signatures of clinical features, messenger RNA expression levels, and microrna expression levels were figured out by using Cox s model. However, no singlenucleotide polymorphisms related to prognosis were found. The selected clinical features were age at initial diagnosis, Karnofsky score, and race, all of which had been suggested to correlate with survival time. Both of the two significant micrornas, microrna-221 and microrna-222, were targeted to p27 Kip1 protein, which implied the important role of p27 Kip1 on the prognosis of glioblastoma multiforme patients. Our results suggested that survival modeling was more suitable than classification to figure out prognostic biomarkers for patients with glioblastoma multiforme. An integrated model containing clinical features, messenger RNA levels, and microrna expression levels was built, which has the potential to be used in clinics and thus to improve the survival status of glioblastoma multiforme patients. Keywords Glioblastoma multiforme, survival analysis, Cox s model, logistic regression, messenger RNA expression, microrna expression, single-nucleotide polymorphism Date received: 24 June 2016; accepted: 23 December 2016 Introduction Glioblastoma multiforme (GBM) is the most frequent and the most aggressive primary brain tumor. GBM is classified as a Grade IV astrocytoma, which is the most serious scale. 1 It develops primarily in the cerebral hemispheres but can also develop in other parts of the brain, brainstem, or spinal cord. The current standard of care for GBM patients, 1 Department of Neurology, Taihe Hospital, Hubei University of Medicine, Shiyan, P.R. China 2 Department of Development and Planning, Taihe Hospital, Hubei University of Medicine, Shiyan, P.R. China 3 Department of Neurosurgery, Taihe Hospital, Hubei University of Medicine, Shiyan, P.R. China 4 Department of Neurology, The Affiliated Huai an Hospital of Xuzhou Medical University and The Second People s Hospital of Huai an, Huai an, P.R. China 5 Department of Clinical Oncology, Huai an First People s Hospital, Nanjing Medical University, Huai an, P.R. China 6 Department of Spinal Surgery, Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Luzhou, China *These authors contributed equally to this work. Corresponding authors: Jing-Ming Lu, Department of Neurology, The Affiliated Huai an Hospital of Xuzhou Medical University and The Second People s Hospital of Huai an, Huai an 223002, P.R. China. Email: lujmhy6@sina.com Sen Li, Department of Spinal Surgery, Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, 182 Chunhui Road, Longmatan District, Luzhou 646000, China. Email: Senlimd@126.comm Creative Commons Non Commercial CC BY-NC: This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 3.0 License (http://www.creativecommons.org/licenses/by-nc/3.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).

2 Tumor Biology including surgical resection, adjuvant radiation therapy, chemotherapy, and the oral alkylating agent temozolomide, elongates the survival time, but the median survival is still only 15 months. 2 The poor prognosis encourages the researchers to study the methods to improve the survival status of GBM patients. One approach is to find out the prognostic biomarkers, which can be used to figure out the subtype of cancer and give personalized therapy. There were mainly three different methods to study the prognosis of cancer, that is, classification, survival modeling, and clustering. Considering that this article aimed to find out biomarkers for the prognosis of GBM, clustering is unsuitable because it is commonly used to find out the subtype of cancer. Other two methods, classification and survival modeling, were both used to decide the biomarkers. This study aimed to use classification and survival modeling to find prognostic biomarkers for GBM and build the logistic regression model or Cox s model which has the potential to be applied in practice. Methods and materials TCGA GBM dataset Clinical information, messenger RNA (mrna) expression, microrna (mirna) expression, and single-nucleotide polymorphism (SNP) array data of patients with GBM were retrieved from The Cancer Genome Atlas (TCGA). This dataset contains 596 patients. Level 3 mrna expression and mirna expression data of Agilent expression array will be used to figure out the mrnas and mirnas related to patients survival. SNP and mutation information are available in the platform of Affymetrix Genome-Wide Human SNP Array 6.0. Classification Patients were separated into two groups using 1 year as a cutoff. One group contains patients living longer than 1 year no matter they died or not after 1 year. The other group contains patients who died within 1 year. As a result, patients with censored data less than 1 year were removed from this study. Such filtering left 484 patients, among which mrna and mirna expression data were available for 471 and 461 subjects, respectively. The filtered patients were randomly separated into training and testing datasets by a ratio of 8:2. After filtering, analysis of variance (ANOVA) was used to identify clinical variables that displayed differently in high-risk or low-risk groups. R package limma 3 was used to figure out mrnas and mirnas which were differentially expressed between the two groups on the training dataset. Two cutoffs, 0.0005 and 0.05, were applied to the raw p value. The different cutoffs were used in order to control the final variables used in the classification models. With these mrna or mirna data, logistic regression models were built and tested on the testing dataset. The performances of classification models were shown using the receiver operating characteristic (ROC) curve plots and the heat maps. Cox s model A total of 39 patients without mrna expression or mirna expression data were filtered out, and this filtering left 557 patients for analysis. Among these patients, Karnofsky score was unavailable for 137 patients. Each clinical variable was used to build a univariate Cox proportional hazard ratio model. 4 Patients with unavailable data were excluded. Quantities of interest hazard ratios for linear coefficients were plotted using R package simph. 5 Each mrna or mirna was fitted to univariate Cox s model as well. 4 False discovery rate (FDR) adjustment was carried out on the results of mirna expression data, and significant mirnas were selected with a cutoff of 0.05. Multivariate Cox s model was built with the expression levels of significant mrnas or mirnas based on the training dataset. Patients were separated into high-risk and low-risk groups according to their predicted hazard ratio. In this case, median of the hazard ratio was used as the cutoff, and patients were separated into the high-risk and low-risk groups evenly. The Kaplan Meier plots and the log rank tests with the high-risk or low-risk groups were carried out on both the training and testing datasets. SNP and mutations As described above, patients were arbitrarily separated into two groups using 1 year as the cutoff. The frequencies of genes where any SNP or mutation happened were counted in two groups after silent mutations were removed. Only genes whose SNP or mutation appeared in more than 10% of patients were taken into account. The chi-square test was used to figure out genes that had different occurrences between the two groups. Within all patients with SNP data, a matrix was built to determine whether each gene had SNP occurred in each patient. Cox s model was fitted to study whether the occurrence of SNP on one gene would lead to different survival times. Integrated model A Cox s model integrated the clinical variables, mrna expression, mirna expression, and SNP identified using the survival modeling was built. A stepwise model selection by the Akaike information criterion (AIC) was carried out to avoid over-fitting. The integrated model was built on the training dataset and was verified on the testing dataset.

Ai et al. 3 Results Clinical parameters related to prognosis Two methods were used to identify clinical parameters related to prognosis, that is, classification and the Cox model. The p values of each method were summarized in Table 1. No clinical parameter was significant when ANOVA was carried out on two groups, while the log likelihood test of Cox s model found that age at initial diagnosis, race, and Karnofsky score were significantly related to survival time. The relations between hazard ratio and numeric variable were shown in Figure 1. The younger when the patient Table 1. Possibility that clinical variables were associated with survival time. Classification a Cox s model b Gender 3.56E-01 2.13E-01 Age at initial diagnosis 1.70E-01 7.77E-16 Race 1.51E-01 2.10E-02 Ethnicity 1.79E-01 9.09E-01 Karnofsky score c 3.74E-01 3.18E-07 a One-way analysis of variance (ANOVA) was used to compare whether these variables were different between the group of patients living longer than 1 year and the group of patients died within 1 year. b These p values were obtained using the log likelihood test of Cox s model. c Karnofsky score is an index describing patients functional impairment. A smaller Karnofsky score means a serious impairment. is diagnosed with GBM or the larger the Karnofsky score is, the lower the hazard ratio is. These plots showed a 95% (the lightest blue) and 50% (the light blue) probability interval of the simulations. 5 While the Karnofsky score had a relatively stable 95% probability interval of the simulations, the probability interval of age at initial diagnosis increased dramatically along with the increased age. It suggested that the prognosis of patients with larger age at initial diagnosis was much more changeable than that of younger patients. As a result, it may be more difficult to predict the survival time of older patients and give them personal treatment. Performance of classification With groups of patients living longer or shorter than 1 year, p values were calculated with R package limma. 3 The mrnas that have p values less than 5.0E-4 were listed in Table 2. Even though the p values of these mrnas seemed significant, the adjusted p values using the FDR method were larger than 0.05. As a result, considering the total 17,417 mrnas available, there were no significant mrnas whose expression levels were different between the two groups. The ROC plots (Figure 2) and the heat maps (Supplementary Figures 1 and 2) both confirmed that these mrnas were poor biomarkers for classification. Then, a logistic regression model was built based on the training dataset using these 14 mrnas and a stepwise model selection was carried out to find a better model with lower AIC. The featureselected model contained OPA3, PARP10, E4F1, CHCHD4, Figure 1. Quantities of interest hazard ratios for age at initial diagnosis (left) and Karnofsky score (right) from Cox s model. The 95% and 50% probability intervals of the simulations were shown in different shades of blue. The X-axle represents the patients ages at initial diagnosis, and the Y-axle represents the hazard ratio.

4 Tumor Biology COG3, UNC93B1, IFI27, and SELV. The ROC plots of these models on training and testing datasets were shown in Figure 2. The area under the curve (AUC) of the training dataset is only 0.71, which suggested that it was not a good model. Moreover, the AUC of the testing dataset was only 0.40, which was even worse than a random model. The same analysis was carried out with expression levels of mirnas. The p values and adjusted p values were shown in Table 3. Similarly, some of mirnas had p values less than 0.05, but their FDRs were insignificant. The ROC plots (Figure 3) and the heat maps (Supplementary Figures 3 and 4) also suggested that these mirnas could not be used to classify patients and decide whether they were able to live longer than 1 year. Table 2. The p value and FDR of top mrnas whose expression levels were related to survival time. logfc p value FDR IRF7 0.207 3.41E-05 0.401 COG3 0.161 5.24E-05 0.401 DES 0.351 9.45E-05 0.401 C19orf25 0.197 1.22E-04 0.401 ZNF575 0.240 1.26E-04 0.401 OPA3 0.136 1.38E-04 0.401 E4F1 0.137 1.83E-04 0.416 PKN2 0.189 2.47E-04 0.416 CHCHD4 0.123 2.89E-04 0.416 PARP10 0.230 3.10E-04 0.416 ARTN 0.191 3.49E-04 0.416 UNC93B1 0.224 3.90E-04 0.416 SELV 0.389 4.35E-04 0.416 IFI27 0.379 4.37E-04 0.416 FDR: false discovery rate; mrna: messenger RNA; FC: fold change. Performance of Cox s model Univariate Cox s model was built based on each mrna or mirna. Their p values and adjusted p values were summarized in Table 4. The types of mrna or mirna were decided by their coefficients. If the mrna or mirna had a positive coefficient, the higher expression level it had, the larger the hazard ratio. So it was a risky biomarker. However, it was a protective biomarker. Multivariate Cox s model was built with the expression levels of significant mrnas based on the training dataset. Patients were separated into high-risk and low-risk groups according to their predicted hazard ratio. In this case, median of the hazard ratio was used as the cutoff, and patients were separated into the high-risk and low-risk groups evenly. The cutoff can be decided by the specific goal. The Kaplan Meier plots of high-risk and low-risk groups in both the training and testing datasets were shown in Figure 4. The p value of the log rank test, which measures whether the two Table 3. The p value and FDR of top mirnas whose expression levels were related to survival time. logfc p value FDR hsa-mir-769-5p 0.079 5.01E-03 0.978 hsa-mir-548b 0.015 2.16E-02 0.978 hsa-mir-409-5p 0.041 2.35E-02 0.978 hsa-mir-422b 0.121 3.13E-02 0.978 hsa-mir-200c 0.106 3.49E-02 0.978 ebv-mir-bart3-3p 0.010 3.86E-02 0.978 hsa-mir-597 0.010 3.93E-02 0.978 hsa-mir-550 0.073 4.31E-02 0.978 hsa-mir-199a 0.210 4.60E-02 0.978 FDR: false discovery rate; mirna: microrna. Figure 2. The ROC plot of logistic regression using mrnas with p value less than 0.001 on both training (left) and testing (right) datasets by classification.

Ai et al. 5 Figure 3. The ROC plot of logistic regression using mirnas with p value less than 0.05 on both training (left) and testing (right) datasets by classification. Table 4. The p value and FDR of significant mrnas and mirnas. Symbol p value FDR Type mrna RANBP17 1.01E-08 1.75E-04 Protective KIAA0495 4.02E-07 2.34E-03 Risky UBE2Z 4.03E-07 2.34E-03 Risky HPGD 1.15E-06 5.00E-03 Protective CYB561 2.05E-06 7.16E-03 Risky CLEC5A 3.60E-06 8.98E-03 Risky IZUMO1 3.11E-06 8.98E-03 Protective COL22A1 4.43E-06 9.66E-03 Risky PDSS1 4.99E-06 9.67E-03 Risky RAB36 5.68E-06 9.90E-03 Protective mirna hsa-mir-222 1.49E-06 7.95E-04 Risky hsa-mir-221 2.76E-05 7.37E-03 Risky FDR: false discovery rate; mrna: messenger RNA; mirna: microrna. or more groups have different survival times, was only 3.0E-11 on the training dataset and 9.7E-03 on the testing dataset. The significant p value on the testing dataset showed that these mrna signatures were able to separate the patients into high-risk and low-risk groups. But the differences between p values of training and testing datasets might imply that the model was over-fitted. A similar model was built using mirna expression levels (Figure 5). The p values on the training and testing datasets were 1.6E-3 and 2.9E-3, respectively. The results suggested that these two mirnas were also good biomarkers. SNP related to prognosis There were no genes where any SNP occurred leading to different survival times. Although there were genes having p values less than 0.05, their FDRs were still insignificant (Table 5). One reason it that to simply the question, all SNPs on one genes were treated equally in this study. It could not conclude that no SNP was related to the prognosis of GBM patients. Multivariate Cox s model Three significant clinical variables, 14 mrnas, and two mirnas were used to build the final integrated multivariate Cox s model. After removing patients with any unavailable data, 304 patients data were used to train the model and 92 patients data were available for the testing dataset. The final model was as follows harzard ratio 0 if Asian 0. 199 if black = [ ]+ [ ] + 0. 716[ if white]+ 0.024 ( ) age_ at_ initial_ diagnosis + 0. 018 Karnofsky_ score + 01.68 COL22A1+ ( 0. 628) PGD + 0. 416 UBE2Z+ ( 0. 184) mir _ 221 + 030. 3 mir_ 222 The analysis of the hazard ratio was shown in Supplementary Figures 5 and 6. The log rank test and Kaplan Meier plot were carried out on both the training and testing datasets (Figure 6).

6 Tumor Biology Figure 4. The Kaplan Meier plot using mrnas with FDR less than 0.01 on both training (left) and testing (right) datasets by Cox s model. Figure 5. The Kaplan Meier plot using mirnas with FDR less than 0.01 on both training (left) and testing (right) datasets by Cox s model. Table 5. The p value and FDR of SNP using classification and Cox s model. Symbol p value FDR Classification SYNE1 4.85E-02 1.000 Cox s model DUX4L18 3.06E-03 0.208 UBR4 1.53E-02 0.387 CCDC151 1.71E-02 0.387 NF1 3.71E-02 0.631 FDR: false discovery rate; SNP: single-nucleotide polymorphism. Eight signatures which were selected from all clinical features, mrna expression level, and mirna expression level performed well on both the training and testing datasets. The separation of low-risk and high-risk groups suggested that this model might be utilized in practice. Discussion We have performed classification and survival modeling to figure out biomarkers for the survival time of GBM. Both methods are commonly used to study the prognosis, but this study found that survival modeling was more suitable for GBM, and probably for other cancers, than classification.

Ai et al. 7 For classification, surprisingly, there were no significant clinical variables, mrnas, mirnas, or SNPs identified. On the contrary, survival modeling found successful biomarkers, which was proved using the testing dataset. It suggested that survival modeling might be a better method for biomarkers selection than classification. There are two shortcomings of classification. One is that classification required two or more groups. When patients were separated into different groups, some of the censored data had to be removed. For example, in this case, 112 censored data with follow-up time less than 1 year were excluded, which accounted for 18.8% of all available cases. Also, GBM had a poor prognosis, so there are much less censored data than other cancers like prostate cancer or breast cancer. Classification will lose even more cases in studies of these cancers. Second shortcoming is that it is hard to determine cutoffs or criteria to classify patients. For example, Patnaik et al. 6 separated patients into groups of patients with recurrence and patients without recurrence in a study of non small cell lung cancer. But in their study, the median recurrence time of patients with recurrence was much less than the median follow-up time of other patients. However, it is not the case in this study. Another study about non small cell lung cancer used groups of patients who survived more than 30 months or less than 25 months to find out a pool of potential mirna biomarkers, which were trained and tested by survival modeling later. 7 Sboner et al. 8 used 10 year as the cutoff in a study of prostate cancer, but they failed in improving prediction of disease progression with mrna biomarkers. Marko et al. 9 studied survival of GBM patients by separating them into groups of living longer than 24 months or shorter than 9 months, which excluded many patients. Due to the above two reasons, methods of classification in this study failed in finding the prognostic biomarkers. But different criteria to separate the patients may succeed. On the contrary, survival modeling using Cox s model or other models, such as the accelerated failure time model, 10 does not have such shortcomings. And it succeeded in obtaining potential signatures for predicting GBM prognosis, even though Cox s model has its own weakness, including that it is hard to interpret the hazard ratio into real survival time. This study identified a group of clinical features, mrnas, and mirnas as potential prognostic biomarkers for GBM patients, which performed successfully in the testing dataset. Some of the biomarkers revealed by this study accord to known mechanisms, but few studies used them as biomarkers. Three clinical variables were identified to be significantly related to survival time. They are age at initial diagnosis, Karnofsky score, and race. Interestingly, except for race, other two significant variables were both used as prognostic factors in recursive partitioning analysis of GBM. 11 This model separated patients into III, IV, and V + VI classes, defined by age, performance status, extent of resection, and neurologic function. 12 Here, the performance status was measured by Karnofsky score. Race is another significant variable. Surprisingly, the white had a higher hazard ratio than the black, and the Asian had the lowest hazard ratio. Multiple studies about GBM reported that there were no significant differences from the white and the black, but the Asian had a long survival time. 13,14 The former studies supported that the significant features could be used in the prediction of prognosis in GBM. Although both selected mrnas and mirnas succeeded in predicting prognosis, most of these mrnas were never reported to be related to any cancer, and none of these mrnas were bound by mir-221 or mir-222. However, both mir-221 and mir-222 are oncogenic mirnas. Figure 6. The Kaplan Meier plot using optimal models on both training (left) and testing (right) datasets by multivariate Cox s model.

8 Tumor Biology When the biological functions of mrna signatures were studied, it was quite interesting that genes encoding some of these mrnas were related to inflammation. CLEC5A was reported to regulate inflammatory reactions and also control neuroinflammation through DAP12. 15 HPGD is the main enzyme of prostaglandin degradation, which is involved in inflammation, and leads to anti-inflammatory effects. Such results may suggest that inflammation is one cause of bad prognosis in patients with GBM. The unrelated biological functions of mrnas implied one possibility that it happened that these mrnas had a lower p value. When Cox s model prediction with a single variable was carried out, only CLEC5A, IZUMO1, and RANBP17 had a significant p value on the testing group. It suggested that other mrnas might be related to survival status by chance on the training group due to the high variety of patients with cancer and smaller sample number in the testing group. Furthermore, the huge difference from p values of the training group and the testing group (3.0E-11 vs 9.7E-3) implied the over-fitting on the training group. On the contrary, the two mirna signatures, mir-221 and mir-222, are oncogenic mirnas as reported. Interestingly, the genes encoding mir-221 and mir-222 occupy adjacent sites on the same chromosome. Moreover, their expression levels appear to be co-regulated, and they also seem to have the same specificity for the target. 16 For example, both mir- 221 and mir-222 are important regulators of p27 Kip1, which is a tumor suppressor and a cell cycle inhibitor. The higher activity and higher levels of mir-221 and mir-222 correlated with lower level of p27 Kip1 protein. 16 It was also reported that mir-221 and mir-222 promoted the aggressive growth of GBM through suppressing p27 Kip1. 16,17 These studies supported that mir-221 and mir-222 might be potential prognostic biomarkers for GBM. All in all, these eight signatures of clinical features, mrna levels, and mirna levels with significant p values on both the testing group and the training group need to be tested further on independent studies. To conclude, this article used two different methods, classification and survival modeling, to figure out prognostic biomarkers for GBM. Although the classification method failed, signatures of clinical features, mrna expression levels, and mirna expression levels were figured out using Cox s model. A multivariate model integrating all these information was built on the training dataset and validated on the testing dataset. It was proved to be a successful model, which has the potential to be used in clinics for personalized therapy for GBM patients. Acknowledgements Z.A. and L.L. are co-first authors. Declaration of conflicting interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Funding The author(s) received no financial support for the research, authorship, and/or publication of this article. References 1. Louis DN, Ohgaki H, Wiestler OD, et al. The 2007 WHO classification of tumours of the central nervous system. Acta Neuropathol 2007; 114: 97 109. 2. Parsons DW, Jones S, Zhang X, et al. An integrated genomic analysis of human glioblastoma multiforme. Science 2008; 321: 1807 1812. 3. Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015; 43: e47. 4. Andersen PK and Gill RD. Cox s regression model for counting processes: a large sample study. Ann Stat 1982; 10: 1100 1120. 5. Gandrud C. SimPH: an R package for showing estimates for interactive and nonlinear effects from Cox proportional hazard models. J Stat Softw 2013, https://www.jstatsoft.org/ article/view/v065i03 6. Patnaik SK, Kannisto E, Knudsen S, et al. Evaluation of microrna expression profiles that may predict recurrence of localized stage I non-small cell lung cancer after surgical resection. Cancer Res 2010; 70: 36 45. 7. Hu Z, Chen X, Zhao Y, et al. Serum microrna signatures identified in a genome-wide serum microrna expression profiling predict survival of non-small-cell lung cancer. J Clin Oncol 2010; 28: 1721 1726. 8. Sboner A, Demichelis F, Calza S, et al. Molecular sampling of prostate cancer: a dilemma for predicting disease progression. BMC Med Genomics 2010; 3: 8. 9. Marko NF, Toms SA, Barnett GH, et al. Genomic expression patterns distinguish long-term from short-term glioblastoma survivors: a preliminary feasibility study. Genomics 2008; 91: 395 406. 10. Wei LJ. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat Med 1992; 11: 1871 1879. 11. Lamborn KR, Chang SM and Prados MD. Prognostic factors for survival of patients with glioblastoma: recursive partitioning analysis. Neuro Oncol 2004; 6: 227 235. 12. Shaw EG, Seiferheld W, Scott C, et al. Reexamining the radiation therapy oncology group (RTOG) recursive partitioning analysis (RPA) for glioblastoma multiforme (GBM) patients. Int J Radiat Oncol Biol Phys 2003; 57: S135 S136. 13. Barnholtz-Sloan JS, Maldonado JL, Williams VL, et al. Racial/ ethnic differences in survival among elderly patients with a primary glioblastoma. J Neurooncol 2007; 85: 171 180. 14. Thumma SR, Fairbanks RK, Lamoreaux WT, et al. Effect of pretreatment clinical factors on overall survival in glioblastoma multiforme: a Surveillance Epidemiology and End Results (SEER) population analysis. World J Surg Oncol 2012; 10: 75. 15. Chen ST, Liu RS, Wu MF, et al. CLEC5A regulates Japanese encephalitis virus-induced neuroinflammation and lethality. PLoS Pathog 2012; 8: e1002655. 16. le Sage C, Nagel R, Egan DA, et al. Regulation of the p27(kip1) tumor suppressor by mir-221 and mir-222 promotes cancer cell proliferation. EMBO J 2007; 26: 3699 3708. 17. Gillies JK and Lorimer IAJ. Regulation of p27kip1 by mirna 221/222 in glioblastoma. Cell Cycle 2007; 6: 2005 2009.