The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

Similar documents
SUPPLEMENTARY INFORMATION

User s Manual Version 1.0

Supplementary Figures

Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers

Supplementary. properties of. network types. randomly sampled. subsets (75%

Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity

Nature Methods: doi: /nmeth.3115

SUPPLEMENTARY FIGURES: Supplementary Figure 1

Expanded View Figures

Nicholas Borcherding, Nicholas L. Bormann, Andrew Voigt, Weizhou Zhang 1-4

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser

Evaluation of public cancer datasets and signatures identifies TP53 mutant signatures with robust prognostic and predictive value

Computational Investigation of Homologous Recombination DNA Repair Deficiency in Sporadic Breast Cancer

Integrative analysis of survival-associated gene sets in breast cancer

DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging

Research Article Breast Cancer Prognosis Risk Estimation Using Integrated Gene Expression and Clinical Data

LncRNA TUSC7 affects malignant tumor prognosis by regulating protein ubiquitination: a genome-wide analysis from 10,237 pancancer

Nature Getetics: doi: /ng.3471

Nature Medicine: doi: /nm.3967

Integrative analysis of DNA methylation and gene expression reveals hepatocellular carcinoma-specific diagnostic biomarkers

Integrative DNA methylome analysis of pan-cancer biomarkers in cancer discordant monozygotic twin-pairs

Phenotype prediction based on genome-wide DNA methylation data

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Elevated RNA Editing Activity Is a Major Contributor to Transcriptomic Diversity in Tumors

The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley

NEpiC: a network-assisted algorithm for epigenetic studies using mean and variance combined signals

OverView Circulating Nucleic Acids (CFNA) in Cancer Patients. Dave S.B. Hoon John Wayne Cancer Institute Santa Monica, CA, USA

Gene expression profiling predicts clinical outcome of prostate cancer. Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M.

SSM signature genes are highly expressed in residual scar tissues after preoperative radiotherapy of rectal cancer.

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Expanded View Figures

Nature Genetics: doi: /ng Supplementary Figure 1. Workflow of CDR3 sequence assembly from RNA-seq data.

Contents. 1.5 GOPredict is robust to changes in study sets... 5

Significance Analysis of Prognostic Signatures

Solving Problems of Clustering and Classification of Cancer Diseases Based on DNA Methylation Data 1,2

SUPPLEMENTARY INFORMATION

LinkedOmics. A Web-based platform for analyzing cancer-associated multi-dimensional data. Manual. First edition 4 April 2017 Updated on July 3, 2017

Gene-microRNA network module analysis for ovarian cancer

Identification of Tissue Independent Cancer Driver Genes

CDH1 truncating alterations were detected in all six plasmacytoid-variant bladder tumors analyzed by whole-exome sequencing.

Patient characteristics of training and validation set. Patient selection and inclusion overview can be found in Supp Data 9. Training set (103)

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Supplemental Information. Molecular, Pathological, Radiological, and Immune. Profiling of Non-brainstem Pediatric High-Grade

Putative effectors for prognosis in lung adenocarcinoma are ethnic and gender specific

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

Distinct cellular functional profiles in pan-cancer expression analysis of cancers with alterations in oncogenes c-myc and n-myc

Impact of Prognostic Factors

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

DNA methylation profiling in the Carolina Breast Cancer Study defines cancer subclasses differing in clinicopathologic characteristics and survival

Session 4 Rebecca Poulos

Novel Biomarkers (Kallikreins) for Prognosis and Therapy Response in Ovarian cancer

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

SUPPLEMENTARY INFORMATION

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies

SUPPLEMENTARY APPENDIX

RALYL Hypermethylation: A Potential Diagnostic Marker of Esophageal Squamous Cell Carcinoma (ESCC) Junwei Liu, MD

Session 4 Rebecca Poulos

Supplementary Information

Title: Human breast cancer associated fibroblasts exhibit subtype specific gene expression profiles

SubLasso:a feature selection and classification R package with a. fixed feature subset

Cancer Informatics Lecture

Title:Identification of a novel microrna signature associated with intrahepatic cholangiocarcinoma (ICC) patient prognosis

DNA methylation signatures for 2016 WHO classification subtypes of diffuse gliomas

Promoter methylation of DNA damage repair (DDR) genes in human tumor entities: RBBP8/CtIP is almost exclusively methylated in bladder cancer

Geisinger Clinic Annual Progress Report: 2011 Nonformula Grant

Integrated Cox s model for predicting survival time of glioblastoma multiforme

Applying Tissue Phenomics to Colorectal Clinical Questions

Proteomic Biomarker Discovery in Breast Cancer

Supplementary Material

Tissue-specific DNA methylation loss during ageing and carcinogenesis is linked to chromosome structure, replication timing and cell division rates

Big Image-Omics Data Analytics for Clinical Outcome Prediction

TCGA. The Cancer Genome Atlas

Layered-IHC (L-IHC): A novel and robust approach to multiplexed immunohistochemistry So many markers and so little tissue

File Name: Supplementary Information Description: Supplementary Figures and Supplementary Tables. File Name: Peer Review File Description:

MicroRNA expression profiling and functional analysis in prostate cancer. Marco Folini s.c. Ricerca Traslazionale DOSL

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

MethylMix An R package for identifying DNA methylation driven genes

Only Estrogen receptor positive is not enough to predict the prognosis of breast cancer

Predicting Kidney Cancer Survival from Genomic Data

The Cancer Genome Atlas

Transcriptional Profiles from Paired Normal Samples Offer Complementary Information on Cancer Patient Survival -- Evidence from TCGA Pan-Cancer Data

CoINcIDE: A framework for discovery of patient subtypes across multiple datasets

Supplementary Figure 1 Overall study design

Original Article CREPT expression correlates with esophageal squamous cell carcinoma histological grade and clinical outcome

Expert-guided Visual Exploration (EVE) for patient stratification. Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland

S1 Appendix: Figs A G and Table A. b Normal Generalized Fraction 0.075

An annotated list of bivalent chromatin regions in human ES cells: a new tool for cancer epigenetic research

LncMAP: Pan-cancer atlas of long noncoding RNA-mediated transcriptional network perturbations

Supplementary Figure 1: Comparison of acgh-based and expression-based CNA analysis of tumors from breast cancer GEMMs.

Androgen Receptor Expression in Renal Cell Carcinoma: A New Actionable Target?

microrna PCR System (Exiqon), following the manufacturer s instructions. In brief, 10ng of

Digitizing the Proteomes From Big Tissue Biobanks

SUPPLEMENTARY INFORMATION

Expression and methylation patterns partition luminal-a breast tumors into distinct prognostic subgroups

ncounter Assay Automated Process Immobilize and align reporter for image collecting and barcode counting ncounter Prep Station

Biomarker development in the era of precision medicine. Bei Li, Interdisciplinary Technical Journal Club

EXPression ANalyzer and DisplayER

Comprehensive analyses of tumor immunity: implications for cancer immunotherapy

An integrative pan-cancer-wide analysis of epigenetic enzymes reveals universal patterns of epigenomic deregulation in cancer

Transcription:

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis Tieliu Shi tlshi@bio.ecnu.edu.cn The Center for bioinformatics East China Normal University Aug. 30, 2018 Hayama Compus, Japan

Why Do We Focus on DNA Methylation? DNAm changes in the non-islands regions, such as shores and gene bodies also play important roles in gene expression regulation. Aberrant methylation could be used as biomarker for clinical decisions, such as cancer diagnosis and prognosis. DNA methylation markers can also be used to predict the origin of tumors with metastases. The Cancer Genome Atlas Project (TCGA) now provides unprecedented cancer genomic and methylation data resources for various cancer researches.

Progressive Change in DNA Methylation from Normal Tissue to Breast Cancer Tissue DNA methylation field defects in cancer tissue Progression of field defects in breast cancer. (c) Example of a DNAm profile of a hypervariable and hypermethylated DVMC, showing the progressive change in DNA methylation. N represents normal tissue from cancer-free women, NADJ for age-matched normal samples adjacent to breast cancers.

DNA Methylation Markers Distinguish Prostate Cancers from Benign Adjacent Tissue BMC Cancer, 17(1): 273, 2017

Methylation Markers can be Used for Diagnosis and Prognosis of Common Cancers Fig. Methylation signatures can differentiate different cancer types from corresponding normal tissues. (A) Unsupervised hierarchical clustering and heat map presentation associated with the methylation profile (according to the color scale shown) in different cancer types. (B) ROC curve showing the high sensitivity and specificity in predicting different cancer types. (C) Zoom-in view of the block diagram in B. PNAS, 114:7414-7419, 2017

ctdna Methylation Markers for Diagnosis and Prognosis of Hepatocellular Carcinoma ROC of the diagnostic prediction model with methylation markers in the training (c) and validation data sets (d). Nature materials, 16:1155, 2017 All of the studies confirmed that there are distinguished DNA methylation patterns between cancer tissues and their related normal tissues.

Data Resources and Methods Datasets (TCGA) Gene expression data: Level 3 expression data from TCGA, log2(x+1) transformed. DNA methylation data: (i) The probes mapped to sex chromosomes were removed; (ii) The samples with missing data (i.e. NAs) in more than 30% of the probes were excluded; (iii) The probes with missing data in more than 30% of the samples were discarded; (iv) The rest of the probes with NAs were imputed using the EMimpute algorithm; (v) BMIQ was employed to correct for the type II probe bias. Validation dataset GEO: GSE69914, GSE76938, GSE48684, GSE73549, GSE65820, GSE66836, GSE89852, GSE58999 and GSE38240

Feature Selection & Function Analysis Definition of differentially methylated probes between tumor and normal samples β-difference > 0.2 and a false discovery rate (FDR) corrected P-value (Benjamini/Hochberg) < 0.05 Definition of differentially methylated probes between different cancer types β-difference > 0.3 and FDR < 0.01 Feature selection & Tumor specific multiclass classifier Recursive feature elimination & logistic regression, OneVsRest Classifier Statistical analysis All statistical analyses and visualization were performed with Python3.5.2 on anaconda3-4.0.0. Gene ontology enrichment analysis and pathway enrichment analysis DAVID

Data Resources and Methods for Survival Analysis Cox regression analysis (1) Standard deviation (SD) across all tumor samples of 26 cancers should be > 0.2. (2) FDR (Benjamini/Hochberg method) for every probe was calculated via univariate cox regression in each cancer, the probes with FDR < 0.05 were retained for further filtration. (3) Log-rank test P-value for survival time among tumor samples should be < 0.05. (4) Multivariate cox regression was performed for the left probes, and stepwise regression was conducted, the probes of multivariate cox regression p-value < 0.05 were removed from the feature set in each iteration. (5) The remaining probes were used to fit the prognostic classifier. Python package lifelines and cox s proportional hazard model was implemented in cox regression analysis.

Methylation Profiles of Different Cancers Vary Tremendously Countplot or differential methylated probes in different cancer types. The number of hypermethylated and hypomethylated CpG sites vary greatly in 18 different cancer types. 1. Esophageal carcinoma (ESCA) has the largest count of hypermethylated CpG sites, whereas pheochromocytoma and paraganglioma (PCPG) has the least. 2. Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) has the least count of hypomethylated CpG sites, while liver hepatocellular carcinoma (LIHC) shows the highest number of hypomethylated CpG sites

Methylation Profiles of Different Cancers Vary Tremendously The distribution of differential methylated probes based on Relation to Island vary significantly in different groups. 1. Differential methylated CpG sites (DMCs) located at gene body are far more than that of other regions, and OpenSea holds a large proportion DMCs among all different Relation To Island (OpenSea, S_Shelf, S_Shore, Island, N_Shore and N_Shelf). 2. The number of differential methylated CpG sites located in the gene body regions is the highest among different genomic regions (TSS1500, TSS200, 5'UTR, 1stExon, Body and 3'UTR).

Methylation Profiles of Different Cancers Vary Tremendously Spearman s correlation analysis between the methylation level of CpG sites and the expression of their corresponding genes for each cancer. Indicating that aberrant DNAms in different tumors may have different functions. Boxplot of Spearman s correlation among different cancer types. The hypermethylated CpG sites tends to negatively correlate with the expression of their corresponding genes in almost all different tumor types. The methylation level of most hypomethylated CpG sites are positively correlated with the expression of their corresponding genes in cancers, but some of them are negatively correlated with the expression of their relevant genes in other cancers

Seven Probes Classifier has Good Performance in Distinguishing Tumor and Normal a, ROC curve of training (12 cancer types and 1216 samples) showing the high sensitivity and specificity in predicting different cancer types from corresponding normal tissues. b Learning curve of 5-fold cross validation in training (ROC = 0.979).

Seven Probes Classifier has Good Performance in Distinguishing Tumor and Normal BRCA PRAD COAD c, d, e ROC curve for the validation data set of GSE69914(Breast Cancer), GSE48684(Colorectal Cancer) and GSE76938 (Prostate Cancer). f ROC curve for the independent validation data set of the remaining 9 TCGA cancers not included in the training set. T and N indicate numbers of tumor and normal samples

Seven Probes Classify Those Samples of 12 Cancer Types into Two Distinguished Groups Unsupervised hierarchical clustering and heatmap associated with the methylation profile of the seven probes across all 1216 samples of 12 cancers. Those samples were classified into two distinct classes by the 7 CpG sites Right color bars mark the tissue type and cancer type.

GO Biological Process and KEGG Pathway Enrichment Analysis Results Enrichment analyses for those genes that their expression levels were significantly correlated with 4 probes, highly associated with tumor biogenesis.

Tumor-Specific Classifier with 12 Probes Effectively Distinguishes Different Cancers Unsupervised hierarchical clustering and heatmap showing the methylation profile of the selected 12 probes across 7605 tumor samples of 26 cancer types reals that those 12 probes complement with each other to distinguish different cancer types.

Tumor-Specific Classifier Effectively Distinguishes Different Cancers The micro-average AUC of OneVsRestClassifier based on those 12 selected CpG sites reaches to 0.98 for those 26 different cancers. The results indicate the high sensitivity and specificity of our multiclass tumor specific classifier in predicting different cancers.

Tumor-Specific Classifier Effectively Distinguishes Different Cancers Validation based on GEO datasets of breast cancer (BRCA), colorectal cancer (COAD), prostate cancer (PRAD), ovarian cancer (OV), lung adenocarcinoma (LUAD) and hepatocellular carcinomas (LIHC). micro-average AUC > 0.89 ROC curve in six independent validation dataset of different cancers. T indicates the numbers of tumor samples used in each dataset.

a Tumor-Specific Classifier Effectively Predict the Origin of Tumors with Metastases GSE58999: Breast cancer with metastases to lymph node. GSE73549 and GSE38240: Prostate cancer with metastases to bone or lymph node (26 metastatic samples in total). ROC curve of multiclass tumor specific classifier in metastatic breast cancer and metastatic prostate cancer. T indicates the numbers of tumor samples with metastasis used in each dataset.

These 12 Probes can Effectively Distinguish Metastatic Tumors From Normal Tissues Unsupervised clustering of the DNA methylation levels of the 12 CpGs between normal prostate samples, prostate tumor and prostate tumor with metastases from GSE73549 and GSE38240. Annotations of the column of the heatmap indicate disease states of patients.

Construction of the Probe-based Prognostic Classifier on 28 cancer types Methylation profile of the probes used by the prognostic classifier in patients Z-score distribution of the prognostic classifier and patient survival status

The Performance of Probe-based Prognostic Classifier Construction of the probe-based prognostic classifier. a The pipeline of feature selection for prognostic classifier. b-h The result of Prognostic classifier for 7 cancer types Upper panel: Kaplan-Meier survival analysis for the patients in each of the 7 cancers Middle panel: heatmap showing methylation of the CpGs used by the prognostic classifier in patients.. Lower panel: Z-score distribution of the prognostic classifier and patient survival status. The patients were divided into low-risk and high-risk groups using the median value of the partial hazard as cutoff. p-value were calculated by the log-rank test.

Conclusions 1. Methylation profiles of different cancers vary tremendously, which can be used to distinguish the cancer from normal as well as different cancer types. 2. Results with independent validation datasets of various cancers demonstrate the performance and robustness of our DNAm site-based tumor-normal classifier. 3. Tumor-specific classifier can effectively distinguish different cancers. The DNAm markers for tumor-specific classifier can also be used to predict the origin of tumor with metastases or with unknown primary origin. 4. Prognostic classifier with DNAm pattern successfully divides patients into high-risk or low-risk group in different cancer types. 5. We identified potential diagnostic and prognostic biomarkers based on the methylation changes of DNAm patterns in diverse TCGA cancers, which have the potential application in clinical practices.

Thank You for Your Attention Questions?