Journal: Nature Methods

Similar documents
Nature Medicine: doi: /nm.3967

ARTICLE RESEARCH. Macmillan Publishers Limited. All rights reserved

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

Patient characteristics of training and validation set. Patient selection and inclusion overview can be found in Supp Data 9. Training set (103)

Nature Methods: doi: /nmeth.3115

National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2009 Formula Grant

Supplementary Materials for

SUPPLEMENTARY INFORMATION

Cancer Informatics Lecture

Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

Supplementary Tables. Supplementary Figures

Evaluation of public cancer datasets and signatures identifies TP53 mutant signatures with robust prognostic and predictive value

Expanded View Figures

SUPPLEMENTARY APPENDIX

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Identification of Tissue Independent Cancer Driver Genes

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

Nature Getetics: doi: /ng.3471

Integrative analysis of survival-associated gene sets in breast cancer

Supplementary. properties of. network types. randomly sampled. subsets (75%

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

Supplementary Information

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Supplement for: CD4 cell dynamics in untreated HIV-1 infection: overall rates, and effects of age, viral load, gender and calendar time.

Nature Neuroscience: doi: /nn Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Nature Genetics: doi: /ng Supplementary Figure 1. Rates of different mutation types in CRC.

An Example of Business Analytics in Healthcare

Reveal Relationships in Categorical Data

Nature Genetics: doi: /ng.2995

PBZ FT01_PBZ FT01_TZ FT01_NZ. interface zone (I) tumor zone (TZ) necrotic zone (NZ)

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Colon cancer subtypes from gene expression data

Surveillance report Published: 17 March 2016 nice.org.uk

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022

Carcinosarcoma Trial rial in s a in rare malign rare mali ancy

Summary... 2 TRANSLATIONAL RESEARCH Tumour gene expression used to direct clinical decision-making for patients with advanced cancers...

underlying metastasis and recurrence in HNSCC, we analyzed two groups of patients. The

Table of content. -Supplementary methods. -Figure S1. -Figure S2. -Figure S3. -Table legend

Predicting Kidney Cancer Survival from Genomic Data

VL Network Analysis ( ) SS2016 Week 3

Nature Genetics: doi: /ng Supplementary Figure 1. Details of sequencing analysis.

The Cancer Genome Atlas & International Cancer Genome Consortium

A Case-Study of Pancreatic Cancer in Michigan: Challenges in Reconstructing Residential Histories

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

TEB. Id4 p63 DAPI Merge. Id4 CK8 DAPI Merge

SAPLING: A Tool for Gene Network Analysis focusing on Psychiatric Genetics

Supplementary Figure 1: Classification scheme for non-synonymous and nonsense germline MC1R variants. The common variants with previously established

Session 4 Rebecca Poulos

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

Transcriptional Profiles from Paired Normal Samples Offer Complementary Information on Cancer Patient Survival -- Evidence from TCGA Pan-Cancer Data

Gene expression profiling predicts clinical outcome of prostate cancer. Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M.

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first

Theta sequences are essential for internally generated hippocampal firing fields.

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

Expanded View Figures

Figure S1: Effects on haptotaxis are independent of effects on cell velocity A)

Unsupervised Identification of Isotope-Labeled Peptides

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

A 16-Gene Signature Distinguishes Anaplastic Astrocytoma from Glioblastoma

Evaluating Classifiers for Disease Gene Discovery

Statistics and Epidemiology Practice Questions

Agilent GeneSpring/MPP Metadata Analysis Framework

Reporting Checklist for Nature Neuroscience

Evidence Based Medicine

Discontinuation and restarting in patients on statin treatment: prospective open cohort study using a primary care database

Supplementary Online Content

Supplemental Figure legends

Mutational Impact on Diagnostic and Prognostic Evaluation of MDS

A DNA Repair Pathway Focused Score for Prediction of Outcomes in Ovarian Cancer Treated With Platinum-Based Chemotherapy

Quick start guide for using subscale reports produced by Integrity

Supplemental Information. Molecular, Pathological, Radiological, and Immune. Profiling of Non-brainstem Pediatric High-Grade

Supplementary Figure 1: Comparison of acgh-based and expression-based CNA analysis of tumors from breast cancer GEMMs.

DNA methylation signatures for 2016 WHO classification subtypes of diffuse gliomas

Session 4 Rebecca Poulos

Supplementary Figure 1. ALVAC-protein vaccines and macaque immunization. (A) Maximum likelihood

Name: BIOS 703 MIDTERM EXAMINATIONS (5 marks per question, total = 100 marks)

Linear Regression in SAS

Supplementary Figures

Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010

Application of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets

Creating prognostic systems for cancer patients: A demonstration using breast cancer

Test Category: Prognostic and Predictive. Clinical Scenario

Classification of cancer profiles. ABDBM Ron Shamir

Introduction. Introduction

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Experimental design and workflow utilized to generate the WMG Protein Atlas.

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

COMPUTATIONAL OPTIMISATION OF TARGETED DNA SEQUENCING FOR CANCER DETECTION

Supplementary Methods

Expert-guided Visual Exploration (EVE) for patient stratification. Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland

National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2009 Formula Grant

Interactive analysis and quality assessment of single-cell copy-number variations

Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE

Sum of Neurally Distinct Stimulus- and Task-Related Components.

Supplementary Figure 1

Inhibition of fatty acid oxidation as a therapy for MYC-overexpressing triplenegative

Transcription:

Journal: Nature Methods Article Title: Network-based stratification of tumor mutations Corresponding Author: Trey Ideker Supplementary Item Supplementary Figure 1 Supplementary Figure 2 Supplementary Figure 3 Supplementary Figure 4 Supplementary Figure 5 Supplementary Figure 6 Supplementary Figure 7 Supplementary Figure 8 Supplementary Figure 9 Supplementary Figure 10 Supplementary Figure 11 Supplementary Figure 12 Supplementary Figure 13 Supplementary Figure 14 Supplementary Figure 15 Supplementary Figure 16 Title or Caption An overview of the somatic mutation landscape of TCGA ovarian cancer cohort. Simulating across different networks Ovarian cancer association with overall survival Lung cancer association with overall survival Uterine cancer association with histological type Standard predictors of survival are independent of ovarian subtype A network view of genes with high network smoothed mutation scores in ovarian cancer, HumanNet, subtype 2 relative to other subtypes A network view of genes with high network smoothed mutation scores in ovarian cancer, HumanNet, subtype 3 relative to other subtypes A network view of genes with high smoothed mutation scores in ovarian cancer, HumanNet, subtype 4 relative to other subtypes. A network view of genes with high smoothed mutation scores in uterine cancer, STRING, subtype 1 relative to other subtypes. A network view of genes with high smoothed mutation scores in uterine cancer, STRING, subtype 2 relative to other subtypes. A network view of genes with high smoothed mutation scores in uterine cancer, STRING, subtype 3 relative to other subtypes. A network view of genes with high smoothed mutation scores in lung cancer, HumanNet, subtype 1 relative to other subtypes. A network view of genes with high smoothed mutation scores in lung cancer, HumanNet, subtype 2 relative to other subtypes. A network view of genes with high smoothed mutation scores in lung cancer, HumanNet, subtype 3 relative to other subtypes. A network view of genes with high smoothed mutation scores in lung cancer, HumanNet, subtype 5 relative to other subtypes. 1

Supplementary Figure 17 Supplementary Figure 18 Supplementary Figure 19 Supplementary Table 1 From mutation-derived subtypes to expression signatures Standard consensus clustering NMF used to recover subtypes in the Tothill et al. expression cohort of ovarian tumors Effects of progressively permuting proportions of the lung cancer dataset Summary of gene interaction networks Source data figure Figure 5, Supp Fig 7-9 Supp Fig. 10-12 Supp Fig 13-16 File name OV_HM90_K4_SAM_diff.xlsx UCEC_ST90_K3_SAM_diff.xlsx LUAD_ST90_K3_SAM_diff.xlsx Supplementary Figure 1 Supplementary Figure 1. An overview of the somatic mutation landscape of TCGA ovarian cancer cohort. (a) Somatic mutations along the length of chromosome 17. (b) A histogram summing the frequency of mutations per gene for the entire exome. (c) A histogram summing the frequency of genes mutated per patient in the cohort. 2

Supplementary Figure 2 Supplementary Figure 2. Simulating across different networks. In this simulation network modules from the NCI- Nature cancer pathways network were used for the simulation and were recovered by NBS using the HumanNet network. Each subtype included between 2-6 driver modules totaling the specified size of genes and the driver gene frequency. Driver frequencies of 10%, 7.5%, 5% and driver modules comprising 100-120, 60-80, 20-40 were used in panels (a),(b) and (c) respectively. Furthermore, a subset (0-4) of the modules was assigned to overlap across multiple subtypes. 3

Supplementary Figure 3 Supplementary Figure 3. Ovarian cancer association with overall survival. (a) Co-clustering matrices for ovarian cancer patients, comparing NBS (HumanNet) to standard consensus clustering. (b-c) Cox proportional hazards model logrank statistic for STRING and PathwayCommons. (d) Hazard ratio of each of the HumanNet subtypes compared to subtype 2 with confidence intervals (0.95, 0.8, 0.6 denoted in blue, yellow and orange respectively). (e) Mean and S.E survival in months for each of the subtypes. (f) A Kaplan-Meier plot of the probability of developing platinum drug resistance for HumanNet with four clusters, Logrank P=0.046. (Subtype 4 is dropped due to missing annotations for PFI for the majority of patients). 4

Supplementary Figure 4 Supplementary Figure 4. Lung cancer association with overall survival. (a) Co-clustering matrices for lung cancer patients, comparing NBS (HumanNet) to standard consensus clustering. (b) Lung cancer patient survival cox proportional hazard model logrank statistic for PathwayCommons. (c) A Kaplan-Meier survival plot with six subtypes. 5

Supplementary Figure 5 Supplementary Figure 5. Uterine cancer association with histological type. (a-c) Association with histological subtype vs. the number of clusters (K). (d-f) Association with tumor grade vs. the number of clusters (K) (g) Summary of histological types for each subtype. (h) Summary of tumor grade vs each subtype. 6

Supplementary Figure 6 Supplementary Figure 6. Standard predictors of survival are independent of ovarian subtype. (a) Percentage of patients receiving an optimal surgical resection (defined as less than 10mm of residual tumor) does not vary significantly between subtypes ( 2 P-value = 0.77). (b) Federation of Gynaecological Oncologists (FIGO) tumor stage does not show evidence for dependence on tumor subtype ( 2 P-value = 0.48). (c) Age at diagnosis does not show dependence on tumor subtype (One-way ANOVA P-value = 0.89). 7

Supplementary Figure 7 Supplementary Figure 7. A network view of genes with high network smoothed mutation scores in ovarian, HumanNet, subtype 2 relative to other subtypes. Node size corresponds to smoothed mutation score. Node color corresponds to a set of functional classes of interest recovered through manual examination of the resulting network with the aid of the GeneMania Cytoscape plugin. Thickened node outlines indicate genes which are known cancer genes from the Sanger list of cancer genes. 8

Supplementary Figure 8 Supplementary Figure 8. A network view of genes with high network smoothed mutation scores in ovarian, HumanNet, subtype 3 relative to other subtypes. Node size corresponds to smoothed mutation score. Node color corresponds to a set of functional classes of interest recovered through manual examination of the resulting network with the aid of the GeneMania Cytoscape plugin. Thickened node outlines indicate genes which are known cancer genes from the Sanger list of cancer genes. 9

Supplementary Figure 9 Supplementary Figure 9. A network view of genes with high smoothed mutation scores in ovarian, HumanNet, subtype 4 relative to other subtypes. Node size corresponds to smoothed mutation scores. Node color corresponds to a set of functional classes of interest recovered through manual examination of the resulting network with the aid of the GeneMania Cytoscape plugin. Thickened node outlines indicate genes which are known cancer genes from the Sanger list of cancer genes. 10

Supplementary Figure 10 Supplementary Figure 10. A network view of genes with high smoothed mutation scores in uterine cancer, STRING, subtype 1 relative to other subtypes. Node size corresponds to smoothed mutation scores. Node color corresponds to a set of functional classes of interest recovered through manual examination of the resulting network with the aid of the GeneMania Cytoscape plugin. Thickened node outlines indicate genes which are known cancer genes from the Sanger list of cancer genes. Edge thickness corresponds to relative edge confidence in the network, underlined gene names indicate the gene is mutated in this subtype. 11

Supplementary Figure 11 Supplementary Figure 11. A network view of genes with high smoothed mutation scores in uterine cancer, STRING, subtype 2 relative to other subtypes. Node size corresponds to smoothed mutation scores. Node color corresponds to a set of functional classes of interest recovered through manual examination of the resulting network with the aid of the GeneMania Cytoscape plugin. Thickened node outlines indicate genes which are known cancer genes from the Sanger list of cancer genes. Edge thickness corresponds to relative edge confidence in the network, underlined gene names indicate the gene is mutated in this subtype. 12

Supplementary Figure 12 Supplementary Figure 12. A network view of genes with high smoothed mutation scores in uterine cancer, STRING, subtype 3 relative to other subtypes. Node size corresponds to smoothed mutation scores. Node color corresponds to a set of functional classes of interest recovered through manual examination of the resulting network with the aid of the GeneMania Cytoscape plugin. Thickened node outlines indicate genes which are known cancer genes from the Sanger list of cancer genes. Edge thickness corresponds to relative edge confidence in the network, underlined gene names indicate the gene is mutated in this subtype. 13

Supplementary Figure 13 Supplementary Figure 13. A network view of genes with high smoothed mutation scores in lung cancer, HumanNet, subtype 1 relative to other subtypes. Node size corresponds to smoothed mutation scores. Node color corresponds to a set of functional classes of interest recovered through manual examination of the resulting network with the aid of the GeneMania Cytoscape plugin. Thickened node outlines indicate genes which are known cancer genes from the Sanger list of cancer genes. Edge thickness corresponds to relative edge confidence in the network, underlined gene names indicate the gene is mutated in this subtype. 14

Supplementary Figure 14 Supplementary Figure 14. A network view of genes with high smoothed mutation scores in lung cancer, HumanNet, subtype 2 relative to other subtypes. Node size corresponds to smoothed mutation scores. Node color corresponds to a set of functional classes of interest recovered through manual examination of the resulting network with the aid of the GeneMania Cytoscape plugin. Thickened node outlines indicate genes which are known cancer genes from the Sanger list of cancer genes. Edge thickness corresponds to relative edge confidence in the network, underlined gene names indicate the gene is mutated in this subtype. 15

Supplementary Figure 15 Supplementary Figure 15. A network view of genes with high smoothed mutation scores in lung cancer, HumanNet, subtype 3 relative to other subtypes. Node size corresponds to smoothed mutation scores. Node color corresponds to a set of functional classes of interest recovered through manual examination of the resulting network with the aid of the GeneMania Cytoscape plugin. Thickened node outlines indicate genes which are known cancer genes from the Sanger list of cancer genes. Edge thickness corresponds to relative edge confidence in the network, underlined gene names indicate the gene is mutated in this subtype. 16

Supplementary Figure 16 Supplementary Figure 16. A network view of genes with high smoothed mutation scores in lung cancer, HumanNet, subtype 5 relative to other subtypes. Node size corresponds to smoothed mutation scores. Node color corresponds to a set of functional classes of interest recovered through manual examination of the resulting network with the aid of the GeneMania Cytoscape plugin. Thickened node outlines indicate genes which are known cancer genes from the Sanger list of cancer genes. Edge thickness corresponds to relative edge confidence in the network, underlined gene names indicate the gene is mutated in this subtype. 17

Supplementary Figure 17 Supplementary Figure 17. From mutation-derived subtypes to expression signatures. (a) A Kaplan-Meier analysis of the proportion of patients who acquire platinum resistance in the Tothill et al. expression cohort for subtypes defined in the TCGA dataset using somatic mutations and NBS. (b) Kaplan-Meier survival plots for the Bonome et al. ovarian cancer patients (c) Kaplan-Meier survival plots for a metastudy of ovarian cancer patients by Győrffy et al.. These subtypes were recovered using a shrunken centroid model trained on the TCGA expression data with somatic mutation NBS subtypes as labels. 18

Supplementary Figure 18 Supplementary Figure 18. Standard consensus clustering NMF used to recover subtypes in the Tothill et al. expression cohort of ovarian tumors. (a) Standard consensus clustering NMF was performed for 1000 rounds with random restarts on the top 4000 most variable genes in the cohort. Average linkage hierarchical clustering was performed on the co-occurrence matrix to recover the following subtypes. Kaplan-Meier plots are shown for three (b), four (c), and five subtypes (d). 19

Supplementary Figure 19 Supplementary Figure 19 Effects of progressively permuting proportions of the lung cancer dataset. Permuting a progressively larger number of mutation uniformly from the entire lung cohort. We report the median likelihood difference of a full model to a base model including just clinical covariates (age, grade, stage, mutation rate, residual tumor after surgery, as well as smoking). The colored regions represent the median absolute deviation (MAD). 20

Supplementary Table 1 Supplementary Table 1. Summary of gene interaction networks. The table shows the networks used as part of our analysis. The HumanNet and STRING networks where filtered to include the top 10% of interactions according to the interaction weights. After filtering all edges were treated as unweighted. Nodes Edges Links and description HumanNet v.1 23 16,243 (7,949) STRING v.9 29 16,560 (12,233) 476,399 (47,641) 1,638,830 (164,034) PathwayCommons 30 14,355 507,757 www.functionalnet.org/humannet A database of gene interactions, derived using a naïve bayes approach by combining multiple lines of experimental evidence. Comprised of both protein-protein interactions (PPIs) and genetic interactions www.string-db.org/ A database integrating a variety of evidence types including, experimental expression and literature mining approaches to derive a globally weighted network of gene interactions. Comprised of multiple types of gene interactions, including: PPIs, genetic and cocitation. www.pathwaycommons.org/pc/ An aggregated repository of gene interactions from several sources including BioGrid, HPRD, IntAct and the NCI set of cancer specific pathways. Comprised of mostly physical PPIs. 21