SUPPLEMENTARY INFORMATION

Similar documents
Targeted Agent and Profiling Utilization Registry (TAPUR ) Study. February 2018

The Center for PERSONALIZED DIAGNOSTICS

NeoTYPE Cancer Profiles

Jocelyn Chapman, MD Division of Gynecologic Oncology. Julie S. Mak, MS, MSc Genetic Counselor Hereditary Cancer Clinic

NeoTYPE Cancer Profiles

Nature Getetics: doi: /ng.3471

Plasma-Seq conducted with blood from male individuals without cancer.

Out-Patient Billing CPT Codes

Why Pathway/Network Analysis?

Nature Genetics: doi: /ng Supplementary Figure 1. Depths and coverages in whole-exome and targeted deep sequencing data.

Clinically Useful Next Generation Sequencing and Molecular Testing in Gliomas MacLean P. Nasrallah, MD PhD

Supplementary Figure 1. Copy Number Alterations TP53 Mutation Type. C-class TP53 WT. TP53 mut. Nature Genetics: doi: /ng.

Discovery and saturation analysis of cancer genes across 21 tumor types

SUPPLEMENTARY INFORMATION

Clinical Grade Genomic Profiling: The Time Has Come

Patricia Aoun MD, MPH Professor and Vice-Chair for Clinical Affairs Medical Director, Clinical Laboratories Department of Pathology City of Hope

Illumina Trusight Myeloid Panel validation A R FHAN R A FIQ

Precision Oncology: Experience at UW

Use case 3: Analysis of somatic mutations in a tumor of an individual patient

Assessing the Economics of Genomic Medicine. Kenneth Offit, MD MPH Chief, Clinical Genetics Service Memorial Sloan-Kettering Cancer Center

What is the status of the technologies of "precision medicine?

Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes

ARTICLE. Mutational landscape and significance across12majorcancertypes. OPEN doi: /nature12634

August 17, Dear Valued Client:

Paths to Precision Medicine

Next Generation Sequencing in Haematological Malignancy: A European Perspective. Wolfgang Kern, Munich Leukemia Laboratory

SureSelect Cancer All-In-One Custom and Catalog NGS Assays

Jennifer Hauenstein Oncology Cytogenetics Emory University Hospital Atlanta, GA

Sample Metrics. Allele Frequency (%) Read Depth Ploidy. Gene CDS Effect Protein Effect. LN Metastasis Tumor Purity Computational Pathology 80% 60%

Provide your cancer patients personalized treatment options with ClariFind

APPLICATIONS OF NEXT GENERATION SEQUENCING IN SOLID TUMORS - PATHOLOGIST PROSPECTIVE

Fluxion Biosciences and Swift Biosciences Somatic variant detection from liquid biopsy samples using targeted NGS

Cancer develops as a result of the accumulation of somatic

Table S1. Demographics of patients and tumor characteristics.

Changing the Culture of Cancer Care II. Eric Holland Fred Hutchinson Cancer Research Center University of Washington Seattle

ACTIVITY 2: EXAMINING CANCER PATIENT DATA

Illumina s Cancer Research Portfolio and Dedicated Workflows

Molecular. Oncology & Pathology. Diagnostic, Prognostic, Therapeutic, and Predisposition Tests in Precision Medicine. Liquid Biopsy.

Genomic Medicine: What every pathologist needs to know

Click to edit Master /tle style

Identification and clinical detection of genetic alterations of pre-neoplastic lesions Time for the PML ome? David Sidransky MD Johns Hopkins

SUPPLEMENTARY INFORMATION

The International Association for the Study of Lung Cancer (IASLC) Lung Cancer Staging Project, Data Elements

# 1 India s Best Selling Cancer Genomics Test Precision treatment for every Cancer Patient Patient Guide

The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley

Oncology Update: The 10 Most Talked About Breast Cancer Topics of The Top 10 10/24/2013. The Awareness Debate

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser

Accel-Amplicon Panels

Ahrim Youn 1,2, Kyung In Kim 2, Raul Rabadan 3,4, Benjamin Tycko 5, Yufeng Shen 3,4,6 and Shuang Wang 1*

Next Generation Sequencing in Clinical Practice: Impact on Therapeutic Decision Making

Comprehensive Analyses of Circulating Cell- Free Tumor DNA

Secuenciación masiva: papel en la toma de decisiones

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

The preclinical efficacy of a novel telomerase inhibitor, imetelstat, in AML: A randomized trial in patient-derived xenografts

Clinical Grade Biomarkers in the Genomic Era Observations & Challenges

IMPLEMENTING NEXT GENERATION SEQUENCING IN A PATHOLOGY LABORATORY

Diagnostica Molecolare!

A n a ly s i s. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity

Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns and Shape the Cancer Genome

IntelliGENSM. Integrated Oncology is making next generation sequencing faster and more accessible to the oncology community.

Identifying cancer type speci c oncogenes and tumor suppressors using limited size data

Expanded View Figures

SUPPLEMENTARY INFORMATION

Examining Genetics and Genomics of Acute Myeloid Leukemia in 2017

Supplementary Information

Exons from 137 genes were targeted for hybrid selection and massively parallel sequencing.

Hematology Fusion/Expression Profile

Clinical implication of MEN1 mutation in surgically resected thymic carcinoid patients

Next generation histopathological diagnosis for precision medicine in solid cancers

Supplementary Figure 1. Cytoscape bioinformatics toolset was used to create the network of protein-protein interactions between the product of each

Blastic Plasmacytoid Dendritic Cell Neoplasm with DNMT3A and TET2 mutations (SH )

Enabling Personalized

Clustered mutations of oncogenes and tumor suppressors.

Predictive biomarker profiling of > 1,900 sarcomas: Identification of potential novel treatment modalities

Dr David Guttery Senior PDRA Dept. of Cancer Studies and CRUK Leicester Centre University of Leicester

Enhancing Assessment of Myeloid Leukemia in the Era of Precision Medicine

SUPPLEMENTARY INFORMATION

SALSA MLPA Probemix P014-B1 Chromosome 8 Lot B and B

I have no conflicts of interest

Enhancing Assessment of Myeloid Leukemia in the Era of Precision Medicine

devseek (Sequence(Analysis(Panel(for(Neurodevelopmental(Disorders

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

Supplemental Figure legends

Personalised cancer care Information for Medical Specialists. A new way to unlock treatment options for your patients

Supplemental Information. Integrated Genomic Analysis of the Ubiquitin. Pathway across Cancer Types

patient guide CancerNext-Expanded genetic testing for hereditary cancer Because knowing your risk can mean early detection and prevention

Hepatocellular Carcinoma (HCC): What Treatment Modality for Which Tumor?

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

COSMIC - Catalogue of Somatic Mutations in Cancer

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies

Targeted Molecular Diagnostics for Targeted Therapies in Hematological Disorders

Supplementary Tables. Supplementary Figures

QIAGEN (Suzhou) Translational Medicine Co., Ltd. Bringing Biomarkers and CDx to Precision Medicine

Insights from Sequencing the Melanoma Exome

TEST MENU TEST CPT CODES TAT. Chromosome Analysis Bone Marrow x 2, 88264, x 3, Days

SALSA MLPA probemix P175-A3 Tumour Gain Lot A3-0714: As compared to the previous version A2 (lot A2-0411), nine probes have a small change in length.

Supplemental Information. Therapy-Related Clonal Hematopoiesis in Patients. with Non-hematologic Cancers Is Common

Transcription:

SUPPLEMENTARY INFORMATION doi:1.138/nature12912 Figure S1. Distribution of mutation rates and spectra across 4,742 tumor-normal (TN) pairs from 21 tumor types, as in Figure 1 from Lawrence et al. Nature (213). Each dot corresponds to a tumor-normal pair, with vertical position indicating the total frequency of somatic mutations in the exome. Tumor types are ordered by their median somatic mutation frequency, with the lowest frequencies (left) found in hematological and pediatric tumors, and the highest (right) in tumors induced by carcinogens such as tobacco smoke and UV light. Mutation frequencies vary more than 1-fold between lowest and highest mutation rates across cancer and also within several tumor types. The lower panel shows the relative proportions of the six different possible base-pair substitutions, as indicated in the legend on the left. WWW.NATURE.COM/NATURE 1

RESEARCH SUPPLEMENTARY INFORMATION Figure S2. Radial plot representing mutational processes across 4,742 tumor-normal (TN) pairs from 21 tumor types, as in Figure 2 from Lawrence et al. Nature (213). The angular space is compartmentalized into the fifteen different factors discovered by NMF. The distance from the center represents the total mutation frequency. Different tumor types segregate into different compartments based on their mutation spectra. Notable examples are: lung adenocarcinoma and lung squamous carcinoma (red; ~1 o clock position), melanoma (black; ~4 o clock position), esophageal (light green), colorectal (dark green), and endometrial cancer (purple) (~8-9 o clock position), samples harboring mutations of the HPV or APOBEC signature (bladder, and some lung and head and neck samples, marked in yellow, red, and blue respectively; ~7 o clock position), and AML samples showing a Tp*A->T signature, ~11 o clock position. 2 WWW.NATURE.COM/NATURE

SUPPLEMENTARY INFORMATION RESEARCH a b MutSigCV Burden of protein-altering mutations (compared to background) nonsilent silent MutSigCL Clustering of mutations in hotspots MutSigFN Functional impact of the mutations low med high high evolutionary conservation Figure S3. a. Three statistical tests were used to detect cancer genes: mutation burden test (calculated by MutSigCV); mutation clustering (calculated by MutSigC); and enrichment in functional sites (calculated by MutSigFN). b. Q-Q plot of the p-values obtained after combining the three tests, when applied on the combined set of 4,742 TN pairs. WWW.NATURE.COM/NATURE 3

RESEARCH SUPPLEMENTARY INFORMATION Figure S4. Mutation patterns for six known and novel cancer genes. Similar diagrams for all genes are available at http://www.tumorportal.org. a. EGFR shows distinctive tumor-type-specific concentrations of mutations at different regions of the gene. b. APC shows mutation throughout the gene in many tumor types, but colorectal cancer shows a distinctive concentration of truncating mutations in the N-terminal half of the protein. c. RHEB, which encodes a small GTPase in the RAS superfamily, shows a mutational hotspot in the effector domain. d. SMARCB1, which is significant based on its mutations in rhabdoid tumor and esophageal adenocarcinoma, also has truncating mutations in several other tumor types. e. RHOA, analogously to RHEB, encodes a small GTPase and shows a mutational hotspot in the effector domain. f. EZH1, which encodes a histone methyltransferase, does not achieve significance in any one tumor type, but does so when analyzing the cohort as a whole. 4 WWW.NATURE.COM/NATURE

SUPPLEMENTARY INFORMATION RESEARCH Acute myeloid leukemia n=196 FLT3 DNMT3A NPM1 IDH2 IDH1 TET2 NRAS RUNX1 WT1 U2AF1 KRAS PTPN11 KIT SMC3 STAG2 PHF6 RAD21 CEBPA ASXL1 SFRS2 SMC1A PAPD5 EZH2 PDSS2 MXRA5 KDM6A GATA2 frequency >1% 5 1% 3 5% 2 3% 1 2% <1% Bladder n=99 Breast n=892 KDM6A GATA3 MAP3K1 PTEN AKT1 CTCF CBFB MLL3 MAP2K4 RUNX1 CDH1 RB1 SF3B1 MLL2 CDKN1A ERCC2 STAG2 PIK3R1 RXRA TBC1D12 NFE2L2 NCOR1 KRAS SPEN RB1 MLL C3orf7 ERBB3 ELF3 FBXW7 FGFR3 FOXQ1 ERBB2 CREBBP HRAS SNX25 TSC1 MGA EZR TBL1XR1 CDKN1B HIST1H3B FOXA1 CASP8 MED23 TBX3 CUL4B STAG2 MYB RAB4A ASXL2 EP3 FGFR2 SETDB1 Carcinoid n=54 Chronic lymphocytic leukemia n=159 Colorectal n=233 Diffuse large B cell lymphoma n=58 Endometrial n=248 Esophageal adenocarcinoma n=141 Glioblastoma multiforme n=291 SF3B1 MYD88 APC FBXW7 SMAD4 NRAS PTEN PIK3R1 KRAS FBXW7 CTCF EGFR PIK3R1 IDH1 RB1 PTEN NF1 SMAD2 MYD88 TNFRSF14 ARHGAP35 TCF7L2 CD79B BRAF MLL2 KRAS EZH2 CARD11 CREBBP XPO1 TNF PCBP1 CD7 CDKN1B HIST1H1E RPS15 RPS2 TGFBR2 RB1 BP1 ACVR1B ERBB3 CASP8 ELF3 TRIM23 CDC27 B2M NTN4 AXIN2 SIRT4 GOT1 POU2F2 ITPKB HLA A BRAF GNA13 POU2AF1 NRAS DDX3X SPEN ATM CHD2 MED12 POT1 IDH1 RBM1 BCLAF1 BCOR MSH6 MAP2K4 IDH2 FAM123B HIST1H3B KRAS B2M STAT6 BCL6 ZFHX3 FGFR2 CCND1 PPP2R1A ING1 ZNF471 ERBB3 CTNNB1 BCOR CUX1 SPOP NRAS FAT1 ARID5B CCDC6 ZNF62 AKT1 SLC1A3 EIF2S2 TCP11L2 SGK1 DIAPH1 KLHL8 SOX17 GNPTAB SLC44A3 ADNP POLE TTLL9 SELP CHD4 PPM1D RSBN1L MICALCL SACS ANK3 TBL1XR1 SOS1 DNER MORC4 MYCN TXNDC8 INPPL1 ZRANB3 TAP1 EP3 TPX2 CAP2 ALKBH6 CDKN2A STAG2 RPL5 FLG SLC26A3 BRAF SMARCA4 SMARCB1 IL7R COL5A1 MAP3K1 KEL CD1D CHD8 DDX5 MUC17 QKI AZGP1 SETD2 NUP21L CASP8 SMAD4 KRAS ERBB4 FAM123B next 2 genes Head and neck n=384 CDKN2A CASP8 FAT1 NFE2L2 NOTCH1 MLL2 NSD1 HRAS EPHA2 AJUBA RAC1 ZNF75 RHOA TGFBR2 PTEN HLA A EP3 B2M OTUD7A HLA B CTCF IPO7 RASA1 MAP4K3 RB1 LCP1 MAPK1 KDM6A SMAD4 IRF6 Kidney clear cell n=417 SETD2 BAP1 VHL PBRM1 PTEN KDM5C MTOR GUSB RHEB ATM TCEB1 MPO CCDC12 PTCH1 Lung adenocarcinoma n=45 Lung squamous cell carcinoma n=178 KEAP1 CDKN2A STK11 CDKN2A KRAS U2AF1 SMARCA4 EGFR MLL2 KEAP1 MET NF1 RIT1 BRAF NFE2L2 RBM1 RB1 ERBB2 ATM SLC4A5 NBPF1 STX2 MAP2K1 RB1 HRAS HLA A FBXW7 FAT1 APC MBD1 MLL3 NRAS PTEN EP3 FUBP1 EGFR ARHGAP35 Medulloblastoma n=92 DDX3X CTNNB1 WAS QKI MLL2 SUFU Melanoma n=118 BRAF NRAS CDKN2A BCLAF1 RAC1 PTEN PPP6C FRMD7 CDK4 STK19 ACO1 XIRP2 LCTL OR52N1 ALPK2 WASF3 OR4A16 MYOCD CTNNB1 Multiple myeloma n=27 KRAS NRAS DIS3 BRAF FAM46C INTS12 TRAF3 PRDM1 IRF4 IDH1 FGFR3 PTPN11 Neuroblastoma n=81 ALK PTPN11 FGFR3 Ovarian n=316 BRCA1 RB1 CDK12 NF1 SMARCB1 BRCA2 KRAS Prostate n=138 SPOP ATM MED12 FOXA1 Rhabdoid tumor n=35 SMARCB1 next 2 genes gene p value 1 15 1 14 1 13 1 12 1 11 1 1 1 9 1 8 1 7 1 6 1 5 1 4 1 3.1.1 1 Figure S5. Cancer genes in 21 tumor types. For each tumor type, genes are arrayed on the horizontal line according to p-value (combined value for three tests in MutSig). Yellow region contains genes that achieve FDR q.1. Orange interval contains p- WWW.NATURE.COM/NATURE 5

RESEARCH SUPPLEMENTARY INFORMATION values for the next 2 genes. The color of the gene s name indicates whether the gene is a known cancer gene (blue), a novel gene with clear connection to cancer (red; discussed in text), or an additional novel gene (black). The color of the circle associated with each gene name indicates the frequency (percent of patients carrying non-silent somatic mutations) in the corresponding tumor type. Total number of samples analyzed (n) is indicated beneath the name of each tumor type. 6 WWW.NATURE.COM/NATURE

SUPPLEMENTARY INFORMATION RESEARCH a b Figure S6. a. Heatmap showing the similarity between each pair of tumor types. Similarity scores were defined for each pair of tumor types as the number of genes significantly mutated in both tumor types, divided by the number of genes significantly mutated in either tumor type. Heatmap shows the color scale from blue (similarity=) to red (similarity=1). b. Dendrogram based on pairwise distances between tumor types, equal to 1 minus their similarity score. WWW.NATURE.COM/NATURE 7

RESEARCH SUPPLEMENTARY INFORMATION 3 2 1 BLCA 5 8 6 4 2 5 1 2 1 ESO LUAD 2 4 3 2 1 BRCA 5 2 1 1 2 15 1 5 GBM LUSC 1 8 6 4 2 CLL 1 2 1 HNSC 2 2 1 MEL 5 1 2 1 CRC 1 2 2 1 KIRC 2 4 1 5 MM 1 2 2 1 DLBCL 5 2 1 LAML 1 6 4 2 UCEC 1 2 Figure S7. Down-sampling analysis shows that candidate cancer gene discovery is continuing to accumulate within each tumor type. Each point represents a random subsampling of the patients. X-axis indicates the number of patients in the sample; Y- axis indicates the number of significant genes in the sample. Blue line is a smoothed fit. 8 WWW.NATURE.COM/NATURE

SUPPLEMENTARY INFORMATION RESEARCH 2 15 1 5 5 1 15 2 # tumor types 18 16 14 12 1 8 6 4 2 1 2 3 4 Figure S8. Down-sampling of combined cohort, with FDR correction applied by subtracting the expected number of false positives at each downsampling point. WWW.NATURE.COM/NATURE 9

RESEARCH SUPPLEMENTARY INFORMATION Number of TN pairs needed for power of 9% in 5% and 9% of genes 2, 1, 5 3 2 1 5 3 2 1 5 1% 2% 3% 5% 1%.1.2.3.5 1 2 3 5 1 2 Somatic mutation frequency (/Mb) Rhabdoid tumor Medulloblastoma Acute myeloid leukemia Carcinoid Neuroblastoma Chronic lymphocytic leukemia Prostate Breast Multiple myeloma Ovarian Kidney clear cell Glioblastoma multiforme Endometrial Colorectal Diffuse large B cell lymphoma Head and neck Esophageal adenocarcinoma Bladder Lung adenocarcinoma Lung squamous cell carcinoma Melanoma Figure S9. Number of tumor-normal pairs required to achieve 9% power for 9% of genes (solid) and 9% power for the median gene (dashed). Other details of figure are exactly as described in Figure 5 of the main text. 1 WWW.NATURE.COM/NATURE

SUPPLEMENTARY INFORMATION RESEARCH Supplementary Tables Table S1. List of source datasets analyzed in this work, and references to the corresponding publications. Table S2. The 26 significantly mutated cancer genes found by analysis with the MutSig suite. The first set of columns (A-B) list the gene symbol and its full name. The second set of columns (D-I) indicate the membership of the genes in various sets. Column D contains a 1 if the gene is listed in the Cancer Gene Census (v65) as being recurrently somatically mutated. Column E contains a 1 if the gene was not previously reported to be somatically mutated in cancer at a significant rate, either in the CGC or in a recent publication (Supplementary Table 3). Columns F and G report whether the gene is a member of the Cancer5 and Cancer5-S sets, respectively. Column H contains a "1" for those thirty genes that were identified as significantly mutated only when analyzing the combined cohort of 4742 samples. Column I contains a "1" for the 33 novel genes with clear and consistent connections to cancer hallmarks that were discussed in the text. The remaining sets of columns diverge in the four tabs of the spreadsheet. The first tab ("Individual q-values") reports the q-values from analyzing each tumor type separately and correcting for only the 18,388 genes this corresponds to the Cancer5 set. The second tab ("q-values when testing 4K hyp.") reports the q-values after the more stringent correction for 18,388 genes x 22 analyses this corresponds to the Cancer5-S set. The third tab ("q-value from RHT") reports the q- values obtained when performing the Restricted Hypothesis Testing (RHT) as described in the text. The fourth tab ("Frac. of non-silent mutations") lists the number (and percentage) of patients of each tumor type carrying a non-silent mutation in the gene. The fifth and last tab ( P-values ) lists the raw p-values obtained from each statistical test (MutSigCV, MutSigCL, MutSigFN), as well as each pairwise combination,, and the final overall (all three tests) combined p-value, for each gene. Table S3. List of the 21 tumor types studied, and the significantly mutated genes found by the MutSig suite in each tumor type. Column D lists the genes that were found to be significant when directly analyzing the full list of 18,388 genes. Column E lists the additional genes found to be significant when performing the Restricted Hypothesis Testing (RHT) as described in the text. In parentheses after each gene name are two numbers: the number and percentage of patients carrying a non-silent mutation in the gene. Table S4. List of references reporting the identification of candidate cancer genes. This table lists the genes that are not yet listed in the Cancer Gene Census (CGC) but have been reported in previous publications. WWW.NATURE.COM/NATURE 11

RESEARCH SUPPLEMENTARY INFORMATION Table S5. List of references to biological literature supporting the 33 novel candidate cancer genes with clear and compelling connections to cancer biology. Table S6. Summary of the analysis comparing the performance of each of the three MutSig metrics separately, in pairwise combinations, and all three combined as in the main analysis. The table lists the number of candidate cancer genes in the Cancer 5 list and the Cancer 5-S list, as well as the number of genes identified from the list of CGC genes that contain somatic mutations in tumor types we studied, and the number of genes identified from the list of 33 novel candidate cancer genes with compelling connections to cancer biology. 12 WWW.NATURE.COM/NATURE