Nature Getetics: doi: /ng.3471

Similar documents
Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity

Nature Genetics: doi: /ng Supplementary Figure 1. Workflow of CDR3 sequence assembly from RNA-seq data.

Session 4 Rebecca Poulos

Identification of Tissue Independent Cancer Driver Genes

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first

Supplementary Materials for

Session 4 Rebecca Poulos

Correlation of gene expression and associated mutation profiles of APOBEC3A, chemosensitivity of cancer cell lines to drug treatment

Supplemental Figure legends

The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley

The Cancer Genome Atlas & International Cancer Genome Consortium

TCGA. The Cancer Genome Atlas

Clustered mutations of oncogenes and tumor suppressors.

Distinct cellular functional profiles in pan-cancer expression analysis of cancers with alterations in oncogenes c-myc and n-myc

User s Manual Version 1.0

Supplementary Figure 1. Copy Number Alterations TP53 Mutation Type. C-class TP53 WT. TP53 mut. Nature Genetics: doi: /ng.

Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers

Expanded View Figures

Supplementary Tables. Supplementary Figures

Supplementary Figure 1: High-throughput profiling of survival after exposure to - radiation. (a) Cells were plated in at least 7 wells in a 384-well

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

Expanded View Figures

COMPUTATIONAL OPTIMISATION OF TARGETED DNA SEQUENCING FOR CANCER DETECTION

e-driver: A novel method to identify protein regions driving cancer Eduard Porta-Pardo 1, Adam Godzik 1,* 1

underlying metastasis and recurrence in HNSCC, we analyzed two groups of patients. The

Supplementary Figures

Elevated RNA Editing Activity Is a Major Contributor to Transcriptomic Diversity in Tumors

Supplemental Information. Integrated Genomic Analysis of the Ubiquitin. Pathway across Cancer Types

Nature Genetics: doi: /ng Supplementary Figure 1. Rates of different mutation types in CRC.

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Qué hemos aprendido hasta hoy? What have we learned so far?

Genomic alterations underlie a pan-cancer metabolic shift associated with tumour hypoxia

Introduction. Introduction

Transient β-hairpin Formation in α-synuclein Monomer Revealed by Coarse-grained Molecular Dynamics Simulation

SUPPLEMENTARY INFORMATION

Supplementary Figures

Nature Structural & Molecular Biology: doi: /nsmb.2419

Expanded View Figures

Nature Genetics: doi: /ng Supplementary Figure 1. Somatic coding mutations identified by WES/WGS for 83 ATL cases.

File Name: Supplementary Information Description: Supplementary Figures and Supplementary Tables. File Name: Peer Review File Description:

Supplementary Figure 1

Supplemental Information For: The genetics of splicing in neuroblastoma

fl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b)

Journal: Nature Methods

The Cancer Genome Atlas

New molecular targets in lung cancer therapy

Nature Genetics: doi: /ng Supplementary Figure 1. Clinical timeline for the discovery WES cases.

Supplementary Figure 1: Classification scheme for non-synonymous and nonsense germline MC1R variants. The common variants with previously established

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R

Nature Immunology: doi: /ni Supplementary Figure 1

Next Generation Sequencing in Clinical Practice: Impact on Therapeutic Decision Making

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies

Solving Problems of Clustering and Classification of Cancer Diseases Based on DNA Methylation Data 1,2

Gene-microRNA network module analysis for ovarian cancer

Supplement to SCnorm: robust normalization of single-cell RNA-seq data

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Nature Methods: doi: /nmeth.3115

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Supplementary Figure 1. Schematic diagram of o2n-seq. Double-stranded DNA was sheared, end-repaired, and underwent A-tailing by standard protocols.

Supplementary Online Content

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

Cancer develops as a result of the accumulation of somatic

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

Genomic tests to personalize therapy of metastatic breast cancers. Fabrice ANDRE Gustave Roussy Villejuif, France

Inferring Biological Meaning from Cap Analysis Gene Expression Data

Supplementary Online Content

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded

DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging

Nature Medicine: doi: /nm.3967

Nature Genetics: doi: /ng Supplementary Figure 1. HOX fusions enhance self-renewal capacity.

Pan-cancer analysis of expressed somatic nucleotide variants in long intergenic non-coding RNA

Review on Tumour Doubling Time (DT) To review the studies that measuring the actual tumour doubling time for human cancers.

Supporting Information

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

Mapping by recurrence and modelling the mutation rate

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Supplementary information. Supplementary figure 1. Flow chart of study design

Ahrim Youn 1,2, Kyung In Kim 2, Raul Rabadan 3,4, Benjamin Tycko 5, Yufeng Shen 3,4,6 and Shuang Wang 1*

NGS in tissue and liquid biopsy

Detergent solubilised 5 TMD binds pregnanolone at the Q245 neurosteroid potentiation site.

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

RNA SEQUENCING AND DATA ANALYSIS

SUPPLEMENTARY INFORMATION. Intron retention is a widespread mechanism of tumor suppressor inactivation.

Supplemental Information. Molecular, Pathological, Radiological, and Immune. Profiling of Non-brainstem Pediatric High-Grade

Dr Yvonne Wallis Consultant Clinical Scientist West Midlands Regional Genetics Laboratory

Cancer Genomics. Nic Waddell. Winter School in Mathematical and Computational Biology. July th

Supplementary Figure 1. ALVAC-protein vaccines and macaque immunization. (A) Maximum likelihood

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Ch. 18 Regulation of Gene Expression

SUPPLEMENTARY INFORMATION

ARTICLE RESEARCH. Macmillan Publishers Limited. All rights reserved

NGS Gateway Lab Services

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Table S1: Analysis of Notch gene rearrangements in triple negative breast cancer subtypes

MSI positive MSI negative

Transcription:

Supplementary Figure 1 Summary of exome sequencing data. ( a ) Exome tumor normal sample sizes for bladder cancer (BLCA), breast cancer (BRCA), carcinoid (CARC), chronic lymphocytic leukemia (CLLX), colorectal cancer (COLR), diffuse large B cell lymphoma (DLBC), esophageal adenocarcinoma (ESOP), glioblastoma multiforme (GLBM), head and neck cancer (HNSC), kidney clear cell carcinoma (KIRC), acute myeloid leukemia (LAML), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), medulloblastoma (MEDU), melanoma (MELA), multiple myeloma (MUMY), neuroblastoma (NEUB), ovarian cancer (OVAR), prostate cancer (PRAD), rhabdoid tumor (RHAB) and uterine corpus endometrial carcinoma (UCEC). ( b ) Reference coordinates for mutation impact annotation 29 (SnpEff). CDS, coding sequence. 1

Supplementary Figure 2 Background mutation models capture variance in somatic mutation rates and are well correlated. ( a ) Genome wide transition/transversion mutation probabilities per tumor type. ( b ) Absolute difference in the log probabilities of complementary mutations (C>T and G>A) per gene in melanoma for the Bayesian and 'Exonic' mutation probability models. The percentage of genes where complementary mutation probabilities are within one order of magnitude is indicated. ( c ) The median of Spearman correlations between the average Bayesian and 'Matched' mutation probabilities in distinct tumor types is shown for the sets of tumor types with minimum numbers of samples ( x axis). ( d ) Correlation between observed WGS intronic mutation probability (pan cancer) and those of the Bayesian (blue) or 'Matched' (gray) models. 2

Supplementary Figure 3 Density scores are highly correlated and enriched for known cancer driver genes. ( a ) Right, the pan cancer relationship between gene specific and global binomial probabilities is shown. Left, correlation (Spearman ρ ) is plotted as a function of density score in the low to mid density range. ( b ) Somatically altered SNV driven cancer gene (SCG) fold enrichment (red) and significance of enrichment (blue) of region associated genes as a function of region density score. ( c ) Fraction of SCGs that are region associated (blue) and fraction of region associated genes that are SCGs (red) as a function of region density score. 3

Supplementary Figure 4 Most mutation cluster density scores fit the null distribution and lie on the diagonal in a quantile quantile plot, indicating that simulations accurately capture the significance of mutation densities. Quantile quantile plots of the observed ( y axis) and simulated ( x axis) density scores ( log 10, P Density ). ( a d ) Representative examples from bladder cancer (BLCA) ( a ), breast cancer (BRCA) ( b ), colorectal cancer (COLR) ( c ) and diffuse large B cell lymphoma (DLBC) ( d ) are shown. The solid line represents the threshold for density score ( log 10, P Density ) that guarantees FDR 5% in each cancer type. The dashed line indicates the line corresponding to y = x. ( e ) Violin plots of density scores in an expanded set of 90 additional colorectal cancer simulations. ( f ) The distributions of density scores in the original (10 ; blue) and expanded (90 ; yellow) sets of simulations are highly concordant and yield tightly correlated FDR estimates for the observed density scores (inset, r 2 = 0.99985). Dashed lines indicate thresholds of FDR 5%. ( g ) 99.2% (128/129) of SMRs thresholded by FDR ( 5%) are shared by the FDR 10 and FDR 90 thresholded sets. 4

Supplementary Figure 5 Robust SMRs capture ~95% of high confidence SMRs from ten cancer types. Robust SMRs are 58.8 fold enriched for somatic, SNV driven Cancer Gene Census (CGC) genes ( P = 2.4 10 34 ). ( a ) Overlap (blue) of robust SMRs (cyan) and high confidence SMRs (gray). ( b, c ) Fraction of SMRs per cancer type classified as robust. Analyses in a and b are limited to high confidence SMRs from the ten cancer types (green) with sufficient intronic mutation clusters for intron based FDR estimation, as shown in b. 5

Supplementary Figure 6 Contribution of trinucleotide and APOBEC mutation heterogeneity in SMR identification. ( a ) The fraction ( ƒ ) of mutated sites in endometrial cancer (UCEC) is plotted for each trinucleotide. Trinucleotides are oriented by transcription strand. Trinucleotides associated with APOBEC mutation signatures at high and low rates are labeled orange and pink, respectively. Notably, ƒ TCT > ƒ TCA and ƒ AGA > ƒ AGT. As shown in the inset (i), SMR mutation sites show a generally reduced fraction of APOBEC associated trinucleotides as compared to the global set of somatic mutation sites in endometrial cancer. ( b ) As shown for endometrial cancer (i), the deviation in the observed over the (single nucleotide) expected trinucleotide representation was compared with the fold change in the trinucleotide representation in SMR mutation sites for cancers with 250 SMR mutation sites (positions). These cancer types encompass 79% of all SMRs. On average, trinucleotide mutation heterogeneity not captured by single nucleotide transition/transversion probabilities contributes to only 7.9% of the change in trinucleotide representation in SMRs. ( a, b ) Analyses performed with high and medium confidence SMRs. ( c ) Histogram of the fraction of mutations that are APOBEC associated per SMR. ( d ) Fraction of SMRs in which APOBEC associated mutations are statistically increased ( P < 0.05, Holmes Bonferroni) per cancer. As shown in the inset (i), 4.0% of identified SMRs ( n = 872) are driven by APOBEC associated mutations. Raw (uncorrected) P values would indicate that 12% of SMRs have higher than expected APOBEC mutation signatures. 6

Supplementary Figure 7 Histogram of the fraction of somatic mutations within each coding region SMR that are predicted to alter protein sequence or RNA splicing. 7

Supplementary Figure 8 Histogram of Gini coefficients of dispersion for nonsynonymous mutations per gene. Gini coefficients were calculated on the basis of the number of nonsynonymous mutations contained per residue mutated in each cancer for CGC genes. For each CGC gene ( n = 522), the maximum coefficient across cancers is plotted 31,32. A set of outliers with extreme Gini coefficients is labeled. 81% of CGC genes with unassociated SMRs have Gini coefficients <0.1. 8

Supplementary Figure 9 Molecular dynamics analysis of wild type and mutant PIK3CA in complex with PIK3R1. ( a ) Wild type (WT) PIK3CA in complex with PIK3R1. ( b ) The K111E mutant of PIK3CA in complex with PIK3R1. ( c ) The G118D mutant of PIK3CA in complex with PIK3R1. The interaction enthalpy across the full PIK3CA PIK3R1 binding interface follows a bimodal distribution (as shown in Fig. 3d ). Binding Mode 1 (blue) is preferred by WT PIK3CA and corresponds to binding interactions that are on average 1.8 kcal/mol tighter than those in Binding Mode 2 (orange), which predominates in the K111E mutant of PIK3CA. The difference between the two binding modes becomes apparent in the salt bridge pattern of R79. In Binding Mode 1, R79 is a key component of the binding interface (with E1215 and E1222 of PIK3R1; shown in gray helices). In Binding Mode 2, a salt bridge between R79 and E81 is in direct competition to this binding interaction (orange panel of a ). In WT PIK3CA, this competition is attenuated by the interaction of K111 with E81 (shown in the blue panel of a ) and to a similar degree by the interaction of R108 with E81 (data not shown). In the K111E mutant of PIK3CA, a similar attenuation can only occur through the simultaneous recruitment of R108 (blue panel of b ). Taken together, the data suggest that K111E causes an inversion of the bimodal binding distribution and effectively weakens the interactions between PIK3CA and PIK3R1 as compared to WT PIK3CA. ( c ) Molecular dynamics simulations of the G118D mutant of PIK3CA show a similar weakening of the binding interactions with R79 at their core, albeit through the reshaping of a more extensive network of salt bridges that involves D118. Data are from 20 independent 0.1 μs molecular dynamics simulations. The individual distributions in Figure 3d correspond to distinct conformational states at the binding interface. Their cumulative populations were normalized and are reported as percentages. 9

Supplementary Figure 10 Enrichment of CGC genes among SMR based protein coding drivers and SMR identified binding interfaces. ( a ) Fraction of SMR and OncodriveCLUST identified protein coding genes in the Cancer Gene Census (CGC). OncodriveCLUST results were obtained from Tamborero et al. 11. Driver analysis in endometrial (UCEC), ovarian (OVAR) and lung squamous cell carcinoma (LUSC) were performed with the same exome data sets. Breast cancer (BRCA) results were obtained with distinct sets of exome data sets and are therefore not directly comparable. ( b ) The fraction of SMR identified and previously reported 51 protein and DNA interaction interfaces with recurrent cancer somatic mutations. For direct comparison, we consider only interactions with nucleic acids and proteins. All CGC genes with previously reported 51 somatically altered nucleic acid or protein interfaces are captured by SMRs (inset). 10

Supplementary Figure 11 Molecular structure and spatial mapping of an SMR on histone H2B. An SMR on histone H2B (HIST1H2BK.1; orange) is highlighted within the structure of the human nucleosome core particle ( PDB, 2CV5 ). Histone H2B (blue), histone H2A (teal) and histone H4 (green) components are highlighted. 11

Supplementary Figure 12 NFE2L2 SMRs alter KEAP1 binding interfaces. The structures of SMR NFE2L2.1 (orange, shown here) and NFE2L2.2 ( Fig. 4g ) were mapped to NFE2L2 structures ( PDB, 2FLU and 3WN7 ). A sector of recurrent lung adenoma alterations on KEAP1 (teal) with density score FDR 5% did not meet the 2% mutation frequency cutoff. The structure of NFE2L2.2 mapped to the mouse NFE2L2 KEAP1 co crystal structure ( PDB, 3WN7 ) is shown in Figure 4g. 12