SUPPLEMENTARY INFORMATION

Similar documents
Supplementary Table 3. 3 UTR primer sequences. Primer sequences used to amplify and clone the 3 UTR of each indicated gene are listed.

a) Primary cultures derived from the pancreas of an 11-week-old Pdx1-Cre; K-MADM-p53

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

c Tuj1(-) apoptotic live 1 DIV 2 DIV 1 DIV 2 DIV Tuj1(+) Tuj1/GFP/DAPI Tuj1 DAPI GFP

Cancer Genomics. Nic Waddell. Winter School in Mathematical and Computational Biology. July th

Supplementary Document

Supplementary Figure 1 MicroRNA expression in human synovial fibroblasts from different locations. MicroRNA, which were identified by RNAseq as most

ERCC2mutations as predictors of response to cisplatinin bladder cancer

ARTICLE RESEARCH. Macmillan Publishers Limited. All rights reserved

Supplementary Figure 1 a

Supplementary Appendix

Supplemental Data. Shin et al. Plant Cell. (2012) /tpc YFP N

Supplementary Figure 1. ROS induces rapid Sod1 nuclear localization in a dosagedependent manner. WT yeast cells (SZy1051) were treated with 4NQO at

Supplementary Materials

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first

Figure S1. Analysis of genomic and cdna sequences of the targeted regions in WT-KI and

Abbreviations: P- paraffin-embedded section; C, cryosection; Bio-SA, biotin-streptavidin-conjugated fluorescein amplification.

BIOLOGY 621 Identification of the Snorks

Citation for published version (APA): Oosterveer, M. H. (2009). Control of metabolic flux by nutrient sensors Groningen: s.n.

Supplementary Figures

CD31 5'-AGA GAC GGT CTT GTC GCA GT-3' 5 ' -TAC TGG GCT TCG AGA GCA GT-3'

Toluidin-Staining of mast cells Ear tissue was fixed with Carnoy (60% ethanol, 30% chloroform, 10% acetic acid) overnight at 4 C, afterwards

Table S1. Oligonucleotides used for the in-house RT-PCR assays targeting the M, H7 or N9. Assay (s) Target Name Sequence (5 3 ) Comments

Supplementary Table 2. Conserved regulatory elements in the promoters of CD36.

Culture Density (OD600) 0.1. Culture Density (OD600) Culture Density (OD600) Culture Density (OD600) Culture Density (OD600)

A comprehensive survey of the mutagenic impact of common cancer cytotoxics

Beta Thalassemia Case Study Introduction to Bioinformatics

Package ClusteredMutations

Supplementary Figure 1: Classification scheme for non-synonymous and nonsense germline MC1R variants. The common variants with previously established

Supplementary Figure 1

Description of Supplementary Files. File Name: Supplementary Information Description: Supplementary Figures and Supplementary Tables

Nature Immunology: doi: /ni.3836

Supplementary Materials for

Supplementary Figure 1a

SUPPLEMENTARY INFORMATION

Resistance to Tetracycline Antibiotics by Wangrong Yang, Ian F. Moore, Kalinka P. Koteva, Donald W. Hughes, David C. Bareich and Gerard D. Wright.

Journal of Cell Science Supplementary information. Arl8b +/- Arl8b -/- Inset B. electron density. genotype

Supplementary Figure 1

A smart acid nanosystem for ultrasensitive. live cell mrna imaging by the target-triggered intracellular self-assembly

Nucleotide Sequence of the Australian Bluetongue Virus Serotype 1 RNA Segment 10

SUPPLEMENTARY DATA. Supplementary Table 1. Primer sequences for qrt-pcr

Astaxanthin prevents and reverses diet-induced insulin resistance and. steatohepatitis in mice: A comparison with vitamin E

Relationship of the APOA5/A4/C3/A1 gene cluster and APOB gene polymorphisms with dyslipidemia

SUPPORTING INFORMATION

Mutation Screening and Association Studies of the Human UCP 3 Gene in Normoglycemic and NIDDM Morbidly Obese Patients

Finding protein sites where resistance has evolved

Nature Getetics: doi: /ng.3471

Lezione 10. Sommario. Bioinformatica. Lezione 10: Sintesi proteica Synthesis of proteins Central dogma: DNA makes RNA makes proteins Genetic code

A basic helix loop helix transcription factor controls cell growth

Supplemental Information. Cancer-Associated Fibroblasts Neutralize. the Anti-tumor Effect of CSF1 Receptor Blockade

What do you think of when you here the word genome?

Frequency of mosaicism points towards mutation prone early cleavage cell divisions.

Beta Thalassemia Sami Khuri Department of Computer Science San José State University Spring 2015

Cross-talk between mineralocorticoid and angiotensin II signaling for cardiac

Mapping by recurrence and modelling the mutation rate

Supplemental Information. Th17 Lymphocytes Induce Neuronal. Cell Death in a Human ipsc-based. Model of Parkinson's Disease

Supplementary Information. Bamboo shoot fiber prevents obesity in mice by. modulating the gut microbiota

Supporting Information

Single-Molecule Analysis of Gene Expression Using Two-Color RNA- Labeling in Live Yeast

SUPPLEMENTARY INFORMATION

Whole Exome Sequenced Characteristics

Nature Methods: doi: /nmeth.3115

SUPPLEMENTARY RESULTS

Baseline clinical characteristics for the 81 CMML patients Routine diagnostic testing and statistical analyses... 3

SUPPLEMENTARY INFORMATION

Supplementary Materials and Methods


SUPPLEMENTAL FIGURE 1

Reporting TP53 gene analysis results in CLL

SUPPLEMENTAL METHODS Cell culture RNA extraction and analysis Immunohistochemical analysis and laser capture microdissection (LCM)

NSAID use and somatic exomic mutations in Barrett s esophagus

Nucleotide diversity of the TNF gene region in an African village

Integration Solutions

Exons from 137 genes were targeted for hybrid selection and massively parallel sequencing.

Identification of heritable genetic risk factors for bladder cancer through genome-wide association studies (GWAS)

McAlpine PERK-GSK3 regulates foam cell formation. Supplemental Material. Supplementary Table I. Sequences of real time PCR primers.

Supplementary Figure 1

Deciphering the Clinical Significance of BRCA Variants. A Senior Honors Thesis

Supplemental Figure legends

SUPPLEMENTARY FIGURES: Supplementary Figure 1

Isolate Sexual Idiomorph Species

Supplementary Figure 1. Estimation of tumour content

Detection of 549 new HLA alleles in potential stem cell donors from the United States, Poland and Germany

Supplementary Figure 1

Supporting Information. Mutational analysis of a phenazine biosynthetic gene cluster in

Cancer Genetics 204 (2011) 45e52

TetR repressor-based bioreporters for the detection of doxycycline using Escherichia

SUPPLEMENTARY INFORMATION

PATIENTS AND METHODS. Subjects

Mechanistic and functional insights into fatty acid activation in Mycobacterium tuberculosis SUPPLEMENTARY INFORMATION

Nature Genetics: doi: /ng Supplementary Figure 1. Rates of different mutation types in CRC.

Supplementary Information

Supplementary information

Phylogenetic analysis of human and chicken importins. Only five of six importins were studied because

Journal: Nature Methods

SUPPLEMENTARY INFORMATION

Supplementary Figure 1: GPCR profiling and G q signaling in murine brown adipocytes (BA). a, Number of GPCRs with 2-fold lower expression in mature

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

CIRCRESAHA/2004/098145/R1 - ONLINE 1. Validation by Semi-quantitative Real-Time Reverse Transcription PCR

Nature Genetics: doi: /ng Supplementary Figure 1. Country distribution of GME samples and designation of geographical subregions.

Transcription:

Somatic ERCC2 Mutations Are Associated with a Distinct Genomic Signature in Urothelial Tumors Jaegil Kim, Kent W Mouw, Paz Polak, Lior Z Braunstein, Atanas Kamburov, Grace Tiao, David J Kwiatkowski, Jonathan E Rosenberg, Eliezer M Van Allen, Alan D Andrea, Gad Getz SUPPLEMENTARY INFORMATION Contents Supplementary Figures 1-18 Supplementary Tables 1-5 (see separate files) Supplementary References

Supplementary Figure 1a Signature 5* C>T CpG AA APOBEC2 APOBEC1

Supplementary Figure 1b Signature 5* C>T CpG AA APOBEC2 APOBEC1

Supplementary Figure 1 Unsupervised hierarchical clustering of signatures identified in three urothelial tumor cohorts (TCGA-130, DFCI/MSK-50, and BGI-99) and the 30 signatures described by COSMIC.[1, 2] COMB-279 is the combined cohort including all tumors from the TCGA-130, DFCI/MSK-50, and BGI-99 cohorts. COMB-MI-242 are all muscle-invasive tumors from the three cohorts. AA: Aristolochic acid. (a) Cosine similarity. (b) Pearson correlation.

Supplementary Figure 2

Supplementary Figure 2 Summary of mutation enrichment analyses for signature 5* across cohorts. All genes mutated in >5% of tumors in the cohort were included in the analysis. A permutation-based method was applied to account for the overall number of non-silent mutations per sample and per gene (Methods).[3] Q-Q plots show observed versus expected p- values for each of the analyses. Genes with Benjamini-Hochberg False Discovery Rate Q<0.1 are shown in red and labeled. ERCC2 was the only gene that was significant in each of the cohorts. COMB-279: all 279 tumors across the 3 cohorts. COMB-MI-242: all 242 muscle-invasive tumors across the 3 cohorts.

Supplementary Figure 3 a. b.

Supplementary Figure 3 Mutational signature analysis of the DFCI/MSK-50 cohort. (a) The spectrum of base changes identified in the DFCI/MSK-50 cohort displayed as the mutated pyrimidine and the adjacent 3 and 5 bases. ssnv: somatic single nucleotide variations. (b) A Bayesian non-negative matrix factorization algorithm was applied to identify signatures from the overall mutation spectrum. Four distinct mutational processes were identified that closely resemble the signatures identified in the TCGA-130 cohort in Figure 1b.

Supplementary Figure 4 DFCI/MSK-50 BGI-99 COMB-279

Supplementary Figure 4 Comparison of signature 5* activity in tumors with mutant versus WT ERCC2 in the DFCI/MSK-50, BGI-99, and combined (COMB-279 = TCGA-130 + DFCI/MSK-50 + BGI-99) cohorts. The median estimated number of mutations is shown in parentheses. Onesided p-values were calculated using a permutation-based method that maintains the overall number of non-silent mutations per sample and per gene (Methods).[3] Note that the BGI-99 signature includes a contribution from the C>T CpG signature.

Supplementary Figure 5 a. b.

Supplementary Figure 5 Mutational signature analysis of the BGI-99 cohort. (a) The spectrum of base changes identified in the BGI-99 cohort displayed as the mutated pyrimidine and the adjacent 3 and 5 bases. (b) A Bayesian non-negative matrix factorization algorithm was applied to identify signatures from the overall mutation spectrum. Four distinct mutational signatures were identified: two signatures resembling those attributed to APOBEC activity (also seen in the TCGA-130 and DFCI/MSK-50 cohorts), a signature attributed to aristolochic acid (AA) exposure, and a signature representing the superposition of the C>T CpG signature and signature 5*.

Supplementary Figure 6a Combined Cohort (COMB 279) 0 4 8 12 # of Signature 5* mutations TCGA: Clonal ERCC2 TCGA: Subclonal ERCC2 DFCI/MSKCC & BGI: ERCC2 Smoking Status Cohorts Clusters TCGA DFCI BGI MI Smoker Never Smoker Not Available C>T C>A C>G T>A T>C T>G TTT GTT CTT ATT TTG GTG CTG ATG TTC GTC CTC ATC TTA GTA CTA ATA TTT GTT CTT ATT TTG GTG CTG ATG TTC GTC CTC ATC TTA GTA CTA ATA TTT GTT CTT ATT TTG GTG CTG ATG TTC GTC CTC ATC TTA GTA CTA ATA ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT

Supplementary Figure 6b Muscle Invasive Combined Cohort (COMB MI 242) 0 4 8 12 # of Signature 5* mutations TCGA: Clonal ERCC2 TCGA: Subclonal ERCC2 DFCI/MSKCC & BGI: ERCC2 Smoking Status Cohorts Clusters TCGA DFCI BGI MI Smoker Never Smoker Not Available C>T C>A C>G T>A T>C T>G TTT GTT CTT ATT TTG GTG CTG ATG TTC GTC CTC ATC TTA GTA CTA ATA TTT GTT CTT ATT TTG GTG CTG ATG TTC GTC CTC ATC TTA GTA CTA ATA TTT GTT CTT ATT TTG GTG CTG ATG TTC GTC CTC ATC TTA GTA CTA ATA ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT

Supplementary Figure 6c Combined Cohort (COMB 279) 0 100 300 Numer of Total SNVs TCGA: Clonal ERCC2 TCGA: Subclonal ERCC2 DFCI/MSKCC & BGI: ERCC2 Smoking Status Cohorts Clusters TCGA DFCI BGI MI Smoker Never Smoker Not Available C>T C>A C>G T>A T>C T>G TTT GTT CTT ATT TTG GTG CTG ATG TTC GTC CTC ATC TTA GTA CTA ATA TTT GTT CTT ATT TTG GTG CTG ATG TTC GTC CTC ATC TTA GTA CTA ATA TTT GTT CTT ATT TTG GTG CTG ATG TTC GTC CTC ATC TTA GTA CTA ATA ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT

Supplementary Figure 6d Muscle Invasive Combined Cohort (COMB MI 242) 0 100 300 Numer of Total SNVs TCGA: Clonal ERCC2 TCGA: Subclonal ERCC2 DFCI/MSKCC & BGI: ERCC2 Smoking Status Cohorts Clusters TCGA DFCI BGI MI Smoker Never Smoker Not Available C>T C>A C>G T>A T>C T>G TTT GTT CTT ATT TTG GTG CTG ATG TTC GTC CTC ATC TTA GTA CTA ATA TTT GTT CTT ATT TTG GTG CTG ATG TTC GTC CTC ATC TTA GTA CTA ATA TTT GTT CTT ATT TTG GTG CTG ATG TTC GTC CTC ATC TTA GTA CTA ATA ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT

Supplementary Figure 6 Unsupervised hierarchical clustering analyses. (a) Clustering of signature 5* activity (as 96 trinucleotide mutational contexts) in the combined (COMB-279) cohort. Separate tracks are included for clonal (defined as probability[cancer cell fraction 0.95] >0.5) and subclonal ERCC2 mutations for the TCGA-130 cohort. Tumors segregated into two clusters of 222 (shown in red) and 57 (shown in blue) tumors. Twenty-five of the 35 ERCC2 mutated tumors belonged to the second (blue) cluster (P=1.7x10-12, two-tailed Fisher s exact test). (b) Clustering of signature 5* activity was also performed for all 242 muscle-invasive tumors from the combined cohort (COMB-MI-242). Tumors segregated into two clusters of 162 (red) and 80 (blue) tumors. Twenty-nine of 32 ERCC2 mutated tumors belonged to the second (blue) cluster (P=4.4x10-14 ). (c) Clustering of all non-silent SNVs in the COMB-279 cohort segregated tumors into clusters of 172 (red) and 107 (blue) tumors. Eighteen of 35 ERCC2 mutated tumors belonged to the second (blue) cluster (P=0.1). (d) Clustering of all non-silent SNVs in the COMB-MI-242 cohort segregated tumors into clusters of 25 (red) and 217 (blue) tumors. Twenty-four of 32 ERCC2 mutated tumors belonged to the second (blue) cluster (P=0.008).

Supplementary Figure 7a Conversed helicase moif Cohort: DFCI/MSK-50 BGI-99 TCGA-130 1 760 Smoking status: Never-smoker Former Smoker Current Smoker Smoking status not available 1 760 0-50 No. esimated signature 5* mutaions: 51-100 101 200 >200 1 760

Supplementary Figure 7b No. Estimated Signature 5* Mutations p=0.037 400 300 200 100 0 Helicase Non-Helicase

Supplementary Figure 7 Location and properties of ERCC2 missense mutations. (a) The location of all missense mutations observed across cohorts are plotted along the length of the ERCC2 gene (760 amino acids) by cohort, smoking status, and signature 5* activity. Mutations cluster within, or adjacent to, conserved helicase motifs (shown in green). One splice site mutation was present in a tumor from the BGI-99 cohort and is not shown here; all other mutations were missense mutations. (b) ERCC2 missense mutations mapped to their predicted equivalent location on an archaeabacterial ERCC2 crystal structure (PDB ID: 3CRV) and colorcoded by estimated number of signature 5* mutations (Methods). For amino acids mutated in more than one tumor, the average number of signature 5* mutations is shown. Helicase domains are shaded in pink and green. Mutations located within or adjacent to ( 10 amino acids) the conserved helicase motifs were associated with a significantly higher number of signature 5* mutations compared to mutations located elsewhere in the protein (P=0.037). CLUMPS analysis revealed significant spatial clustering of mutations within the 3D structure (P=0.0026).[4]

Supplementary Figure 8 DFCI/MSK-50: BGI-99: COMB-MI-242: COMB-279:

Supplementary Figure 8 Activities of other mutational signatures in mutant versus WT ERCC2 tumors across cohorts. The median estimated number of mutations are shown in parentheses and one-sided p-values were computed using a permutation-based method that maintains the overall number of non-silent mutations per sample and per gene (Methods).[3] COMB-279: all cases across the three cohorts. COMB-MI-242: all muscle-invasive cases across the three cohorts. The C>T CpG signature is not shown for the BGI-99 cohort because this signature did not separate from signature 5* in the NMF analysis.

Supplementary Figure 9

Supplementary Figure 9 Enrichment analysis for the APOBEC2 signature across genes. Although the p-value for ERCC2 was <0.05 in each of the three cohorts, the Benjamini-Hochberg False Discovery Rate Q-value was <0.1 (shown in red) only in the combined cohorts (COMB-MI- 242 and COMB-279).

Supplementary Figure 10 Signature 5* APOBEC2 C>T_CpG APOBEC1 TCGA DFCI BGI Smoker Non Smoker Muscle Invasive Non muscle Invasive Not Available Observed mutations Estimated mutations 1000 800 600 400 200 0 500 400 300 200 100 0 Cohorts Tobacco Histology SNV INDEL ERCC2 (13%) ERCC6 (3%) DDB2 (3%) CUL4B (2%) CUL4A (1%) DDB1 (1%) XPC (1%) CDK7 (1%) ERCC3 (1%) ERCC5 (1%) ERCC4 (1%) GTF2H1 (1%) RAD23A (1%) RAD23B (1%) LIG1 (1%) MNAT1 (1%) CETN2 (1%) GTF2H3 (0%) ERCC1 (0%) XPA (0%) UVSSA (0%) Germline_NER (63%) 200 100 0 # events Silent In frame indel Other Nonsyn. Missense Splice site Frame shift Nonsense Germline Not Available

Supplementary Figure 10 Association of somatic and germline NER pathway events with signature 5* activity in the combined cohort (TCGA-130 + DFCI/MSK-50 +BGI-99). The figure is arranged similar to Figure 4 in the main text, but provides additional detail regarding events in NER pathway genes (see Methods for full list of NER pathway genes). Somatic mutations in non- ERCC2 NER genes are rare, and there is no significant enrichment of mutations in any individual non-ercc2 NER gene or of the pathway as a whole (when ERCC2 is excluded) among tumors with increased signature 5* activity. Rare germline NER variants (frequency <2% in the TCGA- 130 + DFCI/MSK-50 cohort; see Methods) are displayed in a single track at the bottom of the figure.

Supplementary Figure 11a Cumulative Distributions (CDF) and Enrichment Scores 1.2 0.9 0.6 0.3 0.0 Germline NER Somatic ERCC2 Enrichment of ERCC2 somatic and NER germline mutations in TCGA+DFCI Cohort 0 50 100 150 Samples in descending order of Signature 5* CDF of ERCC2 Mut CDF of ERCC2 WT Enrichment Score

Supplementary Figure 11b Enrichment in ERCC2 WT Samples ERCC4 p.i706t (3) Log(Rank Sum P value) 2.0 1.5 1.0 0.5 0.0 XPA p.l252v (2) ERCC4 p.p379s LIG1 p.v753m (2) (2) ERCC6.PGBD3 p.v851a LIG1 p.k845n (2) (2) RAD23A p.t131a (3) ERCC8 p.t280k (2) MMS19 p.g1029d (2) BIVM.ERCC5 p.r741w MMS19 p.a558v XPC p.k481n (2) ERCC4 p.s662p ERCC4 p.e875g (2) (2) (2) (3) ERCC5 p.n879s (2) ERCC6 p.i1021v (2) BIVM.ERCC5 p.a435t (3) ERCC4 p.r576t (2) LIG1 p.r409h (2) ERCC6 p.d425a (2) 100 0 100 200 Mean Differences in Signature 5* Mutations

Supplementary Figure 11 Enrichment analyses for somatic and germline NER events. (a) Cumulative distributions and enrichment scores for somatic ERCC2 mutations and germline NER pathway events. The dashed vertical line denotes the maximum enrichment score for somatic ERCC2 mutations. Among the 32 WT ERCC2 tumors with highest signature 5* activity (i.e., left of the dashed line), 19 (59%) had a germline NER variant, whereas among the 123 tumors with lower signature 5* activity (i.e., right of the dashed line), only 54 (44%) had a NER germline variant (p=0.086, Fisher s exact test). (b) Association of NER germline variants with signature 5* activity among WT ERCC2 tumors from the TCGA-130 and DFCI-MSK-50 cohorts (germline data not available for BGI-99 cohort). Variant alleles present in >1 case but <2% of the population are shown (number of cases shown in parentheses), and alleles associated with significant enrichment (FDR<0.1) in signature 5* mutations are highlighted in red. Summary of the four significantly enriched alleles (annotations taken from ExAC [5]): Variant Polyphen2 SIFT Population frequency ERCC4-p.I706T probably damaging deleterious 0.0014 ERCC4-p.R576T benign deleterious 0.00054 LIG1-p.R409H possibly damaging deleterious 0.014 BIVM-ERCC5-p.A435T no annotation

Supplementary Figure 12 a. b. c.

Supplementary Figure 12 Effect of current smoking status and smoking intensity on signature 5* activity in the combined TCGA-130 + DFCI/MSK-50 cohort. (a) There was no difference in the estimated number of signature 5* mutations in current versus former smokers. The estimated number of signature 5* mutations is shown in parentheses. P-values were calculated using the Wilcoxon rank-sum test. (b) There was also no difference in estimated number of signature 5* mutations in current versus former smokers when only WT ERCC2 cases were considered. (c) There was a correlation between smoking intensity (measured in pack-years exposure) and signature 5* activity for ERCC2 mutated cases but not WT ERCC2 cases.

Supplementary Figure 13

Supplementary Figure 13 Association of smoking with other mutational signatures in the combined TCGA-130 + DFCI/MSK-50 cohort. There was no association between smoking status and activity of any of the other mutational signatures identified in the cohorts.

Supplementary Figure 14 a. b.

Supplementary Figure 14 Relationship between signature 5* and COSMIC signatures 4 and 5. The contributions of COSMIC signatures 4 and 5 to signature 5* were determined by deconvoluting signature 5* mutations into COSMIC signature 4 and COSMIC signature 5 components (Methods). (a) Nearly all signature 5* activity is attributable to activity of COSMIC signature 5, with only a small contribution from COSMIC signature 4. The two samples with strongest contribution from COSMIC signature 4 were WT ERCC2 cases. (b) The difference in signature 5* activity between smokers and non-smokers is due to differences in activity of COSMIC signature 5 rather than COSMIC signature 4.

Supplementary Figure 15a b.

Supplementary Figure 15b

Supplementary Figure 15c

Supplementary Figure 15 Strand asymmetry analysis. For all muscle-invasive tumors across the three cohorts (COMB-MI-242), the Bayesian NMF analysis was repeated while considering mutations on the transcribed and non-transcribed strands separately (i.e., 192 rather than 96 trinucleotide mutational contexts). (a) Estimated number of mutations on transcribed (shaded colors) and non-transcribed strands (non-shaded colors) for each of the four signatures. (b) Summary of single base changes on the transcribed versus non-transcribed strand. P-values were computed using the pair-wise Wilcoxon rank-sum test. (c) The activity on the transcribed versus non-transcribed strand is displayed for each of the six possible base pair changes. Signature 5* exhibits a transcriptional strand bias that is strongest for T>C and C>A changes.

Supplementary Figure 16

Supplementary Figure 16 Association between patient age and signature 5* activity in tumors from the TCGA-130 and DFCI/MSK-50 cohorts (the two cohorts with available age data). As has been previously described for COSMIC signature 5 in urothelial cancer, there was no association between age at diagnosis and signature 5* activity in the urothelial tumors analyzed here (P=0.65).[6]

Supplementary Figure 17 a. b. c.

Supplementary Figure 17 Clonality of signature 5* mutations in WT versus mutant ERCC2 tumors. For each tumor with an ERCC2 mutation, the number of clonal (defined as probability [cancer cell fraction 0.95]>0.5) and subclonal signature 5* mutations are shown. Gray lines connect counts from the same tumor. (a) Tumors with a clonal ERCC2 mutation have significantly more clonal than subclonal signature 5* mutations. (b) There is no significant difference in the number of clonal versus subclonal signature 5* mutations in tumors with a subclonal ERCC2 mutation. (c) There is also no significant difference in the number of clonal versus subclonal signature 5* mutations in tumors with WT ERCC2. P-values were calculated using the pairwise Mann-Whitney test.

Supplementary Figure 18

Supplementary Figure 18 Association between signature 5* activity and cisplatin response in the DFCI/MSK-50 cohort (the only cohort with cisplatin response data available). Median estimated number of signature 5* mutations are shown in parentheses and p-values were calculated using the Wilcoxon rank-sum test. There were a significantly higher number of signature 5* mutations in cisplatin responders versus non-responders; however, this difference was not significant when only WT ERCC2 tumors were considered.

Supplementary Tables (see separate files) Supplementary Table 1 Summary of Urothelial Cancer Cohorts. Supplementary Table 2 Numerical Representation of Signature 5* Across Cohorts. Supplementary Table 3 Summary of Mutational Signature Contributions, ERCC2 Mutational Status, and Smoking Status for All Cases. Supplementary Table 4 Comparison of Mutational Signatures in Urothelial Tumor Cohorts to COSMIC Mutational Signatures. Supplementary Table 5 Comparison of Signature 5* Among Urothelial Tumor Cohorts.

Supplementary References 1. COSMIC: Catalogue of Somatic Mutations in Cancer. [cited 2015 October 25]; Available from: http://cancer.sanger.ac.uk/cosmic/signatures. 2. Alexandrov, LB, Nik-Zainal, S, Wedge, DC, Aparicio, SA, et al. Signatures of mutational processes in human cancer. Nature 2013;500:415-21. 3. Strona, G, Nappo, D, Boccacci, F, Fattorini, S, San-Miguel-Ayanz, J. A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals. Nat Commun 2014;5:4114. 4. Kamburov, A, Lawrence, MS, Polak, P, Leshchiner, I, et al. Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc Natl Acad Sci U S A 2015;112:E5486-95. 5. Consortium, EA. Analysis of protein-coding genetic variation in 60,706 humans. BioRx 2015; 6. Alexandrov, LB, Jones, PH, Wedge, DC, Sale, JE, et al. Clock-like mutational processes in human somatic cells. Nat Genet 2015;