Conditional Selection of Genomic Alterations Dictates Cancer Evolution and Oncogenic Dependencies

Size: px
Start display at page:

Download "Conditional Selection of Genomic Alterations Dictates Cancer Evolution and Oncogenic Dependencies"

Transcription

1 Article Conditional Selection of Genomic Alterations Dictates Cancer Evolution and Oncogenic Dependencies Graphical Abstract Authors Marco Mina, Franck Raynaud, Daniele Tavernari,..., Nikolaus Schultz, Elisa Oricchio, Giovanni Ciriello Correspondence In Brief Using an algorithmic approach that they design, Mina et al. construct a pan-cancer map of oncogenic dependencies and find several co-dependent alterations that modify drug response. These results provide a framework to improve cancer therapy by anticipating drug resistance and proposing alternative strategies. Highlights d SELECT identifies cancer evolutionary dependencies from alteration occurrences d d d Pan-cancer dependencies reflect tissue-independent functional interactions Pan-cancer dependencies influence response to therapy Conditional selection is required for the emergence of evolutionary dependencies Mina et al., 27, Cancer Cell 32, August 4, 27 ª 27 Elsevier Inc.

2 Cancer Cell Article Conditional Selection of Genomic Alterations Dictates Cancer Evolution and Oncogenic Dependencies Marco Mina,,2 Franck Raynaud,,2 Daniele Tavernari,,2 Elena Battistello,,2,3 Stephanie Sungalee, 3 Sadegh Saghafinia,,2,3 Titouan Laessle, Francisco Sanchez-Vega, 4 Nikolaus Schultz, 4,5 Elisa Oricchio, 3 and Giovanni Ciriello,2,6, * Department of Computational Biology, University of Lausanne (UNIL), Lausanne, Vaud, Switzerland 2 Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland 3 Swiss Institute for Experimental Cancer Research (ISREC), Ecole Polytechnique Federale Lausanne (EPFL), 5 Lausanne, Vaud, Switzerland 4 Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA 5 Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA 6 Lead Contact *Correspondence: giovanni.ciriello@unil.ch SUMMARY Cancer evolves through the emergence and selection of molecular alterations. Cancer genome profiling has revealed that specific events are more or less likely to be co-selected, suggesting that the selection of one event depends on the others. However, the nature of these evolutionary dependencies and their impact remain unclear. Here, we designed SELECT, an algorithmic approach to systematically identify evolutionary dependencies from alteration patterns. By analyzing 6,456 genomes from multiple tumor types, we constructed a map of oncogenic dependencies associated with cellular pathways, transcriptional readouts, and therapeutic response. Finally, modeling of cancer evolution shows that alteration dependencies emerge only under conditional selection. These results provide a framework for the design of strategies to predict cancer progression and therapeutic response. INTRODUCTION Cancer evolution has been traditionally described according to the Darwinian principles of emergence and selection of genetic features improving cell fitness (Burrell et al., 23; Greaves and Maley, 22; Nowell, 976). Alteration emergence is associated with endogenous and exogenous mutational processes (Alexandrov et al., 23) that lead to distinct levels of genomic instability (de Bruin et al., 24), whereas selection is specifically linked to the functional role of the affected genes (Vogelstein et al., 23). However, the functional consequences of altering one gene often depend on both the cell type in which the alteration emerges (Wang et al., 2) and the constellation of molecular changes already present (Sansom et al., 26). A representative example is the sequential emergence and selection of genetic alterations described by Fearon and Vogelstein (99) to model colorectal carcinogenesis. Similarly, therapeutic inhibition of specific targets and drug efficacy can be influenced by cellular and molecular contexts (Kopetz et al., 25; Whittaker et al., 23). In this scenario, cancer evolution is progressively conditioned by its fingerprint of genetic abnormalities, leading to the context-dependent selection of pro-tumorigenic alterations. While these evolutionary dependencies cannot be directly monitored during tumor development, they can be inferred from Significance Cancer evolution is driven by selected modifications of genes and proteins. As these modifications determine the phenotype and prognosis of tumors together rather than individually, they need to be studied as a system of interdependent events. We propose an algorithmic approach to identify an extensive map of dependencies among 5 selected events across more than 6, human tumors. These dependencies provide insights into gene functional interactions and therapeutic responses. Considering selected alterations as interdependent components of cancer cell machinery will facilitate precision medicine approaches by anticipating drug resistance and proposing unexpected alternative strategies. The conceptual approach and resultant pan-cancer map of oncogenic dependencies proposed here represent important steps toward this goal. Cancer Cell 32, 55 68, August 4, 27 ª 27 Elsevier Inc. 55

3 the set of selected molecular changes observed at diagnosis. Indeed, functionally redundant and antagonistic events are likely to be selected exclusively of each other, whereas synergistic modifications are frequently co-selected and observed together in the same tumor. Large-scale molecular profiling of human tumors has begun to provide evidence of non-random patterns of occurrence between specific oncogenic mutations. A prototypical example is the almost perfect mutual exclusivity between ERK-activating mutations affecting Ras genes and BRAF observed in multiple cancers (Rajagopalan et al., 22). Mutually exclusive alterations have been explored within several tumor types (Boca et al., 2; Ciriello et al., 22, 23; Kim et al., 26; Park and Lehner, 25; Vandin et al., 22), and particular instances of co-occurrent alterations have been associated with disease progression and clinical outcome (Oricchio et al., 24; Sansom et al., 26; Whittaker et al., 23). This evidence suggests the existence of a broad network of functional dependencies between cancer genomic alterations that has largely unexplored biological and clinical impacts. A major limitation to the systematic discovery of this network has been the lack of sufficiently large and homogeneous datasets. Recently, pan-cancer sample collections generated by The Cancer Genome Atlas (TCGA) (The Cancer Genome Atlas Research Network et al., 23) have provided the statistical power for these types of analyses. However, these collections are characterized by numerous confounding factors due to tissue and molecular heterogeneity and a large number of recurrent alterations. The latter not only increases the number of possible dependencies to test but also complicates the assessment of their significance, as transitive effects associated with correlation measures become strong (Dunn et al., 28; Marks et al., 22). To address these challenges, computational approaches need thus to account for tumor type and subtype heterogeneity, preserve the prevalence of specific mutational processes, and disentangle transitive and direct dependencies among an ample collection of selected events. Along these guidelines, we propose here an information-theoretic approach to systematically analyze genomic data from thousands of tumors. This study aims to provide a map of functional dependencies to understand and anticipate cancer evolution and therapeutic response. RESULTS Selected Events in the TCGA Pan-Cancer Dataset In this study, we analyzed molecular data from 6,456 human samples of 23 types of tumors profiled by TCGA (Figure A and Table S). Samples were annotated based on their tissue of origin and their genetic and clinically relevant subtypes. We distilled the thousands of observed copy-number alterations (CNAs) and somatic mutations into 55 candidate selected functional events (SFEs) (Table S). Moreover, we specifically retained the CNAs affecting gene expression and mutations that were either predicted to truncate gene transcripts (truncating) or were found recurrently in specific residues (hotspots). Tumors in our dataset had six SFEs, on average, in accordance with previous estimations (Vogelstein et al., 23), but highly unstable cases were associated with dozens of these events. A trend for tumors exhibiting either a large number of CNAs or a large number of mutations, but never both, was associated with extreme genomic instability (Ciriello et al., 23)(Figure B). Interestingly, the overall distribution of SFEs per sample fit a log-normal distribution (Figure C), as did the alteration distributions within each tumor type (data not shown). This distribution predicts the emergence of tumors with a large number of candidate selected events, potentially reflecting distinct and exacerbated mutational processes favoring the emergence of specific alteration types (Baca et al., 23; Forment et al., 22; Shinbrot et al., 24). The occurrence frequency of each SFE was also highly variable, highlighting a considerable heterogeneity within and among the tumor types (Figure D). This high variance suggests that while the same events can be observed across multiple tumor types, they are rarely, if ever, present across all types. We used the concept of information entropy (Shannon and Weaver, 964) to assess whether the SFEs were observed prevalently in one or few tumor types (SFEs with low entropy) or frequently in multiple cancers (SFEs with high entropy). Somatic mutations had, on average, lower entropy than CNAs (Figure E, p = ), suggesting that the latter could provide a selective advantage in a broad spectrum of contexts by affecting multiple targets at once. Nonetheless, the entropies of both CNAs and mutations were lower than expected by chance (Figure E, p = ). Overall, based on their entropy values, each SFE was observed, on average, in 2 tumor types (Figure F). In summary, distinct mutational processes are associated with tumor genetic makeup and alteration selection at least partially depends on the tissue of origin. However, tumor lineage alone cannot explain the great heterogeneity of the observed selected events or how these events act in concert to promote and sustain tumorigenesis. The SELECT Algorithm To systematically investigate the synergies and/or antagonisms between altered genes in cancer, we designed an algorithmic approach called SELECT (Selected Events Linked by Evolutionary Conditions across human Tumors, Figure SA). Based on the assumption that co-dependent alterations influence each other s probability of being selected, the SELECT algorithm tests patterns of occurrence between SFEs for significant mutual exclusivity and co-occurrence. Pairs of SFEs are tested without imposing any prior biological knowledge. Herein, significant patterns are referred to as motifs. In brief, for each pair of SFEs, SELECT first evaluates the extent to which knowing the status of one alteration informs on the status of another alteration, by means of a weighted version of mutual information (wmi). The observed wmi values are then compared with those expected in the absence of dependence between alterations. Expected patterns are estimated from random permutations of the original datasets that preserve the frequency of alteration of each SFE and the number of mutations and CNAs across each sample, tumor type, and subtype. Finally, a correction procedure is applied (average sum correction [ASC]; Dunn et al., 28) to discriminate between direct and transitive associations, i.e., if event A correlates with both events B and C because of true and direct evolutionary dependencies, then B and C may also be correlated, even if they are evolutionarily independent. The output is a ranked list of scored motifs representing pairs of alterations that are significantly either co-occurrent or 56 Cancer Cell 32, 55 68, August 4, 27

4 A B 4 SKCM HNSC LUAD LUSC BLCA KICH KIRC KIRP GBM LGG IDH-WT IDH-mut IDH-WT IDH-mut IDH-mut-codel Respiratory AML Skin Hematopoietic 235 Brain Urothelial 422 Hormone-related BRCA ACC PRAD THCA CRC ESCA LIHC STAD CESC OV UCS UCEC 88 Gastrointestinal Gynecological Luminal (723) Basal-like Her MSS CIN MSI GS EBV + Squamous Adeno MSS CIN MSI POLE-mut POLE-mut MSI C # of recurrent mutations Frequency # of recurrent CNA Number of alterations D E F Alteration Frequency TP53 PIK3CA BRAF PTEN KRAS ARIDA APC IDH KMT2D FBXW7 Max Frequency in a tumor type Mean Frequency (+ st.dev) across tumor types Mutations (total: 76) High Amplification (total: 65) Deep Deletion (total: 64) 3q26-29 (PIK3CA,TERC) 8q24 (MYC) q2-23 (MCL) q3 (CCND) 5p5 (TERT) 8q22 9q2 (CCNE) 7p (EGFR) 2q (BCL2L) 8p (WHSCL,FGFR) 9p2 (CDKN2A/B) 8p23 (CSMD) q23 (PTEN) 8p2- (PPP2R2A) 3q4 (RB) 4q34 6q23-2 (WWOX, FANCA) Xp22 9p24 (PTPRD) 2q37 Alteration Frequency [mean %] 2% 5% %.2%.5% Mutations Deep Deletions High Amplification Expected Entropy Tumor-type Specificity (Entropy) # tumor types Figure. Selected Functional Events across 23 Human Tumor Types (A) Our pan-cancer dataset includes 6,456 human samples from 23 tumor types profiled by TCGA. Distinctive molecular subtypes are highlighted. (B) Distribution of selected somatic mutations and copy-number alterations (CNA). (C) The number of alterations per sample is fit by a log-normal distribution with parameters m =.5 and s =.8, mean = 6.6, and SD = 6.2. y Axis is on log scale. (D) The ten most frequent SFEs within each alteration type. Each bar plot corresponds to the average alteration frequency; error bars represent SD. (E) Information entropy of each SFE versus pan-cancer frequency of alteration. Information entropy is defined based on the frequency of the SFE in each tumor type. Entropy values of SFEs in our dataset (colored dots) are compared with the expected entropy values (gray dots). y Axis is on log scale. (F) Average number of tumor types where an alteration occurs for a given entropy value. GBM, glioblastoma multiforme; LGG, low-grade glioma; HNSC, head and neck squamous cell carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; SKCM, skin cutaneous melanoma; AML, acute myeloid leukemia; BLCA, bladder carcinoma; KICH, kidney chromophobe carcinoma; KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; BRCA, breast carcinoma; ACC, adrenocortical carcinoma; PRAD, prostate adenocarcinoma; THCA, thyroid papillary carcinoma; CRC, colon and rectum carcinoma; ESCA, esophageal carcinoma; LIHC, liver hepatocellular carcinoma; STAD, stomach adenocarcinoma; CESC, cervix squamous cell carcinoma; OV, ovarian carcinoma; UCS, uterine carcinosarcoma; UCEC, uterine corpus endometrial carcinoma. See also Table S. Entropy mutually exclusive across human tumors. In our analysis, candidate motifs were retained if the score was greater than the threshold S sig =.3, corresponding to the top 75% of motifs with a false discovery rate (FDR) <. prior to ASC and the top 5% of those with an FDR >.. We evaluated the statistical power of our approach across multiple subsampled datasets ranging from 5 to 6, cases. Overall, the number of motifs reflecting co-occurrence (cooccurrence motifs) appeared to plateau after 4, samples (Figure SB), suggesting that the TCGA pan-cancer dataset used in this study is sufficiently large to detect most of the cooccurrence patterns within the set. Conversely, the number of significant motifs reflecting mutual exclusivity (mutual exclusivity motifs) continued to increase with the number of cases (Figure SC), gradually including low-frequency events. Indeed, when the frequency of alteration was below 5%, our dataset did not provide the statistical power required to detect significant motifs, even for perfectly mutually exclusive events (Figure SD). Cancer Cell 32, 55 68, August 4, 27 57

5 A 55 Selected Events 6456 Tumor samples Altered Wild-type SELECT Weighted Mutual Information a b a a b Expected Information b Direct vs. Transitive Interactions 63 Motifs between 365 selected events 47 Mutual Exclusivity Motifs a b Samples Samples a b 224 Co-occurrence Motifs B Tail Distribution Ratio (within-pairs > x) / (between-pairs > x) Ssig All pairs Mutually Exclusive Co-occurent Within > Between Within < Between - Score C # tumor types # tumor types # Motifs >4 # Motifs > Testable motifs (> 5 altered cases) Confirmed in single tumor type: Co-occurrence (> SD from expected) Mutual exclusivity (> SD from expected, or overlap = ) D Tumor Types Sum of trends Mutual exclusivity Heatmap: Not Testable Mut. Ex. Co-occ. Co-occurrence ACC BLCA BRCA CESC CRC ESCA GBM HNSC KIRC LGG LIHC LUAD LUSC OV PRAD SKCM STAD THCA UCEC UCS 3 CO Sum 5 ME Sum Figure 2. Identification of Pairwise Evolutionary Dependencies: Associations with Pathways and Tumor Lineage (A) Schematic of the SELECT algorithm. (B) Ratio of the tail distributions ( Cumulative Distribution Function) of the scores of motifs between alterations in the same pathway (within motifs) and between alterations in different pathways (between motifs). This ratio is shown for all motifs (black line), and separately for mutual exclusivity (purple line) and cooccurrence (green line) motifs. (C) Number of tumor types (y axis) where motifs are testable (gray bars) or detected (colored). Top: co-occurrence motifs. Bottom: mutual exclusivity motifs. (D) Top: deviation from expected overlap are reported for each motif and each tumor type. Only motifs with > SD from the expected overlap in at least two tumor types are shown. Bottom: sum of deviations of each motif toward either co-occurrence (CO, green bars) or mutual exclusivity (ME, purple bars). See also Figures S and S2; Table S2. Nonetheless, the alteration motifs identified in the full pancancer dataset comprise events within a broad range of frequencies, including numerous relatively rare events, which were detectable only on the full dataset (<%, Figures SE and SF). Notably, these results were significantly robust to perturbations of both alteration occurrences (Figure S2A) and SFE selection (Figure S2B). By controlling for heterogeneity among tumor types and mutational processes, SELECT can exploit the statistical power offered by large-scale datasets to robustly and systematically identify evolutionary dependencies across human cancers. Functional Relationships Determine Tissue- Independent Evolutionary Dependencies Overall, SELECT identified 63 motifs reflecting co-occurrence (224) and mutual exclusivity (47) among 365 SFEs (Figure 2A and Table S2). To test whether these motifs reflected functional relationships between the affected genes, we annotated each event in broad functional categories, such as deregulated pathways in cancer and protein function, and more general processes, such as cell metabolism, tumor-microenvironment interaction, and immune response (Table S2). Motifs between events annotated in the same category (within motifs) were at least twice as likely to have a significant score than those between events with distinct annotations (between motifs), and this difference increased with increasing scores (Figure 2B). High-scoring mutual exclusivity was highly enriched in the within motifs, as up to a 2-fold increase was observed. This result is consistent with previous results and indicates that functionally redundant genes are rarely altered together (Ciriello et al., 22; Zack et al., 23). Nonetheless, significant co-occurring alterations were also found to affect genes in the same pathway more frequently than those in different pathways (Figure 2B). Thus, evolutionary dependencies are consistently stronger between functionally related genes than between unrelated genes, indicating that they emerge because of functional interactions. Importantly, our approach captured these relationships without any prior biological information. Next, we investigated whether alteration motifs are associated with tumor lineage. In brief, we tested the 63 motifs identified in 58 Cancer Cell 32, 55 68, August 4, 27

6 our pan-cancer dataset for their tendency toward mutual exclusivity or co-occurrence within each individual tumor type. Overall, 545 out of the 63 motifs could be tested in at least one tumor type, and 7% of those (382) displayed a significant deviation from the expected overlap in more than one tumor type (Figure 2C and Table S2). Strikingly, with very few exceptions these motifs showed the same type of trend, toward either mutual exclusivity or co-occurrence, in each tissue in which their overlap deviated from the expected value (Figure 2D). These results indicate that a large percentage of the motifs we identified reflect the same type of functional dependency across multiple tissues. Evolutionary Dependencies between Selected Oncogenic Events Selected alterations appeared in a variable number of motifs, from to 8, with approximately % of the SFEs represented in at least seven motifs (Figure 3A). Frequent alterations occurring in multiple tumor types were associated with a greater number of motifs than relatively rare events predominately occurring in one or few tumor types (data not shown). Nonetheless, lineage-specific alterations were often among the top scoring motifs in our dataset. Examples include co-occurrent mutations targeting DNMT3A, NPM, and FLT3 in AML (The Cancer Genome Atlas Research Network, 23a) or co-occurrent VHL and PBRM mutations in clear cell renal carcinoma (The Cancer Genome Atlas Research Network, 23b) (Figure 3A and Table S2). ERK-activating KRAS mutations were associated with the greatest number of motifs (8). The top-scoring motifs reflected known dependencies with alterations in the same pathway, such as mutations in BRAF, NRAS, and NF (The Cancer Genome Atlas Research Network, 25a) (Figure 3B). KRAS motifs with alterations in different pathways included both known and novel dependencies with comparable scores. The top-scoring cooccurrence motif revealed a potential synergy between KRAS and RBM mutations in both lung and colorectal adenocarcinoma. Interestingly, this previously unreported co-occurrence was stronger than the known co-occurrence between KRAS and STK mutations (Figure 3B) (Skoulidis et al., 25). Among mutual exclusivity KRAS motifs, SELECT confirmed a significant anti-correlation with EGFR mutations (Thomas et al., 27) and showed that this exclusivity extends to another growth factor receptor gene, FGFR. KRAS motifs with EGFR and FGFR were part of numerous mutual exclusivity motifs involving MAPK regulators and receptor tyrosine kinases (RTKs) (Figure S3A). While they were frequently observed in several tumor types, alterations of these pathways were almost never observed in the same sample. In light of recent reports of synthetic lethal interactions between EGFR and KRAS or BRAF mutations (Sun et al., 24; Unni et al., 25), our results suggest that this phenomenon could be associated with a broad group of MAPK regulators and RTKs that are altered in large and heterogeneous patient populations. TP53 mutations were associated with the most ubiquitous and most frequent motifs detected by SELECT (Figures 3C and S3B). Among the most significant co-occurrences, we found recurrent amplification of the CCNE and SKP2 oncogenes. Although the activities of both of these proteins has been associated with p53 status (Dulic et al., 994; Lin et al., 2; Loeb et al., 25; Minella et al., 22), which supports a functional dependency (Figure S3C), their high frequency of co-occurrence in human samples does not seem to have been reported. Furthermore, the TP53 motifs were shown to discriminate between CDKN2A copy-number deletions (mutually exclusive) and mutations (co-occurrent), highlighting the different functional effects of CDKN2A alterations in its two isoforms, p6 and ARF (Figure S3D). Indeed, deletion of the CDKN2A locus invariably affects p6 and ARF equally, with loss of the former impairing cell-cycle control and loss of the latter impairing p53 signaling independent of the TP53 status. Conversely, mutations prevalently affect p6 (Figure S3E) and leave p53 signaling intact, providing a greater selective advantage in a p53-deficient context. Finally, among the less frequent and relatively uncharacterized SFEs, we detected multiple high-scoring motifs involving mutations of the gene encoding the E3 ubiquitin ligase RNF43 (Figure 3D). Mutations affecting RNF43 are predominantly loss-offunction mutations and have been reported in gastrointestinal tumors. These mutations activate Wnt signaling by interfering with the RNF43-mediated ubiquitination of Frizzled receptor components (Giannakis et al., 24). RNF43 mutations were found to be significantly mutually exclusive with mutations affecting APC and highly concurrent with those targeting CD58, with the latter scoring among the top 3% of our motifs (score >). CD58 mediates the cytotoxic response of T cells and natural killer cells. Truncating mutations on this gene have been found in non-hodgkin lymphoma, leading to evasion of immune responses (Challa-Malladi et al., 2). Interestingly, in colorectal tumors (CRCs), CD58 expression was recently shown to promote Wnt signaling (Xu et al., 25). Co-occurrent losses of RNF43 and CD58 suggest that while CRC cells may benefit from CD58 expression to sustain Wnt signaling, loss of CD58 could become beneficial to overcome cytotoxic immune responses when coupled with alternative Wnt-activating events, such as loss of RNF43. RNF43 mutations were also significantly concurrent with lossof-function mutations targeting ARIDA (Figures 3D and 3E), one of the most frequently mutated genes in our dataset (see Figure D). ARIDA is a component of the chromatin modifier complex SWI/SNF, and recent evidence has linked this protein to Wnt repression (Mathur et al., 27). Co-occurrence of ARIDA and RNF43 mutations was observed in both colorectal (CRC) and stomach (STAD) adenocarcinomas characterized by microsatellite instability (Figure 3F). In both subtypes, we detected alterations affecting key components of Wnt signaling in approximately two-thirds of the tumors, with RNF43 mutations being prevalent in STAD (Figure 3G). We assessed the impact of this mutational co-occurrence in STAD at the transcriptional level. Patients exhibiting mutations in both RNF43 and ARIDA showed higher expression of the transcriptional signatures associated with b-catenin activity and cell proliferation (Gatza et al., 24) (Figure 3H), even after correcting for distinct tumor subtypes (see STAR Methods). These results support the hypothesis that loss of ARIDA and RNF43 act synergistically to activate Wnt signaling in these tumors. Overall, SELECT identified several cancer dependencies associated with cellular pathways and phenotypic readouts. Cancer Cell 32, 55 68, August 4, 27 59

7 A 365 Selected Events # Motifs per Event > 6 (Top %) Score Sum DMNT3A VHL NPM FLT3 PBRM # Motifs (top %) Selected Event (top%) KRAS 7p.A (EGFR) RB BRAF APC TP53 NF 2p2.A (KRAS) CDH 6q23.D (WWOX,FANCA) CDKN2A 3q4.D (RB) 6p2.A 9p2.D (CDKN2A/B) 3p25.A (RAF) 2q5.A (MDM2) 8q.A (RBBP8) 6p3.A SMAD4 9p24.A (JAK2,PD-L) RNF43 PTEN 7q2.A 5q26.A (IGFR) 3q34.A 6p2.A (RBBP6) POLE q4.a (PAK) q3.a (CCND) 8p.A (FGFR) p36.a 4q2.A (PDGFRA,KIT) 9q3.A q22.a 7p.A KMT2C NFE2L2 8q23.D ERBB2 HLA-A B C D RNF43 Motif Score TP53 Motif Score KRAS Motif Score.5.5 Ssig S sig S sig Mutual Exclusivity Motifs BRAF NRAS NF EGFR RB 8p.A (FGFR) 9 Motifs CDH PIK3CA NPM ARIDA 2q5.A (MDM2) ATM 9p2.D (CDKN2A) APC 5q4-5.D CD58 Co-occurrence Motifs RBM STK AMER 9q2.A (CCNE) 8q23.D CDKN2A 5p3.A (SKP2) 8q24.A (MYC) ARIDA NCOR2 HNFA 3q22.A (KLF5) HLA-B E # cases RNF43 ARIDA F STAD MSI (39/7) CRC MSI (/68) *UCEC POLE (2/8), BRCA Luminal (/723), ESCA (/84) * G STAD MSI CRC MSI Mutations: Patients with Wnt-alterations: 75% % ARIDA X RNF43 APC Truncating CTNNB TCF7L2 % altered samples Missense AMER H β-catenin signaling signature score RNF43 ARIDA p = 2.5E-6 p = 7E-7.7 * * * * Proliferation signature score mutant STAD: MSI EBV+ GS CIN *.5.3. * * * * Figure 3. Pairwise Evolutionary Dependencies between Selected Events (A) Overview of all the SFEs involved in at least one significant motif (left) and those involved in more than six motifs (right). SFEs are sorted first by number of motifs and then by the sum of their motif scores. Lineage-specific SFEs involved in few but high-scoring motifs are labeled. (B D) Motifs and scores for mutations affecting KRAS (B), TP53 (C), and RNF43 (D). Tested motifs are sorted by score and shown separately for mutual exclusivity or co-occurrence. Colored dots indicate motifs above our significance threshold (S sig =.3); gray dots are below the threshold. Significant motifs include both previously reported (black font) and novel associations (blue font). (E) Mutation occurrences for RNF43 and ARIDA across all tumor types in our dataset. (F) Tumor subtype distribution of samples with both ARIDA and RNF43 mutations. (G) Alteration frequency of ARIDA mutations and events affecting key regulators of Wnt signaling in STAD and CRC microsatellite instability (MSI) cases. (H) Comparison of b-catenin signaling and cell proliferation signature scores between STAD samples harboring only ARIDA mutation, only RNF43 mutation, or both events. The thick central line of each box plot represents the median number of significant motifs, the bounding box corresponds to the 25th 75th percentiles, and the whiskers extend up to.5 times the interquartile range. See also Figure S3 and Table S3. SFE are identified as follows: mutations are identified by the gene symbol of the mutated gene, copy number amplifications and deletions by the corresponding cytoband followed by A or D, respectively. Putative targets of copy number alterations, if any, are reported in brackets. 6 Cancer Cell 32, 55 68, August 4, 27

8 A Drug response similarity double vs. single mutants Co-occurrence motifs vs. vs. Drug response similarity double mutant vs. tumor types -log(p-value) -log(p-value) 5 2 significant 4 associations 3 (p =.8) 2 Motifs significant associations (p =.3).5 Motifs.5 Overlap * *p = 6E-4 B Drug response similarity Intra tumor-type TP53-9p3.A (CCNE) KRAS-STK FGFR2-RPL22 EGFR-7p.A (EGFR) POLE-PPMD Different tumor types Same tumor type C D # significant associations Hit (n = 4) Differential Response [-log(p)] FDR = cell lines 25 drugs FDR = motifs tested 39 motifs modify drug response Increased Sensitivity Increased Resistance No significant association (n = 33) 2 (n = 8) 3 (n = 7) 4/5 (5) IL7R - NF (6) DNMT3A-TET2 (6) CCND - RPL22 (7) TP53-9q3.A (CCNE) (9) ARIDA - RNF43 (9) E F G AR-42 IC 5 ( μ M) (HDAC-inhibitor). Navitoclax IC 5 ( μ M) (Bcl-inhibitor) VX-68 IC 5 ( μ M) (AURK-inhibitor) RPL22 CCND * * * * * mutant p = 5E-5 TET2 DNMT3A * * * * * mutant p = 5E-4 RNF43 ARIDA * * * * mutant p = 9E-6 * Large Intestine Endometrioid Ovary Other Large Intestine Endometrioid Prostate Lymphoblastic leukemia Non-small cell lung Other Large Intestine AML Lymphoblastic leukemia Burkitt Lymphoma Other Figure 4. Systematic Analysis of Drug Sensitivity Associated to Co-occurrence Motifs in Human Cancer Cell Lines (A) Left: schematic of the comparison between double- and single-mutant cell lines and between double-mutant cell lines and cell lines derived from the same tumor type in terms of drug response similarity. Center: distribution of p values for the 72 tested motifs in the two comparisons. Right: overlap between the motifs significant in the two comparisons. (B) Distribution of drug response similarity among double-mutant cell lines for the top five significant motifs compared with the background distribution of all the intra-tumor-type concordance scores (gray box plot). Each dot represents a pair of double-mutant cell lines derived from either different tumor types (green dots) or the same tumor type (orange dots). (C) Differential response between double-mutant cell lines and single-mutant cell lines to single drug compounds. y Axis: p values from the ANOVA analysis ( log (p)). Results are shown separately for changes associated to increased sensitivity (upper quadrant) and increased resistance (lower quadrant). (D) Distribution of 72 co-occurrence motifs based on the number of drugs (hits) having significantly different response between double-mutant and single-mutant cell lines. (E) Response to HDAC-inhibitor AR-42 (measured by IC 5 levels) in cell lines harboring either RPL22, CCND, or both RPL22 and CCND mutations. (F) Response to Bcl-inhibitor Navitoclax in cell lines harboring either TET2, DNMT3A, or both TET2 and DNMT3A mutations. (G) Response to Aurora Kinases (AURK) inhibitor VX-68 in cell lines harboring either RNF43, ARIDA, or both RNF43 and ARIDA mutations. (legend continued on next page) Cancer Cell 32, 55 68, August 4, 27 6

9 To validate our results beyond the TCGA pan-cancer dataset, we analyzed genomic data for two additional cohorts. The first cohort included whole-exome sequencing data for 9,292 human samples from 6 tumor types collected by the International Cancer Genome Consortium (ICGC cohort). The second cohort included targeted sequencing and copy-number data for 9,966 patients profiled and treated in multiple cancer hospitals. Molecular data for this second cohort were recently made available by the American Association of Cancer Research (AACR) Project GENIE (GENIE cohort). For the GENIE cohort, we selected tumors that were profiled by either the Dana Farber Cancer Institute (DFCI; 6,74 patients and 56 tumor types) or the Memorial Sloan Kettering Cancer Center (MSKCC; 7,259 patients and 53 tumor types), as these were tested for the greatest number of events. Given the significantly different composition of these cohorts in terms of tumor types and molecular alterations (Figure S3F), we validated our results using both supervised and unsupervised approaches. First, we tested each motif detected in the TCGA pan-cancer dataset for significant mutual exclusivity or co-occurrence in the other datasets when data for the corresponding alterations was available. This analysis showed that 44% (p = ) and 45% (p = ) of the motifs were also significant in the ICGC and GENIE datasets, respectively (Table S3). To assess this concordance in an unsupervised manner, we ran SELECT independently on the ICGC and GENIE cohorts and then tested the resulting motifs for significant overlap with those found in the TCGA dataset. Again, despite the different dataset compositions, we found highly significant overlaps with both the ICGC (3%, p = ) and GENIE (35%, p = ) cohorts (Table S3). Beyond validating the robustness of our results and approach, this analysis foreshadows the opportunity of combining SELECT with matched molecular and clinical data currently generated in the clinic aiming to distill a comprehensive map of oncogenic dependencies and assess its prognostic value. Evolutionary Dependencies Modulate Drug Response The success of molecular profiling as a clinical routine depends on the ability to match specific genetic abnormalities to drug response. However, if two genomic alterations are not independent of each other, tumors with both mutations might respond differently to therapy than tumors with only one of the mutations. Consequently, the impact of evolutionary dependencies could be dual. On one hand, they could help identify patient groups that are likely to have similar responses to treatment. Alternatively, they could expose resistance mechanisms and/or vulnerabilities driven by the co-occurrence of specific alterations. We tested these hypotheses by integrating the available molecular and drug screening profiles for a large collection of cancer cell lines (Iorio et al., 26). Of the 224 co-occurrence motifs identified, 72 with a sufficient number of cell lines with single and double alterations (>3 samples) were tested. First, we asked whether cell lines grouped based on co-occurrence motifs (double mutants) had a more similar response to treatment than cell lines grouped based on either single alterations (single mutants) or tissue origin. To this purpose, we compared binary sensitivity profiles of all cell lines (resistant =, sensitive = ) using the Matthew Correlation Coefficient (MCC) as similarity measure. Querying sensitivity profiles to multiple drug compounds, we found that the double-mutant cell lines showed higher similarity when compared with each other than when compared with the single mutants for 2 out of 72 motifs (p =.8; Figure 4A and Table S4). Furthermore, the double-mutant cell lines showed greater similarity in drug response than expected by tumor lineage alone in 8 motifs (p =.3; Figures 4A and S4A; Table S4). Importantly, these similarities were not driven by double-mutant models of the same lineage (see the top five in Figure 4B), indicating that these motifs provide additional and independent information to identify models with similar sensitivity profiles. Remarkably, these two analyses led to highly concordant results (Figures 4A and S4B), mutually corroborating their potential therapeutic relevance. Overall, we found that the similarities of drug responses determined by co-occurrent motifs often exceeded those expected for single selected alterations and tumor lineage. Next, we investigated whether co-occurrent motifs lead to actionable evidence of emerging resistance and/or sensitivity to specific pharmacological agents. To this purpose, we assessed the differential responses to each compound between single- and double-mutant lines. Differential drug response was tested as previously described (Iorio et al., 26) by accounting for tumor type, genetic instability, growth profile, and medium, and results with FDR <.25 were retained (Table S4). Strikingly, we found significantly different responses to at least one drug for more than half of the motifs analyzed (39 out of 72, 54%) (Figures 4C and 4D). These results highlighted both candidate mechanisms of resistance and disease vulnerabilities. The greatest number of significant associations with drug resistance was linked to the co-occurrence of TP53 mutations and CCNE amplification (Figure 4D), even though the latter never occurred as a single event in cancer cell lines, preventing a direct comparison using this dataset. Motifs associated with resistance to multiple compounds included co-occurrent CCND and RPL22 mutations, which were linked to resistance to five of the seven HDAC inhibitors tested (Figure 4E and Table S4)as well as mutations of DNMT3A and TET2 (Figure 4F andtable S4). Differential drug response was confirmed across multiple tissue types. Co-occurrent mutations can generate vulnerabilities, thus increasing tumor sensitivity to specific drug perturbations (Dey et al., 27). Surprisingly, this was the case for co-occurrent ARIDA and RNF43 mutations. Specifically, cell lines exhibiting alterations in both genes displayed an exquisite sensitivity to the Aurora kinase inhibitor VX-68 (p = ), which was the second overall most significant hit in our analysis (Figure 4G). Consistent with our findings in human samples (Figure 3F), three out of seven cell lines with co-occurrent events were derived from large intestinal tumors. The remaining cases included three models derived from hematopoietic tumors and one sarcoma cell line. Regardless of tumor lineage, the double-mutant lines The thick central line of each box plot in all panels represents the median number of significant motifs, the bounding box corresponds to the 25th 75th percentiles, and the whiskers extend up to.5 times the interquartile range. See also Figure S4 and Table S4. SFE are identified as follows: mutations are identified by the gene symbol of the mutated gene, copy number amplifications and deletions by the corresponding cytoband followed by A or D, respectively. Putative targets of copy number alterations, if any, are reported in brackets. 62 Cancer Cell 32, 55 68, August 4, 27

10 A B RTK EGFR NF NRAS KRAS APC Wnt 7p.A (EGFR) Co-occurrence High-scoring EGFR BRAF KRAS NRAS Mutual Exclusivity High-scoring APC NF TCF7L2 RNF43 AMER CIC B CTNNB (not in module) MAPK/ERK No pattern Activates Inhibits Not tested (same Chr.) BRAF ERK GSK3B CIC TCF7L2 CTNNB/TCF4 Groucho RNF43 AMER C proliferation score CRC p = 2.25E-3 proliferation score STAD p = 2.5E-3 proliferation score SKCM p =.86 WNT + /ERK + WNT + /ERK - or WNT - /ERK WNT + /ERK + WNT + /ERK - or WNT - /ERK + WNT + /ERK + WNT + /ERK - or WNT - /ERK + WNT - /ERK - WNT - /ERK - WNT - /ERK - D Differential Response [-log(p)] Compound FDR =. Increased Sensitivity Increased Resistance E Differential Response [-log(p)] JQ I-BET-762 Bromodomain Inhibitors Compound FDR =. RTK inhibitors (p =.9) Increased Sensitivity Increased Resistance F Co-occurrence High-scoring 6q23.D (WWOX) TP53 9q2.D (CDKN2A/B) CDKN2A 2q4.A (CDK4) q3.a (CCND) 9q2.A (CCNE) RB 3q4.D (RB) 2q5.A (MDM2) (not in module) Mutual Exclusivity High-scoring No pattern Not tested (same Ch r.) G Apoptosis WWOX complex with Activates ARF MDM2 TP53 CDKN2A CDK2 p2 CCNE Inhibits p6 Cell Cycle CDK4 CCND RB H FDR <. BRCA BLCA OV KIPAN LGG STAD UCEC GBM ESCA LUAD Altered in: Proliferation Score (Score Difference vs. wild-type).2.4 Both Pathways Only one pathway Figure 5. Tissue-Independent Modules of Alteration Motifs in Human Cancers (A) A significant module found by SELECT aggregates multiple pairwise interdependencies between SFEs. Pairwise interdependencies are shown by the upper triangular matrix reporting in each cell the type of motif observed between the alterations in the corresponding row and column. High-scoring: score >.3. (B) Alterations in the module affect genes in the MAPK/ERK and Wnt pathways and the receptor tyrosine kinase (RTK) EGFR. (C) Proliferation scores of tumors with co-occurrent alterations in MAPK/ERK- and Wnt-related genes (green), tumors with alterations in genes related to only one of the two pathways (dark gray), and tumors with no alterations in the module genes (light gray). (D) Differential response to multiple drug compounds between cell lines with co-occurrent APC and KRAS mutations and cell lines with only APC or only KRAS mutation. y Axis: p values from the ANOVA analysis ( log (p)). Results are shown separately for changes associated to increased sensitivity (upper quadrant) and increased resistance (lower quadrant). (E) Differential response to multiple drug compounds between colorectal cancer cell lines with co-occurrent APC and KRAS mutations and cell lines with only APC or only KRAS mutation. RTK inhibitors (green) and bromodomain inhibitors (blue) are highlighted. (F) A second significant module found by SELECT aggregates multiple pairwise interdependencies between nine SFEs. (G) Alterations in the module affect genes in the cell-cycle and apoptosis pathways. (H) Mean increase of proliferation scores of tumors with co-occurrent alterations in cell-cycle- and apoptosis-related genes (green bars) and tumors with alterations in genes related to only one of these pathways (dark gray) compared with tumors with no alterations in the module genes (baseline). SDs are reported for each mean value as error bars. KIPAN, union of KICH, KIRC, and KIRP. See also Table S5. SFE are identified as follows: mutations are identified by the gene symbol of the mutated gene, copy number amplifications and deletions by the corresponding cytoband followed by A or D, respectively. Putative targets of copy number alterations, if any, are reported in brackets. Cancer Cell 32, 55 68, August 4, 27 63

11 A Breast Cancer dataset (BRCA) 965 human samples Independently Selected Events (ISE) simulations vs. B C D sparse (CSEs) simulations vs. Conditionally Selected Events Candidate selected alteration Positively influencing selection Negatively influencing selection dense (CSEd) simulations Most frequent drivers ( module) Random drivers (n modules) E F G 2 2 # BRCA-pairs > x # ISE-pairs > x # CSEw-pairs > x # ISE-pairs > x 2 # CSEs-pairs > x # ISE-pairs > x Tail Distribution Ratio 5 2 Tail Distribution Ratio 5 2 Tail Distribution Ratio Score Score Score Figure 6. Emergence of Motifs through Event Conditional Selection in Cancer Evolution (A D) Schematic of motif comparisons between, synthetic tumors generated using a numerical model of cancer evolution assuming independently selected events (ISE, disconnected black dots) (A) and the real breast cancer dataset (n = 965) (B),, synthetic tumors generated with sparse conditionally selected events (scse) (C), and, synthetic tumors generated with dense conditionally selected events (dcse) (D). In CSE models, the selection of one alteration can either favor (green edges) or hinder (purple edges) the selection of another. (E G) Comparisons by tail distribution ratio of the motif scores emerging from the ISE model versusthe breast cancer cohort (E), scse cohort (F), anddcse cohort (G). responded to VX-68 in contrast to single mutants of the same type (Figure 4G). These results support the hypothesis that while concurrent losses of RNF43 and ARIDA increase expression of Wnt and proliferation markers (Figure 3H), these events come at the cost of exposing a therapeutically actionable vulnerability. Overall, our results indicate that the identification of evolutionary dependencies between cancer genomic alterations could help improve both the stratification of patients into therapeutically relevant groups and the prediction of drug response from genetic evidence. Modules of Evolutionary Dependencies Reveal Functional Crosstalk among Pathways Functional interactions within and between pathways indicate the presence of networks of evolutionary dependencies that pairwise motifs alone could fail to capture. To handle this complexity, we further developed SELECT to search for unexpected concentrations, or modules, of high-scoring motifs. In brief, () candidate modules are identified from the network of all-positive motifs (score >) based on a seed-andextend procedure; (2) the modules are then clustered, and a consensus module is determined for each cluster; (3) the consensus module is finally tested for significance against a null model based on score permutation (see STAR Methods). Again, no prior knowledge about biological networks is used to reduce the search space. Rather, the modules are extracted exclusively based on their alteration motifs. In our dataset, we identified a total of 2 modules with FDR <. (Table S5). The highest scoring module included multiple dependencies between the key regulators of ERK and Wnt signaling (Figures 5A and 5B). Indeed, in our dataset, mutually exclusive oncogenic mutations in KRAS and NRAS co-occurred with loss-of-function mutations targeting APC, with both events being required for colorectal carcinogenesis. Similarly, mutually exclusive NF and BRAF mutations were often found together with alterations affecting multiple Wnt inhibitors (TCF7L2, AMER, and RNF43). Crosstalk between ERK and Wnt signaling has been previously reported to affect tumor formation (Sansom et al., 26) and response to therapy (Spranger et al., 25). In our pan-cancer dataset, we found that samples with co-occurrent ERK and Wnt activation had significantly upregulated proliferation-associated genes (Gatza et al., 24) in stomach, colorectal, and skin melanoma, where mutations in these pathways are prevalent (Figure 5C). Interestingly, the associations between some of these events did not score as significantly in the pairwise analysis, but they 64 Cancer Cell 32, 55 68, August 4, 27

12 Cancer Cell Tissue of origin MUTATION EMERGENCE Functional Interactions MUTATION SELECTION Evolutionary Dependencies Interdependent Occurrence Figure 7. Cancer Evolution through Event Conditional Selection Cancer evolution proceeds through emergence and selection of functional alterations. Our results indicate that the selection process is dependent on one side on tissue of origin and on the other on functional interactions between proteins, which in turn establish evolutionary dependencies giving rise to alteration motifs across tumor cohorts. were nicely highlighted via the module discovery. A striking example is the co-occurrence of APC and KRAS mutations, which is prototypical of microsatellite stable colorectal tumors (CRC- MSS). In our dataset, 37% of the CRC-MSS cases showed both events, while 53% presented only one of the two events. Cooccurrent mutations were also sporadically observed in stomach, lung, and uterine carcinomas. Within the cancer cell line dataset, double-mutant lines were associated with significantly increased resistance to multiple drug compounds compared with the single-mutant cases both across all tumor types (Figure 5D) and specifically in CRC (Figure 5E). In particular, double mutants in the CRC models were found to be resistant to bromodomain inhibitors and drugs targeting multiple RTKs, with the notable exception of EGFR (Figure 5E and Table S5). The latter result was consistent with KRAS mutations alone being sufficient to drive resistance to this class of compounds (Massarelli et al., 27). Another significant module recapitulated multiple dependencies between alterations of cell-cycle regulators, especially the Rb-regulated G /S transition, and p53 signaling (Figures 5F and 5G). Most alterations affecting the Rb pathway are mutually exclusive with each other, suggesting that one is sufficient. However, these invariably require p53 inactivation either by directly mutating TP53, deleting ARF, or overexpressing MDM2, which is proximal to and frequently co-amplified with CDK4. Co-occurrent alterations of cell-cycle and apoptosis regulators were consistently associated with significantly increased mrna expression of proliferation-associated genes in multiple tumor types (Figure 5H). Aside from pairwise associations, these modules could reveal tissue-independent functional redundancies within pathways and dependencies between them. Cancer Evolution under Conditional Selection Thus far, we have shown that alteration motifs are associated with functionally and therapeutically relevant interactions between altered genes and pathways. We assumed that these non-random patterns emerge because functionally related genes are conditionally selected, i.e., the selection of one event influences the probability of selecting others. To test this hypothesis, we used a mathematical model of cancer evolution (Bozic et al., 2) (see STAR Methods). Importantly, this model assumes that events are selected independently of each other, i.e., the emergence and selection of one mutation does not influence the probability of others being selected. First, we generated a synthetic dataset composed of, samples to which we applied the SELECT algorithm (Figure 6A). We then compared the obtained motif scores with those obtained by applying SELECT to the breast cancer cohort (Figure 6B, 965 cases) and to two additional synthetic cohorts generated under the hypothesis of conditionally selected events (CSEs) (Figures 6C and 6D). Specifically, we modified the model of cancer evolution to include a network of conditions in which the selected mutation of one gene positively or negatively influences the probability of selecting mutations of its neighbors in the network. In our analyses, we generated two networks of conditions: a sparse network (Figure 6C) that randomly connects each gene to two others, and a dense network (Figure 6D) in which multiple gene modules coherently influence each other. Results on different cohorts were compared by the ratio of the tail distributions of the motif scores, i.e., the ratios between the number of motifs with score greater than a given value in a cohort versus the number in the other. In the synthetic dataset, we observed a considerably lower number of high-scoring motifs than in the breast cancer cohort, with the latter exhibiting a 2-fold increase of high-scoring motifs (Figure 6E). This result indicates that this model based on independently selected events cannot recapitulate the levels of interdependence observed in a real cancer dataset. Strikingly, we observed the emergence of numerous high-scoring motifs in both CSE models (Figures 6F and 6G). Importantly, this increase was greater for the dense CSE model, which reached levels comparable with those observed for the breast cancer dataset. In contrast to the hypothesis of independently selected events, conditional selection is consistent with the level of interdependency observed in the human dataset, and provides a rational framework to model the emergence of interdependent events. DISCUSSION Our pan-cancer analysis indicates that the selection of molecular alterations during cancer evolution is a tightly constrained and context-dependent process. This genetic progression is influenced by both tissue of origin and functional gene interactions, which establish evolutionary dependencies that give rise to non-random patterns of alteration occurrence across human cancers (Figure 7). To reconstruct this process from only tumor molecular profiles, we designed and developed a systematic approach, SELECT, which can cope with the remarkable heterogeneity both among and within tumor types. To validate our approach, we recapitulated known dependencies and interactions and identified multiple motifs associated with both functional and therapeutic readouts. Our results not only Cancer Cell 32, 55 68, August 4, 27 65

13 demonstrate the functional impact of evolutionary dependencies on human disease but also stress their importance in the design of personalized treatments. In this context, systematically assessing the therapeutic response associated with significantly co-occurrent or mutually exclusive events will be important. These types of studies will be particularly powerful in the future by leveraging molecular data that are currently routinely generated in the clinic. As therapeutic intervention exerts selective pressure on the viability of cancer cells, understanding their genomic dependencies will be crucial to detecting and possibly anticipating resistance. In this scenario, molecular profiles of the tumor at multiple time points during the course of treatment will provide invaluable data to include cancer therapies as variables in our model. Indeed, therapeutic protocols based on dynamic drug administration can modulate cancer evolution (Siravegna et al., 25; Sun et al., 24), and ultimately a well-characterized and constrained evolutionary process could not only be predicted but also be guided toward the selection of desirable and vulnerable phenotypes. STAR+METHODS Detailed methods are provided in the online version of this paper and include the following: d KEY RESOURCES TABLE d CONTACT FOR REAGENT AND RESOURCE SHARING d METHOD DETAILS B Data Collection and Selected Functional Events (SFE) Identification B The SELECT Algorithm (Pairwise Alteration Motifs) B The SELECT Algorithm (Modules of Alteration Motifs) B Mathematical Modelling of Cancer Evolution d QUANTIFICATION AND STATISTICAL ANALYSIS B Log-Normal Model Fit B Hypothesis Testing B CDKN2A Locus Mapping B Pathway Analysis B Tail Distribution Ratio B Gene Expression Signature Enrichment Analysis B SELECT Robustness and Power Analysis B Drug Sensitivity in Cell Lines d DATA AND SOFTWARE AVAILABILITY SUPPLEMENTAL INFORMATION Supplemental Information includes Supplemental text, seven figures, and five tables and can be found with this article online at ccell AUTHOR CONTRIBUTIONS Conceptualization, M.M. and G.C.; Methodology, M.M., F.R., and G.C.; Software, M.M. and F.R.; Formal Analysis, M.M., D.T., E.B., S. Sungalee, S. Saghafinia, and T.L.; Data Curation, M.M., F.S.-V., N.S., and G.C.; Writing, M.M., E.O., and G.C.; Supervision, E.O. and G.C.; Project Administration, G.C. ACKNOWLEDGMENTS We wish to thank Douglas Hanahan, Maria C. Donaldson, Carlo Rivolta, Sven Bergman, Nicolo Riggi, and the anonymous reviewers for insightful feedback on both the content and form of this manuscript. The authors would like to acknowledge the American Association for Cancer Research and its financial and material support in the development of the AACR Project GENIE registry, as well as members of the consortium for their commitment to data sharing. Interpretations are the responsibility of study authors. This work was supported by the Swiss National Science Foundation (SNSF, grant 33_6959). Additional support was provided by the Gabriella Giorgi- Cavaglieri Foundation (G.C.), the ISREC Foundation (E.O.), the Pierre Mercier Foundation (E.O.), the Marie-Josée and Henry R. Kravis Center for Molecular Oncology (N.S.), National Cancer Institute Cancer Center Core Grant (P3- CA8748) (N.S.), the Robertson Foundation (N.S.), the Prostate Cancer Foundation (N.S.), and the Center for Metastasis Research of the Sloan Kettering Institute Foundation (F.S-V.). Received: February 28, 27 Revised: June 5, 27 Accepted: June 26, 27 Published: July 27, 27 REFERENCES Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Aparicio, S.A.J.R., Behjati, S., Biankin, A.V., Bignell, G.R., Bolli, N., Borg, A., Børresen-Dale, A.-L., et al. (23). Signatures of mutational processes in human cancer. Nature 5, Baca, S.C., Prandi, D., Lawrence, M.S., Mosquera, J.M., Romanel, A., Drier, Y., Park, K., Kitabayashi, N., MacDonald, T.Y., Ghandi, M., et al. (23). Punctuated evolution of prostate cancer genomes. Cell 53, Barbie, D.A., Tamayo, P., Boehm, J.S., Kim, S.Y., Moody, S.E., Dunn, I.F., Schinzel, A.C., Sandy, P., Meylan, E., Scholl, C., et al. (29). Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK. Nature 462, 8 2. Beroukhim, R., Mermel, C.H., Porter, D., Wei, G., Raychaudhuri, S., Donovan, J., Barretina, J., Boehm, J.S., Dobson, J., Urashima, M., et al. (2). The landscape of somatic copy-number alteration across human cancers. Nature 463, Boca, S.M., Kinzler, K.W., Velculescu, V.E., Vogelstein, B., and Parmigiani, G. (2). Patient-oriented gene set analysis for cancer mutation data. Genome Biol., R2. Bozic, I., Antal, T., Ohtsuki, H., Carter, H., Kim, D., Chen, S., Karchin, R., Kinzler, K.W., Vogelstein, B., and Nowak, M.A. (2). Accumulation of driver and passenger mutations during tumor progression. Proc. Natl. Acad. Sci. USA 7, Brennan, C.W., Verhaak, R.G.W., McKenna, A., Campos, B., Noushmehr, H., Salama, S.R., Zheng, S., Chakravarty, D., Sanborn, J.Z., Berman, S.H., et al. (23). The somatic genomic landscape of glioblastoma. Cell 55, Burrell, R.A., McGranahan, N., Bartek, J., and Swanton, C. (23). The causes and consequences of genetic heterogeneity in cancer evolution. Nature 5, Cerami, E., Gao, J., Dogrusoz, U., Gross, B.E., Sumer, S.O., Aksoy, B.A., Jacobsen, A., Byrne, C.J., Heuer, M.L., Larsson, E., et al. (22). The cbio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, Challa-Malladi, M., Lieu, Y.K., Califano, O., Holmes, A.B., Bhagat, G., Murty, V.V., Dominguez-Sola, D., Pasqualucci, L., and Dalla-Favera, R. (2). Combined genetic inactivation of b2-microglobulin and CD58 reveals frequent escape from immune recognition in diffuse large B cell lymphoma. Cancer Cell 2, Chang, M.T., Asthana, S., Gao, S.P., Lee, B.H., Chapman, J.S., Kandoth, C., Gao, J., Socci, N.D., Solit, D.B., Olshen, A.B., et al. (26). Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, Ciriello, G., Cerami, E., Sander, C., and Schultz, N. (22). Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 22, Cancer Cell 32, 55 68, August 4, 27

14 Ciriello, G., Miller, M.L., Aksoy, B.A., Senbabaoglu, Y., Schultz, N., and Sander, C. (23). Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 45, Ciriello, G., Gatza, M.L., Beck, A.H., Wilkerson, M.D., Rhie, S.K., Pastore, A., Zhang, H., McLellan, M., Yau, C., Kandoth, C., et al. (25). Comprehensive molecular portraits of invasive lobular breast cancer. Cell 63, Davis, C.F., Ricketts, C.J., Wang, M., Yang, L., Cherniack, A.D., Shen, H., Buhay, C., Kang, H., Kim, S.C., Fahey, C.C., et al. (24). The somatic genomic landscape of chromophobe renal cell carcinoma. Cancer Cell 26, de Bruin, E.C., McGranahan, N., Mitter, R., Salm, M., Wedge, D.C., Yates, L., Jamal-Hanjani, M., Shafi, S., Murugaesu, N., Rowan, A.J., et al. (24). Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346, Dey, P., Baddour, J., Muller, F., Wu, C.C., Wang, H., Liao, W.-T., Lan, Z., Chen, A., Gutschner, T., Kang, Y., et al. (27). Genomic deletion of malic enzyme 2 confers collateral lethality in pancreatic cancer. Nature 542, Dulic, V., Kaufmann, W.K., Wilson, S.J., Tlsty, T.D., Lees, E., Harper, J.W., Elledge, S.J., and Reed, S.I. (994). p53-dependent inhibition of cyclin-dependent kinase activities in human fibroblasts during radiation-induced G arrest. Cell 76, Dunn, S.D., Wahl, L.M., and Gloor, G.B. (28). Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 24, Fearon, E.R., and Vogelstein, B. (99). A genetic model for colorectal tumorigenesis. Cell 6, Forment, J.V., Kaidi, A., and Jackson, S.P. (22). Chromothripsis and cancer: causes and consequences of chromosome shattering. Nat. Rev. Cancer 2, Gatza, M.L., Silva, G.O., Parker, J.S., Fan, C., and Perou, C.M. (24). An integrated genomics approach identifies drivers of proliferation in luminalsubtype human breast cancer. Nat. Genet. 46, Giannakis, M., Hodis, E., Jasmine Mu, X., Yamauchi, M., Rosenbluh, J., Cibulskis, K., Saksena, G., Lawrence, M.S., Qian, Z.R., Nishihara, R., et al. (24). RNF43 is frequently mutated in colorectal and endometrial cancers. Nat. Genet. 46, Greaves, M., and Maley, C.C. (22). Clonal evolution in cancer. Nature 48, Iorio, F., Knijnenburg, T.A., Vis, D.J., Bignell, G.R., Menden, M.P., Schubert, M., Aben, N., Gonçalves, E., Barthorpe, S., Lightfoot, H., et al. (26). A landscape of pharmacogenomic interactions in cancer. Cell 66, Kim, J.W., Botvinnik, O.B., Abudayyeh, O., Birger, C., Rosenbluh, J., Shrestha, Y., Abazeed, M.E., Hammerman, P.S., DiCara, D., Konieczkowski, D.J., et al. (26). Characterizing genomic alterations in cancer by complementary functional associations. Nat. Biotechnol. 34, Knijnenburg, T.A., Klau, G.W., Iorio, F., Garnett, M.J., McDermott, U., Shmulevich, I., and Wessels, L.F.A. (26). Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy. Sci. Rep. 6, Kopetz, S., Desai, J., Chan, E., Hecht, J.R., O Dwyer, P.J., Maru, D., Morris, V., Janku, F., Dasari, A., Chung, W., et al. (25). Phase II pilot study of vemurafenib in patients with metastatic braf-mutated colorectal cancer. J. Clin. Oncol. 33, Lawrence, M.S., Stojanov, P., Mermel, C.H., Robinson, J.T., Garraway, L.A., Golub, T.R., Meyerson, M., Gabriel, S.B., Lander, E.S., and Getz, G. (24). Discovery and saturation analysis of cancer genes across 2 tumour types. Nature 55, Lin, H.-K., Chen, Z., Wang, G., Nardella, C., Lee, S.-W., Chan, C.-H., Yang, W.-L., Wang, J., Egia, A., Nakayama, K.I., et al. (2). Skp2 targeting suppresses tumorigenesis by Arf-p53-independent cellular senescence. Nature 464, Loeb, K.R., Kostner, H., Firpo, E., Norwood, T., D Tsuchiya, K., Clurman, B.E., and Roberts, J.M. (25). A mouse model for cyclin E-dependent genetic instability and tumorigenesis. Cancer Cell 8, Marks, D.S., Hopf, T.A., and Sander, C. (22). Protein structure prediction from sequence variation. Nat. Biotechnol. 3, Massarelli, E., Varella-Garcia, M., Tang, X., Xavier, A.C., Ozburn, N.C., Liu, D.D., Bekele, B.N., Herbst, R.S., and Wistuba, I.I. (27). KRAS mutation is an important predictor of resistance to therapy with epidermal growth factor receptor tyrosine kinase inhibitors in non-small-cell lung cancer. Clin. Cancer Res. 3, Mathur, R., Alver, B.H., San Roman, A.K., Wilson, B.G., Wang, X., Agoston, A.T., Park, P.J., Shivdasani, R.A., and Roberts, C.W.M. (27). ARIDA loss impairs enhancer-mediated gene regulation and drives colon cancer in mice. Nat. Genet. 49, Matthews, B.W. (975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 45, Mermel, C.H., Schumacher, S.E., Hill, B., Meyerson, M.L., Beroukhim, R., and Getz, G. (2). GISTIC2. facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2, R4. Minella, A.C., Swanger, J., Bryant, E., Welcker, M., Hwang, H., and Clurman, B.E. (22). p53 and p2 form an inducible barrier that protects cells against cyclin E-cdk2 deregulation. Curr. Biol. 2, Nowell, P.C. (976). The clonal evolution of tumor cell populations. Science 94, Oricchio, E., Ciriello, G., Jiang, M., Boice, M.H., Schatz, J.H., Heguy, A., Viale, A., de Stanchina, E., Teruya-Feldstein, J., Bouska, A., et al. (24). Frequent disruption of the RB pathway in indolent follicular lymphoma suggests a new combination therapy. J. Exp. Med. 2, Park, S., and Lehner, B. (25). Cancer type-dependent genetic interactions between cancer driver alterations indicate plasticity of epistasis across cell types. Mol. Syst. Biol., 824. Rajagopalan, H., Bardelli, A., Lengauer, C., Kinzler, K.W., Vogelstein, B., and Velculescu, V.E. (22). Tumorigenesis: RAF/RAS oncogenes and mismatchrepair status. Nature 48, 934. Ramos, A.H., Lichtenstein, L., Gupta, M., Lawrence, M.S., Pugh, T.J., Saksena, G., Meyerson, M., and Getz, G. (25). Oncotator: cancer variant annotation tool. Hum. Mutat. 36, E2423 E2429. Sansom,O.J.,Meniel,V.,Wilkins,J.A.,Cole,A.M.,Oien,K.A.,Marsh,V., Jamieson, T.J., Guerra, C., Ashton, G.H., Barbacid, M., et al. (26). Loss of Apc allows phenotypic manifestation of the transforming properties of an endogenous K-ras oncogene in vivo. Proc. Natl. Acad. Sci. USA 3, Shannon, C.E., and Weaver, W. (964). The Mathematical Theory of Communication (The University of Illinois Press). Shinbrot, E., Henninger, E.E., Weinhold, N., Covington, K.R., Göksenin, A.Y., Schultz, N., Chao, H., Doddapaneni, H., Muzny, D.M., Gibbs, R.A., et al. (24). Exonuclease mutations in DNA polymerase epsilon reveal replication strand specific mutation patterns and human origins of replication. Genome Res. 24, Siravegna, G., Mussolin, B., Buscarino, M., Corti, G., Cassingena, A., Crisafulli, G., Ponzetti, A., Cremolini, C., Amatu, A., Lauricella, C., et al. (25). Clonal evolution and resistance to EGFR blockade in the blood of colorectal cancer patients. Nat. Med. 2, Skoulidis, F., Byers, L.A., Diao, L., Papadimitrakopoulou, V.A., Tong, P., Izzo, J., Behrens, C., Kadara, H., Parra, E.R., Canales, J.R., et al. (25). Co-occurring genomic alterations define major subsets of KRAS-mutant lung adenocarcinoma with distinct biology, immune profiles, and therapeutic vulnerabilities. Cancer Discov. 5, Spranger, S., Bao, R., and Gajewski, T.F. (25). Melanoma-intrinsic b-catenin signalling prevents anti-tumour immunity. Nature 523, Sun, C., Wang, L., Huang, S., Heynen, G.J.J.E., Prahallad, A., Robert, C., Haanen, J., Blank, C., Wesseling, J., Willems, S.M., et al. (24). Reversible and adaptive resistance to BRAF(V6E) inhibition in melanoma. Nature 58, The Genomes Project Consortium (25). A global reference for human genetic variation. Nature 526, Cancer Cell 32, 55 68, August 4, 27 67

15 The Cancer Genome Atlas Research Network. (2). Integrated genomic analyses of ovarian carcinoma. Nature 474, The Cancer Genome Atlas Research Network. (22a). Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, The Cancer Genome Atlas Research Network. (22b). Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, The Cancer Genome Atlas Research Network. (23a). Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, The Cancer Genome Atlas Research Network. (23b). Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, The Cancer Genome Atlas Research Network. (23c). Integrated genomic characterization of endometrial carcinoma. Nature 497, The Cancer Genome Atlas Research Network. (24a). Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 57, The Cancer Genome Atlas Research Network. (24b). Comprehensive molecular profiling of lung adenocarcinoma. Nature 5, The Cancer Genome Atlas Research Network. (24c). Comprehensive molecular characterization of gastric adenocarcinoma. Nature 53, The Cancer Genome Atlas Research Network. (24d). Integrated genomic characterization of papillary thyroid carcinoma. Cell 59, The Cancer Genome Atlas Research Network. (25a). Genomic classification of cutaneous melanoma. Cell 6, The Cancer Genome Atlas Research Network. (25b). Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature 57, The Cancer Genome Atlas Research Network. (25c). Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas. N. Engl. J. Med. 372, The Cancer Genome Atlas Research Network. (25d). The molecular taxonomy of primary prostate cancer. Cell 63, 25. The Cancer Genome Atlas Research Network. (26). Comprehensive molecular characterization of papillary renal-cell carcinoma. N. Engl. J. Med. 374, The Cancer Genome Atlas Research Network, Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R.M., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C., and Stuart, J.M. (23). The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45, 3 2. Thomas, R.K., Baker, A.C., DeBiasi, R.M., Winckler, W., LaFramboise, T., Lin, W.M., Wang, M., Feng, W., Zander, T., MacConaill, L.E., et al. (27). Highthroughput oncogene mutation profiling in human cancer. Nat. Genet. 39, Tusher, V.G., Tibshirani, R., and Chu, G. (2). Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, Unni, A.M., Lockwood, W.W., Zejnullahu, K., Lee-Lin, S.-Q., and Varmus, H. (25). Evidence that synthetic lethality underlies the mutual exclusivity of oncogenic KRAS and EGFR mutations in lung adenocarcinoma. Elife 4, e697. Vandin, F., Upfal, E., and Raphael, B.J. (22). De novo discovery of mutated driver pathways in cancer. Genome Res. 22, Vogelstein, B., Papadopoulos, N., Velculescu, V.E., Zhou, S., Diaz, L.A., and Kinzler, K.W. (23). Cancer genome landscapes. Science 339, Wang, K., Li, M., and Hakonarson, H. (2). ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e64. Wang, N.J., Sanborn, Z., Arnett, K.L., Bayston, L.J., Liao, W., Proby, C.M., Leigh, I.M., Collisson, E.A., Gordon, P.B., Jakkula, L., et al. (2). Loss-offunction mutations in Notch receptors in cutaneous and lung squamous cell carcinoma. Proc. Natl. Acad. Sci. USA 8, Whittaker, S.R., Theurillat, J.-P., Van Allen, E., Wagle, N., Hsiao, J., Cowley, G.S., Schadendorf, D., Root, D.E., and Garraway, L.A. (23). A genome-scale RNA interference screen implicates NF loss in resistance to RAF inhibition. Cancer Discov. 3, Xu, S., Wen, Z., Jiang, Q., Zhu, L., Feng, S., Zhao, Y., Wu, J., Dong, Q., Mao, J., and Zhu, Y. (25). CD58, a novel surface marker, promotes self-renewal of tumor-initiating cells in colorectal cancer. Oncogene 34, Zack, T.I., Schumacher, S.E., Carter, S.L., Cherniack, A.D., Saksena, G., Tabak, B., Lawrence, M.S., Zhang, C.-Z., Wala, J., Mermel, C.H., et al. (23). Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, Zheng, S., Cherniack, A.D., Dewal, N., Moffitt, R.A., Danilova, L., Murray, B.A., Lerario, A.M., Else, T., Knijnenburg, T.A., Ciriello, G., et al. (26). Comprehensive pan-genomic characterization of adrenocortical carcinoma. Cancer Cell 29, Cancer Cell 32, 55 68, August 4, 27

16 STAR+METHODS KEY RESOURCES TABLE REAGENT or RESOURCE SOURCE IDENTIFIER Deposited Data TCGA cohort TCGA Consortium FireHose: cbioportal: ICGC cohort v. 23 International Cancer Genome Consortium GENIE cohort AACR Project GENIE Consortium Synapse repository syn722266: GDSC dataset Genomics of Drug Sensitivity in Cancer Gene Ontology and GO Annotation Gene Ontology Consortium Software and Algorithms SELECT This paper fastsemsim N/A ssgsea Barbie et al., 29 bioc/html/gsva.html GISTIC 2. Mermel et al., 2 MutSigCV Lawrence et al., 24 genepattern/modules/docs/mutsigcv CONTACT FOR REAGENT AND RESOURCE SHARING Further information and requests for resources and software should be directed to and will be fulfilled by the Lead Contact, Giovanni Ciriello (giovanni.ciriello@unil.ch). METHOD DETAILS Data Collection and Selected Functional Events (SFE) Identification TCGA Cohort Molecular data for the tumor types analyzed in this study has been collected in July 25 from the FireHose ( broadinstitute.org/) and cbioportal (Cerami et al., 22)( data repositories for The Cancer Genome Atlas (TCGA). Only TCGA datasets publicly available at that time were used in our study (Brennan et al., 23; Ciriello et al., 25; Davis et al., 24; The Cancer Genome Atlas Research Network, 2, 22a, 22b, 23a, 23b, 23c, 24a, 24b, 24c, 24d, 25b, 25c, 25d, 26; Zheng et al., 26) (Table S). Definition of SFE - Copy Number Alterations Copy number segmentation files were aggregated and discrete copy number calls were derived for the whole dataset using GISTIC 2. (Beroukhim et al., 2). Discrete calls are defined as deep deletion (-2), shallow deletion (-), diploid (), low amplification (), and high amplification (2). A gene was called altered in a sample if its status was deep deletion or high amplification. Recurrent copy number changes have been identified using the GISTIC algorithm. GISTIC was ran on the pan-cancer dataset to identify candidate regions of interest (ROIs). Genes were associated to each region based on genomic location and were retained if: ) differentially expressed when amplified or deleted, and 2) altered in at least 8% of the cases having an alteration in at least one gene of the region. If none of the genes in a region met the condition 2, the region was split in at most 2 regions meeting the condition. If both conditions and 2 were never met, the region was discarded. In our analysis none of the analyzed regions were discarded. To define a final and comprehensive list of pan-cancer recurrent copy number changes, we aggregated ROIs obtained by running GISTIC on each single tumor type and on the pan-cancer dataset as a whole. Precisely, we evaluated the common set of altered samples between each ROI found in a single cancer study (query) and ROIs detected in the pan-cancer dataset that were distant at most MB from the query ROI. If the query ROI had less than 5% of altered samples in common with its closest match, then it was included in the final set of ROIs, otherwise it was discarded as redundant. After all single studies had been considered, we re-run iteratively a procedure matching each ROI in our list against all others to minimize redundancies. In total, we identified 65 and 64 non-redundant regions Cancer Cell 32, e e6, August 4, 27 e

17 of recurrent amplification and deletion, respectively (copy number SFE, Table S). A sample was called altered for a given region if at least one gene in the region was altered. Copy number events of the same type and within the same chromosome have not been tested for co-occurrence in the analysis of alteration motifs, as genomic proximity invariably and strongly influences the frequency of concurrent events. Definition of SFE - Single Point Variants We aggregated Mutation Annotation Files (MAFs) from each cancer type to derive a unique mutation file for the 6,456 cases where both mutation and copy number data were available. Gene symbols were uniformed to the latest HGNC nomenclature ( genenames.org/). Variants have been re-annotated using Oncotator (Ramos et al., 25), and Annovar (Wang et al., 2) was used to compute the global minor allele frequency (GMAF) in the normal population. Variants were retained if GMAF was below.5% (source: Phase III Genome data (The Genomes Project Consortium, 25)). This file included,642,432 somatic variants. On this final MAF, we ran the MutSigCV (Lawrence et al., 24) algorithm to identify recurrently mutated genes and aggregated the results obtained by the same method on each single tumor type, as done before for copy number alterations. In the end, we obtained a list of 76 recurrently mutated genes (mutation SFE, Table S). The MAF was further refined to include variants only for these 76 genes and only if they were classified as frame shift deletion and insertion, in-frame deletions and insertions, missense, nonsense, and splice site modifications. In total, we identified 32,52 somatic variants for these 76 genes, 22,485 were missense mutations and,35 putative truncating mutations. Missense mutations were analyzed for presence of recurrently mutated amino acids, i.e. hotspots, using a recently proposed approach (Chang et al., 26). Only 5,834 missense mutations were classified as hotspot based on our analysis or what was reported in the original manuscript (Chang et al., 26), and those were retained in the final MAF. Finally, for a few tumor types we noticed a few hotspot in-frame mutations in specific genes that were neither reported in the corresponding TCGA manuscript, nor in COSMIC, and associated with poly-q sequences. These events were removed from our MAF as suspicious artefacts. Genomic Alteration Matrix (GAM) The so obtained set of 65 amplifications, 64 deletions, and 76 recurrent mutation events constitutes our final collection of putative selected functional events or SFEs (Table S). We recurred to the alteration call abstraction, where given a sample and an SFE, that SFE either occurs () or does not occur () in the given sample. Copy number alterations were called occurring if GISTIC found a deep deletion (-2) or high amplification (2) for genes in the amplified or deleted region, mutations were called occurring based on the variants reported in the MAF filtered as described above. Based on this concept, we built a binary genomic alteration matrix (GAM) with samples as columns and SFEs as row where each entry (i.j) report the alteration call for the SFE i in sample j. This matrix has been used as input for all analyses described in this manuscript. GENIE Cohort Data Collection and Processing Tumor data from the AACR Genomics, Evidence, Neoplasia, Information, Exchange (GENIE) project were downloaded on April 27 from the Synapse repository syn ( for a total of 9,966 patients. Alteration events defined in the TCGA pan-cancer cohort that were also tested in this dataset were used as SFEs and single point mutation data and discretized copy number changes were reduced to a binary GAM as previously described. ICGC Cohort Data Collection and Processing Tumor data from the International Cancer Genome Consortium (ICGC) were downloaded on April 27 (data freeze version 23) from the ICGC Data Portal ( Single point somatic mutation data were available for 6 of the 7 projects listed, accounting in total for 9,292 samples. Alteration events defined in the TCGA pan-cancer cohort that were also tested in this dataset were used as SFEs and single point mutation data and discretized copy number changes were reduced to a binary GAM as previously described. The SELECT Algorithm (Pairwise Alteration Motifs) In this study, we designed an information-theoretic approach to identify pairs of selected functional events (SFEs) that occur in the same sample more or less frequently than expected by chance. We call these non-random patterns of occurrence motifs. Motifs can represent either mutual exclusive or concurrent alterations, whereas modules are defined as combinations of such patterns for a group of alterations. The complete description of the methodology is available in Methods S. Input Data The only required input is a binary Genomic Alteration Matrix (GAM) representing the occurrence of each alteration in the samples. The method analyzes all alteration pairs in an unbiased way, without requiring any additional prior (e.g. functional networks or pathway annotations). Multiple tumor types/subtypes can be analyzed together, as the methodology natively keeps preserve these tumor groups in the null model and in the scoring procedure. In this case an additional table indicating for each sample its tumor type and subtype is required. Step : Quantify the Strength of Pairwise Patterns First, we use a weighted version of Mutual Information (wmi) to quantify the interdependence between two alterations x and y (Figures S5A S5C): wmiðx; yþ = Hðx; yþ HðxÞ HðyÞ where H (x) and H (x,y) are the weighted Entropy functions e2 Cancer Cell 32, e e6, August 4, 27

18 HðxÞ = X f x ðiþ fx ðiþ log w x;y i f;g w x;y Hðx; yþ = X X i f;g j f;g f x;y ði; jþ log w x;y fx;y ði; jþ with f x () being the frequency of occurrence of alteration x, f x () = f x (), and f x,y (i,j) the observed frequency of each possible combination of alterations x and y. The total number of samples in H (x) is corrected to give low weight to samples from tumor (sub)types where alterations x and y are rarely or never observed. The w x,y correction factor is defined, separately for each tumor type t, as the sigmoid function w x;y = l ðminðfxðþ ;fy + e ðþþ rþ with l = 5 and r =. (Figure S5D). This correction allows wmi to detect both ME and CO patterns between rare alterations (Figures S5E and S5F). Step 2. Statistical Significance and Effect Size Estimation Second, we determine whether observed pairwise alteration patterns are likely or not to occur by chance. We generate a null distribution by repeatedly randomizing the input GAM ( times in this study) and compare observed the wmi value of each pair of SFEs to the one the pair obtains within these randomized GAMs (Figure S5G). The core of our randomization procedure is a controlled permutation of values within the GAM. This approach interprets the GAM as the adjacency matrix of a bipartite graph where nodes are either samples or alterations, and alterations are connected to samples where they occur. Then a random bipartite graph is generated by edge switching (Ciriello et al., 22), such that it retains identical number edges in each node (i.e. both samples and alterations retain their frequencies). Furthermore, in this study we divided the actual pan-cancer alteration matrix into different blocks, such that each block contained only samples from the same tumor type or subtype (if present) and only one type of alteration (i.e. either mutations, or copy number amplifications, or copy number deletions). Each block is separately permuted as described above and then re-assembled into one randomized pan-cancer matrix. The combination of edge switching and matrix partitioning thus preserves: ) the frequency of each SFE, 2) the number of alterations observed in each sample, 3) the total number of alterations within each tumor type and subtype, 4) the total amount of alterations of a specific type in each tumor type and subtype. Finally, we should note that copy number SFEs located in the same chromosome were not tested for co-occurrence as copy number alterations are often large and can encompass several ROIs. We quantify the effect size wmi f ðx; yþ of each pattern as the difference between the observed wmi and the expected, or random, wmi. An empirical p value is derived for each SFE pair as the percentage of random scores for the specific pair greater than or equal to the observed one. Finally, the procedure defined by Tusher and colleagues (Tusher et al., 2) is applied to control the false discovery rate (FDR). Step 3. ASC Correction To penalize indirect, transitive effects and promote direct associations, we correct the scores with the Average Sum Correction (ASC) procedure (Dunn et al., 28) as follows: CorrectedScoreðx; yþ = k wmiðx; f yþ sðx; Þ sð; yþ + sð; Þ where sðx; Þ and s ð; yþ are the mean scores obtained by x and y, respectively, sð; Þ is the global mean, and k is a re-scaling constant (here k = ). Whenever the score of a motif is mentioned in the text, it always refers to the ASC-corrected score. Application to the TCGA Dataset We applied the pairwise pattern analysis to the GAM of 6,456 cases and 55 SFEs. As pre-processing step, we discarded 7 SFEs that occurred in less than 6 samples (frequency <.%), as well as 42 samples with no SFE occurrences, reducing the GAM to 635 samples and 488 SFEs. Motifs were retained if their scores were greater than the threshold score S sig =.3, corresponding to the top 75% of the scores obtained by motifs with FDR <. before ASC and to the top 5% of those with FDR >.. Application to the GENIE Cohort SELECT was run on the GENIE cohort with the same parameters used for the TCGA dataset. The threshold S sig was selected as for the TCGA dataset (the minimum score at which the 95% of the interactions with FDR >. are discarded). We selected samples profiled by either the MSKCC or DFCI center as these were tested for the greatest number of alterations. To avoid batch effects, the center of provenience was kept into account as covariate in the randomization process. SELECT was also run separately for the samples coming from the MSKCC and DFCI centers. The analyses of the different cohorts led to consistent and similar results (Table S3). Application to the ICGC Cohort SELECT was run on the ICGC cohort with the same parameters used for the TCGA dataset. The threshold S sig was selected as for the TCGA dataset (the minimum score at which the 95% of the interactions with FDR >. are discarded). To avoid any batch effect, the project of provenience was kept into account as a covariate in the randomization process. w x;y Cancer Cell 32, e e6, August 4, 27 e3

19 Tumor Type-Specificity Analysis Tumor type-specificity of the 63 motifs detected in the pan-cancer analysis (pan-cancer motifs) was studied by running SELECT on each single tumor type, and looking for their tendency toward mutual exclusivity or co-occurrence within each individual tumor type. Given the reduced statistical power of single tumor-type datasets, we relied on the effect size rather than the statistical significance of the differences between the observed and expected patterns of alteration. A pan-cancer motif (x,y) was tested within a tumor type t if both x and y alterations occurred in at least 5 samples in t. We considered a motif having a trend towards co-occurrence or mutual exclusivity in tumor type t if the observed overlap between occurrences of x and y was outside the 84 th percentile of the expected overlap distribution (corresponding to one standard deviation in a normal distribution). Because of their significance at the pan-cancer level, pan-cancer mutual exclusivity motifs with an overlap of in tumor type t were always considered as showing a trend towards mutual exclusivity in t. The SELECT Algorithm (Modules of Alteration Motifs) To identify significant modules of evolutionary dependencies from a network of pairwise motifs between SFEs, () first SELECT uses a seed-and-extend procedure to identify naive candidate modules, (2) then it generates consensus modules by clustering the candidate modules identified at the first step, (3) finally, it tests each consensus module for statistical significance using random modules extracted from randomized networks of pairwise motifs. The complete description of the methodology is available in Methods S. Input Data Our approach only requires a network of pairwise motifs as input data. Step. Naive Modules Extraction Naive candidate modules are built from an initial set of seed motifs. Each seed is locally extended by adding the node connected to the seed by the set of motifs with highest sum of scores (Figure S6A). Candidate modules are extended till n SFEs are included (here n = ). Step 2. Hierarchical Consensus Clustering Naive candidate modules are clustered based on the set of SFE they contain, using complete agglomerative hierarchical clustering using the Manhattan distance (R function hclust). The internal nodes of the dendrogram obtained from the hierarchical clustering represent groups of modules sharing a high number of alterations and alteration motifs. An internal node is flagged as candidate consensus module if (a) there is at least one alteration shared by more than the 8% of the naive modules clustered under the same internal node; (b) the direct descendants of the internal node share the same core of frequent alterations (see Supplemental Information); (c) the direct ancestor of the internal node is not a candidate consensus module (Figure S6B). Step 3. Statistical Significance Analysis A null distribution of random consensus modules is generated by applying the module discovery algorithm to randomized networks of pairwise motifs. First, random networks are generated by re-shuffling the edge weights of the input network, but preserving the original topology. Then, random consensus modules are extracted from these random networks. Finally, the number of edges and sum of edge weights of each observed consensus module are compared with those exhibited by random consensus modules with the same number of SFEs. From this comparison, empirical p values are derived for the number of edges and sum of edge weights of each observed consensus module. The Benjamin-Hockberg FDR adjustment is applied to correct for multiple hypothesis testing. Observed consensus modules with an FDR %. for either the number of edges or the sum of edge weights are retained as significant. Application to the TCGA Dataset As input network to the module discovery step, we used the list of motifs with score > identified by the pairwise analysis. The motif score of these interactions were used as edge weights in the network. Motifs identified in the pan-cancer dataset with score R S sig, were used as seeds. We applied the module discovery process (Steps -3) iteratively: after each iteration we remove the significant modules from the input network and iterate the method on the reduced network. In our analyses, we stop after the third iteration. In the first iteration 9 modules were found significant, while the second and third round identified 2 and additional modules, respectively. Mathematical Modelling of Cancer Evolution Our main hypothesis is that mutual exclusivity and co-occurrence patterns emerge because of an underlying structure of dependencies which determine the selection process during cancer evolution. To test this hypothesis, we simulate the growth of cancer samples using a previously proposed mathematical model of cancer evolution (Bozic et al., 2). According to this model, at each iteration a cell either replicates or dies with complementary probabilities p and (-p), and a replicating cell acquires either a passenger or a selected mutation with probabilities m p and m s, respectively. The probability of replicating p increases with the number of selected mutations in the cell. We modified this model to track which genes out of 2, are mutated, with 5 genes having m s > and m s varying between these 5 genes so to obtain a distribution of selected alteration frequencies resembling those observed in our human dataset (Figures S7A and S7B). This model assumes that events are selected independently of each other (independently selected events or ISE), i.e. the emergence and selection of one mutation does not influence the probability of others to be selected. We compare this model with one assuming conditionally selected events (CSE). In the CSE model, we bias the alteration selection process by imposing an underlying dependency network. A dependency network is defined between the 5 genes with m s >, such that once one of these genes is mutated and its mutation selected, it influences the probability of selecting mutations affecting its e4 Cancer Cell 32, e e6, August 4, 27

20 neighbors, either positively (increase probability of selection) or negatively (decrease probability of selection). The ISE model is equivalent to a CSE model with an empty dependency network. In this work we considered the following three dependency networks between n=5 alterations:. Null network: a graph with no edges, representing the absence of any conditional selection (ISE model). 2. Random d2 network: each node is connected, at random, to two other nodes. The sign of the interaction is randomly chosen (CSE-sparse model). 3. Dense modular network: within this network six communities of alterations are strongly positively connected within themselves, but negatively connected between each other (CSE-dense model). A complete description of the methodology is available in Methods S. Generation and Analysis of Synthetic Data For each dependency network we generated a cohort of samples, using the model parameters indicated in Figure S7C. We applied SELECT to each synthetic cohort, and compared the distribution of EIS scores between cohorts using the Tail Distribution Ratio approach. QUANTIFICATION AND STATISTICAL ANALYSIS Log-Normal Model Fit We used the R library MASS to fit log normal models to the distributions of number of alterations per sample. Hypothesis Testing Unless otherwise specified, the Wilcoxon rank-sum test was used to assess the significance when comparing two distributions. The Kruskal-Wallis test was used when comparing three or more distributions. CDKN2A Locus Mapping Somatic mutations affecting CDKN2A were re-annotated using the Variant Effect Predictor ( tools/vep/index.html) to specifically distinguish mutation predicted effects on p6 and ARF. RefSeq transcript ID for p6 and ARF were respectively NM_77.4 and NM_ Pathway Analysis We functionally annotated each SFE based on a manually curated controlled vocabulary of categories including oncogenic pathways, cellular processes and protein functions. The annotation was automatically performed using Gene Ontology with the tool fast- SemSim ( followed by manual curation. Tail Distribution Ratio We defined the concept of tail distribution ratio (TDR) to compare the distribution of scores obtained by two separate sets of alteration motifs. The tail distribution of a random variable is the complement of its cumulative distribution, i.e. TD i (X) expresses the probability of a variable i to assume values greater than X. In our analyses, for a given score X, we estimated that probability as the fraction of pairs with score > X. We separately estimated these values for motifs in each set (e.g. within-pairs and between-pairs) and we took the ratio of each TD (X), therefore termed tail distribution ratio (TDR). Gene Expression Signature Enrichment Analysis We developed a two step enrichment procedure to assess the association between co-occurrence motifs and gene expression signatures. First, single sample Gene Set Enrichment analysis (Barbie et al., 29) (ssgsea), implemented in the R package GSVA, is used to calculate an expression score for each gene expression signature and each sample. The default parameters from the GSVA package were used. Second, we tested for differential signature scores associated to the co-occurrent motifs. Precisely, we compared signature scores of samples with only one of the two alterations (or altered pathways) with samples having both alterations (or altered pathways). ANOVA was used to model the signature scores as a linear combination of the alteration status (double altered vs single altered) and tumor subtype. The enrichments were performed separately on each tumor type. Co-occurrent motifs with less than four double altered samples were not tested. SELECT Robustness and Power Analysis Robustness to Noise in Alteration Occurrence Calls In order to test the robustness of the results in the presence of noise in the alteration call, we artificially added or removed alteration occurrences to the original GAM. For a given noise fraction k (k =.,.2,.5,.,.2), k*n alterations were added or removed from the original GAM, where N is the total number of alteration occurrences in the original GAM. The number of occurrences added/ deleted for each SFE was proportional to the SFE frequency. The process was repeated times for each noise fraction k, and Cancer Cell 32, e e6, August 4, 27 e5

21 SELECT was then run on the resulting GAMs. The robustness of SELECT was evaluated in terms of recall (the fraction of the 63 interactions significant in the original matrix that were correctly identified in the noisy matrices). Robustness to Noise in SFE Selection We also verified to which extent results would change when adding new SFEs or removing any of the 55 ones. We simulated the addition and removal of s SFEs to the original dataset (s =, 2, 3, 4, 5 and ), and ran SELECT on the resulting GAMs. The robustness of SELECT was evaluated in terms of recall and Matthews Correlation Coefficient (MCC), taking into account both False Positives and False Negatives. Power Analysis To estimate the power of detecting significant motifs as a function of the number of samples, SELECT was run on subsampled versions of the original GAM (subsampling levels: 5,, 2, 3, 4, 5, 55 and 6 samples). The subsampling process was repeated times for each subsampling level, and the original tumor type composition was preserved in the subsampled GAMs. The 63 interactions found by SELECT on the original GAM were binned according to the maximum and minimum frequencies of the two alterations and their overlap frequency. Within each bin, the power of SELECT at a given subsampling level corresponds to the average percentage of interactions found significant in the subsampled GAMs. Drug Sensitivity in Cell Lines Data Collection and Preprocessing Molecular profiles and drug sensitivity data (expressed by IC 5 concentration) for a panel of 74 cancer cell lines screened for 265 drugs by the Cosmic Cell Line Project (Iorio et al., 26) was downloaded from the Cosmic Data Portal ( cell_lines/, data freeze v79) and the GDSC portal ( data freeze GDSC). Only the 927 cell lines with both genomic and drug sensitivity data available were considered in the downstream analyses. The continuous IC 5 values were dichotomized into sensitive and resistant classes as described in (Knijnenburg et al., 26). Subsequent analyses have been performed in Python 2.7.2, using NumPy, SciPy, Pandas and Matplotlib libraries, and R. Similarity Analysis of Double vs Single Mutant Drug Sensitivity Profiles For a given pairwise motif between SFEs x and y, we compared the similarity of drug sensitivity in double-mutant cell lines (i.e. harboring both x and y alterations) and single-mutant cell lines (i.e. harboring either x or y alterations, but not both). For each pair of cell lines sensitivity profiles were assembled based on drugs that were tested on both cell lines (shared drugs) and binary sensitivity calls were used (resistant =, sensitive = ) instead of IC 5 values, as described above. A similarity value between the sensitivity profiles of two cell lines was determined by Matthews Correlation Coefficient (Matthews, 975). If two cell lines had fewer than shared drugs, their similarity was discarded. For each co-occurrence motif two sets were derived: a first set containing the similarities between double-mutant cell lines, and a second set comprising the similarities between single and double-mutant cell lines. The difference between the two sets was tested with a one-tailed Wilcoxon rank-sum test. The total number of significant motifs was then compared to the expected number of motifs found significant after reshuffling the cell line labels within tumor types. The process was repeated times and the empirical p values in Figure 4A have been derived as the fraction of random runs returning a number of significant motifs greater than or equal to the one observed. Similarity Analysis of Double Mutant vs Intra-Tumor Type Drug Sensitivity Profiles The similarity between double-mutant cell lines was compared to the background distribution of pairwise similarity among all the cell lines within the same tumor type. To account for the differences of similarity distributions across tumor types (Figure S4A), intra- and inter-tumor type similarity scores were quantile-normalized to a uniform reference distribution ranging between and. Cell lines without a reported TCGA tumor type annotation were discarded. For each motif, the one-tailed Wilcoxon rank-sum test was used to test whether similarity scores of double mutant cell lines were greater than those of the background distribution. The total number of significant motifs was then compared to the expected number of motifs found significant after reshuffling the cell line labels within tumor types. The process was repeated times and the empirical p values in Figure 4A have been derived as the fraction of random runs returning a number of significant motifs greater than or equal to the one observed. Association between SFE Interactions and Specific Drug Sensitivity Given a motif and a specific drug D, we compared the sensitivity to D of single and double-mutant cell lines. To this purpose we employed the statistical framework based on ANOVA modeling proposed in (Iorio et al., 26). Drug response (continuous IC 5 values, log scaled) is modeled as a linear combination of the alteration status (dichotomized in double altered and single altered classes), the tissue of origin of the cell lines, the screening medium, the tumor instability class and the growth properties of the cell lines (adherent/suspension). As previously done we retained significant results with an FDR <.25. DATA AND SOFTWARE AVAILABILITY The R implementation of the SELECT algorithm and the TCGA GAM are publicly available at ciriellolab.org/select/select.html. e6 Cancer Cell 32, e e6, August 4, 27

22 Cancer Cell, Volume 32 Supplemental Information Conditional Selection of Genomic Alterations Dictates Cancer Evolution and Oncogenic Dependencies Marco Mina, Franck Raynaud, Daniele Tavernari, Elena Battistello, Stephanie Sungalee, Sadegh Saghafinia, Titouan Laessle, Francisco Sanchez-Vega, Nikolaus Schultz, Elisa Oricchio, and Giovanni Ciriello

23 Table S. Related to Figure. Tumor type and subtype annotation for each sample, and list of SFEs.

24 A Patients molecular profiles Genomic Alteration Matrix Observed patterns Network of ASC correction Pairwise Co -occurrence and dependency Mutual Exclusivity network SFEs weighted Mutual Information X V V Samples Random patterns Mutual Exclusivity Co-occurence. Prune transitive associations 2. Score pairwise interactions Mutual Exclusivity Co-occurence B # of significant motifs C # of significant motifs D Probability of of overlap = min Frequency Distribution MAX Frequency Distribution # samples min Frequency Distribution MAX Frequency Distribution # samples Alteration Frequency <.5%.5-% -2% 2-3% 3-4% 4-5% 5-% > % k Alteration frequency Pan-cancer k 5k k 2k # samples.%.5% % 2% 5k k 5% % 25% 5% E > % > % F Sub-sampling 47 5-% 4-5% 3-4% 2-3% MAX Frequency 5-% 4-5% 3-4% 2-3% MAX Frequency Recalled Motifs (out of 47) % -2% # samples # samples Fraction of recovered solutions ( out of 47) Overlap > 4% 3-4% 5-3% 5-5% -5% < %.5-% <.5% # samples Fraction of recovered solutions ( out of 224).5-% <.5% Recalled Motifs (out of 224) 3 Sub-sampling <.5%.5-% -2% > 2% MinFrequency # samples

25 Figure S. Related to Figure 2: The SELECT algorithm: Schematic and power analysis. (A) Schematic of the pairwise interaction analysis pipeline. (B, C) Top boxplots: Number of significant motifs identified by SELECT (y-axis) on subsampled versions of the TCGA cohort (x-axis: number of samples considered) for co-occurrence (B) and mutual exclusivity (C) motifs. The thick central line of each boxplot represents the median number of significant motifs, the bounding box corresponds to the percentiles, whereas the whiskers extend up to.5 times the interquartile range. Bottom plots: Composition of significant motifs in terms of frequency of the events involved (minimum and maximum frequency for each pair). The higher the alteration frequency, the higher the probability of being detected even with low sample sizes. (D) The probability of observing an overlap of samples (y-axis) between two randomly distributed alterations (at different alteration frequencies, color-coded) decreases with the number of samples considered (x-axis). (E) Fraction of the 63 motifs identified in the full TCGA cohort that are significant in the analysis of the subsampled cohorts. Motifs were binned according to the maximum frequency of the two events in the motif (major y- axis), the minimum frequency of the two events (minor x-axis) and the overlap ratio of the motif (minor y-axis). The color of each bin represents the average fraction of motifs in the bin that are significant. Grey bins indicate that none of the motifs in the bin were significant. The analysis was repeated at different levels of subsampling (number of samples ranging between 5 and 6, major x-axis). Left panel: mutual exclusivity motifs. Right panel: Co-occurrence motifs. (F) Average number of motifs (out of 63 motifs identified in the full TCGA cohort) that are found significant with different subsampling of the TCGA cohort. Left panel: mutual exclusivity motifs. Right panel: Co-occurrence motifs.

26 A Adding calls at random % (327.5) 2% (74) 5% (798) % (3678) 2% (7392) NOISY Dataset (Score).3.. E-3 E E-3 E E-3 E-4 E-4..3 E-4..3 E-4..3 E-4..3 E-4..3 E-3. E-3. E-3. E-3. E E-3 E E-3 E-4 REAL Dataset (Score) Removing calls at random % (486) 2% (86.5) 5% (927) % (38) 2% (755) NOISY Dataset (Score).3.. E-3 E E-3 E E-3 E-4 E-4..3 E-4..3 E-4..3 E-4..3 E-4..3 E-3. E-3. E-3. E-3. E E-3 E E-3 E-4 REAL Dataset (Score) B n SFE replaced out of 55: Legend (panels A and B) NOISY Dataset (Score).3.. E-3 E-4 n = n = 2 n = E-3 E E-3 E-4 False Positive True Negative True Positive False Negative E-3 E E-3 E E-3 E # pairs REAL Dataset (Score) NOISY Dataset (Score).3.. E-3 E-4 n = 4 n = 5 n =.3.. E-3 E-4 E-3 E E-3 E E-3 E E-3 E-4 C MCC % 75% 5% 25% # replaced SFE Recall REAL Dataset (Score)

Supplementary Figure 1. Copy Number Alterations TP53 Mutation Type. C-class TP53 WT. TP53 mut. Nature Genetics: doi: /ng.

Supplementary Figure 1. Copy Number Alterations TP53 Mutation Type. C-class TP53 WT. TP53 mut. Nature Genetics: doi: /ng. Supplementary Figure a Copy Number Alterations in M-class b TP53 Mutation Type Recurrent Copy Number Alterations 8 6 4 2 TP53 WT TP53 mut TP53-mutated samples (%) 7 6 5 4 3 2 Missense Truncating M-class

More information

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser Melissa S. Cline 1*, Brian Craft 1, Teresa Swatloski 1, Mary Goldman 1, Singer Ma 1, David Haussler 1, Jingchun Zhu 1 1 Center for Biomolecular

More information

The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley

The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley Department of Genetics Lineberger Comprehensive Cancer Center The University of North Carolina at Chapel Hill What is TCGA? The Cancer Genome

More information

Nature Getetics: doi: /ng.3471

Nature Getetics: doi: /ng.3471 Supplementary Figure 1 Summary of exome sequencing data. ( a ) Exome tumor normal sample sizes for bladder cancer (BLCA), breast cancer (BRCA), carcinoid (CARC), chronic lymphocytic leukemia (CLLX), colorectal

More information

Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity

Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity A Consistently unmethylated sites (30%) in 21 cancer types 174,696

More information

Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers

Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers Sung-Hou Kim University of California Berkeley, CA Global Bio Conference 2017 MFDS, Seoul, Korea June 28, 2017 Cancer

More information

User s Manual Version 1.0

User s Manual Version 1.0 User s Manual Version 1.0 #639 Longmian Avenue, Jiangning District, Nanjing,211198,P.R.China. http://tcoa.cpu.edu.cn/ Contact us at xiaosheng.wang@cpu.edu.cn for technical issue and questions Catalogue

More information

Session 4 Rebecca Poulos

Session 4 Rebecca Poulos The Cancer Genome Atlas (TCGA) & International Cancer Genome Consortium (ICGC) Session 4 Rebecca Poulos Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 28

More information

Session 4 Rebecca Poulos

Session 4 Rebecca Poulos The Cancer Genome Atlas (TCGA) & International Cancer Genome Consortium (ICGC) Session 4 Rebecca Poulos Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 20

More information

Plasma-Seq conducted with blood from male individuals without cancer.

Plasma-Seq conducted with blood from male individuals without cancer. Supplementary Figures Supplementary Figure 1 Plasma-Seq conducted with blood from male individuals without cancer. Copy number patterns established from plasma samples of male individuals without cancer

More information

Expanded View Figures

Expanded View Figures Molecular Systems iology Tumor CNs reflect metabolic selection Nicholas Graham et al Expanded View Figures Human primary tumors CN CN characterization by unsupervised PC Human Signature Human Signature

More information

Distinct cellular functional profiles in pan-cancer expression analysis of cancers with alterations in oncogenes c-myc and n-myc

Distinct cellular functional profiles in pan-cancer expression analysis of cancers with alterations in oncogenes c-myc and n-myc Honors Theses Biology Spring 2018 Distinct cellular functional profiles in pan-cancer expression analysis of cancers with alterations in oncogenes c-myc and n-myc Anne B. Richardson Whitman College Penrose

More information

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from Supplementary Figure 1 SEER data for male and female cancer incidence from 1975 2013. (a,b) Incidence rates of oral cavity and pharynx cancer (a) and leukemia (b) are plotted, grouped by males (blue),

More information

Clinical Grade Genomic Profiling: The Time Has Come

Clinical Grade Genomic Profiling: The Time Has Come Clinical Grade Genomic Profiling: The Time Has Come Gary Palmer, MD, JD, MBA, MPH Senior Vice President, Medical Affairs Foundation Medicine, Inc. Oct. 22, 2013 1 Why We Are Here A Shared Vision At Foundation

More information

Supplemental Information. Integrated Genomic Analysis of the Ubiquitin. Pathway across Cancer Types

Supplemental Information. Integrated Genomic Analysis of the Ubiquitin. Pathway across Cancer Types Cell Reports, Volume 23 Supplemental Information Integrated Genomic Analysis of the Ubiquitin Pathway across Zhongqi Ge, Jake S. Leighton, Yumeng Wang, Xinxin Peng, Zhongyuan Chen, Hu Chen, Yutong Sun,

More information

Genomic Medicine: What every pathologist needs to know

Genomic Medicine: What every pathologist needs to know Genomic Medicine: What every pathologist needs to know Stephen P. Ethier, Ph.D. Professor, Department of Pathology and Laboratory Medicine, MUSC Director, MUSC Center for Genomic Medicine Genomics and

More information

Next Generation Sequencing in Clinical Practice: Impact on Therapeutic Decision Making

Next Generation Sequencing in Clinical Practice: Impact on Therapeutic Decision Making Next Generation Sequencing in Clinical Practice: Impact on Therapeutic Decision Making November 20, 2014 Capturing Value in Next Generation Sequencing Symposium Douglas Johnson MD, MSCI Vanderbilt-Ingram

More information

Cancer develops as a result of the accumulation of somatic

Cancer develops as a result of the accumulation of somatic Cancer-mutation network and the number and specificity of driver mutations Jaime Iranzo a,1, Iñigo Martincorena b, and Eugene V. Koonin a,1 a National Center for Biotechnology Information, National Library

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Pan-cancer analysis of global and local DNA methylation variation a) Variations in global DNA methylation are shown as measured by averaging the genome-wide

More information

Expanded View Figures

Expanded View Figures Solip Park & Ben Lehner Epistasis is cancer type specific Molecular Systems Biology Expanded View Figures A B G C D E F H Figure EV1. Epistatic interactions detected in a pan-cancer analysis and saturation

More information

Biology of cancer development in the GI tract

Biology of cancer development in the GI tract 1 Genesis and progression of GI cancer a genetic disease Colorectal cancer Fearon and Vogelstein proposed a genetic model to explain the stepwise formation of colorectal cancer (CRC) from normal colonic

More information

The Cancer Genome Atlas & International Cancer Genome Consortium

The Cancer Genome Atlas & International Cancer Genome Consortium The Cancer Genome Atlas & International Cancer Genome Consortium Session 3 Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 31 st July 2014 1

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature10866 a b 1 2 3 4 5 6 7 Match No Match 1 2 3 4 5 6 7 Turcan et al. Supplementary Fig.1 Concepts mapping H3K27 targets in EF CBX8 targets in EF H3K27 targets in ES SUZ12 targets in ES

More information

Targeted Agent and Profiling Utilization Registry (TAPUR ) Study. February 2018

Targeted Agent and Profiling Utilization Registry (TAPUR ) Study. February 2018 Targeted Agent and Profiling Utilization Registry (TAPUR ) Study February 2018 Precision Medicine Therapies designed to target the molecular alteration that aids cancer development 30 TARGET gene alterations

More information

Supplemental Figure legends

Supplemental Figure legends Supplemental Figure legends Supplemental Figure S1 Frequently mutated genes. Frequently mutated genes (mutated in at least four patients) with information about mutation frequency, RNA-expression and copy-number.

More information

Ahrim Youn 1,2, Kyung In Kim 2, Raul Rabadan 3,4, Benjamin Tycko 5, Yufeng Shen 3,4,6 and Shuang Wang 1*

Ahrim Youn 1,2, Kyung In Kim 2, Raul Rabadan 3,4, Benjamin Tycko 5, Yufeng Shen 3,4,6 and Shuang Wang 1* Youn et al. BMC Medical Genomics (2018) 11:98 https://doi.org/10.1186/s12920-018-0425-z RESEARCH ARTICLE Open Access A pan-cancer analysis of driver gene mutations, DNA methylation and gene expressions

More information

Supplementary Materials for

Supplementary Materials for www.sciencetranslationalmedicine.org/cgi/content/full/7/283/283ra54/dc1 Supplementary Materials for Clonal status of actionable driver events and the timing of mutational processes in cancer evolution

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Genomic tests to personalize therapy of metastatic breast cancers. Fabrice ANDRE Gustave Roussy Villejuif, France

Genomic tests to personalize therapy of metastatic breast cancers. Fabrice ANDRE Gustave Roussy Villejuif, France Genomic tests to personalize therapy of metastatic breast cancers Fabrice ANDRE Gustave Roussy Villejuif, France Future application of genomics: Understand the biology at the individual scale Patients

More information

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies 2017 Contents Datasets... 2 Protein-protein interaction dataset... 2 Set of known PPIs... 3 Domain-domain interactions...

More information

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis Tieliu Shi tlshi@bio.ecnu.edu.cn The Center for bioinformatics

More information

Identification of Potential Therapeutic Targets by Molecular and Genomic Profiling of 628 Cases of Uterine Serous Carcinoma

Identification of Potential Therapeutic Targets by Molecular and Genomic Profiling of 628 Cases of Uterine Serous Carcinoma Identification of Potential Therapeutic Targets by Molecular and Genomic Profiling of 628 Cases of Uterine Serous Carcinoma Nathaniel L Jones 1, Joanne Xiu 2, Sandeep K. Reddy 2, Ana I. Tergas 1, William

More information

Development of Carcinoma Pathways

Development of Carcinoma Pathways The Construction of Genetic Pathway to Colorectal Cancer Moriah Wright, MD Clinical Fellow in Colorectal Surgery Creighton University School of Medicine Management of Colon and Diseases February 23, 2019

More information

Clustered mutations of oncogenes and tumor suppressors.

Clustered mutations of oncogenes and tumor suppressors. Supplementary Figure 1 Clustered mutations of oncogenes and tumor suppressors. For each oncogene (red dots) and tumor suppressor (blue dots), the number of mutations found in an intramolecular cluster

More information

The mutations that drive cancer. Paul Edwards. Department of Pathology and Cancer Research UK Cambridge Institute, University of Cambridge

The mutations that drive cancer. Paul Edwards. Department of Pathology and Cancer Research UK Cambridge Institute, University of Cambridge The mutations that drive cancer Paul Edwards Department of Pathology and Cancer Research UK Cambridge Institute, University of Cambridge Previously on Cancer... hereditary predisposition Normal Cell Slightly

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Workflow of CDR3 sequence assembly from RNA-seq data.

Nature Genetics: doi: /ng Supplementary Figure 1. Workflow of CDR3 sequence assembly from RNA-seq data. Supplementary Figure 1 Workflow of CDR3 sequence assembly from RNA-seq data. Paired-end short-read RNA-seq data were mapped to human reference genome hg19, and unmapped reads in the TCR regions were extracted

More information

File Name: Supplementary Information Description: Supplementary Figures and Supplementary Tables. File Name: Peer Review File Description:

File Name: Supplementary Information Description: Supplementary Figures and Supplementary Tables. File Name: Peer Review File Description: File Name: Supplementary Information Description: Supplementary Figures and Supplementary Tables File Name: Peer Review File Description: Primer Name Sequence (5'-3') AT ( C) RT-PCR USP21 F 5'-TTCCCATGGCTCCTTCCACATGAT-3'

More information

Supplementary Tables. Supplementary Figures

Supplementary Tables. Supplementary Figures Supplementary Files for Zehir, Benayed et al. Mutational Landscape of Metastatic Cancer Revealed from Prospective Clinical Sequencing of 10,000 Patients Supplementary Tables Supplementary Table 1: Sample

More information

About OMICS Group Conferences

About OMICS Group Conferences About OMICS Group OMICS Group International is an amalgamation of Open Access publications and worldwide international science conferences and events. Established in the year 2007 with the sole aim of

More information

Supplementary Figure 1. Estimation of tumour content

Supplementary Figure 1. Estimation of tumour content Supplementary Figure 1. Estimation of tumour content a, Approach used to estimate the tumour content in S13T1/T2, S6T1/T2, S3T1/T2 and S12T1/T2. Tissue and tumour areas were evaluated by two independent

More information

Next generation histopathological diagnosis for precision medicine in solid cancers

Next generation histopathological diagnosis for precision medicine in solid cancers Next generation histopathological diagnosis for precision medicine in solid cancers from genomics to clinical application Aldo Scarpa ARC-NET Applied Research on Cancer Department of Pathology and Diagnostics

More information

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies Dr. Maricel G. Kann Assistant Professor Dept of Biological Sciences UMBC 2 The term protein domain

More information

Supplementary Figure 1: Comparison of acgh-based and expression-based CNA analysis of tumors from breast cancer GEMMs.

Supplementary Figure 1: Comparison of acgh-based and expression-based CNA analysis of tumors from breast cancer GEMMs. Supplementary Figure 1: Comparison of acgh-based and expression-based CNA analysis of tumors from breast cancer GEMMs. (a) CNA analysis of expression microarray data obtained from 15 tumors in the SV40Tag

More information

Biomarkers in Imunotherapy: RNA Signatures as predictive biomarker

Biomarkers in Imunotherapy: RNA Signatures as predictive biomarker Biomarkers in Imunotherapy: RNA Signatures as predictive biomarker Joan Carles, MD PhD Director GU, CNS and Sarcoma Program Department of Medical Oncology Vall d'hebron University Hospital Outline Introduction

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma. Supplementary Figure 1 Mutational signatures in BCC compared to melanoma. (a) The effect of transcription-coupled repair as a function of gene expression in BCC. Tumor type specific gene expression levels

More information

Jennifer Hauenstein Oncology Cytogenetics Emory University Hospital Atlanta, GA

Jennifer Hauenstein Oncology Cytogenetics Emory University Hospital Atlanta, GA Comparison of Genomic Coverage using Affymetrix OncoScan Array and Illumina TruSight Tumor 170 NGS Panel for Detection of Copy Number Abnormalities in Clinical GBM Specimens Jennifer Hauenstein Oncology

More information

Molecular Subtyping of Endometrial Cancer: A ProMisE ing Change

Molecular Subtyping of Endometrial Cancer: A ProMisE ing Change Molecular Subtyping of Endometrial Cancer: A ProMisE ing Change Charles Matthew Quick, M.D. Associate Professor of Pathology Director of Gynecologic Pathology University of Arkansas for Medical Sciences

More information

NeoTYPE Cancer Profiles

NeoTYPE Cancer Profiles NeoTYPE Cancer Profiles 30+ Multimethod Assays for Hematologic Diseases and Solid Tumors Molecular FISH Anatomic Pathology The next generation of diagnostic, prognostic, and therapeutic assessment What

More information

SureSelect Cancer All-In-One Custom and Catalog NGS Assays

SureSelect Cancer All-In-One Custom and Catalog NGS Assays SureSelect Cancer All-In-One Custom and Catalog NGS Assays Detect all cancer-relevant variants in a single SureSelect assay SNV Indel TL SNV Indel TL Single DNA input Single AIO assay Single data analysis

More information

Results and Discussion of Receptor Tyrosine Kinase. Activation

Results and Discussion of Receptor Tyrosine Kinase. Activation Results and Discussion of Receptor Tyrosine Kinase Activation To demonstrate the contribution which RCytoscape s molecular maps can make to biological understanding via exploratory data analysis, we here

More information

TCGA. The Cancer Genome Atlas

TCGA. The Cancer Genome Atlas TCGA The Cancer Genome Atlas TCGA: History and Goal History: Started in 2005 by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) with $110 Million to catalogue

More information

Insights from Sequencing the Melanoma Exome

Insights from Sequencing the Melanoma Exome Insights from Sequencing the Melanoma Exome Michael Krauthammer, MD PhD, December 2 2015 Yale University School Yof Medicine 1 2012 Exome Screens and Results Exome Sequencing of 108 sun-exposed melanomas

More information

Osamu Tetsu, MD, PhD Associate Professor Department of Otolaryngology-Head and Neck Surgery School of Medicine, University of California, San

Osamu Tetsu, MD, PhD Associate Professor Department of Otolaryngology-Head and Neck Surgery School of Medicine, University of California, San Osamu Tetsu, MD, PhD Associate Professor Department of Otolaryngology-Head and Neck Surgery School of Medicine, University of California, San Francisco Lung Cancer Classification Pathological Classification

More information

Nicholas Borcherding, Nicholas L. Bormann, Andrew Voigt, Weizhou Zhang 1-4

Nicholas Borcherding, Nicholas L. Bormann, Andrew Voigt, Weizhou Zhang 1-4 SOFTWARE TOOL ARTICLE TRGAted: A web tool for survival analysis using protein data in the Cancer Genome Atlas. [version 1; referees: 1 approved] Nicholas Borcherding, Nicholas L. Bormann, Andrew Voigt,

More information

Secuenciación masiva: papel en la toma de decisiones

Secuenciación masiva: papel en la toma de decisiones Secuenciación masiva: papel en la toma de decisiones Cancer is a Genetic Disease Development of cancer is driven by the acquisition of somatic genetic alterations: Nonsynonymous point mutations: missense.

More information

Fluxion Biosciences and Swift Biosciences Somatic variant detection from liquid biopsy samples using targeted NGS

Fluxion Biosciences and Swift Biosciences Somatic variant detection from liquid biopsy samples using targeted NGS APPLICATION NOTE Fluxion Biosciences and Swift Biosciences OVERVIEW This application note describes a robust method for detecting somatic mutations from liquid biopsy samples by combining circulating tumor

More information

Oncogenes and Tumor Suppressors MCB 5068 November 12, 2013 Jason Weber

Oncogenes and Tumor Suppressors MCB 5068 November 12, 2013 Jason Weber Oncogenes and Tumor Suppressors MCB 5068 November 12, 2013 Jason Weber jweber@dom.wustl.edu Oncogenes & Cancer DNA Tumor Viruses Simian Virus 40 p300 prb p53 Large T Antigen Human Adenovirus p300 E1A

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute Whole Genome and Transcriptome Analysis of Anaplastic Meningioma Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute Outline Anaplastic meningioma compared to other cancers Whole genomes

More information

NeoTYPE Cancer Profiles

NeoTYPE Cancer Profiles NeoTYPE Cancer Profiles Multimethod Analysis of 25+ Hematologic Diseases and Solid Tumors Anatomic Pathology FISH Molecular The next generation of diagnostic, prognostic, and therapeutic assessment NeoTYPE

More information

Predictive biomarker profiling of > 1,900 sarcomas: Identification of potential novel treatment modalities

Predictive biomarker profiling of > 1,900 sarcomas: Identification of potential novel treatment modalities Predictive biomarker profiling of > 1,900 sarcomas: Identification of potential novel treatment modalities Sujana Movva 1, Wenhsiang Wen 2, Wangjuh Chen 2, Sherri Z. Millis 2, Margaret von Mehren 1, Zoran

More information

Clinical Grade Biomarkers in the Genomic Era Observations & Challenges

Clinical Grade Biomarkers in the Genomic Era Observations & Challenges Clinical Grade Biomarkers in the Genomic Era Observations & Challenges IOM Committee on Policy Issues in the Clinical Development & Use of Biomarkers for Molecularly Targeted Therapies March 31-April 1,

More information

Integration of Cancer Genome into GECCO- Genetics and Epidemiology of Colorectal Cancer Consortium

Integration of Cancer Genome into GECCO- Genetics and Epidemiology of Colorectal Cancer Consortium Integration of Cancer Genome into GECCO- Genetics and Epidemiology of Colorectal Cancer Consortium Ulrike Peters Fred Hutchinson Cancer Research Center University of Washington U01-CA137088-05, PI: Peters

More information

IntelliGENSM. Integrated Oncology is making next generation sequencing faster and more accessible to the oncology community.

IntelliGENSM. Integrated Oncology is making next generation sequencing faster and more accessible to the oncology community. IntelliGENSM Integrated Oncology is making next generation sequencing faster and more accessible to the oncology community. NGS TRANSFORMS GENOMIC TESTING Background Cancers may emerge as a result of somatically

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle   holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/1887/22278 holds various files of this Leiden University dissertation. Author: Cunha Carvalho de Miranda, Noel Filipe da Title: Mismatch repair and MUTYH deficient

More information

Genomic and Functional Approaches to Understanding Cancer Aneuploidy

Genomic and Functional Approaches to Understanding Cancer Aneuploidy Article Genomic and Functional Approaches to Understanding Cancer Aneuploidy Graphical Abstract Cancer-Type Specific Aneuploidy Patterns in TCGA Samples CRISPR Transfection and Selection Immortalized Cell

More information

Out-Patient Billing CPT Codes

Out-Patient Billing CPT Codes Out-Patient Billing CPT Codes Updated Date: August 3, 08 Client Billed Molecular Tests HPV DNA Tissue Testing 8764 No Medicare Billed - Molecular Tests NeoARRAY NeoARRAY SNP/Cytogenetic No 89 NeoLAB NeoLAB

More information

Inferring Biological Meaning from Cap Analysis Gene Expression Data

Inferring Biological Meaning from Cap Analysis Gene Expression Data Inferring Biological Meaning from Cap Analysis Gene Expression Data HRYSOULA PAPADAKIS 1. Introduction This project is inspired by the recent development of the Cap analysis gene expression (CAGE) method,

More information

The Tandem Duplicator Phenotype Is a Prevalent Genome-Wide Cancer Configuration Driven by Distinct Gene Mutations

The Tandem Duplicator Phenotype Is a Prevalent Genome-Wide Cancer Configuration Driven by Distinct Gene Mutations Article The Tandem Duplicator Phenotype Is a Prevalent Genome-Wide Cancer Configuration Driven by Distinct Gene Mutations Graphical Abstract Authors Francesca Menghi, Floris P. Barthel, Vinod Yadav,...,

More information

Clinically Useful Next Generation Sequencing and Molecular Testing in Gliomas MacLean P. Nasrallah, MD PhD

Clinically Useful Next Generation Sequencing and Molecular Testing in Gliomas MacLean P. Nasrallah, MD PhD Clinically Useful Next Generation Sequencing and Molecular Testing in Gliomas MacLean P. Nasrallah, MD PhD Neuropathology Fellow Division of Neuropathology Center for Personalized Diagnosis (CPD) Glial

More information

Transform genomic data into real-life results

Transform genomic data into real-life results CLINICAL SUMMARY Transform genomic data into real-life results Biomarker testing and targeted therapies can drive improved outcomes in clinical practice New FDA-Approved Broad Companion Diagnostic for

More information

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. Supplementary Figure 1 Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. (a,b) Values of coefficients associated with genomic features, separately

More information

LncMAP: Pan-cancer atlas of long noncoding RNA-mediated transcriptional network perturbations

LncMAP: Pan-cancer atlas of long noncoding RNA-mediated transcriptional network perturbations Published online 9 January 2018 Nucleic Acids Research, 2018, Vol. 46, No. 3 1113 1123 doi: 10.1093/nar/gkx1311 LncMAP: Pan-cancer atlas of long noncoding RNA-mediated transcriptional network perturbations

More information

Discovery Dataset. PD Liver Luminal B/ Her-2+ Letrozole. PD Supraclavicular Lymph node. PD Supraclavicular Lymph node Luminal B.

Discovery Dataset. PD Liver Luminal B/ Her-2+ Letrozole. PD Supraclavicular Lymph node. PD Supraclavicular Lymph node Luminal B. Discovery Dataset 11T pt1cpn2am1(liver) 2009 2010 Liver / Her-2+ 2011 Death Letrozole CHT pt1cpn0(sn)m0 Supraclavicular Lymph node Death 12T 2003 2006 2006 2008 Anastrozole Local RT+Examestane Fulvestrant

More information

S1 Appendix: Figs A G and Table A. b Normal Generalized Fraction 0.075

S1 Appendix: Figs A G and Table A. b Normal Generalized Fraction 0.075 Aiello & Alter (216) PLoS One vol. 11 no. 1 e164546 S1 Appendix A-1 S1 Appendix: Figs A G and Table A a Tumor Generalized Fraction b Normal Generalized Fraction.25.5.75.25.5.75 1 53 4 59 2 58 8 57 3 48

More information

Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes

Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes Nature Genetics 47, 106-114 (2015) doi:101038/ng3168 Max Leiserson RECOMB 2015 April

More information

RNA SEQUENCING AND DATA ANALYSIS

RNA SEQUENCING AND DATA ANALYSIS RNA SEQUENCING AND DATA ANALYSIS Download slides and package http://odin.mdacc.tmc.edu/~rverhaak/package.zip http://odin.mdacc.tmc.edu/~rverhaak/rna-seqlecture.zip Overview Introduction into the topic

More information

w ª wy xvwz A ª vw xvw P ª w} xvw w Æ w Æ V w,x Æ w Æ w Æ y,z Æ { Æ y,z, w w w~ w wy}æ zy Æ wyw{ xæ wz w xywæ xx Æ wv Æ } w x w x w Æ w Æ wy} zy Æ wz

w ª wy xvwz A ª vw xvw P ª w} xvw w Æ w Æ V w,x Æ w Æ w Æ y,z Æ { Æ y,z, w w w~ w wy}æ zy Æ wyw{ xæ wz w xywæ xx Æ wv Æ } w x w x w Æ w Æ wy} zy Æ wz w ª wy xvwz A ª vw xvw P ª w} xvw w Æ w Æ V w,x Æ w Æ w Æ y,z Æ { Æ y,z, w w w~ w wy}æ zy Æ wyw{ xæ wz w xywæ xx Æ wv Æ } w x w x w Æ w Æ wy} zy Æ wz {w Æ Æ wyw{ x w Germ-line mutations in BRCA1 are associated

More information

Precision Genetic Testing in Cancer Treatment and Prognosis

Precision Genetic Testing in Cancer Treatment and Prognosis Precision Genetic Testing in Cancer Treatment and Prognosis Deborah Cragun, PhD, MS, CGC Genetic Counseling Graduate Program Director University of South Florida Case #1 Diana is a 47 year old cancer patient

More information

Mutational Impact on Diagnostic and Prognostic Evaluation of MDS

Mutational Impact on Diagnostic and Prognostic Evaluation of MDS Mutational Impact on Diagnostic and Prognostic Evaluation of MDS Elsa Bernard, PhD Papaemmanuil Lab, Computational Oncology, MSKCC MDS Foundation ASH 2018 Symposium Disclosure Research funds provided by

More information

PI3K Background. The SignalRx R & D pipeline is shown below followed by a brief description of each program:

PI3K Background. The SignalRx R & D pipeline is shown below followed by a brief description of each program: PI3K Background The phosphatidylinositol 3-kinase (PI3K) pathway is a key cell signaling node whose dysregulation commonly results in the transformation of normal cells into cancer cells. The role of PI3K

More information

RNA SEQUENCING AND DATA ANALYSIS

RNA SEQUENCING AND DATA ANALYSIS RNA SEQUENCING AND DATA ANALYSIS Length of mrna transcripts in the human genome 5,000 5,000 4,000 3,000 2,000 4,000 1,000 0 0 200 400 600 800 3,000 2,000 1,000 0 0 2,000 4,000 6,000 8,000 10,000 Length

More information

Cancer. The fundamental defect is. unregulated cell division. Properties of Cancerous Cells. Causes of Cancer. Altered growth and proliferation

Cancer. The fundamental defect is. unregulated cell division. Properties of Cancerous Cells. Causes of Cancer. Altered growth and proliferation Cancer The fundamental defect is unregulated cell division. Properties of Cancerous Cells Altered growth and proliferation Loss of growth factor dependence Loss of contact inhibition Immortalization Alterated

More information

Cancer-type dependent genetic interactions between cancer driver alterations indicate plasticity of epistasis across cell types

Cancer-type dependent genetic interactions between cancer driver alterations indicate plasticity of epistasis across cell types Cancer-type dependent genetic interactions between cancer driver alterations indicate plasticity of epistasis across cell types Solip Park and Ben Lehner Corresponding author: Ben Lehner, Centre for Genomic

More information

Pan-cancer screen for mutations in non-coding elements with conservation and cancer specificity reveals correlations with expression and survival

Pan-cancer screen for mutations in non-coding elements with conservation and cancer specificity reveals correlations with expression and survival www.nature.com/npjgenmed ARTICLE OPEN Pan-cancer screen for mutations in non-coding elements with conservation and cancer specificity reveals correlations with expression and survival Henrik Hornshøj 1,

More information

oncogenes-and- tumour-suppressor-genes)

oncogenes-and- tumour-suppressor-genes) Special topics in tumor biochemistry oncogenes-and- tumour-suppressor-genes) Speaker: Prof. Jiunn-Jye Chuu E-Mail: jjchuu@mail.stust.edu.tw Genetic Basis of Cancer Cancer-causing mutations Disease of aging

More information

Nature Medicine: doi: /nm.3967

Nature Medicine: doi: /nm.3967 Supplementary Figure 1. Network clustering. (a) Clustering performance as a function of inflation factor. The grey curve shows the median weighted Silhouette widths for varying inflation factors (f [1.6,

More information

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R Frequency(%) 1 a b ALK FS-indel ALK R1Q HRAS Q61R HRAS G13R IDH R17K IDH R14Q MET exon14 SS-indel KIT D8Y KIT L76P KIT exon11 NFS-indel SMAD4 R361 IDH1 R13 CTNNB1 S37 CTNNB1 S4 AKT1 E17K ERBB D769H ERBB

More information

Pan-cancer patterns of DNA methylation

Pan-cancer patterns of DNA methylation Witte et al. Genome Medicine 2014, 6:66 REVIEW Pan-cancer patterns of DNA methylation Tania Witte, Christoph Plass and Clarissa Gerhauser * Abstract The comparison of DNA methylation patterns across cancer

More information

A Robust Method for Identifying Mutated Driver Pathways in Cancer

A Robust Method for Identifying Mutated Driver Pathways in Cancer , pp.392-397 http://dx.doi.org/10.14257/astl.2016. A Robust Method for Identifying Mutated Driver Pathways in Cancer Can-jun Hu and Shu-Lin Wang * College of computer science and electronic engineering,

More information

Colorectal Cancer in 2017: From Biology to the Clinics. Rodrigo Dienstmann

Colorectal Cancer in 2017: From Biology to the Clinics. Rodrigo Dienstmann Colorectal Cancer in 2017: From Biology to the Clinics Rodrigo Dienstmann MOLECULAR CLASSIFICATION Tumor cell Immune cell Tumor microenvironment Stromal cell MOLECULAR CLASSIFICATION Biomarker Tumor cell

More information

A Comprehensive Pan-Cancer Molecular Study of Gynecologic and Breast Cancers

A Comprehensive Pan-Cancer Molecular Study of Gynecologic and Breast Cancers Article A Comprehensive Pan-Cancer Molecular Study of Gynecologic and Breast Cancers Highlights d Integrated analysis finds molecular features characteristic of gynecologic tumors d d d Subtypes with high

More information

MSI positive MSI negative

MSI positive MSI negative Pritchard et al. 2014 Supplementary Figure 1 MSI positive MSI negative Hypermutated Median: 673 Average: 659.2 Non-Hypermutated Median: 37.5 Average: 43.6 Supplementary Figure 1: Somatic Mutation Burden

More information

ACTIVITY 2: EXAMINING CANCER PATIENT DATA

ACTIVITY 2: EXAMINING CANCER PATIENT DATA OVERVIEW Refer to the Overview of Cancer Discovery Activities for Key Concepts and Learning Objectives, Curriculum Connections, and Prior Knowledge, as well as background information, references, and additional

More information

The Center for PERSONALIZED DIAGNOSTICS

The Center for PERSONALIZED DIAGNOSTICS The Center for PERSONALIZED DIAGNOSTICS Precision Diagnostics for Personalized Medicine A joint initiative between The Department of Pathology and Laboratory Medicine & The Abramson Cancer Center The (CPD)

More information

Genomic Analyses across Six Cancer Types Identify Basal-like Breast Cancer as a Unique Molecular Entity

Genomic Analyses across Six Cancer Types Identify Basal-like Breast Cancer as a Unique Molecular Entity Genomic Analyses across Six Cancer Types Identify Basal-like Breast Cancer as a Unique Molecular Entity Aleix Prat, Barbara Adamo, Cheng Fan, Vicente Peg, Maria Vidal, Patricia Galván, Ana Vivancos, Paolo

More information

OncoMir Library Cancer Type Target Gene

OncoMir Library Cancer Type Target Gene OncoMir Library Cancer Type Target Gene hsa-let-7a-1 Breast Cancer, Lung Cancer H RAS, HMGA2, CDK6, NRAS hsa-let-7a-2 Breast Cancer, Lung Cancer H RAS, HMGA2, CDK6, NRAS hsa-let-7a-3 Breast Cancer, Lung

More information

Dr Yvonne Wallis Consultant Clinical Scientist West Midlands Regional Genetics Laboratory

Dr Yvonne Wallis Consultant Clinical Scientist West Midlands Regional Genetics Laboratory Dr Yvonne Wallis Consultant Clinical Scientist West Midlands Regional Genetics Laboratory Personalised Therapy/Precision Medicine Selection of a therapeutic drug based on the presence or absence of a specific

More information

Looking Beyond the Standard-of- Care : The Clinical Trial Option

Looking Beyond the Standard-of- Care : The Clinical Trial Option 1 Looking Beyond the Standard-of- Care : The Clinical Trial Option Terry Mamounas, M.D., M.P.H., F.A.C.S. Medical Director, Comprehensive Breast Program UF Health Cancer Center at Orlando Health Professor

More information