Nature Genetics: doi: /ng Supplementary Figure 1. Rates of different mutation types in CRC.

Similar documents
Supplementary Materials for

Identification of neutral tumor evolution across cancer types

Supplementary Figure 1. Estimation of tumour content

AD (Leave blank) TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients

OncoPhase: Quantification of somatic mutation cellular prevalence using phase information

Enterprise Interest Thermo Fisher Scientific / Employee

Nature Genetics: doi: /ng Supplementary Figure 1. Somatic coding mutations identified by WES/WGS for 83 ATL cases.

Clonal Evolution of saml. Johnnie J. Orozco Hematology Fellows Conference May 11, 2012

Neutral evolution in colorectal cancer, how can we distinguish functional from non-functional variation?

Research Strategy: 1. Background and Significance

Supplementary Tables. Supplementary Figures

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R

DNA-seq Bioinformatics Analysis: Copy Number Variation

Rare Variant Burden Tests. Biostatistics 666

Golden Helix s End-to-End Solution for Clinical Labs

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Supplementary Figure 1: Classification scheme for non-synonymous and nonsense germline MC1R variants. The common variants with previously established

Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer

underlying metastasis and recurrence in HNSCC, we analyzed two groups of patients. The

The feasibility of circulating tumour DNA as an alternative to biopsy for mutational characterization in Stage III melanoma patients

Nature Getetics: doi: /ng.3471

LESSON 3.2 WORKBOOK. How do normal cells become cancer cells? Workbook Lesson 3.2

Tumor mutational burden and its transition towards the clinic

Analysis with SureCall 2.1

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first

Myeloma Genetics what do we know and where are we going?

Nature Genetics: doi: /ng Supplementary Figure 1

New Enhancements: GWAS Workflows with SVS

NSAID use and somatic exomic mutations in Barrett s esophagus

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies

Cancer develops after somatic mutations overcome the multiple

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

Nature Genetics: doi: /ng Supplementary Figure 1

p.r623c p.p976l p.d2847fs p.t2671 p.d2847fs p.r2922w p.r2370h p.c1201y p.a868v p.s952* RING_C BP PHD Cbp HAT_KAT11

Field wide development of analytic approaches for sequence data

The Importance of Coverage Uniformity Over On-Target Rate for Efficient Targeted NGS

Supplementary Information. Supplementary Figures

SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization

Global variation in copy number in the human genome

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Importance of minor TP53 mutated clones in the clinic

NGS in tissue and liquid biopsy

CS2220 Introduction to Computational Biology

Clinical Utility of Actionable Genome Information in Precision Oncology Clinic

Performance Characteristics BRCA MASTR Plus Dx

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

Computational Systems Biology: Biology X

LTA Analysis of HapMap Genotype Data

Will now consider in detail the effects of relaxing the assumption of infinite-population size.

Journal: Nature Methods

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

TPMI Presents: Translational Genomics Research Update, Opportunities and Challenges

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

Nature Biotechnology: doi: /nbt.1904

Mendelian Randomization

Integration of Cancer Genome into GECCO- Genetics and Epidemiology of Colorectal Cancer Consortium

Mathematics and Physics of Cancer: Questions. Robijn Bruinsma, UCLA KITP Colloquium May 6, ) Cancer statistics and the multi-stage model.

NGS ONCOPANELS: FDA S PERSPECTIVE

A general framework for analyzing tumor subclonality using SNP array and DNA sequencing data

Cancer Validation in the 100,000 genomes project. Dr Shirley Henderson ACGS spring meeting 06/07/16

Identification of Genetic Determinants of Metastasis and Clonal Relationships Between Primary and Metastatic Tumors

arxiv: v1 [cs.ce] 30 Dec 2014

Supplementary Figures for Roth et al., PyClone: Statistical inference of clonal population structure in cancer

SUPPLEMENTARY INFORMATION

Interactive analysis and quality assessment of single-cell copy-number variations

NGS Types of gene dossier applications UKGTN can evaluate

GENOME-WIDE ASSOCIATION STUDIES

Statistical Genetics

Supplementary Figures

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

Introduction to LOH and Allele Specific Copy Number User Forum

Supplemental Figure legends

SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models

The Biology and Genetics of Cells and Organisms The Biology of Cancer

What can genetic studies tell us about ADHD? Dr Joanna Martin, Cardiff University

CDH1 truncating alterations were detected in all six plasmacytoid-variant bladder tumors analyzed by whole-exome sequencing.

CONTRACTING ORGANIZATION: The Chancellor, Masters and Scholars of the University of Cambridge, Clara East, The Old Schools, Cambridge CB2 1TN

Nature Medicine: doi: /nm.3967

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Bayesian Prediction Tree Models

TP53 mutational profile in CLL : A retrospective study of the FILO group.

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Nature Medicine: doi: /nm.4439

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

- 1 - Reviewers' comments: Reviewer #1 (Remarks to the Author):

Introduction to the Genetics of Complex Disease

DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING. John Archer. Faculty of Life Sciences University of Manchester

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

Targeted qpcr. Debate on PGS Technology: Targeted vs. Whole genome approach. Discolsure Stake shareholder of GENETYX S.R.L

Plasma-Based Molecular Testing for Early Detection of Lung Cancer

Supplementary Methods

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

The Role of Host: Genetic Variation

NGS IN ONCOLOGY: FDA S PERSPECTIVE

November 9, Johns Hopkins School of Medicine, Baltimore, MD,

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012

Genome. Institute. GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon s Cloud. R. Jay Mashl.

Transcription:

Supplementary Figure 1 Rates of different mutation types in CRC. (a) Stratification by mutation type indicates that C>T mutations occur at a significantly greater rate than other types. (b) As for the overall mutation rate (Fig. 1d), the rates of all mutation types were significantly higher in the MSI group.

Supplementary Figure 2 Analysis of neutral evolution in colon cancers is robust to copy number changes. We removed mutations that fell within regions of the genome with altered copy number, that were identified by using SNP arrays paired to the exome-sequenced samples to detect regions of copy number change. (a) The consistently high values of goodness of fit demonstrate that the neutral model is robust to confounding copy number changes. Red line indicates the R 2 =0.98 threshold for calling a tumor neutral. (b) Estimating the mutation rates using only the mutations in copy number devoid regions yields similar results as when all SNVs are included, confirming the robustness of our approach.

Supplementary Figure 3 Validation of the mutational signature across the frequency spectrum. The mutational signature that characterizes the underlying biology is maintained across the frequency spectrum, providing further evidence that the identified somatic variants are reliable and are not due to sequencing errors.

Supplementary Figure 4 Synonymous/nonsynonymous ratio in colon cancers. (a) A random base change within a codon is more likely to result in a nonsynonymous or stop-gain mutation than a synonymous mutation; hence, we expect the mutation rate per division of nonsynonymous mutations to be higher than for synonymous mutations. This is observed in the data. (b) The synonymous and nonsynonymous mutation rates become equivalent after normalizing by the total number of possible synonymous and nonsynonymous mutation sites in the exome, respectively. Therefore, synonymous and nonsynonymous mutations accrue at the same effective rate, consistent with our neutral model of cancer growth.

Supplementary Figure 5 Neutrality analysis in WGS gastric cancers data is robust to copy number changes. (a) The goodness of fit of the neutral model was robust to copy number changes, as results were equivalent when only diploid regions or all genomic regions were considered. We note that the total number of cases considered after CNV filtering is considerably smaller because some samples did not have enough variants in the remaining diploid regions to fit the model. Red line indicates the R 2 =0.98 threshold for calling a tumor neutral. (b) Mutation rates were also consistent when copy number altered regions were discarded in the analysis. We found only a single neutral MSI tumor, so a distribution could not be plotted.

Supplementary Figure 6 Validation of the mutational signature across the frequency spectrum in WGS gastric cancer data. The mutational signature is conserved through the allelic frequency spectrum also in this cohort of whole genome sequenced gastric cancers, thus supporting the authenticity of the called variants.

Supplementary Figure 7 Synonymous/nonsynonymous ratio in whole genome sequenced gastric cancers. Adjusted synonymous versus nonsynonymous mutation rates were consistent with neutral evolution in the gastric cancer cohort.

Supplementary Figure 8 Mutation rates by channel across cancer types. When measured separately between different mutational channels, the mutation rate estimated by the model was consistent with the underlying biology: rates of C>A mutations were higher in lung cancer (tobacco smoking) and C>T mutation rates were higher than other channels in all cancer types.

20 10 0 10 20 R 2 model fit 0.980 0.985 0.990 0.995 1.000 0 2 4 6 8 10 12 14 0 20 40 60 80 100 120 0 50 100 150 0 500 1000 1500 2000 2500 0 20 40 60 80 100 Cumulative number of mutations M(f) 0 631 A R 2 =0.99924 1/0.25 1/0.2 1/0.15 1/0.12 Inverse allelic frequency 1/f C D E F Low mutation rate Large clonal peak 30% normal contamination 1% detection limit G H Supplementary Figure 9 Stochastic simulations of neutral evolution recapitulate the observed NGS data. (a) We produced realistic synthetic NGS data using a stochastic simulation of tumor growth that accounts for the neutral accumulation of mutations in the tumor as well as the different sources of sequencing noise (sampling, sequencing depth and normal contamination). (b) The prediction of the analytical model on the cumulative distribution of subclonal allelic frequencies agrees with the stochastic simulation. We generated synthetic data to test the accuracy with which tumor parameters could be reliably recovered when faced with confounding factors. Illustrative synthetic data are shown for (c) low mutation rate, (d) a high number of clonal mutations, (e) significant normal contamination and (f) a low detection limit. (g) Over 10,000 simulations, the interquartile range of the percentage error in the estimates of the mutation rate is <5%, demonstrating the ability of the analytical model to accurately estimate tumor growth parameters from NGS data. (h) The R 2 values of the fits are consistently high over 10,000 simulations. Unless otherwise stated, the input parameters for the simulation and subsequent sampling were =100 mutations/cell division, =ln(2), detection limit=10%, normal contamination=0%, mean N i =100 and number of clonal mutations=200.

A 20 10 0 10 20 B 20 10 0 10 20 C 60 40 20 0 20 Uncorrected D 50 40 30 20 10 0 10 20 Corrected E I 100 50 0 50 100 20 10 0 10 20 0 200 400 # of Clonal Mutations 25 50 75 100 Read Depth ln(2.0) ln(1.8) ln(1.6) ln(1.4) ln(1.2) λ F J R 2 model fit 0.80 0.85 0.90 0.95 1.00 R 2 model fit 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1% 5 % 10 % Detection Limit 25 50 75 100 Read Depth ln(2.0) ln(1.8) ln(1.6) ln(1.4) ln(1.2) λ G 100 50 0 50 100 0 % 10 % 20 % 30 % 40 % 50 % Normal Contamination 3 10 100 Mutation Rate (\Divison) H R 2 model fit 0.6 0.7 0.8 0.9 1.0 0 % 10 % 20 % 30 % 40 % 50 % Normal Contamination 3 10 100 Mutation Rate (\Divison) Supplementary Figure 10 Robustness of the parameter estimation analysis. By varying the parameters of the simulation, we show that the analytical model can accurately identify neutrality of tumor growth and recover the mutation rate in the face of (a) different numbers of clonal mutations and (b) different detection limits, and that we can correct for normal contamination accurately (to within 5%) for contamination below 30% (c,d). (e,f) The effect of varying read depth. Less accurate mutation rate estimates were achieved at low read depth (<25) and poorer fits of the analytical model. (g,h) The effect of mutation rate: lower mutation rates lead to poorer model fits and a higher variance in the mutation estimate because fewer variants are available to fit the model. (i,j) The effect of growth rate : the variance in the mutation rate estimate increases as tumor growth slows ( decreases) and the fit of the model becomes worse. The offspring probability distributions for the different values of were P = ln(2) = (p 0 = 0, p 1 = 0, p 2 = 1), P = ln(1.8) = (p 0 = 0.05, p 1 = 0.1, p 2 = 0.85), P = ln(1.6) = (p 0 = 0.1, p 1 = 0.2, p 2 = 0.7), P = ln(1.4) = (p 0 = 0.2, p 1 = 0.2, p 2 = 0.6) and P = ln(2) = (p 0 = 0.2, p 1 = 0.4, p 2 = 0.4).

0 5 10 15 20 25 30 35 Cumulative number of mutations M(f) 0 178 0 10 20 30 40 50 60 Cumulative number of mutations M(f) 0 311 0 20 40 60 80 Cumulative number of mutations M(f) 0 265 0 10 20 30 40 Cumulative number of mutations M(f) 0 156 A B C D R 2 =0.95757 R 2 =0.95522 Early selection 2 sub-clones E F 1/0.24 1/0.15 1/0.12 Inverse allelic frequency 1/f G H 1/0.24 1/0.15 1/0.12 Inverse allelic frequency 1/f R 2 =0.95689 R 2 =0.9738 3 sub-clones Change in mutation rate 1/0.24 1/0.15 1/0.12 1/0.24 1/0.15 1/0.12 Inverse allelic frequency 1/f Inverse allelic frequency 1/f Supplementary Figure 11 Clonal expansions and microenvironmental niches produce a deviation from the power-law. By introducing a second population with a 65% fitness advantage (P = (p 0 = 0, p 1 = 0.8, p 2 = 0.2), Q = (p 0 = 0, p 1 = 0.02, p 2 = 0.98)) when the tumor comprises 80 cells, we see a second peak at VAF ~0.2 (a) and a bend in the cumulative distribution plot (b). A subclonal tumor architecture where mutations from the same subclone cluster around allelic frequencies would not show patterns consistent with neutrality, both when we consider two (c,d) and three (e,f) different subclones within a sample. A new phenotypically distinct clone introduced with a tenfold higher mutation rate (20 per division to 200 per division) also produces a deviation from the neutral power-law (g,h).