SUPPLEMENTARY INFORMATION

Similar documents
Accel-Amplicon Panels

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R

Fluxion Biosciences and Swift Biosciences Somatic variant detection from liquid biopsy samples using targeted NGS

Genomic Medicine: What every pathologist needs to know

IntelliGENSM. Integrated Oncology is making next generation sequencing faster and more accessible to the oncology community.

Illumina Trusight Myeloid Panel validation A R FHAN R A FIQ

The Center for PERSONALIZED DIAGNOSTICS

Next generation histopathological diagnosis for precision medicine in solid cancers

Supplementary Figure 1. Estimation of tumour content

SureSelect Cancer All-In-One Custom and Catalog NGS Assays

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Shashikant Kulkarni, M.S (Medicine)., Ph.D., FACMG Associate Professor of Pathology & Immunology Associate Professor of Pediatrics and Genetics

Targeted Agent and Profiling Utilization Registry (TAPUR ) Study. February 2018

Personalised cancer care Information for Medical Specialists. A new way to unlock treatment options for your patients

Plasma-Seq conducted with blood from male individuals without cancer.

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Clinical Grade Genomic Profiling: The Time Has Come

Figure S4. 15 Mets Whole Exome. 5 Primary Tumors Cancer Panel and WES. Next Generation Sequencing

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies

Clinical Grade Biomarkers in the Genomic Era Observations & Challenges

EXAMPLE. - Potentially responsive to PI3K/mTOR and MEK combination therapy or mtor/mek and PKC combination therapy. ratio (%)

Secuenciación masiva: papel en la toma de decisiones

Next Generation Sequencing in Clinical Practice: Impact on Therapeutic Decision Making

Session 4 Rebecca Poulos

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Dr David Guttery Senior PDRA Dept. of Cancer Studies and CRUK Leicester Centre University of Leicester

Predictive biomarker profiling of > 1,900 sarcomas: Identification of potential novel treatment modalities

Out-Patient Billing CPT Codes

Insights from Sequencing the Melanoma Exome

Patricia Aoun MD, MPH Professor and Vice-Chair for Clinical Affairs Medical Director, Clinical Laboratories Department of Pathology City of Hope

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

6/12/2018. Disclosures. Clinical Genomics The CLIA Lab Perspective. Outline. COH HopeSeq Heme Panels

UNIVERSITY OF TORINO DEPARTMENT OF ONCOLOGY. Giorgio V. Scagliotti University of Torino Dipartment of Oncology

COSMIC - Catalogue of Somatic Mutations in Cancer

Reporting TP53 gene analysis results in CLL

Identification and clinical detection of genetic alterations of pre-neoplastic lesions Time for the PML ome? David Sidransky MD Johns Hopkins

Genome. Institute. GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon s Cloud. R. Jay Mashl.

SUPPLEMENTARY INFORMATION. Intron retention is a widespread mechanism of tumor suppressor inactivation.

Session 4 Rebecca Poulos

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

MSI positive MSI negative

The mutations that drive cancer. Paul Edwards. Department of Pathology and Cancer Research UK Cambridge Institute, University of Cambridge

Detecting Oncogenic Mutations in Whole Blood

Nature Genetics: doi: /ng Supplementary Figure 1. Somatic coding mutations identified by WES/WGS for 83 ATL cases.

Supplementary Methods

Integration of Cancer Genome into GECCO- Genetics and Epidemiology of Colorectal Cancer Consortium

Clinically Useful Next Generation Sequencing and Molecular Testing in Gliomas MacLean P. Nasrallah, MD PhD

Dr Yvonne Wallis Consultant Clinical Scientist West Midlands Regional Genetics Laboratory

SUPPLEMENTARY INFORMATION

NeoTYPE Cancer Profiles

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

AD (Leave blank) TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients

Vertical Magnetic Separation of Circulating Tumor Cells and Somatic Genomic-Alteration Analysis in Lung Cancer Patients

Computational Systems Biology: Biology X

CITATION FILE CONTENT/FORMAT

New Drug development and Personalized Therapy in The Era of Molecular Medicine

Click to edit Master /tle style

The Role of Next Generation Sequencing in Solid Tumor Mutation Testing

NGS in tissue and liquid biopsy

Jennifer Hauenstein Oncology Cytogenetics Emory University Hospital Atlanta, GA

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Introduction of an NGS gene panel into the Haemato-Oncology MPN service

August 17, Dear Valued Client:

Supplementary Figure 1. Cytoscape bioinformatics toolset was used to create the network of protein-protein interactions between the product of each

ARTICLE RESEARCH. Macmillan Publishers Limited. All rights reserved

The Cancer Genome Atlas & International Cancer Genome Consortium

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

5 th July 2016 ACGS Dr Michelle Wood Laboratory Genetics, Cardiff

Nature Medicine: doi: /nm.4439

The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley

Analysis with SureCall 2.1

Osamu Tetsu, MD, PhD Associate Professor Department of Otolaryngology-Head and Neck Surgery School of Medicine, University of California, San

Illumina s Cancer Research Portfolio and Dedicated Workflows

Transform genomic data into real-life results

Nature Getetics: doi: /ng.3471

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Variant interpretation exercise. ACGS Somatic Variant Interpretation Workshop Joanne Mason 21/09/18

ACTIVITY 2: EXAMINING CANCER PATIENT DATA

CDH1 truncating alterations were detected in all six plasmacytoid-variant bladder tumors analyzed by whole-exome sequencing.

MEDICAL POLICY Genetic Testing for Breast and Ovarian Cancers

Tumor mutational burden and its transition towards the clinic

Supplementary Figure 1. Copy Number Alterations TP53 Mutation Type. C-class TP53 WT. TP53 mut. Nature Genetics: doi: /ng.

NeoTYPE Cancer Profiles

Diagnostic application of SNParrays to brain cancers

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

APPLICATIONS OF NEXT GENERATION SEQUENCING IN SOLID TUMORS - PATHOLOGIST PROSPECTIVE

EBUS-TBNA Diagnosis and Staging of Lung Cancer

Molecular. Oncology & Pathology. Diagnostic, Prognostic, Therapeutic, and Predisposition Tests in Precision Medicine. Liquid Biopsy.

Advances in Brain Tumor Research: Leveraging BIG data for BIG discoveries

SALSA MLPA probemix P175-A3 Tumour Gain Lot A3-0714: As compared to the previous version A2 (lot A2-0411), nine probes have a small change in length.

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Cancer gene discovery via network analysis of somatic mutation data. Insuk Lee

DNA-seq Bioinformatics Analysis: Copy Number Variation

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

Data mining with Ensembl Biomart. Stéphanie Le Gras

Ten years ago, the idea that all of the genes

Introduction to genetic variation. He Zhang Bioinformatics Core Facility 6/22/2016

p.r623c p.p976l p.d2847fs p.t2671 p.d2847fs p.r2922w p.r2370h p.c1201y p.a868v p.s952* RING_C BP PHD Cbp HAT_KAT11

Transcription:

SUPPLEMENTARY INFORMATION Systematic investigation of cancer-associated somatic point mutations in SNP databases HyunChul Jung 1,2, Thomas Bleazard 3, Jongkeun Lee 1 and Dongwan Hong 1 1. Cancer Genomics Branch, Division of Convergence Technology, National Cancer Center, Gyeonggi-do 41-769, Korea 2. Bioinformatics and Systems Biology Graduate Program, University of California San Diego, 95 Gilman Drive, La Jolla, CA 9293, USA 3. College of Natural Sciences, Seoul National University Graduate School, Seoul 11-799, Korea To whom correspondence should be addressed; E-mail: dwhong@ncc.re.kr Nature Biotechnology: doi:1.138/nbt.2681

Table of Contents Materials and Methods... 3 Supplementary Notes... 6 Supplementary Figures Suppl. Figure 1. The number of overlapped positions supported by at least 1, 5 and 1 tumor samples... 8 Suppl. Figure 2. Mutually exclusive alteration pattern between PIK3CA and TP53... 9 Suppl. Figure 3. For TP53, Kaplan-Meier survival curves for tumor samples with cancerassociated somatic mutations represented in dbsnp or other variants versus wild-type by logrank test... 1 Suppl. Figure 4. Workflow of the proposed comprehensive SNP filtering approach... 11 Supplementary Tables Suppl. Table 1. List of the compiled cancer genomics articles... 12 Suppl. Table 2. List of the cancer-associated somatic mutations represented in dbsnp... 18 Suppl. Table 3. Functional consequence of the cancer-associated somatic mutations represented in dbsnp... 21 Suppl. Table 4. High-confidence filtered cancer-associated somatic mutations represented in dbsnp shown in two example articles... 39 Suppl. Table 5. Analysis of mutually exclusive alteration patterns... 4 Suppl. Table 6. List of patients with cancer-associated somatic mutations represented in dbsnp and other variants in TP53... 41 Suppl. Table 7 Cancer-associated somatic mutations represented in 1 Genomes Project. 42 References... 43 Nature Biotechnology: doi:1.138/nbt.2681

Materials and Methods Eligible cancer genomics articles We selected articles published in Nature, Nature Genetics, Genome Research, and PNAS between January 21 and June 212 that used next generation sequencing technology to study human cancer. In this survey, we focused on cancer genomics articles with whole genome sequencing (WGS) or whole exome sequencing (WES). We selected articles where the identification of point mutations was one of the main parts of the study and was mentioned in the abstract. We excluded articles which only investigated structural variations, copy number variations, or pathogen infections using sequencing data. We selected articles regardless of next generation sequencing platform used, number of samples, cancer type, and point mutation calling algorithm used. Articles were first identified in a PubMed and Google Scholar search. Then, we further searched for articles using the search engine of each journal. Several individuals independently read the methods, and supplementary methods of each paper to search for the SNP filtering approach used in the point mutation calling workflow. Following this inspection, we classified the articles into four categories: (A) Article with filtering; (B) Article with partial filtering; (C) Article with filtering-unknown; and (D) Other. The 'articles with filtering' were those that used the common SNP filtering approach which filtered identified point mutations against public SNP databases such as dbsnp 1 or the 1 Genomes Project database 2. According to descriptions of the SNP filtering, the 'articles with filtering did not use a subset of dbsnp databases such as common SNPs (SNPs with >= 1% minor allele frequency) or Flagged SNPs (Clinically associated SNPs), because most of the articles used old versions of dbsnp (e.g. dbsnp 13) that do not provide the subset. The 'articles with partial filtering' were those that filtered out point mutations using the public databases but which saved for analysis those in disease databases such as COSMIC 3 or OMIM 4. The 'articles with filtering-unknown' were those where we could not find any description of SNP filtering approach in any section of the article. Preprocessing of COSMIC and dbsnp database for extraction of overlapping SNPs We downloaded all SNPs listed in dbsnp135 from the UCSC genome browser 5. We extracted SNPs whose class was single on all chromosomes (n = 47,762,49). Out of the single SNPs, we selected SNPs whose function column contained missense, nonsense, stop-loss, or splice (n=514,7). We also downloaded COSMIC v6 data from the COSMIC web site. We selected point mutations Nature Biotechnology: doi:1.138/nbt.2681

where a hg19 coordinate was available and whose mutation description column had Nonstop extension, Substitution Nonsense, or Substitution Missense. To select only somatic point mutations, we chose point mutations where the mutation somatic status was confirmed somatic variant or reported in another cancer sample as somatic and discarded point mutations of which the status was variant of unknown origin, reported in another sample as germline, not specified, and confirmed germline variant. We removed duplicate mutation entries with the same sample ID and retained just one representative of each. We focused on overlapping non-silent SNPs supported by at least five tumor samples in the main analyses. Prediction of functional consequence of the cancer-associated somatic mutations represented in dbsnp We used three in-silico methods, SIFT 6, PolyPhen2 7, and MutationAssessor 8 to assess the functional impact of the cancer-associated somatic mutations represented in dbsnp database. In cases where positions had several reported variant alleles, we ran the three tools with all reported nucleotide changes. The prediction results can be found in supplementary figure 4 and table 4. Next, we classified each mutation position into functional and non-functional groups. The positions predicted to be functional with relatively low confidence were also classified into the functional group. The positions having multiple prediction results due to several reported variant alleles were classified into the functional group, if one of the nucleotide changes was predicted to be functional. We used PhyloP 9 to assess the degree of conservation. Mutation positions for which the PhyloP score was greater than 1.3 (P <.5) were classified into the functional group. Identification of the high-confidence filtered mutations For the bladder cancer article 1, we first downloaded publicly available raw sequencing data (SRA38181) from the NCBI Sequence Read Archive (SRA) 11. We followed the same alignment and variant calling approach to replicate their variant calling results. We first aligned reads against NCBI reference genome (hg18) using Burrows-Wheeler Alignment (BWA) tool 12 and performed local realignment of the BWA-aligned reads using Genome Analysis Toolkit (GATK) 13. After removing PCR duplicates using Picard, somatic point mutations were called by VarScan 14. We first aligned wholeexome sequencing of 9 bladder tumor samples used in the discovery step through the pipeline Nature Biotechnology: doi:1.138/nbt.2681

described above. To confirm our results, we contacted authors to ask for their point mutation calling results (VarScan output file). The consistency between the results was very high. For example, there was little difference in the number of reads supporting variant alleles and variant allele frequencies. Based on the high consistency in the variant calling results of 9 tumor samples, we decided to analyze the 88 tumor samples using their variant calling results. To select high-confidence filtered cancerassociated somatic mutations represented in dbsnp, we used a list of the validated point mutations in their supplementary materials with the variant calling results. For each tumor sample, we selected the filtered mutations for which the number of reads supporting variant allele and variant allele frequencies were greater than those of at least one confirmed point mutation from the same sample. In cases where tumor samples had too few confirmed mutations for setting the two cutoff values, we did not include the mutations from these tumor samples. For the prostate cancer article 15, we downloaded sequencing data (SRA37395) from NCBI SRA and processed the data with the pipeline described above to detect variants. To select high-confidence filtered cancer-associated somatic mutations represented in dbsnp, we only focused on the detection of homozygous mutations. There were two reasons for this. First, we did not have any reliable cutoff values such as the number of reads supporting variant allele and variant allele frequency from validated mutations. Second, we did not take the same variant calling approaches used in the article. Thus, we selected only homozygous mutations of which variant allele frequency was higher than 95%. Moreover, we manually inspected the identified homozygous mutations with the Integrative Genomics Viewer (IGV) browser 16. Finally, we contacted authors to ask for confirmation of the identified homozygous mutations and they confirmed them. Evaluation of clinical significance of the cancer-associated somatic mutations represented in dbsnp in TP53 We obtained patient survival information in Supplementary Table 1 of the article concerned 17. The 48 patients provided information such as survival (in months) after the diagnosis, first hormone therapy, and first chemotherapy. We first searched for high-confidence non-silent somatic mutations and highlevel copy number alterations in TP53. According to TP53 mutants, we classified the patients into those with the cancer-associated somatic mutations represented in dbsnp; those with other variants such as non-silent point mutations (excluding the cancer-associated somatic mutations represented in Nature Biotechnology: doi:1.138/nbt.2681

dbsnp), frameshift indels, and structural variations (high-level amplifications or deletions); and those without variants (wild-type). The patient (WA1) having both the cancer-associated somatic mutation represented in dbsnp and high-level deletion was excluded in this analysis. Patients with cancerassociated somatic mutations represented in dbsnp or other variants in TP53 did not show significant prognostic difference for survival after the diagnosis and first chemotherapy. Supplementary Notes Investigation of cancer-associated somatic mutations represented in 1 Genomes Project database dbsnp135, which we used in this study, includes 1 Genomes Project Pilot 1,2,3 and Phase 1 data, which are the most recent to be released. Therefore, we searched for mutations reported by 1 Genomes Project data among the cancer-associated somatic mutations represented in dbsnp database (n=257). We found that 9 of the 257 mutations were reported by them and almost half (n=4) of the 9 mutations were predicted to have a functional consequence by at least three out of the four methods. 4 of the 9 mutations were common germline SNPs with MAF of at least 1%. 2 of them, rs181516 (MAF=7.9%) in ATM and rs59912467 (MAF=1.2%) in STK11, were found to be a melanoma susceptibility locus by GWAS and be related to a cancer prone syndrome by OMIM database, respectively. The other two common SNPs might be passenger mutations or their association with cancer might not be revealed yet. In addition, 4 of the remaining 5 SNPs with MAF of less than 1% were flagged as clinically-associated (Supplementary Table 7). For example, rs2893457 generates one of the six well-known hot-spot codons in TP53. In addition, rs181166 in APC and rs59912467 in STK11 are related to multiple colorectal adenomas and cancer prone disorder by OMIM database. Furthermore, we don t exclude the possibility that some of the overlapping SNPs reported by 1 Genomes Project data were erroneously entered into the COSMIC database. Description of the SNP filtering pipeline Our proposed comprehensive SNP filtering approach was implemented in a web-based tool called CSTAR (Cancer genome Sequencing Tool to Acquire Reliable somatic point mutations; http://cstarncc.org), which takes non-silent point mutations in SNP databases as an input (VCF, MAF or tab Nature Biotechnology: doi:1.138/nbt.2681

delimited text file). The pipeline compares the input SNP list to a knowledgebase that is comprised of overlapping SNPs between dbsnp and COSMIC databases. Those point mutations not present in SNP databases are immediately forwarded as candidate mutations. For mutations present in dbsnp, the program references functional consequences predicted by in-silico prediction tools SIFT, Polyphen, Mutation Assessor and Phylop, clinical associations flagged by dbsnp, and disease susceptibility information from the GWAS catalog (http://www.genome.gov/gwastudies/). The first filtering module allows users to create customized cancer-associated variant lists by selecting 1 the required minimum number of tumor samples supporting each mutation in COSMIC, 2 the number of mutations occurred in gene, and 3 the required number of tools predicting damage. The second parameter in particular is designed to aid in identifying cancer driver genes by rescuing mutations that are either clustered in hot spots or scattered along the entire gene. The second module then rescues clinicallyassociated or disease susceptibility SNPs. Finally, the rescued SNP list from the two modules is provided as an output. Supplementary Figure 4 shows the workflow of the proposed comprehensive SNP filtering approach. Nature Biotechnology: doi:1.138/nbt.2681

Supplementary Figures Supplementary Figure 1. The number of overlapped positions supported by at least 1, 5 and 1 tumor samples 514,587 Nature Biotechnology: doi:1.138/nbt.2681

Supplementary Figure 2. Mutually exclusive alteration pattern between PIK3CA and TP53 (P =.3). Tumor samples with or without mutations are labeled in red or blue, respectively. For PIK3CA and TP53, newly identified tumor samples with filtered mutations were marked with asterisks. P values were calculated by two-tailed Fisher exact test Nature Biotechnology: doi:1.138/nbt.2681

Supplementary Figure 3. For TP53, Kaplan-Meier survival curves for tumor samples with cancerassociated somatic mutations represented in dbsnp or other variants versus wild-type by log-rank test. The tumor sample (WA1) having both the cancer-associated somatic mutation represented in dbsnp and other variant was excluded from survival analysis Nature Biotechnology: doi:1.138/nbt.2681

Supplementary Figure 4. Workflow of the proposed comprehensive SNP filtering approach Nature Biotechnology: doi:1.138/nbt.2681

Supplementary Tables Supplementary Table 1. List of the compiled cancer genomics articles Journal Year Category Title Evidence sentences Nature 21 Filtering A comprehensive catalogue of somatic mutations from a human cancer genome In order to allow for any under-called positions in the germline, no observations of that allele were permitted in the germ line, although one call was permitted if the depth was 3. Substitutions corresponding to known SNP positions (dbsnp 129) were excluded. Substitutions were annotated using Ensembl version 52. Nature 21 Filtering A small-cell lung cancer genome with complex signatures of tobacco exposure We used the optimal thresholds defined in point 5 of the power calculations above (based on a mutation prevalence of 8 per Mb, as estimated from capillary sequence data in COSMIC) to determine whether there was sufficient evidence for calling a somatic substitution or not at each base in this preliminary list. Resulting tumour-specific substitutions were further filtered to remove (1) those residing in regions of loss of heterozygosity (LOH) in the normal cell line; (2) those potentially due to misalignment in segmental duplications and near sequence gaps; (3) those corresponding to polymorphic positions in dbsnp; (4) those potentially due to misalignment or miscalls as they are adjacent to SNPs or within 5 bp of insertions and deletions; and (5) those where all supporting reads contained the putative variant in the first or last 5 bp of the read (to reduce effects of misalignment across indels). Substitutions were annotated using Ensembl version 52. Nature 21 Filtering Nature 21 Filtering Genome remodelling in a basallike breast cancer metastasis and xenograft. The mutation spectrum revealed by paired genome sequences from a lung cancer patient We again followed the same procedure as described in Mardis et al(1). Predicted SNVs and Indels were compared to dbsnp 129. For SNVs, we require a position match for determining concordance between the variant and dbsnp 129. In addition, we compared (by position) predicted SNVs with SNPs found in the CEU and YRI trios as determined from the 1, Genomes project. This suggests that excluding SNVs that are only partially called in the normal would have increased the overall validation rate to 78% without a large impact on sensitivity. Further, excluding such loci that are only partially called in the normal would yield only 8,732 tumor-specific SNVs that are also described in dbsnp (i.e. likely false negative calls in the normal genome assembly). Nature Genetics 211 Filtering Frequent somatic mutations in MAP3K5 and MAP3K9 in metastatic melanoma identified by exome sequencing The pileup file of all variations detected in each sample was first compared to all variations annotated in dbsnp132 along with data from the 1 Genomes Project. After this analysis, all newly identified variations were fully annotated. Nature Genetics 211 Filtering Exome sequencing identifies GRIN2A as frequently mutated in melanoma To eliminate common germline mutations from consideration, alterations observed in dbsnp13 or in the 1 Genomes Project 11_21 data release project were removed. Nature Biotechnology: doi:1.138/nbt.2681

Nature Genetics 211 Filtering Exome sequencing identifies somatic mutations of DNA methyltransferase gene DNMT3A in acute monocytic leukemia We used an in-house software system to identify somatic mutations by comparing variants identified in bone marrow exome data set against dbsnp and germline variants present in peripheral blood control samples. Nature Genetics 211 Filtering Frequent mutations of chromatin remodeling genes in transitional cell carcinoma of the bladder To eliminate any previously described germline variants, we cross-referenced potential somatic mutations against the dbsnp13 and SNP datasets of Han Chinese in Beijing (CHB) and Japanese in Toyko (JPT) from the three pilot studies in the 1 Genomes Project. Nature Genetics 211 Filtering Analysis of the coding genome of diffuse large B-cell lymphoma For the tumor samples, only 'high confidence' variants (that is, variants supported by at least one read in one direction and two non-duplicate reads in the opposite direction) were retained, according to the GS Reference Mapper Software algorithm. For the normal samples, a less stringent criterion was applied in that all variants detected in at least one read were considered to be present in the sample. Candidate somatic (that is, tumorspecific) variants were then obtained by removing known population polymorphisms present in the NCBI dbsnp database (Build 132) as well as variants present in the corresponding paired normal DNA. Nature Genetics 211 Filtering Exome sequencing identifies frequent mutation of ARID1A in molecular subtypes of gastric cancer Candidate somatic (that is, tumor-specific) variants were then obtained by removing known population polymorphisms present in the NCBI dbsnp database (Build 132) as well as variants present in the corresponding paired normal DNA. Nature Genetics 211 Filtering Frequent mutations of genes encoding ubiquitin-mediated proteolysis pathway components in clear cell renal cell carcinoma In order to eliminate any previously described germline variants, the somatic mutations were cross-referenced against the dbsnp (version 13) and SNP data sets of Han Chinese in Beijing (CHB) and Japanese in Toyko (JPT) from the three pilot studies in the 1 genomes project (http://www.1genomes.org). Any mutations present in above data sets were filtered out and the remaining mutations were subjected to subsequent analyses. Nature 211 Filtering Nature 211 Filtering PNAS 211 Filtering Frequent mutation of histonemodifying genes in non-hodgkin lymphoma Frequent pathway mutations of splicing machinery in myelodysplasia Whole-exome sequencing of neoplastic cysts of the pancreas reveals recurrent mutations in components of ubiquitindependent pathways Any SNV near gapped alignments or exactly overlapping sites assessed as being polymorphisms (SNPs) were disregarded, including variants matching a position in dbsnp or the sequenced personal genomes of Venter58, Watson59 or the anonymous Asian6 and Yoruban61 individuals. Synonymous variants, polymorphisms registered in the dbsnp131 and 1 genome database, and variants on the intron region except splicing sites were filtered. Duplicate tags were removed, and a mismatched base was identified as a mutation only when (i) it was identified by more than five distinct tags, (ii) the number of distinct tags containing a particular mismatched base was at least 2% of the total distinct tags, (iii) it was not present in >.1% of the tags in the matched normal sample, and (iv) it was not present in SNP databases (dbsnp Build 134 Release, http://www.ncbi.nlm.nih.gov/projects/snp/ and http://browser.1 genomes.org/index.html). Nature Biotechnology: doi:1.138/nbt.2681

PNAS 211 Filtering Nature Genetics 212 Filtering Nature Genetics 212 Filtering Nature 212 Filtering Nature 212 Filtering PNAS 211 Others Nature Genetics 211 Nature Genetics 212 Partial filtering Partial filtering Exome sequencing identifies a spectrum of mutation frequencies in advanced and lethal prostate cancers Integrated analysis of somatic mutations and focal copy-number changes identifies key genes and pathways in hepatocellular carcinoma Whole-genome sequencing of liver cancers identifies etiological influences on mutation patterns and recurrent mutations in chromatin regulators Sequencing of neuroblastoma identifies chromothripsis and defects in neuritogenesis genes Driver mutations in histone H3.3 and chromatin remodelling genes in paediatric glioblastoma Loss-of-function mutations in Notch receptors in cutaneous and lung squamous cell carcinoma Exome sequencing of gastric adenocarcinoma identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes Exome sequencing of liver flukeassociated cholangiocarcinoma A majority of the variants identified by exome sequencing were present within dbsnp. After removing from consideration all variants that were observed in the pilot dataset of the 1 Genomes Project (11, 12) as well as any variants present in any of ~2, additional exomes sequenced at the University of Washington, the number of variants remaining in 2/23 samples was reduced to ~35. Variants were filtered for their coding localization, annotation in dbsnp131 or 1 genomes, somatic and functionally impairment. If a base with consensus quality lower than 2 occurs within 3bp on either side of the target SNV, we discarded the SNVs. After SNV calling in the tumor samples, candidate SNVs were filtered based on the lymphocyte sequence of the same patient; (1) candidate SNV alleles with a frequency.3 after removing reads with base quality < 15, and mapping quality < 2, (2) depth of coverage in lymphocyte 5, (3) depth of coverage in lymphocyte 1 and candidate SNV allele was represented in the dbsnp database v131 (http://www.ncbi.nlm.nih.gov/projects/snp/). Variants, which are reported in dbsnp13, that were found in any of the normal blood samples or that were found within the public genomes from Complete Genomics were removed from the data set. A variant called in a tumour was considered to be a candidate somatic mutation if the matched normal sample had at least 1 reads covering this position and had zero variant reads, and the variant was not reported in dbsnp131 or the 1 Genomes data set (October 211). Tumor samples withouth matched normal samples : To eliminate common germline polymorphisms from consideration, variants that had the same position as variants present in pilot data from the 1, Genomes Project or in 2, exomes corresponding to normal (nontumor, nonxenografted) tissues sequenced at the University of Washington were removed from consideration. ; Tumor samples with matached normal samples : All mutations known in dbsnp were subtracted unless present in COSMIC. To identify somatic mutations, we excluded from our analysis all germline variants found in the dbsnp131 or 1 Genomes Project (4th August 21 release) databases and then subtracted the sequence variants of the normal exomes from the tumor exomes. Any sequence variants found in COSMIC v47, a database of cancer somatic mutations, were retained. We compared our variants against the common polymorphisms present in dbsnp131 and in the 1 Genomes Project databases, in order to discard any common SNPs. Several cancer somatic mutations are also present in dbsnp, and we retained any common variants also found to be present in COSMIC v47. Nature 212 Partial filtering The genetic basis of early T-cell precursor acute lymphoblastic leukaemia High-confidence germline variants that were not found in dbsnp were retained as novel variants. In addition, variants in dbsnp that were also present in OMIM or COSMIC were retained as these variants are likely to be of biologic importance. Nature 212 Partial filtering Novel mutations target distinct subgroups of medulloblastoma Since only tumor samples were sequenced, known germline variations in dbsnp (excluding validated mutations in COSMIS, OMIMSNP and ClinicalVar), NHLBI Exome Sequencing Project (http://evs.gs.washington.edu/evs/downloaded on 11.21.211) and germline variations identified by PCGP were removed. Nature Biotechnology: doi:1.138/nbt.2681

Nature Genetics 211 Unknown High-resolution characterization of a hepatocellular carcinoma genome Nature Genetics 211 Unknown Inactivating mutations of the chromatin remodeling gene ARID2 in hepatocellular carcinoma Nature Genetics 211 Unknown Nature Genetics 211 Unknown Nature Genetics 211 Unknown Nature Genetics 211 Unknown Nature 211 Unknown Nature 211 Unknown Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A- TCF7L2 fusion Recurrent mutations in the U2AF1 splicing factor in myelodysplastic syndromes Somatic histone H3 alterations in pediatric diffuse intrinsic pontine gliomas and non-brainstem glioblastomas Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia A novel recurrent mutation in MITF predisposes to familial and sporadic melanoma Initial genome sequencing and analysis of multiple myeloma Nature 211 Unknown Nature 211 Unknown The genomic complexity of primary human prostate cancer Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia Nature Biotechnology: doi:1.138/nbt.2681

Genome Research 211 Unknown Nature Genetics 212 Unknown Nature Genetics 212 Unknown Nature Genetics 212 Unknown Nature 212 Unknown Nature 212 Unknown Nature 212 Unknown Nature 212 Unknown Nature 212 Unknown Nature 212 Unknown Nature 212 Unknown Whole-exome sequencing of human pancreatic cancers and characterization of genomic instability caused by MLH1 haploinsufficiency and complete deficiency Exome sequencing identifies recurrent somatic MAP2K1 and MAP2K2 mutations in melanoma Somatic histone H3 alterations in pediatric diffuse intrinsic pontine gliomas and non-brainstem glioblastomas Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer A novel retinoblastoma therapy from genomic and epigenetic analyses Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing Exome sequencing identifies frequent mutation of the SWI_SNF complex gene PBRM1 in renal carcinoma Melanoma genome sequencing reveals frequent PREX2 mutations Sequence analysis of mutations and translocations across breast cancer subtypes Whole-genome analysis informs breast cancer response to aromatase inhibition The landscape of cancer genes and mutational processes in breast cancer Nature Biotechnology: doi:1.138/nbt.2681

Nature 212 Unknown Clonal selection drives genetic divergence of metastatic medulloblastoma Genome Research 212 Unknown Whole genome sequencing of matched primary and metastatic acral melanomas PNAS 212 Unknown Discovery and prioritization of somatic mutations in diffuse large B-cell lymphoma (DLBCL) by whole-exome sequencing Nature 212 Unknwon The clonal and mutational evolution spectrum of primary triple-negative breast cancers Nature Biotechnology: doi:1.138/nbt.2681

Supplementary Table 2. List of the cancer-associated somatic mutations represented in dbsnp Gene Chromosome & Position (Hg19) Number of supporting tumor samples rs ID Gene Chromosome & Position (Hg19) Number of supporting tumor samples JAK2 9:57377-57377 29268 rs77375493 TP53 17:757847-757847 25 rs138729528 KRAS 12:25398284-25398284 14685 rs121913529 PTPN11 12:112888198-112888198 25 rs121918453 7:14453136-14453136 13572 rs11348822 CDKN2A 9:2197136-2197136 24 rs121913381 KRAS 12:25398285-25398285 477 rs12191353 TP53 17:7579358-7579358 24 rs1154654 KRAS 12:25398281-25398281 367 rs112445441 PDGFRA 4:5514136-5514136 24 rs12198586 IDH1 2:29113112-29113112 237 rs1219135 EGFR 7:552495-552495 23 rs121913465 PIK3CA 3:17895285-17895285 1459 rs121913279 PTEN 1:897119-897119 23 rs121913294 EGFR 7:55259515-55259515 1423 rs121434568 APC 5:11217527-11217527 23 rs121913462 FGFR3 4:183568-183568 12 rs121913483 EGFR 7:55221822-55221822 23 rs14984192 NRAS 1:115256529-115256529 952 rs1155429 7:14453137-14453137 22 rs121913378 PIK3CA 3:17893691-17893691 793 rs148863 TP53 17:7577139-7577139 22 rs55832599 TP53 17:757846-757846 765 rs28934578 CDKN2A 9:2197117-2197117 21 rs121913386 KIT 4:55599321-55599321 759 rs12191357 HRAS 11:534285-534285 21 rs14894226 IDH1 2:29113113-29113113 631 rs121913499 TP53 17:757759-757759 21 rs121912652 TP53 17:7577538-7577538 597 rs1154652 APC 5:11217539-11217539 2 rs121913328 TP53 17:757712-757712 592 rs28934576 TP53 17:757784-757784 2 rs121912667 NRAS 1:11525653-11525653 555 rs121913254 NF2 22:367836-367836 2 rs74315499 PIK3CA 3:17893682-17893682 492 rs121913273 APC 5:112175426-112175426 19 rs121913326 3:41266124-41266124 489 rs121913412 PTEN 1:8972852-8972852 19 rs12199231 TP53 17:7577539-7577539 461 rs121912651 PIK3CA 3:17893693-17893693 19 rs121913275 NRAS 1:115258747-115258747 432 rs121913237 TSHR 14:8161299-8161299 19 rs28937584 TP53 17:7577121-7577121 427 rs121913343 HRAS 11:533873-533873 18 rs121913496 3:41266137-41266137 395 rs12191349 7:14453145-14453145 18 rs121913366 TP53 17:757794-757794 394 rs28934574 VHL 3:1183725-1183725 18 rs53826 FGFR3 4:18699-18699 388 rs121913485 PIK3CA 3:1789527-1789527 18 rs121913288 TP53 17:7577548-7577548 338 rs28934575 FGFR2 1:123279677-123279677 18 rs79184941 3:41266113-41266113 39 rs12191343 KIT 4:55594258-55594258 17 rs121913523 TP53 17:7577534-7577534 38 rs28934571 VHL 3:1183797-1183797 17 rs5387 HRAS 11:534288-534288 3 rs1489423 7:14481411-14481411 17 rs121913351 KRAS 12:25398282-25398282 299 rs121913535 CDKN2A 9:2197196-2197196 16 rs121913384 3:4126611-4126611 279 rs1219134 CDKN2A 9:21971153-21971153 16 rs121913383 IDH2 15:9631934-9631934 277 rs12191352 CDKN2A 9:2197118-2197118 16 rs11552822 RET 1:43617416-43617416 247 rs74799832 TP53 17:7577511-7577511 16 rs28934577 NRAS 1:115258744-115258744 244 rs121434596 KRAS 12:25398262-25398262 15 rs121913538 FGFR3 4:183564-183564 235 rs121913482 PIK3CA 3:17893674-17893674 15 rs121913285 GNAS 2:5748442-5748442 23 rs11554273 EGFR 7:5523343-5523343 15 rs13923663 TP53 17:757819-757819 224 rs121912666 VHL 3:1183772-1183772 15 rs14893829 FLT3 13:28592642-28592642 223 rs121913488 TP53 17:7578518-7578518 14 rs28934875 3:4126697-4126697 222 rs28931588 CSF1R 5:149433645-149433645 13 rs181271 HRAS 11:533874-533874 214 rs121913233 STK11 19:12721-12721 13 rs121913324 NRAS 1:115258748-115258748 214 rs12191325 TP53 17:7578532-7578532 13 rs28934873 MPL 1:438159-438159 199 rs121913615 APC 5:112151261-112151261 12 rs137854568 rs ID Nature Biotechnology: doi:1.138/nbt.2681

DNMT3A 2:25457242-25457242 184 rs1471633 FGFR3 4:186119-186119 12 rs28931614 3:41266136-41266136 163 rs12191347 NF2 22:35732-35732 12 rs74315496 PDGFRA 4:5515293-5515293 158 rs12198585 7:14481417-14481417 11 rs121913348 PIK3CA 3:17893692-17893692 154 rs121913274 VHL 3:1191488-1191488 11 rs53818 TP53 17:7578461-7578461 151 rs121912654 SRC 2:3631762-3631762 11 rs121913314 TP53 17:7577547-7577547 15 rs121912656 KIT 4:55599333-55599333 11 rs121913682 KRAS 12:2538275-2538275 142 rs1785145 TP53 17:757412-757412 11 rs17882252 GNAQ 9:849488-849488 142 rs121913492 TSHR 14:8161289-8161289 1 rs12198877 APC 5:112175639-112175639 136 rs121913332 STK11 19:1221319-1221319 1 rs121913322 3:4126614-4126614 134 rs28931589 CDKN2A 9:21971177-21971177 1 rs121913382 PTEN 1:8969294-8969294 128 rs12199224 VHL 3:119148-119148 1 rs121913346 IDH2 15:9631838-9631838 127 rs12191353 GNAS 2:57484596-57484596 1 rs121913494 KRAS 12:2538276-2538276 118 rs12191324 APC 5:112164616-112164616 1 rs137854574 TP53 17:7577556-7577556 118 rs121912655 KRAS 12:25398279-25398279 1 rs14894365 TP53 17:757785-757785 113 rs112431538 VHL 3:11882-11882 1 rs53811 TP53 17:757722-757722 11 rs121913344 TSHR 14:8161258-8161258 1 rs12198859 EGFR 7:5524971-5524971 17 rs121434569 MPL 1:43814979-43814979 1 rs121913614 FGFR3 4:18689-18689 17 rs121913479 SMO 7:12885341-12885341 1 rs121918347 3:4126698-4126698 98 rs121913396 ATM 11:18175462-18175462 1 rs181516 TP53 17:7578442-7578442 98 rs14892494 PTPN11 12:11288822-11288822 1 rs121918462 TP53 17:7577124-7577124 92 rs121912657 WT1 11:32413578-32413578 9 rs1219799 NRAS 1:115256528-115256528 92 rs121913255 PTEN 1:89717615-89717615 9 rs12199227 HRAS 11:534289-534289 91 rs14894229 7:14453146-14453146 9 rs121913369 AKT1 14:15246551-15246551 91 rs121434592 TSHR 14:8166172-8166172 9 rs12198878 NRAS 1:115258745-115258745 9 rs121434595 TP53 17:757417-757417 9 rs121912664 KIT 4:5559932-5559932 88 rs12191356 WT1 11:3241791-3241791 9 rs142937387 PTPN11 12:11288821-11288821 85 rs121918464 KRAS 12:2538283-2538283 9 rs121913528 PIK3CA 3:17893694-17893694 84 rs121913286 ERBB2 17:378822-378822 9 rs12191347 TP53 17:7578479-7578479 83 rs28934874 ABL1 9:13374829-13374829 9 rs121913451 GNAS 2:57484421-57484421 79 rs121913495 EGFR 7:55259485-55259485 9 rs14893435 KIT 4:5559361-5559361 79 rs121913517 RB1 13:48941648-48941648 9 rs1219133 CDKN2A 9:2197112-2197112 77 rs121913388 CDKN2A 9:21971116-21971116 9 rs11552823 3:41266112-41266112 77 rs121913228 APC 5:112164586-112164586 8 rs137854573 TP53 17:7577559-7577559 77 rs28934573 STK11 19:122415-122415 8 rs121913323 3:41266125-41266125 76 rs121913413 RET 1:4369948-4369948 8 rs7576352 KRAS 12:25378562-25378562 71 rs121913527 IDH1 2:2918317-2918317 8 rs34218846 TP53 17:757716-757716 69 rs17849781 FGFR2 1:12325834-12325834 8 rs121913476 PTEN 1:89717672-89717672 67 rs12199219 VHL 3:1183764-1183764 8 rs5384 3:4126613-4126613 67 rs121913399 APC 5:112128143-112128143 8 rs62619935 PTEN 1:8969295-8969295 65 rs12199229 TP53 17:7577526-7577526 8 rs121912653 CDKN2A 9:21971186-21971186 65 rs121913387 7:14453132-14453132 8 rs121913365 HRAS 11:534286-534286 63 rs14894228 MET 7:116423414-116423414 8 rs121913246 KIT 4:55593613-55593613 58 rs121913521 TP53 17:757841-757841 8 rs1472414 KIT 4:55593661-55593661 57 rs121913513 WT1 11:3241356-3241356 8 rs28941778 TP53 17:757799-757799 55 rs12191266 VHL 3:119156-119156 7 rs5382 EGFR 7:55259524-55259524 54 rs121913444 DNMT3A 2:254668-254668 7 rs144689354 HRAS 11:533875-533875 54 rs2893346 TSHR 14:816115-816115 7 rs149978216 PIK3CA 3:17895274-17895274 54 rs121913283 VHL 3:1183794-1183794 7 rs11913277 Nature Biotechnology: doi:1.138/nbt.2681

FGFR3 4:187889-187889 51 rs78311289 PTPN11 12:112926884-112926884 7 rs121918458 FGFR3 4:18692-18692 49 rs121913484 APC 5:11217524-11217524 7 rs181166 TP53 17:757755-757755 48 rs28934572 VHL 3:1188245-1188245 7 rs1489383 KIT 4:5559363-5559363 47 rs121913235 KRAS 12:25398255-25398255 7 rs121913236 EGFR 7:5524177-5524177 46 rs28929495 GNAS 2:57484597-57484597 7 rs137854533 FGFR3 4:188331-188331 44 rs12191348 RB1 13:4895376-4895376 7 rs12191332 FLT3 13:28592641-28592641 43 rs12199646 MET 7:11641199-11641199 7 rs563917 PTPN11 12:112888211-112888211 43 rs121918465 CBL 11:119148991-119148991 6 rs192712314 ALK 2:29432664-29432664 41 rs11399487 NF2 22:3788-3788 6 rs7431554 FGFR3 4:18789-18789 41 rs12191315 VHL 3:1183739-1183739 6 rs5382 APC 5:112175423-112175423 38 rs121913329 FKBP9 7:3314327-3314327 6 rs2953555 7:14453134-14453134 37 rs121913364 KIT 4:55561764-55561764 6 rs12191355 PTPN11 12:112888199-112888199 36 rs121918454 VHL 3:1183785-1183785 6 rs53828 FBXW7 4:153247289-153247289 36 rs14968468 JAK3 19:179489-179489 6 rs12191354 KIT 4:5559934-5559934 34 rs121913514 STK11 19:122487-122487 6 rs121913315 7:14453154-14453154 34 rs121913338 WT1 11:32413566-32413566 6 rs121979 PIK3CA 3:17895284-17895284 33 rs121913281 TP53 17:757838-757838 6 rs72661117 PIK3CA 3:178916876-178916876 33 rs121913287 VHL 3:119153-119153 6 rs14893825 KRAS 12:2538277-2538277 33 rs121913238 7:14453193-14453193 6 rs12191337 7:1448142-1448142 33 rs121913355 ERBB2 17:37881-37881 6 rs121913471 APC 5:112173917-112173917 32 rs121913333 KRAS 12:2538282-2538282 5 rs1488629 APC 5:112175576-112175576 32 rs12191333 PTEN 1:89692911-89692911 5 rs12199241 FGFR3 4:186153-186153 32 rs28931615 RET 1:4369949-4369949 5 rs75996173 KIT 4:55593464-55593464 31 rs3822214 WT1 11:32413565-32413565 5 rs1219793 PIK3CA 3:1789529-1789529 31 rs121913277 CDC73 1:19394272-19394272 5 rs121434265 PTEN 1:89711899-89711899 29 rs121913293 RB1 13:4895555-4895555 5 rs12191334 PIK3CA 3:178921553-178921553 29 rs121913284 VHL 3:1191555-1191555 5 rs53823 APC 5:112174631-112174631 29 rs121913331 7:1445315-1445315 5 rs121913341 CDKN2A 9:21971111-21971111 28 rs121913385 TRRAP 7:985982-985982 5 rs147459 CDKN2A 9:2197128-2197128 28 rs121913389 7:1448143-1448143 5 rs121913357 EGFR 7:5524178-5524178 28 rs121913428 KIT 4:55595519-55595519 5 rs121913516 PIK3CA 3:17892798-17892798 28 rs121913272 ZDHHC11 5:833915-833915 5 rs6233211 PTPN11 12:112888166-112888166 27 rs121918461 RB1 13:48955538-48955538 5 rs12191333 KIT 4:55599348-55599348 27 rs121913524 RB1 13:48942685-48942685 5 rs12191331 APC 5:11217533-11217533 27 rs121913327 NF1 17:29576111-29576111 5 rs13785456 NF2 22:332794-332794 26 rs121434259 7:14453149-14453149 5 rs121913361 STK11 19:1223125-1223125 26 rs59912467 CDKN2A 9:2197153-2197153 5 rs137854598 TSHR 14:816976-816976 26 rs12198864 APC 5:112162891-112162891 5 rs13785458 KIT 4:55594221-55594221 25 rs121913512 Nature Biotechnology: doi:1.138/nbt.2681

Supplementary Table 3. Functional consequence of the cancer-associated somatic mutations represented in dbsnp Gene rsid Chromosome & Position (Hg19) Number of supporting tumor samples Ref Var SIFT PolyPhen2 MutationAssessor Phylop Prediction Score Prediction Score Prediction Score Score ABL1 rs121913451 9:13374829-9 C G DAMAGING.5 13374829.95 low.86.85 AKT1 rs121434592 14:15246551-91 C T DAMAGING.1 15246551 1. high 3.85 2.36 ALK rs11399487 2:29432664-41 C T Not scored N/A 29432664 1. medium 3.32 2.56 APC rs121913326 5:112175426-112175426 19 G T Not scored N/A nonsense Nonsense 2.94 APC rs121913327 5:11217533-11217533 27 C T Not scored N/A nonsense Nonsense 2.86 APC rs121913328 5:11217539-11217539 2 C T Not scored N/A nonsense Nonsense 1.57 APC rs121913329 5:112175423-112175423 38 C T Not scored N/A nonsense Nonsense 2.94 APC rs12191333 5:112175576-112175576 32 C T Not scored N/A nonsense Nonsense 2.83 APC rs121913331 5:112174631-112174631 29 C T Nonsense N/A nonsense Nonsense 1.39 APC rs121913332 5:112175639-112175639 136 C T Not scored N/A nonsense Nonsense 1.51 APC rs121913333 5:112173917-112173917 32 C T Nonsense N/A nonsense Nonsense.78 APC rs121913462 5:11217527-11217527 2 G A Not scored N/A benign.1 low 1.4 2.86 APC rs121913462 5:11217527-11217527 21 G T Not scored N/A benign.1 Nonsense 2.86 APC rs137854568 5:112151261-112151261 12 C T Nonsense N/A nonsense Nonsense 2.52 APC rs137854573 5:112164586-112164586 8 C T Nonsense N/A nonsense Nonsense -.3 APC rs137854574 5:112164616-112164616 1 C T Nonsense N/A nonsense Nonsense -.3 APC rs13785458 5:112162891-112162891 5 C T Nonsense N/A nonsense Nonsense 1.46 APC rs181166 5:11217524-11217524 4 G C Not scored N/A benign. neutral.55 1.53 APC rs181166 5:11217524-11217524 3 G T Not scored N/A benign. Nonsense 1.53 APC rs62619935 5:112128143-112128143 8 C T Nonsense N/A nonsense Nonsense 1.28 ATM rs181516 11:18175462-18175462 1 G A TOLERATED.23 benign.4 medium 2.16 2.75 rs11348822 7:14453136-1 A C DAMAGING.82 high 3.96 2.16 Nature Biotechnology: doi:1.138/nbt.2681

rs11348822 rs11348822 rs121913338 rs121913338 rs121913341 rs121913348 rs121913348 rs121913351 rs121913351 rs121913351 rs121913355 rs121913355 rs121913355 rs121913357 rs121913357 rs121913361 rs121913364 rs121913365 rs121913365 rs121913366 rs121913366 rs121913369 rs12191337 rs121913378 14453136 7:14453136-1355 A T DAMAGING 14453136.82 medium 2.28 2.16 7:14453136-12 A G TOLERATED.32 14453136.82 medium 2.58 2.16 7:14453154-31 T C DAMAGING 14453154 1. high 4.29 2.16 7:14453154-3 T A DAMAGING 14453154 1. high 4.64 2.16 7:1445315-5 A C DAMAGING 1445315 1. high 3.94 2.16 7:14481417-6 C T DAMAGING 14481417 1. high 4.49 2.62 7:14481417-5 C A DAMAGING 14481417 1. high 4.49 2.62 7:14481411-12 C A DAMAGING 14481411 1. high 4.63 2.62 7:14481411-4 C T DAMAGING 14481411 1. high 4.63 2.62 7:14481411-1 C G DAMAGING 14481411 1. high 4.63 2.62 7:1448142-5 C T DAMAGING 1448142 1. high 4.23 2.62 7:1448142-11 C A DAMAGING 1448142 1. high 4.58 2.62 7:1448142-17 C G DAMAGING 1448142 1. medium 3.16 2.62 7:1448143-1 C G DAMAGING 1448143 1. high 4.58 2.62 7:1448143-4 C T DAMAGING 1448143 1. high 4.58 2.62 7:14453149-5 C G DAMAGING 14453149 1. high 4.64 2.65 7:14453134-37 T C DAMAGING 14453134.78 medium 2.2 2.16 7:14453132-6 T A DAMAGING 14453132.92 medium 3.44.94 7:14453132-2 T G DAMAGING 14453132.92 medium 3.44.94 7:14453145-11 A C DAMAGING 14453145 1. high 4.42 2.16 7:14453145-7 A T DAMAGING 14453145 1. high 4.42 2.16 7:14453146-9 G C DAMAGING 14453146.93 medium 2.15.3 7:14453193-6 T C DAMAGING.4 14453193 1. medium 3.35 2.16 7:14453137-14453137 2 C A DAMAGING.1 benign.4 low 1.3 2.65 Nature Biotechnology: doi:1.138/nbt.2681

rs121913378 7:14453137-14453137 2 C T DAMAGING benign.4 medium 2.33 2.65 CBL rs192712314 11:119148991-6 G A DAMAGING.5 119148991 1. medium 3.32 2.74 CDC73 rs121434265 1:19394272-19394272 2 C A Nonsense N/A nonsense Nonsense 1.37 CDC73 rs121434265 1:19394272-19394272 3 C G Nonsense N/A nonsense Nonsense 1.37 CDKN2A rs11552822 9:2197118-4 C T TOLERATED.31 2197118 1. medium 2.5 2.81 CDKN2A rs11552822 9:2197118-2 C G TOLERATED.54 2197118 1. medium 2.5 2.81 CDKN2A rs11552822 9:2197118-1 C A TOLERATED 1 2197118 1. medium 2.5 2.81 CDKN2A rs11552823 9:21971116- synonymous in 2 G T DAMAGING 1. 21971116 Uniprot 2.75 CDKN2A rs11552823 9:21971116- synonymous in 7 G A DAMAGING 1. 21971116 Uniprot 2.75 CDKN2A rs121913381 9:2197136-12 C A DAMAGING 2197136 1. medium 1.94 2.81 CDKN2A rs121913381 9:2197136-7 C G DAMAGING 2197136 1. medium 1.94 2.81 CDKN2A rs121913381 9:2197136-5 C T TOLERATED.19 2197136 1. medium 1.94 2.81 CDKN2A rs121913382 9:21971177-21971177 1 C A Nonsense N/A benign.9 low 1.39 -.39 CDKN2A rs121913383 9:21971153-1 C T DAMAGING.2 21971153.97 low 1.4.79 CDKN2A rs121913383 9:21971153-15 C A Nonsense N/A 21971153.97 low 1.4.79 CDKN2A rs121913384 9:2197196-3 C T TOLERATED.36 2197196.64 medium 2.5 2.81 CDKN2A rs121913384 9:2197196-13 C A Nonsense N/A 2197196.64 medium 2.5 2.81 CDKN2A rs121913385 9:21971111-1 G T DAMAGING.2 21971111.8 low 1.85 2.75 CDKN2A rs121913385 9:21971111-27 G A DAMAGING.2 21971111.8 low 1.85 2.75 CDKN2A rs121913386 9:2197117-21 G A DAMAGING 2197117 1. Nonsense 2.81 CDKN2A rs121913387 9:21971186-65 G A Nonsense N/A 21971186.82 low 1.7.73 CDKN2A rs121913388 9:2197112-77 G A Nonsense N/A 2197112.99 medium 2.5 1.46 CDKN2A rs121913389 9:2197128-1 C G TOLERATED.15 2197128.98 medium 2. 2.81 CDKN2A rs121913389 9:2197128-27 C T Nonsense N/A 2197128.98 medium 2. 2.81 CDKN2A rs137854598 9:2197153-2 G T DAMAGING 1. synonymous in 2.75 Nature Biotechnology: doi:1.138/nbt.2681

CDKN2A CSF1R rs137854598 rs181271 rs121913228 rs121913228 rs121913228 rs121913396 rs121913396 rs121913396 rs121913399 rs121913399 rs1219134 rs1219134 rs1219134 rs12191343 rs12191343 rs12191343 rs12191347 rs12191347 rs12191347 rs12191349 rs12191349 rs12191349 rs121913412 rs121913412 2197153 Uniprot 9:2197153- synonymous in 3 G A DAMAGING 1. 2197153 Uniprot 2.75 5:149433645-13 T C DAMAGING 149433645 1. neutral.7 1.98 3:41266112-1 T A DAMAGING 41266112 1. medium 2.57 2.16 3:41266112-59 T G DAMAGING 41266112 1. medium 3.27 2.16 3:41266112-17 T C DAMAGING 41266112 1. medium 3.27 2.16 3:4126698-55 A G DAMAGING 4126698 1. medium 3.29 2.16 3:4126698-28 A T DAMAGING 4126698 1. medium 3.29 2.16 3:4126698-15 A C DAMAGING 4126698 1. medium 3.29 2.16 3:4126613-55 G A DAMAGING 4126613 1. medium 3.29 2.71 3:4126613-12 G C DAMAGING 4126613 1. medium 3.29 2.71 3:4126611-145 C G DAMAGING 4126611 1. medium 3.29 2.71 3:4126611-81 C T DAMAGING 4126611 1. medium 3.29 2.71 3:4126611-53 C A DAMAGING 4126611 1. medium 3.29 2.71 3:41266113-156 C T DAMAGING 41266113 1. medium 3.27 2.71 3:41266113-126 C G DAMAGING 41266113 1. medium 3.27 2.71 3:41266113-27 C A DAMAGING 41266113 1. medium 3.27 2.71 3:41266136-41266136 1 T A DAMAGING.3 benign.13 medium 2.41 2.25 3:41266136-41266136 1 T G DAMAGING benign.13 medium 3.22 2.25 3:41266136-41266136 152 T C DAMAGING benign.13 medium 3.22 2.25 3:41266137-15 C A DAMAGING 41266137 1. medium 3.22 2.8 3:41266137-16 C G DAMAGING 41266137 1. medium 3.22 2.8 3:41266137-364 C T DAMAGING 41266137 1. medium 3.22 2.8 3:41266124-3 A T TOLERATED.16 41266124.49 medium 2.69 2.25 3:41266124-5 A C DAMAGING 41266124.49 medium 3.24 2.25 Nature Biotechnology: doi:1.138/nbt.2681

rs121913412 3:41266124-481 A G DAMAGING 41266124.49 medium 3.24 2.25 rs121913413 3:41266125-2 C G TOLERATED.16 41266125.49 medium 2.69 2.8 rs121913413 3:41266125-69 C T DAMAGING 41266125.49 medium 3.24 2.8 rs121913413 3:41266125-5 C A DAMAGING.4 41266125.49 medium 3.24 2.8 rs28931588 3:4126697-115 G T DAMAGING 4126697 1. medium 3.29 2.71 rs28931588 3:4126697-38 G C DAMAGING 4126697 1. medium 3.29 2.71 rs28931588 3:4126697-69 G A DAMAGING 4126697 1. medium 3.29 2.71 rs28931589 3:4126614-65 G A DAMAGING 4126614 1. medium 3.29 2.71 rs28931589 3:4126614-69 G T DAMAGING 4126614 1. medium 3.29 2.71 DNMT3A rs144689354 2:254668-7 G A DAMAGING 254668 1. high 3.61 1.26 DNMT3A rs1471633 2:25457242-184 C T DAMAGING.3 25457242.65 medium 2.83 2.69 EGFR rs121434568 7:55259515-1422 T G DAMAGING 55259515 1. high 4.1 2.15 EGFR rs121434568 7:55259515-1 T A DAMAGING 55259515 1. high 4.1 2.15 EGFR rs121434569 7:5524971-17 C T DAMAGING 5524971 1. low 1.74 2.8 EGFR rs121913428 7:5524178-26 G C DAMAGING 5524178 1. high 4.6 2.54 EGFR rs121913428 7:5524178-2 G A DAMAGING 5524178 1. high 4.6 2.54 EGFR rs121913444 7:55259524-6 T G DAMAGING 55259524 1. high 3.54 2.22 EGFR rs121913444 7:55259524-48 T A DAMAGING 55259524 1. medium 2.85 2.22 EGFR rs121913465 7:552495-1 G A DAMAGING 552495 1. medium 2.14 2.73 EGFR rs121913465 7:552495-22 G T DAMAGING.1 552495 1. medium 3.24 2.73 EGFR rs13923663 7:5523343-15 G T DAMAGING.1 5523343 1. medium 3.15 2.77 EGFR rs14893435 7:55259485-9 C T DAMAGING 55259485 1. high 3.54 2.75 EGFR rs14984192 7:55221822-3 C A DAMAGING 55221822 1. medium 1.99 2.82 EGFR rs14984192 7:55221822-2 C T DAMAGING 55221822 1. medium 3.38 2.82 EGFR rs28929495 7:5524177-26 G A DAMAGING 1. high 4.6 2.75 Nature Biotechnology: doi:1.138/nbt.2681

5524177 EGFR rs28929495 7:5524177-2 G T DAMAGING 5524177 1. high 4.6 2.75 ERBB2 rs12191347 17:378822-9 T C DAMAGING 378822 1. high 3.94 1.92 ERBB2 rs121913471 17:37881-5 G T TOLERATED.47 37881.81 low 1.12 2.42 ERBB2 rs121913471 17:37881-1 G A TOLERATED.32 37881.81 low 1.86 2.42 FBXW7 rs14968468 4:153247289-2 G C DAMAGING 153247289 1. high 3.55 1.56 FBXW7 rs14968468 4:153247289-34 G A DAMAGING 153247289 1. high 3.55 1.56 FGFR2 rs121913476 1:12325834-2 A T 12325834 1. medium 2.89.24 FGFR2 rs121913476 1:12325834-6 A C 12325834 1. medium 2.89.24 FGFR2 rs79184941 1:123279677-18 G C.4 123279677.99 high 3.89 2.74 FGFR3 rs12191315 4:18789-18789 5 A C 1. high 4.25.55 FGFR3 rs12191315 4:18789-18789 36 A T 1. high 4.45.55 FGFR3 rs121913479 4:18689-18689 17 G T.1 benign.1 medium 2.54.38 FGFR3 rs12191348 4:188331-188331 44 G T 1. high 4.75 2.16 FGFR3 rs121913482 4:183564-183564 235 C T 1. high 3.81 1.89 FGFR3 rs121913483 4:183568-183568 12 C G.1 1. high 3.54 1.89 FGFR3 rs121913484 4:18692-18692 49 A T.96 medium 2.57.2 FGFR3 rs121913485 4:18699-18699 388 A G.1.99 medium 3.6.67 FGFR3 rs28931614 4:186119-186119 12 G A.3.96 medium 2.28 1.2 FGFR3 rs28931615 4:186153-186153 32 C A TOLERATED.6.63 medium 2.26.9 FGFR3 rs78311289 4:187889-187889 5 A C 1. high 3.9 1.69 FGFR3 rs78311289 4:187889-187889 46 A G 1. high 4.25 1.69 FKBP9 rs2953555 7:3314327-6 G A DAMAGING.1 3314327 1. high 3.81 2.45 FLT3 rs12199646 13:28592641-43 T A DAMAGING 28592641 1. high 3.93 2.25 FLT3 rs121913488 13:28592642-6 C T DAMAGING 28592642.96 medium 2.9 2.79 Nature Biotechnology: doi:1.138/nbt.2681

FLT3 rs121913488 13:28592642-188 C A DAMAGING 28592642.96 medium 3.24 2.79 FLT3 rs121913488 13:28592642-29 C G DAMAGING 28592642.96 medium 3.39 2.79 GNAQ rs121913492 9:849488-1 T C 849488 1. high 4.65 2.14 GNAQ rs121913492 9:849488-78 T A 849488 1. high 4.65 2.14 GNAQ rs121913492 9:849488-63 T G 849488 1. high 4.65 2.14 GNAS rs11554273 2:5748442-5 C A DAMAGING 5748442 1. high 4.37 2.59 GNAS rs11554273 2:5748442-225 C T DAMAGING 5748442 1. high 4.37 2.59 GNAS rs121913494 2:57484596-1 A T DAMAGING 57484596 1. high 4.34.98 GNAS rs121913495 2:57484421-1 G T DAMAGING 57484421 1. high 4.37 2.59 GNAS rs121913495 2:57484421-78 G A DAMAGING 57484421 1. high 4.37 2.59 GNAS rs137854533 2:57484597-7 G T DAMAGING 57484597 1. high 4..84 HRAS rs14894226 11:534285-534285 1 C T benign.29 high 4.17 1.98 HRAS rs14894226 11:534285-534285 11 C A benign.29 high 4.17 1.98 HRAS rs14894228 11:534286-534286 6 C A.5 high 4.17 1.98 HRAS rs14894228 11:534286-534286 57 C G.5 high 4.17 1.98 HRAS rs14894229 11:534289-534289 12 C G.2.53 medium 3.16 1.98 HRAS rs14894229 11:534289-534289 23 C A.1.53 medium 3.36 1.98 HRAS rs14894229 11:534289-534289 56 C T.1.53 medium 3.36 1.98 HRAS rs1489423 11:534288-534288 8 C G.86 high 4.5 1.98 HRAS rs1489423 11:534288-534288 41 C T.86 high 4.5 1.98 HRAS rs1489423 11:534288-534288 251 C A.1.86 medium 3.36 1.98 HRAS rs121913233 11:533874-533874 13 T A.1 benign.1 high 4.71 1.66 HRAS rs121913233 11:533874-533874 111 T C.2 benign.1 high 4.71 1.66 HRAS rs121913496 11:533873-533873 6 C G.1 benign.3 high 4.1 -.8 HRAS rs121913496 11:533873-533873 12 C A.1 benign.3 high 4.1 -.8 Nature Biotechnology: doi:1.138/nbt.2681