Nature Genetics: doi: /ng Supplementary Figure 1

Similar documents
Identification of heritable genetic risk factors for bladder cancer through genome-wide association studies (GWAS)

Supplementary Figure S1A

Global variation in copy number in the human genome

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations.

Association-heterogeneity mapping identifies an Asian-specific association of the GTF2I locus with rheumatoid arthritis

Supplemental Figure legends

Supplementary Materials for

Supplementary information. Supplementary figure 1. Flow chart of study design

Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer

CS2220 Introduction to Computational Biology

indicated in shaded lowercase letters (hg19, Chr2: 217,955, ,957,266).

Supplementary Figures

Supplementary Figure 1 Dosage correlation between imputed and genotyped alleles Imputed dosages (0 to 2) of 2-digit alleles (red) and 4-digit alleles

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Additional Disclosure

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

Nature Genetics: doi: /ng Supplementary Figure 1

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

GENOME-WIDE ASSOCIATION STUDIES

What Do We Know About Individual Variability and Its

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

Assessing Accuracy of Genotype Imputation in American Indians

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

Identification and replication of the interplay of four genetic high-risk variants for urinary bladder cancer

Genomic structural variation

Drug Metabolism Disposition

Pirna Sequence Variants Associated With Prostate Cancer In African Americans And Caucasians

Big Data Training for Translational Omics Research. Session 1, Day 3, Liu. Case Study #2. PLOS Genetics DOI: /journal.pgen.

Table S2. Baseline characteristics of the Rotterdam Study for interaction analysis

Genomics 101 (2013) Contents lists available at SciVerse ScienceDirect. Genomics. journal homepage:

Supplementary Information. Common variants associated with general and MMR vaccine-related febrile seizures

Introduction to Genetics and Genomics

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed.

Large-scale genotyping identifies 41 new loci associated with breast cancer risk

Bin Liu, Lei Yang, Binfang Huang, Mei Cheng, Hui Wang, Yinyan Li, Dongsheng Huang, Jian Zheng,

Genome-wide association studies for human narcolepsy and other complex diseases

Enterprise Interest Thermo Fisher Scientific / Employee

Ct=28.4 WAT 92.6% Hepatic CE (mg/g) P=3.6x10-08 Plasma Cholesterol (mg/dl)

New Enhancements: GWAS Workflows with SVS

Accel-Amplicon Panels

Session 4 Rebecca Poulos

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S.

Supplementary Materials for

The lymphoma-associated NPM-ALK oncogene elicits a p16ink4a/prb-dependent tumor-suppressive pathway. Blood Jun 16;117(24):

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Session 4 Rebecca Poulos

CDH1 truncating alterations were detected in all six plasmacytoid-variant bladder tumors analyzed by whole-exome sequencing.

# For the GWAS stage, B-cell NHL cases which small numbers (N<20) were excluded from analysis.

NGS in tissue and liquid biopsy

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Supplementary Figure 1. Estimation of tumour content

Figure S4. 15 Mets Whole Exome. 5 Primary Tumors Cancer Panel and WES. Next Generation Sequencing

Discovery Dataset. PD Liver Luminal B/ Her-2+ Letrozole. PD Supraclavicular Lymph node. PD Supraclavicular Lymph node Luminal B.

Supplementary information

Plasma-Seq conducted with blood from male individuals without cancer.

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Association of BRCA2 variants with cardiovascular disease in Saudi Arabia

Supplementary Figures

Supplementary Tables. Supplementary Figures

TITLE: A Genome-wide Breast Cancer Scan in African Americans. CONTRACTING ORGANIZATION: University of Southern California, Los Angeles, CA 90033

Nature Getetics: doi: /ng.3471

DNA-seq Bioinformatics Analysis: Copy Number Variation

MRC-Holland MLPA. Description version 18; 09 September 2015

Hands-On Ten The BRCA1 Gene and Protein

LTA Analysis of HapMap Genotype Data

Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder)

Supplementary Information Titles Journal: Nature Medicine

Supplementary webappendix

GWAS of HCC Proposed Statistical Approach Mendelian Randomization and Mediation Analysis. Chris Amos Manal Hassan Lewis Roberts Donghui Li

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Supplementary Figure 1. Quantile-quantile (Q-Q) plots. (Panel A) Q-Q plot graphical

RNA SEQUENCING AND DATA ANALYSIS

Supplementary Figure 1. Schematic diagram of o2n-seq. Double-stranded DNA was sheared, end-repaired, and underwent A-tailing by standard protocols.

Introduction to the Genetics of Complex Disease

Nature Neuroscience: doi: /nn Supplementary Figure 1. Missense damaging predictions as a function of allele frequency

SUPPLEMENTARY DATA. 1. Characteristics of individual studies

CONTENT SUPPLEMENTARY FIGURE E. INSTRUMENTAL VARIABLE ANALYSIS USING DESEASONALISED PLASMA 25-HYDROXYVITAMIN D. 7

SUPPLEMENTAL INFORMATIONS

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

RECENT ADVANCES IN THE MOLECULAR DIAGNOSIS OF BREAST CANCER

Supplementary information for: A functional variation in BRAP confers risk of myocardial infarction in Asian populations

TITLE: Unique Genomic Alterations in Prostate Cancers in African American Men

A Genome-wide Association Study in Han Chinese Identifies Multiple. Susceptibility loci for IgA Nephropathy. Supplementary Material

Conditions. Name : dummy Age/sex : xx Y /x. Lab No : xxxxxxxxx. Rep Centre : xxxxxxxxxxx Ref by : Dr. xxxxxxxxxx

UNIVERSITY OF CALIFORNIA, LOS ANGELES

Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1. Study design.

A common genetic variant of 5p15.33 is associated with risk for prostate cancer in the Chinese population

MSI positive MSI negative

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

Supplementary Figures

Doing more with genetics: Gene-environment interactions

SUPPLEMENTARY INFORMATION

Transcription:

Supplementary Figure 1 LD (r 2 ) between the A3AB deletion and all markers in a 400-kb APOBEC3 region in 1000 Genomes Project populations. Populations: CEU, individuals of European ancestry from Utah, only samples that overlapped with the HapMap set are used here; CHB, Chinese from Beijing; JPT, Japanese from Tokyo. All samples were genotyped with a CNV assay; genotypes for all other markers were generated by the 1000 Genomes Project (October 2014 release). SNP rs12628403 is the only marker that tags the deletion in Europeans and Japanese (r 2 = 1.0) and Chinese (r 2 = 0.95). In the Yoruba (YRI) panel of the 1000 Genomes Project, the CNV is weakly polymorphic (4.2%) while rs12628403 is monomorphic, and LD metrics could not be calculated. SNP rs12628403 was also genotyped by a custom TaqMan assay (Supplementary Note), and all genotypes were 100% concordant with data from the 1000 Genomes Project.

Supplementary Figure 2 LD (r 2 ) between the A3AB deletion, its proxy SNP rs12628403, and all GWAS-genotyped and imputed markers within a 400-kb APOBEC3 genomic region on chromosome 22q13.1. The plot is based on 848 genotyped or imputed markers in 1,837 samples from individuals of European ancestry from the PLCO study in which the deletion (gray box) was genotyped by a CNV assay and its proxy SNP, rs12628403, was genotyped by a TaqMan genotyping assay. In this set, the CNV and rs12628403 have r 2 = 0.92 and D = 0.97. Because of low LD with other markers in this region (best r 2 ~0.2), the deletion and its proxy SNP, rs12628403, cannot be imputed and have to be genotyped.

Supplementary Figure 3 Electrophoretic mobility shift assays for SNP rs1014971 with nuclear extracts from bladder cancer cell line HTB-9 and breast cancer cell lines MDA-MB-231 and T-47D.

Supplementary Figure 4 Expression of selected APOBEC3 genes (A3A, A3B, and A3G) in GTEx. Expression analysis in 8,555 samples (53 normal human tissues from 544 donors) based on data generated by the Genotype-Tissue Expression (GTEx) Project. Expression is measured by RNA seq and presented as normalized log 10 (FPKM) values. Expression in bladder and breast tissue samples is marked by red boxes. Data for colon transverse tissue were available only for A3A and are labeled separately, while data for expression of A3A were not available for adipose visceral tissue.

Supplementary Figure 5 Expression of selected APOBEC3 genes (A3A, A3B, and A3G) in bladder cancer cell lines RT-4 and HTB-9 infected with Sendai virus (SeV) or treated with the DNA-damaging drug bleomycin (Bleo). (a,c) Increase in expression of a viral-specific RNA shows that cells were successfully infected with SeV. (b,d) In untreated cells, baseline expression of A3A is significantly lower than that of A3B and A3G. (e,g) A3A and A3G but not A3B are significantly induced after 12 h of SeV infection. (f,h) Expression of A3A, A3B, and A3G is significantly induced by 24 h of treatment with bleomycin as compared to untreated (UT) samples. Plots present expression values ( C t, log 2 scale) for targets (A3A, A3B, and A3G) normalized by the geometric mean of expression for two endogenous controls (GAPDH and PPIA). Dotted lines indicate the lower level of detection for the targets a C t value of 40 was assigned to samples for which expression was not detected by 40 cycles of qrt PCR; individual plot points for these samples are defined by the levels of expression of endogenous controls. All experiments were performed in biological triplicate. P values are for two-sided t tests. Shown are values for individual replicates and means. Raw data are available in Supplementary Data 2.

Supplementary Figure 6 Expression of selected APOBEC3 genes (A3A, A3B, and A3G) in breast cancer cell lines MDA-MB-231 and T-47D infected with Sendai virus or treated with the DNA-damaging drug bleomycin. (a,c) Increase in expression of a viral-specific RNA shows that cells were successfully infected with SeV. (b,d) In untreated cells, A3B expression is significantly higher than that of A3A and A3G. (e,g) Only A3A in MDA-MB-231 cells and all APOBEC genes in T-47D cells are significantly induced after 12 h of SeV infection. (f,h) Only A3B and A3G in MDA-MB-231 cells and all APOBEC genes in T-47D cells are significantly induced by 24 h of treatment with bleomycin as compared to untreated (UT) samples. Plots present expression values ( C t, log 2 scale) for targets (A3A, A3B, and A3G) normalized by the geometric mean of expression for two endogenous controls (GAPDH and PPIA). Dotted lines indicate the lower level of detection for the targets a C t value of 40 was assigned to samples for which expression was not detected by 40 cycles of qrt PCR; individual plot points for these samples are defined by the level of expression of endogenous controls. All experiments were performed in biological quadruplicate. P values are for two-sided t tests. Shown are values for individual replicates and means. NE, not expressed in all samples. Raw data are available in Supplementary Data 2.

Supplementary Figure 7 APOBEC mutagenesis and SNP rs17000526 as predictors of overall survival for patients with breast cancer in TCGA. (a h) Results are presented separately for patients with ER + (a d) and ER (e h) tumors. (a,e) Overall survival in relation to quartiles of APOBEC-signature mutation counts. (b,f) Overall survival in relation to APOBEC mutagenesis pattern classified as no or yes (at least one mutation present). (c,g) Overall survival in relation to APOBEC mutagenesis pattern classified as no, low mutation counts (1 48 mutations) or high mutation counts ( 49 mutations; based on the median in bladder tumors, presented in Fig. 5b). (d,h) Overall survival in relation to rs17000526. Hazards ratios and P values are for multivariate Cox regression models that also include age and tumor stage as core variables.

Supplementary Information Contents (Middlebrooks, Banday et. al.) Section detail Page Supplementary Tables: Supplementary Table 1 Supplementary Table 2 Supplementary Table 3 Supplementary Table 4 Supplementary Table 5 Supplementary Table 6 Supplementary Table 7 Supplementary Table 8 Supplementary Table 9 Supplementary Table 10 Supplementary Table 11 Association with bladder cancer risk for top genotyped or imputed markers within 1 Mb of the chr22q13.1 region (separate Excel file) Genotypes of A3AB deletion (CNV) and SNP rs12628403 in HapMap populations (separate Excel file) Association of SNP rs1014971 and A3AB deletion with bladder cancer risk in 1,719 bladder cancer cases and 2,566 controls of European ancestry from NCI-GWAS1 and 1,116 bladder cancer cases and 945 controls from Japan Exploratory analysis of association between SNP rs17000526 and expression of all gene isoforms within a 400 Kb APOBEC3 region in 357 bladder tumors in TCGA Exploratory analysis of association between SNP rs17000526 and expression of all gene isoforms within a 400 Kb APOBEC3 region in 541 breast tumors in TCGA Association of CpG site methylation with expression of major A3B isoform (uc003awo.1) in 357 bladder tumors in TCGA Association of all bladder cancer GWAS signals with counts of APOBEC-signature mutations in 357 bladder tumors in TCGA Effect of mrna expression of all APOBEC3 isoforms on APOBEC mutagenesis in TCGA bladder tumors Effect of mrna expression of all APOBEC3 isoforms on APOBEC mutagenesis in TCGA breast tumors Induction of known interferon-stimulated genes (ISGs) by infection with Sendai virus (SeV) or treatment with a DNAdamaging drug bleomycin (Bleo) in bladder cancer cell line HTB-9 and breast cancer cell line MCF7 APOBEC mutagenesis, SNP rs17000526 and expression of APOBEC3 isoforms as predictors of overall survival (OS) of bladder cancer patients in TCGA --- --- 3 4 5 6 7-8 9 10 11-12 13-14 1

Supplementary Table 12 Supplementary Table 13 Supplementary Table 14 Supplementary Table 15 Supplementary Table 16 Supplementary Table 17 Supplementary Table 18 Effect of somatic mutations in TP53 on overall survival (OS) of bladder cancer patients predicted by APOBEC mutagenesis and SNP rs17000526 in TCGA Distribution of somatic mutations in PIK3CA in TCGA bladder tumors in rs17000526 genotype groups Effect of somatic mutations in TP53 on overall survival (OS) of breast cancer patients predicted by APOBEC mutagenesis and SNP rs17000526 in TCGA Distribution of A3AB deletion genotypes in controls of European ancestry Distribution of A3AB deletion genotypes in controls from Japan Demographic, clinical, and genetic data for bladder and breast cancer patients in TCGA Correlation of A3AB deletion status with expression of main A3A and A3B isoforms in subsets of bladder and breast tumors in TCGA 15 16 17 18 19 20 21 Supplementary Table 19 Distribution of A3AB deletion genotypes in subsets of bladder and breast tumors in TCGA and in HapMap samples Supplementary Table 20 Oligos for EMSA probes 23 Supplementary Note: 24 22 Association of SNP rs1014971 with breast cancer risk in Breast Cancer Association Consortium (BCAC) 24 SNP rs12628403 as A3AB deletion proxy 25 Detailed analysis of survival in TCGA breast cancer patients 26 URLs 27 References 27 2

Supplementary Tables Supplementary Table 3. Association of SNP rs1014971 and A3AB deletion with bladder cancer risk in 1,719 bladder cancer cases and 2,566 controls of European ancestry from NCI-GWAS1 and 1,116 bladder cancer cases and 945 controls from Japan European ancestry* 1,719 cases and 2,566 controls Japanese 1,116 cases and 945 controls Variant Alleles, protective underlined Protective allele, % cases /controls OR (95%CI), P-value Adjusted for rs1014971 Adjusted for A3AB deletion Protective allele, % cases/ controls OR (95%CI), P-value Adjusted for rs1014971 Adjusted for A3AB deletion rs1014971 T/C 28.1/ 34.1 0.84 (0.76-0.92) P=3.13E-04-0.85 (0.77-0.93) P=1.02E-03 55.2/ 56.7 0.84 (0.74-0.95) P=7.50E-03-0.95 (0.74-0.98) P=2.00E-02 A3AB deletion I/D 5.8/ 6.9 0.82 (0.67-0.99) P=0.040 0.88 (0.72-1.07) P=0.210-25.9/ 27.6 0.87 (0.73-1.04) P=0.128 0.96 (0.79-1.16) P=0.670 - Results are for a dominant genetic model for A3AB deletion genotypes: deletion absence (I/I) vs. deletion presence (I/D or D/D) and for an additive genetic model for SNP rs1014971. *ORs and p-values are adjusted for age, sex, study site (SPBC, Spain and PLCO, USA) and smoking status (ever/never) 3

Supplementary Table 4. Exploratory analysis of association between SNP rs17000526 and expression of all gene isoforms within a 400 Kb APOBEC3 region in 357 bladder tumors in TCGA Gene Isoform ID, UCSC* Isoform annotation Beta-coefficient P-value APOBEC3A uc011aob.1 minor isoform 0.07 0.292 APOBEC3A uc003awn.2 major isoform 0.11 0.141 APOBEC3A uc011aoc.1 A3AB deletion isoform 0.00 0.982 APOBEC3B uc003awo.1 major isoform 0.29 8.91E-05 APOBEC3B uc003awp.1 minor isoform 1-0.10 0.009 APOBEC3B uc003awq.1 minor isoform 2-0.03 0.600 APOBEC3C uc003awr.2 - -0.02 0.808 APOBEC3D uc011aoe.1-0.01 0.858 APOBEC3D uc011aod.1-0.04 0.632 APOBEC3D uc003awt.3 - -0.03 0.677 APOBEC3F uc003aww.2-0.09 0.224 APOBEC3F uc011aog.1-0.04 0.634 APOBEC3F uc003awv.2 - -0.01 0.860 APOBEC3G uc003awy.2 - -0.04 0.599 APOBEC3G uc003awx.2-0.03 0.710 APOBEC3H uc011aoh.1-0.08 0.311 APOBEC3H uc003axa.3-0.04 0.632 APOBEC3H uc011aoi.1 - -0.01 0.787 CBX6 uc003awm.1-0.13 0.085 CBX6 uc003awl.2-0.05 0.539 CBX7 uc003axc.2 - -0.14 0.076 CBX7 uc003axb.2 - -0.08 0.286 DNAL4 uc003awj.2 - -0.02 0.740 NPTXR uc003awk.2-0.05 0.499 SUN2 uc003awh.1-0.07 0.283 SUN3 uc003awi.1 - -0.03 0.674 SUN4 uc010gxq.1-0.01 0.920 SUN6 uc011aoa.1-0.03 0.658 *UCSC - University of California Santa Cruz genome browser. Beta-coefficients represent increase or decrease of quantile-normalized 1 log10 mrna expression per risk allele of SNP rs17000526, adjusting for age, sex, and race. Bolded results are significant after Bonferroni multiple test correction (alpha = 1.79E-03). 4

Supplementary Table 5. Exploratory analysis of association between SNP rs17000526 and expression of all gene isoforms within a 400 Kb APOBEC3 region in 541 breast tumors in TCGA Gene Isoform ID, UCSC* Isoform annotation Beta-coefficient P-value APOBEC3A uc011aob.1 minor isoform 0.12 0.024 APOBEC3A uc003awn.2 major isoform 0.11 0.075 APOBEC3A uc011aoc.1 A3AB deletion isoform -0.03 0.599 APOBEC3B uc003awo.1 major isoform 0.20 1.23E-03 APOBEC3B uc003awp.1 minor isoform 1-0.05 0.191 APOBEC3B uc003awq.1 minor isoform 2-0.02 0.612 APOBEC3C uc003awr.2-0.08 0.226 APOBEC3D uc011aoe.1 - -0.02 0.635 APOBEC3D uc011aod.1-0.11 0.070 APOBEC3D uc003awt.3-0.10 0.132 APOBEC3F uc003aww.2-0.07 0.278 APOBEC3F uc011aog.1-0.14 0.024 APOBEC3F uc003awv.2-0.08 0.233 APOBEC3G uc003awy.2-0.09 0.178 APOBEC3G uc003awx.2-0.06 0.393 APOBEC3H uc011aoh.1-0.04 0.507 APOBEC3H uc003axa.3-0.12 0.067 APOBEC3H uc011aoi.1-0.06 0.027 CBX6 uc003awm.1-0.10 0.113 CBX6 uc003awl.2-0.09 0.154 CBX7 uc003axc.2 - -0.03 0.674 CBX7 uc003axb.2-0.06 0.376 DNAL4 uc003awj.2-0.06 0.362 NPTXR uc003awk.2-0.01 0.838 SUN2 uc003awh.1-0.02 0.780 SUN3 uc003awi.1-0.18 0.005 SUN4 uc010gxq.1-0.03 0.596 SUN6 uc011aoa.1-0.07 0.239 *UCSC - University of California Santa Cruz genome browser. Beta-coefficients represent increase or decrease of quantile-normalized 1 log10 mrna expression per risk allele of SNP rs17000526, adjusting for age and race. Bolded results are significant after Bonferroni multiple test correction (alpha = 1.79E- 03). 5

Supplementary Table 6. Association of CpG site methylation with expression of major A3B isoform (uc003awo.1) in 357 bladder tumors in TCGA CpG site Beta-coefficient P-value cg21707131* -0.263 4.49E-08 cg25787886-0.243 4.49E-07 cg06837067-0.206 2.33E-05 cg14387414-0.190 9.91E-05 cg14194956-0.183 1.60E-04 cg16045423-0.101 0.039 cg26227661 0.099 0.044 cg22268271 0.091 0.065 cg02292872-0.080 0.105 cg23124451 0.079 0.106 cg11816043-0.065 0.183 cg26000393-0.062 0.203 cg02344701 0.060 0.217 cg07311505 0.058 0.237 cg18708252 0.056 0.260 cg05903330 0.042 0.394 cg14097849 0.040 0.423 cg02523424-0.038 0.443 cg03141856-0.033 0.498 cg25790850 0.031 0.525 cg26672614-0.025 0.610 cg01089751 0.018 0.717 cg07431064-0.015 0.765 cg27062573 0.012 0.810 cg01525244 0.010 0.834 cg03804568 0.010 0.842 cg00545295-0.009 0.861 cg24075680 0.008 0.868 cg01027808 0.0003 0.994 * Beta-coefficients represent increase or decrease of quantile-normalized 1 log10 mrna expression per log10 DNA methylation levels of CpG sites tested, not adjusting for any covariates. 6

Supplementary Table 7. Association of all bladder cancer GWAS signals with counts of APOBECsignature mutations in 357 bladder tumors in TCGA GWAS marker Region/ Gene Reference TCGA proxy SNP r 2 with GWAS marker Betacoefficients P- value rs1014971 22q13.1/ APOBEC3 Rothman et al. 2010 2 rs17000526 1 0.19 8.24E-06 rs10775480 18q12.3/ SLC14A2 Garcia-Closas et al. 2011 3 rs2298718 0.96 0.10 0.018 rs10853535 18q12.3 SLC14A2 Garcia-Closas et al. 2011 3 rs10853535 1-0.09 0.028 rs6104690 20p12.2 Figueroa et al. 2014 4 rs6040291 1 0.07 0.089 rs4510656 6p22.3/ CDKAL1 Figueroa et al. 2014 1 rs10946406 1-0.08 0.059 rs1495741 8p22/ Rothman et al NAT2 2010 2 rs4921914 1 0.08 0.094 rs710521 rs401681 rs907611 3q28/ TP63 5p15.33/ TERT- CLPTML 11p15.5/ LSP1 Kiemeney et al. 2008 5 rs11706540 1 0.07 0.169 Rafnar et al. rs401681 1 0.05 0.259 2009 6 Figueroa et al. 2014 1 rs11041476 1 0.05 0.305 rs8102137 rs7257330 rs11892031 19q12.3/ Rothman et al. CCNE1 2010 2 19q12.3/ Fu et al., CCNE1 2014 7 2q37.1/ Rothman et al. UGT1A 2010 2 rs17513752 1-0.05 0.327 rs3218036 0.76-0.07 0.467 rs11893247 1 0.03 0.690 rs798766 4p16.3/FGFR, TMEM129- TACC Kiemeney et al. 2010 8 rs798766 1-0.03 0.533 7

rs9642880 8q24.1 Kiemeney et al. 2008 4 rs9642880 1-0.03 0.479 rs10936599 3q26.2/ MYNN Figueroa et al. 2014 1 rs10936600 1 0.02 0.684 rs17863783 2q37.1/ Tang et al. UGT1A6 2012 9 rs17863783 1 0.08 0.543 rs62185668 20p12.2 Rafnar et al. 2014 10 rs6074214 1-0.01 0.781 rs6108803 20p12.2 Figueroa et al. 2015 rs6074214 1-0.01 0.781 rs4907479 13q34/ MCF2L Figueroa et al. 2014 1 rs2993291 0.88-0.01 0.895 rs2294008 8q24.3/ Wu et al. PSCA 2009 11 rs1045574 1 0.00 0.987 Beta-coefficients represent increase or decrease of log10 APOBEC-signature mutation counts per risk allele of candidate SNPs, adjusting for age, gender, and race. Bolded results are significant after Bonferroni multiple test correction (alpha = 2.50E-03). 8

Supplementary Table 8. Effect of mrna expression of all APOBEC3 isoforms on APOBEC mutagenesis in TCGA bladder tumors Dependent Variable: APOBEC mutagenesis pattern (log10_apobec_mutload_minestimate) Predictors Beta-coefficient P-value* Age.003.351 Gender.109.197 Race -.137.057 Tumor stage.003.947 rs17000526_genotypes (0, 1, 2).197 2.3E-04 log_uc003awn2_a3a_major.125.016 log_uc011aob1_a3a_minor -.042.419 log_uc011aoc1_a3ab_del.079.106 log_uc003awo1_a3b_major.183 2.2E-04 log_uc003awp1_a3b_minor1 -.004.957 log_uc003awq1_a3b_minor2 -.026.609 log_uc003awr2_a3c -.112.029 log_uc003awt3_a3d -.062.277 log_uc011aod1_a3d -.014.802 log_uc011aoe1_a3d.014.778 log_uc003awv2_a3f -.086.037 log_uc003aww2_a3f -.027.664 log_uc011aog1_a3f.092.103 log_uc003awx2_a3g -.062.487 log_uc003awy2_a3g.142.062 log_uc003axa3_a3h.025.694 log_uc011aoh1_a3h.107.107 log_uc011aoi1_a3h -.069.311 *P-values are for a multivariate linear regression model that includes all the variables listed; the results may differ from those presented in Figure 2 M-N which are based on a limited set of variables. Splicing forms are designated based on information from the UCSC genome browser. 9

Supplementary Table 9. Effect of mrna expression of all APOBEC3 isoforms on APOBEC mutagenesis in TCGA breast tumors Dependent Variable: APOBEC mutagenesis pattern (log10_apobec_mutload_minestimate) Predictors Beta-coefficient P-value* Age.003.113 Race.116.006 Tumor stage -.014.706 rs17000526_genotypes (0, 1, 2).035.354 log_uc003awn2_a3a_major.035.182 log_uc011aob1_a3a_minor -.003.915 log_uc011aoc1_a3ab_del.103 0.002 log_uc003awo1_a3b_major -.029.254 log_uc003awp1_a3b_minor1 -.051.225 log_uc003awq1_a3b_minor2.048.189 log_uc003awr2_a3c -.037.188 log_uc003awt3_a3d.017.525 log_uc011aod1_a3d.012.628 log_uc011aoe1_a3d -.001.968 log_uc003awv2_a3f -.035.129 log_uc003aww2_a3f.074.021 log_uc011aog1_a3f.013.706 log_uc003awx2_a3g.001.984 log_uc003awy2_a3g.010.733 log_uc003axa3_a3h.037.147 log_uc011aoh1_a3h.056.048 log_uc011aoi1_a3h -.001.993 *P-values are for a multivariate linear regression model that includes all the variables listed; the results may differ from those presented in Figure 2 O-P which are based on a limited set of variables. Splicing forms are designated based on information from the UCSC genome browser. Expression of all isoforms is presented as quantile-normalized 1 log10 expression values. 10

Supplementary Table 10. Induction of known interferon-stimulated genes (ISGs) by infection with Sendai virus (SeV) or treatment with a DNA-damaging drug bleomycin (Bleo) in bladder cancer cell line HTB-9 and breast cancer cell line MCF7 SeV vs. Control, 12 hrs Bleo vs. Control, 24 hrs ISGs HTB-9 MCF-7 HTB-9 MCF-7 Gene Log2 p-val Log2 p-val Log2 p-val Log2 p-val MX1 7.17 9.12E-05 5.65 1.27E-03 0.89 4.65E-02 0.07 7.62E-01 ISG15 4.90 1.89E-04 4.87 3.79E-03 0.60 3.37E-02 0.00 9.91E-01 CXCL10 8.36 2.05E-04 13.16 5.31E-04-0.35 4.68E-01 ND ND CCL5 11.93 2.21E-04 9.54 5.90E-04 3.68 9.33E-02 0.39 3.11E-01 TLR3 3.23 1.33E-03 11.83 5.33E-04-0.97 5.25E-04 ND ND IL8 2.98 1.51E-03 5.30 1.35E-02 4.44 2.47E-04 ND ND STAT1 2.71 1.67E-03 2.65 5.58E-03 0.17 1.33E-01 0.28 6.86E-02 IFNB1 8.07 1.88E-03 15.09 6.97E-02 3.92 3.38E-01-0.51 5.19E-01 IL6 3.93 2.10E-03 5.13 1.86E-03 6.03 1.41E-04-0.42 7.92E-01 MEFV 6.98 2.24E-03 6.52 3.18E-03 ND ND ND ND IFIH1 5.02 2.53E-03 5.81 7.58E-04-0.17 4.15E-01-0.24 5.68E-02 DHX58 4.86 2.81E-03 8.22 3.01E-04 0.02 9.46E-01 0.23 5.41E-01 OAS2 7.06 3.65E-03 9.72 1.35E-03-0.79 3.67E-02-0.16 8.75E-01 AIM2 4.37 3.79E-03 6.87 1.34E-02-0.21 5.61E-01 ND ND TNF 3.62 3.84E-03 0.34 4.07E-01 4.58 1.76E-03 ND ND CXCL11 6.09 3.91E-03 12.54 1.47E-03-1.38 2.38E-01 ND ND NLRP3-0.46 4.56E-03 5.61 7.17E-02 2.00 7.84E-04 ND ND CASP1 1.44 4.66E-03 4.02 3.31E-01-0.36 5.81E-02 ND ND AZI2 0.87 6.05E-03 0.18 4.14E-01 0.56 1.45E-02-0.30 1.78E-01 IL15 1.56 6.92E-03 5.38 4.15E-03-0.45 6.77E-02 0.31 7.01E-01 APOBEC3G 0.67 7.36E-03 3.90 5.04E-02-0.26 1.27E-01 ND ND NFKBIA 2.88 9.95E-03 2.87 1.48E-02 0.60 2.11E-02-0.16 2.51E-01 IFNA1 1.57 1.13E-02 7.44 8.29E-02-0.67 2.03E-01 ND ND CTSS 1.40 1.13E-02 4.18 5.17E-03 0.74 3.74E-03 3.82 1.62E-01 IL12A 1.01 1.78E-02 9.11 1.14E-01 1.68 2.60E-02 ND ND IL1B 0.39 2.22E-02 4.81 2.47E-01 1.52 3.84E-03 ND ND TICAM1 1.05 2.28E-02 0.70 3.92E-01 1.08 4.86E-03 0.28 6.86E-02 MYD88 1.48 2.32E-02 1.12 4.46E-02 0.08 4.27E-01 0.00 9.73E-01 CXCL9 7.20 3.93E-02 11.60 5.58E-04-2.14 3.13E-01 1.13 4.02E-01 IRF7 2.98 4.27E-02 2.57 3.98E-03 2.33 1.60E-03-0.18 6.36E-01 TRIM25 1.31 4.56E-02 1.05 6.15E-02 0.03 7.70E-01-0.35 9.52E-02 CASP10 1.54 5.05E-02 2.16 3.45E-02 1.10 2.60E-04-0.84 1.91E-01 CYLD 0.48 5.69E-02 2.13 2.06E-03 1.71 8.69E-04-0.37 2.30E-01 11

NFKB1 0.28 6.57E-02 0.49 1.15E-01 0.66 4.65E-03-0.43 2.09E-01 MAP2K3 0.51 6.63E-02-0.53 2.58E-01 0.70 2.32E-02-0.08 2.03E-01 IRF5-1.19 7.59E-02-0.49 1.17E-01 0.96 1.81E-03-0.38 2.98E-01 CTSL1 0.36 1.02E-01-0.78 8.32E-02 0.63 8.29E-03-0.09 5.05E-01 MAP2K1-0.55 1.36E-01-0.87 5.99E-02 0.64 3.65E-03-0.37 2.22E-01 JUN 0.94 1.40E-01 0.76 2.20E-01 2.34 2.23E-03-0.02 9.70E-01 MAPK1-0.43 1.70E-01-0.73 1.33E-01 0.63 4.19E-02-0.40 3.28E-02 MAPK8-0.21 1.73E-01-0.54 2.29E-01 0.59 1.03E-02-0.43 1.48E-01 TRAF6-0.18 2.47E-01-0.37 2.10E-01 0.68 3.51E-04-0.35 1.24E-01 CHUK -0.20 2.74E-01-0.55 1.05E-01 0.60 1.76E-02-0.54 8.54E-02 FOS -0.14 3.17E-01-0.62 5.88E-02 5.01 3.74E-04 0.26 4.20E-01 SPP1-0.10 8.54E-01 ND ND 2.25 5.01E-02-0.19 1.77E-01 PYDC1 0.10 9.56E-01-0.45 9.13E-01 2.77 5.10E-03 5.64 1.27E-02 Expression was quantified using antiviral qrt-pcr arrays (Qiagen), analysis is based on 2 biological replicates for each condition, p-values are for two-sided T-test. ND expression is not detected. For SeV experiment control group represents non-infected samples; for Bleo treatment control group represents treatment with DMSO (vehicle). Log2 values represent increase or decrease of expression (positive or negative values, respectively) compared to control groups. 12

Supplementary Table 11. APOBEC mutagenesis, SNP rs17000526 and expression of APOBEC3 isoforms as predictors of overall survival (OS) of bladder cancer patients in TCGA Treatment (yes/no) YES (N=81) NO (N=146) Adjustments* Predictor Betacoeff coeff Beta - P-value P-value Age, years 0.024 0.37 Variables included Gender -0.40 0.43 in all models Stage 1.02 <5.6E-03 rs17000526-0.21 0.50 0.08 0.80 Signature mutations -1.13 6.77E-03 1.18 0.012 Mutagenesis pattern -0.91 2.96E-03 0.92 5.27E-03 log_uc003awn2_a3a_major -0.41 0.094 log_uc011aob1_a3a_minor 0.17 0.506 log_uc011aoc1_a3ab_del 0.38 0.156 log_uc003awo1_a3b_major -0.26 0.363 log_uc003awp1_a3b_minor1-0.03 0.903 log_uc003awq1_a3b_minor2 0.29 0.257 log_uc003awr2_a3c -0.27 0.224 log_uc003awt3_a3d -0.57 0.021 log_uc011aod1_a3d -0.23 0.256 log_uc011aoe1_a3d -0.24 0.304 log_uc003awv2_a3f 0.37 0.167 log_uc003aww2_a3f -0.38 0.063 log_uc011aog1_a3f -0.20 0.388 log_uc003awx2_a3g -0.42 0.078 log_uc003awy2_a3g -0.48 0.052 log_uc003axa3_a3h -0.45 0.045 log_uc011aoh1_a3h -0.35 0.143 log_uc011aoi1_a3h -0.05 0.889 Age, years 0.051 0.010 Variables included Gender -0.16 0.64 in all models Stage 0.77 <4.2E-04 rs17000526-0.25 0.21-0.11 0.60 Signature mutations -0.72 8.12E-03-0.69 0.018 Mutagenesis pattern -0.50 0.020-0.45 0.046 log_uc003awn2_a3a_major 0.05 0.810 log_uc011aob1_a3a_minor -0.05 0.824 log_uc011aoc1_a3ab_del -0.12 0.555 log_uc003awo1_a3b_major -0.27 0.124 log_uc003awp1_a3b_minor1 0.10 0.741 log_uc003awq1_a3b_minor2-0.55 0.079 log_uc003awr2_a3c -0.32 0.048 log_uc003awt3_a3d -0.07 0.688 log_uc011aod1_a3d -0.47 0.018 log_uc011aoe1_a3d -0.26 0.267 13

ALL (N=356) With treatment info (N=227) log_uc003awv2_a3f -0.11 0.555 log_uc003aww2_a3f -0.24 0.163 log_uc011aog1_a3f -0.24 0.142 log_uc003awx2_a3g -0.22 0.166 log_uc003awy2_a3g -0.30 0.081 log_uc003axa3_a3h -0.17 0.395 log_uc011aoh1_a3h -0.30 0.117 log_uc011aoi1_a3h -0.36 0.341 Age, years 0.027 0.010 Gender -0.229 0.29 Stage 0.668 <1E-06 Variables included in all models rs17000526-0.26 0.069-0.17 0.24 Signature mutations -0.80 1.10E-05-0.74 2.82E-04 Mutagenesis pattern -0.51 8.90E-05-0.46 1.34E-03 log_uc003awo1_a3b_major -0.21 0.044 log_uc003awn2_a3a_major -0.09 0.41 log_uc011aoc1_a3ab_del -0.06 0.62 Age, years 0.038 0.013 Variables included Gender -0.20 0.47 in all models Stage 0.86 <2E-06 rs17000526-0.24 0.15-0.06 0.72 Signature mutations -0.82 2.52E-04-0.79 1.21E-03 Mutagenesis pattern -0.62 4.04E-04-0.57 2.23E-03 log_uc003awo1_a3b_major -0.25 0.094 log_uc003awn2_a3a_major -0.14 0.35 log_uc011aoc1_a3ab_del 0.15 0.77 Treatment (Yes/No) -0.19 0.52 Treatment: neoadjuvant, adjuvant, or radiation; ALL: all patients regardless of treatment information; With treatment info: only patients that have treatment information. All multivariate modes include age, gender and tumor stage as core variables. Adjustments include mutagenesis pattern for rs17000526 and rs17000526 for APOBEC mutagenesis variables. Negative beta-coefficients represent decreased hazard of death (longer survival) per bladder cancer risk allele rs17000526-a, with higher APOBEC mutagenesis metrics (log10 signature mutation counts or mutagenesis pattern), or higher expression of specific isoforms (quantile-normalized 1 log10 expression values), tested in separate models. Positive betacoefficients represent decreased survival with age, and increased tumor stage; treatment is not significantly associated with improved survival. Analyses are limited by samples with all the covariates available. Analysis in 73 patients with muscle-invasive bladder cancer treated with adjuvant platinumbased therapy showed significantly increased survival with increased expression of A3A, A3D and A3H in tumors 12. This is consistent with results presented above in TCGA bladder cancer patients with muscleinvasive bladder cancer who received treatment (predominantly adjuvant) survival is improved with increased APOBEC mutagenesis, expression of some isoforms of A3D, A3H and A3A (non-significant). The pattern is absent in TCGA bladder cancer patients who did not receive treatment. 14

Supplementary Table 12. Effect of somatic mutations in TP53 on overall survival (OS) of bladder cancer patients predicted by APOBEC mutagenesis and SNP rs17000526 in TCGA TP53 mutations SNP rs17000526 Combined Yes No A-cancer risk allele N (%) N (%) N (%) P-value* AA 139 (38.3) 83 (46.4) 56 (30.4) 0.0066 AG 152 (41.9) 67 (37.4) 85 (46.2) GG 72 (19.8) 29 (16.2) 43 (23.4) Total 363 179 184 *Chi-Square test (df=2) for distribution of somatic mutations (Yes/No, not limited by APOBECsignature type) in TP53 in rs17000526 genotype groups. Predictor Beta-coefficient P-value rs17000526-0.26 6.90E-02 rs17000526* -0.26 7.00E-02 Signature mutations -0.80 1.10E-05 Signature mutations* -0.83 1.60E-05 Mutagenesis pattern -0.51 8.90E-05 Mutagenesis pattern* -0.51 1.29E-04 Cox regression models for overall survival (OS) are based on rs17000526, log10 APOBEC mutagenesis metrics, adjusting for age, gender, tumor stage, with or without adjustment for presence of TP53 mutations*. Analysis is based on 356 samples with data available for all the variables. 15

Supplementary Table 13. Distribution of somatic mutations in PIK3CA in TCGA bladder tumors in rs17000526 genotype groups PIK3CA mutations SNP rs17000526 Combined Yes No A-cancer risk allele N (%) N (%) N (%) P-value AA 139 (38.3) 39 (47.6) 100 (35.6) 0.037* AG 152 (41.9) 34 (41.5) 118 (42.0) GG 72 (19.8) 9 (11.0) 63 (22.4) Total 363 82 281 *Chi-Square test (df=2) for distribution of somatic mutations (Yes/No, not limited by APOBEC-signature type) in PIK3CA in rs17000526 genotype groups. Mutation data for TCGA samples were obtained from Firebrowse (Materials and Methods). We tested a panel of genes frequently mutated in bladder tumors (TP53, RB1, ELF3, TSC1, PIK3CA, RHOB, CDKN2A, ARID1A, ZFP36L1, CDKN1A, ATM and FGFR3) in relation to rs17000526 genotype groups. Significant enrichments of mutations in rs17000526 genotype groups were found for PIK3CA and TP53 (Supplementary Table 12). 16

Supplementary Table 14. Effect of somatic mutations in TP53 on overall survival (OS) of breast cancer patients predicted by APOBEC mutagenesis and SNP rs17000526 in TCGA ER+ breast tumors, n=387 TP53 mutations SNP rs17000526 Combined Yes No A-cancer risk allele N (%) N (%) N (%) P-value AA 123 (31.8) 19 (25.7) 104 (33.2) 0.158* AG 197 (50.9) 37 (50.0) 160 (51.1) GG 67 (17.3) 18 (24.3) 49 (15.7) Total 387 74 313 Predictor Beta-coefficient P-value rs17000526-0.33 0.20 rs17000526* -0.23 0.39 Signature mutations 0.059 0.85 Signature mutations* 0.008 0.98 Mutagenesis pattern 0.09 0.76 Mutagenesis pattern* 0.10 0.74 ER- breast tumors, n=119 TP53 mutations SNP rs17000526 Combined Yes No P-value A-cancer risk allele N (%) N (%) N (%) AA 41 (34.5) 33 (41.3) 8 (20.5) 0.057* AG 61 (51.3) 36 (43.8) 26 (66.7) GG 17 (14.3) 12 (15.0) 5 (12.8) Total 119 80 39 Predictor Beta-coefficient P-value rs17000526-0.79 0.023 rs17000526* -0.65 0.077 Signature mutations -0.51 0.37 Signature mutations* -0.38 0.50 Mutagenesis pattern 0.08 0.86 Mutagenesis pattern* -0.02 0.96 *Chi-Square test (df=2) for distribution of somatic mutations (Yes/No, not limited by APOBEC-signature type) in TP53 in rs17000526 genotype groups. Cox regression models for overall survival (OS) prediction based on rs17000526, log10 APOBEC mutagenesis metrics, adjusting for age, and tumor stage, with or without adjustment for presence of TP53 mutations*. Analyses are limited by samples with data available for all the variables, n=387 for ER+ and n=119 for ER- tumors. 17

Supplementary Table 15. Distribution of A3AB deletion genotypes in controls of European ancestry A3AB deletion genotype PLCO N (%) SBCS N (%) Combined N (%) HapMap, Europeans* I/I 1,386 (85.87) 835 (87.71) 2,221 (86.55) 52 (86.67) I/D 220 (13.63) 116 (12.18) 336 (13.09) 8 (13.33) D/D 8 (0.50) 1 (0.11) 9 (0.35) 0 (0.00) Total 1,614 952 2,566 60 PLCO - Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial, USA SBCS - Spanish Bladder Cancer Study A3AB deletion alleles: I - insertion, D deletion; * individuals of European ancestry from Utah (CEU); HapMap and study samples were genotyped for the deletion using the same CNV assay, HapMap genotypes were 100 % concordant with those reported earlier based on PCR and gel electrophoresis method 13. 18

Supplementary Table 16. Distribution of A3AB deletion genotypes in controls from Japan A3AB deletion genotype Japanese N (%) HapMap, Japanese* I/I 495 (52.38) 20 (44.44) I/D 379 (40.11) 18 (40.00) D/D 71 (7.51) 7 (15.56) Total 945 45 A3AB deletion alleles: I - insertion, D deletion; * Japanese from Tokyo, Japan (JPT); HapMap and study samples were genotyped for the deletion using the same CNV assay, HapMap genotypes were 93% concordant with those reported earlier based on PCR and gel electrophoresis method 13. 19

Supplementary Table 17. Demographic, clinical, and genetic data for bladder and breast cancer patients in TCGA TCGA variables 357 bladder tumors N (%) 533 breast tumors N (%) Age (mean ± SD) 68.54 ± 10.66 58.55 ± 13.11 Male 266 (74.5) 5 (0.94) Female 91 (25.5) 528 (99.1) Caucasian 294 (82.3) 445 (83.5) Asian 43 (12.0) 32 (6.0) African American 20 (5.6) 56 (10.5) Tumor stage T0 N/A N/A T1 2 (0.56) 84 (15.8) T2 116 (32.5) 302 (56.8) T3 120 (33.6) 137 (25.8) T4 114 (31.9) 5 (0.94) Unknown 5 (1.4) 4 (0.75) rs17000526 G/G 68 (19.0) 85 (15.9) A/G 151 (42.3) 269 (50.6) A/A G A 138 (38.7) 287 (40.2) 427 (59.8) 179 (33.5) 439 (41.2) 627 (58.8) The set is limited by samples with available data for age, sex, race, tumor stage, SNP rs17000526, CpG site methylation and A3B CNA used for analyses presented in Figure 2 A-D. 20

Supplementary Table 18. Correlation of A3AB deletion status with expression of main A3A and A3B isoforms in subsets of bladder and breast tumors in TCGA Bladder, N=105 Gene Isoform ID, UCSC* Isoform annotation Beta-coefficient P-value A3A uc003awn.2 major isoform -0.351 0.475 A3A uc011aoc.1 A3AB deletion isoform 0.766 2.99E-04 A3B uc003awo.1 major isoform -0.379 0.131 Breast, N=388 Gene Isoform ID, UCSC* Isoform annotation Beta-coefficient P-value A3A uc003awn.2 major isoform -0.394 3.63E-03 A3A uc011aoc.1 A3AB deletion isoform 0.437 9.57E-05 A3B uc003awo.1 major isoform -0.902 3.17E-11 *UCSC - University of California Santa Cruz genome browser Analysis was performed in subsets of TCGA samples with available A3AB deletion genotype data 14. A3AB expression levels were tested for correlation with specified isoforms, adjusting for age, sex, and race. Deletion status, which is scored on DNA level as presence of 0, 1, or 2 deletion alleles significantly correlates with presence of A3AB deletion isoform annotated on mrna level, thus these variables can be used as proxies for each other. Incomplete correlation could be due to misclassification of A3AB deletion status or isoform annotation in TCGA. Expression of all isoforms is presented as quantile-normalized 1 log10 expression values. 21

Supplementary Table 19. Distribution of A3AB deletion genotypes in subsets of bladder and breast tumors in TCGA and in HapMap samples Bladder, N =105 Breast, N=388 Population I/I I/D and D/D Total MAF MAF in HapMap Caucasian 80 9 89 0.051 0.090 (CEU) African American 8 0 8 0 0.042 (YRI) Asian 6 2 8 0.125 0.312 (CHB, JPT) Total 94 11 105 0.066 Population I/I I/D D/D Total MAF MAF in HapMap Caucasian 305 31 1 337 0.049 0.090 (CEU) African American 25 4 0 29 0.069 0.042(YRI) Asian 9 12 1 22 0.318 0.312 (CHB, JPT) Total 339 47 2 388 0.066 A3AB deletion alleles: I - insertion, D - deletion; HapMap populations: CEU individuals of European ancestry from Utah; YRI Yoruba from Nigeria, CHB Chinese from Beijing, JPT Japanese from Tokyo. HapMap samples were genotyped for A3AB deletion using a CNV assay (Table S2); breast and bladder cancer TCGA samples 14 were scored by mapping short exome-sequencing DNA reads to the reference genome. The deletion is more common in Asians than in other ethnic groups. 22

Supplementary Table 20. Primers SNP Oligo Sequence rs17000526 rs17000526_g-f ATAAGGGCGTTGGGCAAGGAAA EMSA rs17000526_g-r TTTCCTTGCCCAACGCCCTTAT rs17000526_a-f ATAAGGGCGTTAGGCAAGGAAA rs17000526_a-r TTTCCTTGCCTAACGCCCTTAT rs1014971 rs1014971_a-f AGGTACTCCCAACCCCTGCAGC EMSA rs1014971_a-r GCTGCAGGGGTTGGGAGTACCT rs1014971_g-f AGGTACTCCCGACCCCTGCAGC rs1014971_g-r GCTGCAGGGGTCGGGAGTACCT rs1004748 rs1004748_g-f GAAGCGGCAGGGCCAGCCATGTG EMSA rs1004748_g-r CACATGGCTGGCCCTGCCGCTTC rs1004748_a-f GAAGCGGCAGGACCAGCCATGTG rs1004748_a-r CACATGGCTGGTCCTGCCGCTTC qrt-pcr for SeV-DI RNA SeV-DI RNA_F SeV-DI RNA_R GTCAAGATGTTCGGGGCCAG CGTTCTGCACGATAGGGACT 23

Supplementary Note: Association of SNP rs1014971 with breast cancer risk in Breast Cancer Association Consortium (BCAC) 15. Estimates are based on 45,290 cases and 41,880 controls of European ancestry from 41 studies genotyped as part of the Collaborative Oncological Gene-environment Study (COGS) using a custom Illumina iselect genotyping array. 24

SNP rs12628403 TaqMan assay information This SNP is located in an intronic region upstream of exon 4 of A3A; the deletion starts after this exon. The region has high similarity to A3B and A3G, unique positions between these regions are marked in blue, position of TaqMan probe is in underlined italics. Assay specificity is defined by primers that uniquely recognize A3A and not A3B or A3G regions. rs12628403-a/c A3A: gatgtgggaagtctgtcctgagagtcatgggccctaggtgccaccccgatcccacagcgggagcgtgactta A3B: gatgtgggaagtctgtcctgagagtcatgggcccttggtgctgccccctccccacaacaggagcgtgactta A3G: gatgtgggaagtctgtcttgagagtcatgggccttggtgccaccacgat cccacagcgggagtgtgactta SNP rs12628403 TaqMan amplicon, 171 bp ATTCCAATGGGAAGGAACTGCCTGATGAAGGAGCTAAGTCCCTAGGGGAGGGAGAGGGAAAGGAGGGACTGAAACCAGGATG TGGGAAGTCTGTCCTGAGAGTC[A/C]TGGGCCCTAGGTGCCACCCCGATCCCACAGCGGGAGCGTGACTTATCTCCCCTGTCCCTT TTCAGA Forward primer: ATTCCAATGGGAAGGAACTGC Reverse primer: TCTGAAAAGGGACAGGGGAGA Probe_allele_A_VIC: GAGAGTCATGGGCCCTA Probe_allele_C_FAM: GAGAGTCCTGGGCCCTA Haplotypes for SNP rs12628403 and deletion (CNV) in PLCO set, n=1,837, D =0.97, r 2 =0.92 Haplotype rs12628403-deletion (CNV) Frequency A_I 0.925 C_D 0.070 A_D 0.002 C_I 0.003 Rs12628403 alleles A and C, A3AB deletion alleles: I insertion and D - deletion; concordance in duplicated samples: for rs12628403 99.8% in 751 samples, for CNV 99.7% in 397 samples. Since CNV and rs12628403 are in strong linkage disequilibrium (LD) in our set of individuals of European ancestry (n = 1,837, D = 0.97, r 2 = 0.92), we used rs12628403 genotypes when CNV data could not be generated due to insufficient DNA quantity or quality (471 of 4,285 European samples, 11%). Detailed analysis of survival in TCGA breast cancer patients 25

ER+ tumors Predictors Beta-coefficient P-value Age, years.056 3.0E-05 Tumor stage.861 3.5E-04 log_nc_uc003awn2_a3a_major -.011.921 log_nc_uc011aob1_a3a_minor.099.652 log_nc_uc011aoc1_a3ab_del.628.007 log_nc_uc003awo1_a3b_major -.124.430 log_nc_uc003awp1_a3b_minor1 -.042.888 log_nc_uc003awq1_a3b_minor2 -.495.147 APOBEC mutagenesis pattern.011.971 rs17000526 genotypes (0, 1, 2 risk alleles) -.228.404 Survival is in ER+ breast cancer is decreased with higher age, tumor stage and expression of A3AB isoform (A3AB germline deletion). Expression of all isoforms is presented as quantile-normalized 1 log10 expression values. ER- tumors Predictors Beta-coefficient P-value Age, years.006.756 Tumor stage 1.505 2.5E-05 log_nc_uc003awn2_a3a_major -.850.041 log_nc_uc011aob1_a3a_minor.084.770 log_nc_uc011aoc1_a3ab_del -.519.175 log_nc_uc003awo1_a3b_major.739.053 log_nc_uc003awp1_a3b_minor1 1.057.008 log_nc_uc003awq1_a3b_minor2.288.297 APOBEC mutagenesis pattern.321.475 rs17000526 genotypes (0, 1, 2 risk alleles) -.543.209 Survival is in ER- breast cancer is decreased with higher tumor stage and expression of minor form of A3B. Expression of all isoforms is presented as quantile-normalized 1 log10 expression values. 26

URLs: GTEx: http://www.gtexportal.org/ Breast Cancer Association Consortium data: apps.ccge.medschl.cam.ac.uk/consortia/bcac/) REFERENCES: 1. Shabalin, A.A. Matrix eqtl: ultra fast eqtl analysis via large matrix operations. Bioinformatics 28, 1353-8 (2012). 2. Rothman, N. et al. A multi-stage genome-wide association study of bladder cancer identifies multiple susceptibility loci. Nat Genet 42, 978-84 (2010). 3. Garcia-Closas, M. et al. A genome-wide association study of bladder cancer identifies a new susceptibility locus within SLC14A1, a urea transporter gene on chromosome 18q12.3. Hum Mol Genet 20, 4282-9 (2011). 4. Figueroa, J.D. et al. Genome-wide association study identifies multiple loci associated with bladder cancer risk. Hum Mol Genet 23, 1387-98 (2014). 5. Kiemeney, L.A. et al. Sequence variant on 8q24 confers susceptibility to urinary bladder cancer. Nat Genet 40, 1307-12 (2008). 6. Rafnar, T. et al. Sequence variants at the TERT-CLPTM1L locus associate with many cancer types. Nat Genet 41, 221-7 (2009). 7. Fu, Y.P. et al. The 19q12 bladder cancer GWAS signal: association with cyclin E function and aggressive disease. Cancer Res 74, 5808-18 (2014). 8. Kiemeney, L.A. et al. A sequence variant at 4p16.3 confers susceptibility to urinary bladder cancer. Nat Genet 42, 415-9 (2010). 9. Tang, W. et al. Mapping of the UGT1A locus identifies an uncommon coding variant that affects mrna expression and protects from bladder cancer. Hum Mol Genet 21, 1918-30 (2012). 10. Rafnar, T. et al. Genome-wide association study yields variants at 20p12.2 that associate with urinary bladder cancer. Hum Mol Genet 23, 5545-57 (2014). 11. Wu, X. et al. Genetic variation in the prostate stem cell antigen gene PSCA confers susceptibility to urinary bladder cancer. Nat Genet 41, 991-5 (2009). 12. Mullane, S.A. et al. Correlation of Apobec Mrna Expression with overall Survival and pd-l1 Expression in Urothelial Carcinoma. Sci Rep 6, 27702 (2016). 13. Kidd, J.M., Newman, T.L., Tuzun, E., Kaul, R. & Eichler, E.E. Population stratification of a common APOBEC gene deletion polymorphism. PLoS Genet 3, e63 (2007). 14. Nik-Zainal, S. et al. Association of a germline copy number polymorphism of APOBEC3A and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer. Nat Genet 46, 487-91 (2014). 15. Michailidou, K. et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat Genet 45, 353-61, 361e1-2 (2013). 27