Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Stanford Biostatistics Workshop Pierre Neuvial with Henrik Bengtsson and Terry Speed Department of Statistics, UC Berkeley September 30, 2010 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 1 / 60
Outline Genotyping microarrays in cancer research 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 2 / 60
Outline Genotyping microarrays in cancer research DNA copy number changes in cancer cells 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 3 / 60
Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genomic changes at the DNA level are hallmarks of cancer We inherited 23 paternal and 23 maternal chromosomes, mostly identical. Normal karyotype Tumor karyotype Our goal: identify CN changes to improve characterization, classification, and treatment of cancers Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 4 / 60
Genotyping microarrays in cancer research Genotypes in a diploid chromosome DNA copy number changes in cancer cells Genotypes in a diploid chromosome Single nucleotide polymorphism G C C T C A G G 10-20 million known SNPs Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 5 / 60
Genotyping microarrays in cancer research Genotypes in a diploid chromosome DNA copy number changes in cancer cells Genotypes in a diploid chromosome Single nucleotide polymorphism A B B B AB BB B A AB A A AA 10-20 million known SNPs Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 6 / 60
Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotypes and copy numbers in a tumor Genotypes and copy numbers in a tumor Tumor with deletion copy-neutral LOH Matched Normal (diploid) Tumor with gain (0,2) - - BB BB BB BB A B B B AB BB A B BB BB ABB BBB (1,2) (0,1) (1,1) - A A A A AA B A AB A A AA A A B A AB AA (1,1) Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 7 / 60
Genotyping microarrays in cancer research DNA copy number changes in cancer cells Parental, minor and major copy numbers Parental copy numbers at genomic locus j: (m j, p j ), the unobserved number of maternal and paternal chromosomes at j. Copy number state at genomic locus j CN = (C 1j, C 2j ), where C 1j = min(m j, p j ) and C 2j = max(m j, p j ). Minor (C 1 ) and major (C 2 ) copy numbers: characterize the above CN events in cancers can be estimated from SNP arrays Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 8 / 60
Outline Genotyping microarrays in cancer research Genotyping microarray data 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 9 / 60
Genotyping microarrays in cancer research Genotyping microarray data Technology: Technology: Copy number and genotyping microarrays Copy number and genotyping microarrays Chip Design Probes DNA T/C CGTGTAATTGAACC GCACATTAACTTGG GCACATCAACTTGG CGTGTAGTTGAACC CCCCGTAAAGTACT TATGCCGCCCTGCG ATACGGCGGGACGC Sample DNA + Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 10 / 60
Genotyping microarrays in cancer research Genotyping microarray data (C 1, C 2 ) can be estimated from SNP arrays For SNP j in sample i, observed signal intensities can be summarized as (θ, β), where θ ij = θ Aij + θ ijb and β ij = θ ijb /θ ij. Total copy numbers C ij = 2 θ ij θ Rj = C 1ij + C 2ij Decrease in heterozygosity DH ij = 2 β ij 1/2 = C 2ij C 1ij C 2ij + C 1ij Notes: DH only defined for SNPs that were heterozygous in the germline Both dimensions are needed to understand what is going on: Copy neutral LOH: CN = (0, 2), normal total copy number Balanced duplication: CN = (2, 2), allelic balance Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 11 / 60
Genotyping microarrays in cancer research Genotyping microarray data The Cancer Genome Atlas (TCGA) Accelerate our understanding of the molecular basis of cancer 20 tumor types: brain (glioblastoma multiforme), ovarian, breast, lung, leukemia (AML)... Large studies: 500 tumor-normal pairs for each tumor type Data levels: DNA copy number, gene expression, DNA methylation Platforms: microarray and sequencing For SNP arrays: identify copy number changes: (C, DH) or (C 1, C 2 ): 1 detection: finding regions 2 classification labeling regions Data shown in this presentation: high-grade serous ovarian adenocarcinoma (OvCa). Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 12 / 60
Outline Genotyping microarrays in cancer research Genotyping microarray data 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 13 / 60
Genotyping microarrays in cancer research No copy number change: (1,1) Genotyping microarray data BB AB AA Homozygous SNPs in the normal sample are highlighted in red. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 14 / 60
Gain: (1, 2) Genotyping microarrays in cancer research Genotyping microarray data BB AB AA Homozygous SNPs in the normal sample are highlighted in red. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 15 / 60
Deletion: (0, 1) Genotyping microarrays in cancer research Genotyping microarray data BB AB AA Homozygous SNPs in the normal sample are highlighted in red. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 16 / 60
Genotyping microarrays in cancer research Copy number neutral LOH: (0, 2) Genotyping microarray data BB AB AA Homozygous SNPs in the normal sample are highlighted in red. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 17 / 60
Genotyping microarrays in cancer research Tumor purity/normal contamination Genotyping microarray data In practice what we call tumor samples are actually a mixture of tumor and normal cells. The ones just shown have the largest fraction of tumor cells in the data set. In presence of normal contamination allele B fractions for heterozygous SNPs are shrunk toward 1/2. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 18 / 60
Genotyping microarrays in cancer research Normal, gain, copy neutral LOH Genotyping microarray data BB AB AA Homozygous SNPs in the normal sample are highlighted in red. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 19 / 60
Genotyping microarrays in cancer research Normal, deletion, copy neutral LOH Genotyping microarray data BB AB AA Homozygous SNPs in the normal sample are highlighted in red. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 20 / 60
Outline TumorBoost: improved power to detect CN changes 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 21 / 60
Outline TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 22 / 60
TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Raw genomic signals: allelic ratios are noisy After preprocessing using the CRMAv2 method βt βn C Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 23 / 60
TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects SNP effect in a region of no CN change in the tumor Instead of three points at (0,0), ( 1 2, 1 2 ) and (1,1), we have three clusters; the observed deviation is a SNP effect: δ ij = β ij µ ij δ is quite reproducible between the normal and the tumor Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 24 / 60
TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects SNP effect in a region where tumor has a gain Homozygous clusters are similar as before Heterozygous cluster is split in two, and tilted Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 25 / 60
TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects SNP effect in a region where tumor is CNNLOH Homozygous clusters are similar as before Heterozygous cluster is even more tilted Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 26 / 60
TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Overview of the TumorBoost method Idea 1 the SNP effect is reproducible between tumor and normal 2 in the normal the truth is easier to infer because we only expect three true allele B fractions, corresponding to genotypes AA, AB, BB. For each SNP, we estimate the SNP effect in the normal hybridization, and subtract it from the tumor. Features No need to know copy number regions in advance Normalization is performed for each SNP separately Only one tumor/normal pair required H. Bengtsson, P. Neuvial, T.P. Speed TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays. BMC Bioinformatics (2010) 11:245. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 27 / 60
TumorBoost: improved power to detect CN changes Proposed normalization strategy Method: taking advantage of SNP effects Estimate the SNP effect in the normal sample as ˆδ Nj = β Nj ˆµ Nj, where ˆµ Nj {0, 1/2, 1} is the normal genotype For homozygous SNPs (ˆµ Nj {0, 1}): β Tj = β Tj β Nj + ˆµ Nj For heterozygous SNPs (ˆµ Nj 1/2): β Tj = { 1 2 βtj β Nj 1 1 2 1 β Tj 1 β Nj if β Tj < β Nj otherwise Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 28 / 60
Outline TumorBoost: improved power to detect CN changes Results: improved signal to noise ratio 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 29 / 60
TumorBoost: improved power to detect CN changes Genomic signals before normalization Results: improved signal to noise ratio Normal, gain, copy neutral LOH Normal, loss, copy neutral LOH βn βt C Chromosome 2 (in Mb) Chromosome 10 (in Mb) Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 30 / 60
TumorBoost: improved power to detect CN changes Genomic signals after normalization Results: improved signal to noise ratio Normal, gain, copy neutral LOH Normal, loss, copy neutral LOH βn βt C Chromosome 2 (in Mb) Chromosome 10 (in Mb) Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 31 / 60
TumorBoost: improved power to detect CN changes Allele B fractions before normalization Results: improved signal to noise ratio Chromosome 10 Chromosome 2 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 32 / 60
TumorBoost: improved power to detect CN changes Allele B fractions after normalization Results: improved signal to noise ratio Chromosome 10 Chromosome 2 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 33 / 60
TumorBoost: improved power to detect CN changes ASCNs before normalization Results: improved signal to noise ratio Chromosome 10 Chromosome 2 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 34 / 60
TumorBoost: improved power to detect CN changes ASCNs after normalization Results: improved signal to noise ratio Chromosome 10 Chromosome 2 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 35 / 60
Outline TumorBoost: improved power to detect CN changes ROC evaluation 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 36 / 60
TumorBoost: improved power to detect CN changes ROC evaluation Detecting changes in allele B fractions allele B fractions: β allele B fractions for heterozygous SNPs mirrored allele B fractions for heterozygous SNPs: ρ = β 1/2 = DH/2 For heterozygous SNPs DH only has one mode so it can be segmented. We use ROC analysis to assess how separated two regions on each side of a known change point in DH are. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 37 / 60
TumorBoost: improved power to detect CN changes ROC evaluation ROC evaluation Available from aroma.cn.eval at: http://aroma-project.org For a given sample: find a clear change point label flanking regions, e.g. NORMAL (1,1) and DELETION (0,1) choose one reference state and one state to call For each value of a threshold τ: Call SNPs below τ a DELETION Count number of true and false DELETIONs. ROC curve is built by adjusting τ. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 38 / 60
TumorBoost: improved power to detect CN changes ROC evaluation ROC evaluation Available from aroma.cn.eval at: http://aroma-project.org For a given sample: find a clear change point label flanking regions, e.g. NORMAL (1,1) and DELETION (0,1) choose one reference state and one state to call For each value of a threshold τ: Call SNPs below τ a DELETION Count number of true and false DELETIONs. ROC curve is built by adjusting τ. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 38 / 60
TumorBoost: improved power to detect CN changes ROC evaluation ROC evaluation Available from aroma.cn.eval at: http://aroma-project.org For a given sample: find a clear change point label flanking regions, e.g. NORMAL (1,1) and DELETION (0,1) choose one reference state and one state to call For each value of a threshold τ: Call SNPs below τ a DELETION Count number of true and false DELETIONs. ROC curve is built by adjusting τ. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 38 / 60
TumorBoost: improved power to detect CN changes ROC evaluation Result: Result: Better detection of allelic imbalances Better detection of allelic imbalances Allelic imbalance NORMAL REGION GAIN 100% TumorBoost Total CNs NORMAL REGION GAIN True-Positive Rate 50% Total CN original 0% False-Positive Rate 50% Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 39 / 60
TumorBoost: improved power to detect CN changes ROC evaluation Complete preprocessing for a single tumor/normal pair Available from aroma.cn and aroma.affymetrix at: http://aroma-project.org Normalization and locus-level summarization using CRMAv2 (Bengtsson et al, 2009) for the normal and the tumor sample separately Naive genotyping of the normal sample: threshold density of β TumorBoost normalization (Bengtsson et al, 2010) Note: genotyping errors can be taken care of by smoothing or using confidence scores. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 40 / 60
TumorBoost: improved power to detect CN changes ROC evaluation Observed power to detect changes (1,1) vs (1,2) (1,1) vs (0,1) Legend: Total copy number Raw allele B fractions Normalized β (naive) Normalized β (Birdseed) (1,2) vs (0,2) (0,1) vs (0,2) TCN is consistent across change points β is not! Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 41 / 60
TumorBoost: improved power to detect CN changes ROC evaluation Expected power to detect changes C varies from one unit in all change points just shown For DH and thus (C 1, C 2 )) it s more complicated: ρ 0.0 0.5 1.0 (0,2) (0,1) (1,3) (1,2) (1,1) (0,0) 0.0 0.5 1.0 Tumor purity Difference in ρ 0.0 0.5 1.0 (1,1) to (0,1) (1,2) to (0,2) (1,1) to (1,2) (0,1) to (0,2) 0.0 0.5 1.0 Tumor purity The expected improvement depends on the type of change point and on normal contamination. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 42 / 60
TumorBoost: improved power to detect CN changes ROC evaluation What if no matched normal is available? CalMaTe For each SNP: Estimate a calibration function (from observed signals to genotypes) using a set of reference samples Back-transform test samples Before CalMate normalization After CalMate normalization Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 43 / 60
Challenges for detecting and calling of copy number events Outline 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 44 / 60
Challenges for detecting and calling of copy number events Outline Detecting copy number changes from both C and DH 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 45 / 60
Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Changes can be reflected in both dimensions Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 46 / 60
Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH DH has greater detection power than C at a single locus Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 47 / 60
Challenges for detecting and calling of copy number events More informative probes for C than DH Affymetrix GenomeWideSNP 6 Detecting copy number changes from both C and DH All units CN units SNP units Frequency 1,856,069 946,705 909,364 Proportion 100% 51% 49% Unit types All units AA AB BB Frequency 1,856,069 326,500 251,446 331,418 Proportion 100% 18% 14% 18% SNPs by genotype call for sample TCGA-23-1027 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 48 / 60
Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Similar detection power at a fixed resolution Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 49 / 60
Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Need for a truly joint dimensional segmentation method Most methods segment only one of C and DH Some use two-way segmentation: Olshen et al, [PSCBS] A handful are truly two-dimensional : Chen et al, [pscn] Greenman et al, Biostat., 2010, [PICNIC] Sun et al, NAR, 2009, [genocna] Challenges for a truly joint segmentation method A two-dimensional signal Only heterozygous SNPs can be used to detect CN changes from DH Bias in the estimation of DH DH is not Gaussian Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 50 / 60
Challenges for detecting and calling of copy number events Outline Calling: influence of tumor purity, ploidy, and signal saturation 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 51 / 60
Challenges for detecting and calling of copy number events Copy numbers are not calibrated Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 52 / 60
ATION Challenges for detecting and calling of copy number events Non-calibrated signals: signal saturation Calling: influence of tumor purity, ploidy, and signal saturation CN obs = f(cn true ) < CN true C obs = f (C true ) < C obs f is unknown Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 53 / 60
Challenges for detecting and calling of copy number events Calling: influence of tumor purity, ploidy, and signal saturation Non-calibrated signals: ploidy and purity Copy number Pure tumor, ploidy = 2 Pure tumor, ploidy > 2 3 2 1 0 110 120 130 140 150 Position (Mb) Copy number 3 2 1 0 110 120 130 140 150 Position (Mb) Allele fractions 1.0 0.5 0.0 110 120 130 140 150 Position (Mb) Allele fractions 1.0 0.5 0.0 110 120 130 140 150 Position (Mb) Non pure tumor, ploidy=2 Non-pure tumor, ploidy < 2 Copy number 3 2 1 0 110 120 130 140 150 Position (Mb) Copy number 3 2 1 0 110 120 130 140 150 Position (Mb) Allele fractions 1.0 0.5 0.0 110 120 130 140 150 Position (Mb) Allele fractions 1.0 0.5 0.0 110 120 130 140 150 Position (Mb) Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 54 / 60
Challenges for detecting and calling of copy number events Calling: influence of tumor purity, ploidy, and signal saturation Purity, ploidy, and signal saturation Why copy numbers are not calibrated signal saturation non purity: presence of normal cells in the tumor sample ploidy: the total amount of DNA is fixed by the assay Remarks ploidy is not identifiable purity and ploidy are biological properties of the sample signal saturation is an artifact from the assay under the rug: tumor heterogeneity Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 55 / 60
Challenges for detecting and calling of copy number events Calling: influence of tumor purity, ploidy, and signal saturation OverUnder: Attiyeh et al, Genome Research, 2009 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 56 / 60
Challenges for detecting and calling of copy number events Calling: influence of tumor purity, ploidy, and signal saturation GAP: Popova et al, Genome Biology, 2009 2 1 4 6 8 10 12 14 16 18 20 22 (a) BAF 0 1 (b) LRR 0-1 1 3 50000 0 5 100000 7 150000 9 11 200000 13 15 17 250000 19 21 X 300000 SNP Index B 0.2 0.4 0.6 0.8 1.0 B allele frequency Pierre Neuvial (UC Berkeley) ~70% tumor cells (d) 0.0 Log R ratio -0.2 0.0 0.2 Sub-clones -0.4 0.4 A 0.0 BB AB -0.4 AA Log R ratio -0.2 0.0 0.2 0.4 Log R ratio -0.2 0.0 0.2 ABB AAB -0.4 (c) 0.4 Homozygous regions Germline -- Acquired 0.2 0.4 0.6 0.8 1.0 B allele frequency SNP array data analyses in cancer studies p_baf = 0.3 q_lrr = 0.3 0.0 (e) 0.2 0.4 0.6 0.8 1.0 B allele frequency 2010-09-30 57 / 60
Challenges for detecting and calling of copy number events ASCAT: Van Loo et al, PNAS, 2010 Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 58 / 60
Challenges for detecting and calling of copy number events Calling: influence of tumor purity, ploidy, and signal saturation Comments on existing approaches What about Affymetrix data? Choice between candidate solutions Perform ad hoc correction for saturation Tumor heterogeneity? Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 59 / 60
Challenges for detecting and calling of copy number events Calling: influence of tumor purity, ploidy, and signal saturation Thanks Henrik Bengtsson Adam Olshen Maria Ortiz Angel Rubio Venkat E. Seshan Terry Speed Paul Spellman Nancy R. Zhang Funding: The Cancer Genome Atlas Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 60 / 60