Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

Similar documents
Understanding DNA Copy Number Data

Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre

Comparison of segmentation methods in cancer samples

DNA-seq Bioinformatics Analysis: Copy Number Variation

Introduction to LOH and Allele Specific Copy Number User Forum

ADVANCED PGT SERVICES

Global variation in copy number in the human genome

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

Biostatistical modelling in genomics for clinical cancer studies

Human Cancer Genome Project. Bioinformatics/Genomics of Cancer:

BACOM2: a Java tool for detecting normal cell contamination of copy number in heterogeneous tumor

Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

Clinical Genomics. Ina E. Amarillo, PhD FACMGG

UTILIZATION OF A SNP MICROARRAY FOR CHRONIC LYMPHOCYTIC LEUKEMIA: EFFICACY, INFORMATIVE FINDINGS AND PROGNOSTIC CAPABILITIES

LTA Analysis of HapMap Genotype Data

Supplementary Materials for

The role of cytogenomics in the diagnostic work-up of Chronic Lymphocytic Leukaemia

Absolute quantification of somatic DNA alterations in human cancer

Chapter 1 : Genetics 101

Identification of regions with common copy-number variations using SNP array

Introduction to Gene Sets Analysis

On Missing Data and Genotyping Errors in Association Studies

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

OncoPhase: Quantification of somatic mutation cellular prevalence using phase information

Computational Analysis of Genome-Wide DNA Copy Number Changes

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Assessment of Breast Cancer with Borderline HER2 Status Using MIP Microarray

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Clinical Interpretation of Cancer Genomes

CS2220 Introduction to Computational Biology

Structural Variation and Medical Genomics

Research Strategy: 1. Background and Significance

The laws of Heredity. Allele: is the copy (or a version) of the gene that control the same characteristics.

Application of Whole Genome Microarrays in Cancer: You should be doing this test!!

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization

CHROMOSOMAL MICROARRAY (CGH+SNP)

SNP array-based analyses of unbalanced embryos as a reference to distinguish between balanced translocation carrier and normal blastocysts

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

A general framework for analyzing tumor subclonality using SNP array and DNA sequencing data

Contents. 1.5 GOPredict is robust to changes in study sets... 5

Seven cases of intellectual disability analysed by genomewide SNP analysis. Rodney J. Scott

Multiple Copy Number Variations in a Patient with Developmental Delay ASCLS- March 31, 2016

Computational Systems Biology: Biology X

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

Analysis of CGH and SNP arrays for the detection of chromosomal aberrations in single cells

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Using CART to Mine SELDI ProteinChip Data for Biomarkers and Disease Stratification

Copy Number Variations and Association Mapping Advanced Topics in Computa8onal Genomics

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Advances In Measurement Modeling: Bringing Genetic Information Into Preventive Interventions And Getting The Phenotype Right

Copy number and somatic mutations drive tumors

Supplementary Material to. Genome-wide association study identifies new HLA Class II haplotypes strongly protective against narcolepsy

CNV Detection and Interpretation in Genomic Data

The Biology and Genetics of Cells and Organisms The Biology of Cancer

Genomic complexity and arrays in CLL. Gian Matteo Rigolin, MD, PhD St. Anna University Hospital Ferrara, Italy

CHAPTER 2. Cancer Res Nov 15; 65(22):

CURRENT GENETIC TESTING TOOLS IN NEONATAL MEDICINE. Dr. Bahar Naghavi

Ploidy and large-scale genomic instability consistently identify basal-like breast

Large-scale identity-by-descent mapping discovers rare haplotypes of large effect. Suyash Shringarpure 23andMe, Inc. ASHG 2017

SNP Array NOTE: THIS IS A SAMPLE REPORT AND MAY NOT REFLECT ACTUAL PATIENT DATA. FORMAT AND/OR CONTENT MAY BE UPDATED PERIODICALLY.

Genetics and Diversity Punnett Squares

What can we contribute to cancer research and treatment from Computer Science or Mathematics? How do we adapt our expertise for them

Molecular Markers. Marcie Riches, MD, MS Associate Professor University of North Carolina Scientific Director, Infection and Immune Reconstitution WC

UNIT 3 GENETICS LESSON #30: TRAITS, GENES, & ALLELES. Many things come in many forms. Give me an example of something that comes in many forms.

Tutorial on Genome-Wide Association Studies

Hematopathology Service Memorial Sloan Kettering Cancer Center, New York

Nature Medicine: doi: /nm.3967

Systematic Analysis for Identification of Genes Impacting Cancers

November 9, Johns Hopkins School of Medicine, Baltimore, MD,

Analysis of Somatic Alterations in Cancer Genome: From SNP Arrays to Next Generation Sequencing

Applications of Chromosomal Microarray Analysis (CMA) in pre- and postnatal Diagnostic: advantages, limitations and concerns

Genomic structural variation

Introduction to genetic variation. He Zhang Bioinformatics Core Facility 6/22/2016

Dan Koller, Ph.D. Medical and Molecular Genetics

Session 4 Rebecca Poulos

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

Nature Genetics: doi: /ng Supplementary Figure 1. Rates of different mutation types in CRC.

LESSON 3.2 WORKBOOK. How do normal cells become cancer cells? Workbook Lesson 3.2

Addressing the challenges of genomic characterization of hematologic malignancies using microarrays

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University

By Mir Mohammed Abbas II PCMB 'A' CHAPTER CONCEPT NOTES

Lab Activity 36. Principles of Heredity. Portland Community College BI 233

Unit 5: Genetics Notes

White Paper. Copy number variant detection. Sample to Insight. August 19, 2015

GENOME-WIDE ASSOCIATION STUDIES

Session 4 Rebecca Poulos

SNP Array NOTE: THIS IS A SAMPLE REPORT AND MAY NOT REFLECT ACTUAL PATIENT DATA. FORMAT AND/OR CONTENT MAY BE UPDATED PERIODICALLY.

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis

Distinguishing Second Primary Cancers From Metastases: Statistical Challenges in Testing Clonal Relatedness of Tumors

Compound heterozygosity Yurii S. Aulchenko yurii [dot] aulchenko [at] gmail [dot] com. Thursday, April 11, 13

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

Non-Mendelian inheritance

Cytogenetics Technologies, Companies & Markets

Antigen Presentation to T lymphocytes

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Quantification of Normal Cell Fraction and Copy Number Neutral LOH in Clinical Lung Cancer Samples Using SNP Array Data

Transcription:

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Stanford Biostatistics Workshop Pierre Neuvial with Henrik Bengtsson and Terry Speed Department of Statistics, UC Berkeley September 30, 2010 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 1 / 60

Outline Genotyping microarrays in cancer research 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 2 / 60

Outline Genotyping microarrays in cancer research DNA copy number changes in cancer cells 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 3 / 60

Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genomic changes at the DNA level are hallmarks of cancer We inherited 23 paternal and 23 maternal chromosomes, mostly identical. Normal karyotype Tumor karyotype Our goal: identify CN changes to improve characterization, classification, and treatment of cancers Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 4 / 60

Genotyping microarrays in cancer research Genotypes in a diploid chromosome DNA copy number changes in cancer cells Genotypes in a diploid chromosome Single nucleotide polymorphism G C C T C A G G 10-20 million known SNPs Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 5 / 60

Genotyping microarrays in cancer research Genotypes in a diploid chromosome DNA copy number changes in cancer cells Genotypes in a diploid chromosome Single nucleotide polymorphism A B B B AB BB B A AB A A AA 10-20 million known SNPs Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 6 / 60

Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotypes and copy numbers in a tumor Genotypes and copy numbers in a tumor Tumor with deletion copy-neutral LOH Matched Normal (diploid) Tumor with gain (0,2) - - BB BB BB BB A B B B AB BB A B BB BB ABB BBB (1,2) (0,1) (1,1) - A A A A AA B A AB A A AA A A B A AB AA (1,1) Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 7 / 60

Genotyping microarrays in cancer research DNA copy number changes in cancer cells Parental, minor and major copy numbers Parental copy numbers at genomic locus j: (m j, p j ), the unobserved number of maternal and paternal chromosomes at j. Copy number state at genomic locus j CN = (C 1j, C 2j ), where C 1j = min(m j, p j ) and C 2j = max(m j, p j ). Minor (C 1 ) and major (C 2 ) copy numbers: characterize the above CN events in cancers can be estimated from SNP arrays Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 8 / 60

Outline Genotyping microarrays in cancer research Genotyping microarray data 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 9 / 60

Genotyping microarrays in cancer research Genotyping microarray data Technology: Technology: Copy number and genotyping microarrays Copy number and genotyping microarrays Chip Design Probes DNA T/C CGTGTAATTGAACC GCACATTAACTTGG GCACATCAACTTGG CGTGTAGTTGAACC CCCCGTAAAGTACT TATGCCGCCCTGCG ATACGGCGGGACGC Sample DNA + Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 10 / 60

Genotyping microarrays in cancer research Genotyping microarray data (C 1, C 2 ) can be estimated from SNP arrays For SNP j in sample i, observed signal intensities can be summarized as (θ, β), where θ ij = θ Aij + θ ijb and β ij = θ ijb /θ ij. Total copy numbers C ij = 2 θ ij θ Rj = C 1ij + C 2ij Decrease in heterozygosity DH ij = 2 β ij 1/2 = C 2ij C 1ij C 2ij + C 1ij Notes: DH only defined for SNPs that were heterozygous in the germline Both dimensions are needed to understand what is going on: Copy neutral LOH: CN = (0, 2), normal total copy number Balanced duplication: CN = (2, 2), allelic balance Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 11 / 60

Genotyping microarrays in cancer research Genotyping microarray data The Cancer Genome Atlas (TCGA) Accelerate our understanding of the molecular basis of cancer 20 tumor types: brain (glioblastoma multiforme), ovarian, breast, lung, leukemia (AML)... Large studies: 500 tumor-normal pairs for each tumor type Data levels: DNA copy number, gene expression, DNA methylation Platforms: microarray and sequencing For SNP arrays: identify copy number changes: (C, DH) or (C 1, C 2 ): 1 detection: finding regions 2 classification labeling regions Data shown in this presentation: high-grade serous ovarian adenocarcinoma (OvCa). Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 12 / 60

Outline Genotyping microarrays in cancer research Genotyping microarray data 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 13 / 60

Genotyping microarrays in cancer research No copy number change: (1,1) Genotyping microarray data BB AB AA Homozygous SNPs in the normal sample are highlighted in red. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 14 / 60

Gain: (1, 2) Genotyping microarrays in cancer research Genotyping microarray data BB AB AA Homozygous SNPs in the normal sample are highlighted in red. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 15 / 60

Deletion: (0, 1) Genotyping microarrays in cancer research Genotyping microarray data BB AB AA Homozygous SNPs in the normal sample are highlighted in red. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 16 / 60

Genotyping microarrays in cancer research Copy number neutral LOH: (0, 2) Genotyping microarray data BB AB AA Homozygous SNPs in the normal sample are highlighted in red. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 17 / 60

Genotyping microarrays in cancer research Tumor purity/normal contamination Genotyping microarray data In practice what we call tumor samples are actually a mixture of tumor and normal cells. The ones just shown have the largest fraction of tumor cells in the data set. In presence of normal contamination allele B fractions for heterozygous SNPs are shrunk toward 1/2. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 18 / 60

Genotyping microarrays in cancer research Normal, gain, copy neutral LOH Genotyping microarray data BB AB AA Homozygous SNPs in the normal sample are highlighted in red. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 19 / 60

Genotyping microarrays in cancer research Normal, deletion, copy neutral LOH Genotyping microarray data BB AB AA Homozygous SNPs in the normal sample are highlighted in red. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 20 / 60

Outline TumorBoost: improved power to detect CN changes 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 21 / 60

Outline TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 22 / 60

TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Raw genomic signals: allelic ratios are noisy After preprocessing using the CRMAv2 method βt βn C Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 23 / 60

TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects SNP effect in a region of no CN change in the tumor Instead of three points at (0,0), ( 1 2, 1 2 ) and (1,1), we have three clusters; the observed deviation is a SNP effect: δ ij = β ij µ ij δ is quite reproducible between the normal and the tumor Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 24 / 60

TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects SNP effect in a region where tumor has a gain Homozygous clusters are similar as before Heterozygous cluster is split in two, and tilted Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 25 / 60

TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects SNP effect in a region where tumor is CNNLOH Homozygous clusters are similar as before Heterozygous cluster is even more tilted Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 26 / 60

TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Overview of the TumorBoost method Idea 1 the SNP effect is reproducible between tumor and normal 2 in the normal the truth is easier to infer because we only expect three true allele B fractions, corresponding to genotypes AA, AB, BB. For each SNP, we estimate the SNP effect in the normal hybridization, and subtract it from the tumor. Features No need to know copy number regions in advance Normalization is performed for each SNP separately Only one tumor/normal pair required H. Bengtsson, P. Neuvial, T.P. Speed TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays. BMC Bioinformatics (2010) 11:245. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 27 / 60

TumorBoost: improved power to detect CN changes Proposed normalization strategy Method: taking advantage of SNP effects Estimate the SNP effect in the normal sample as ˆδ Nj = β Nj ˆµ Nj, where ˆµ Nj {0, 1/2, 1} is the normal genotype For homozygous SNPs (ˆµ Nj {0, 1}): β Tj = β Tj β Nj + ˆµ Nj For heterozygous SNPs (ˆµ Nj 1/2): β Tj = { 1 2 βtj β Nj 1 1 2 1 β Tj 1 β Nj if β Tj < β Nj otherwise Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 28 / 60

Outline TumorBoost: improved power to detect CN changes Results: improved signal to noise ratio 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 29 / 60

TumorBoost: improved power to detect CN changes Genomic signals before normalization Results: improved signal to noise ratio Normal, gain, copy neutral LOH Normal, loss, copy neutral LOH βn βt C Chromosome 2 (in Mb) Chromosome 10 (in Mb) Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 30 / 60

TumorBoost: improved power to detect CN changes Genomic signals after normalization Results: improved signal to noise ratio Normal, gain, copy neutral LOH Normal, loss, copy neutral LOH βn βt C Chromosome 2 (in Mb) Chromosome 10 (in Mb) Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 31 / 60

TumorBoost: improved power to detect CN changes Allele B fractions before normalization Results: improved signal to noise ratio Chromosome 10 Chromosome 2 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 32 / 60

TumorBoost: improved power to detect CN changes Allele B fractions after normalization Results: improved signal to noise ratio Chromosome 10 Chromosome 2 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 33 / 60

TumorBoost: improved power to detect CN changes ASCNs before normalization Results: improved signal to noise ratio Chromosome 10 Chromosome 2 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 34 / 60

TumorBoost: improved power to detect CN changes ASCNs after normalization Results: improved signal to noise ratio Chromosome 10 Chromosome 2 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 35 / 60

Outline TumorBoost: improved power to detect CN changes ROC evaluation 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 36 / 60

TumorBoost: improved power to detect CN changes ROC evaluation Detecting changes in allele B fractions allele B fractions: β allele B fractions for heterozygous SNPs mirrored allele B fractions for heterozygous SNPs: ρ = β 1/2 = DH/2 For heterozygous SNPs DH only has one mode so it can be segmented. We use ROC analysis to assess how separated two regions on each side of a known change point in DH are. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 37 / 60

TumorBoost: improved power to detect CN changes ROC evaluation ROC evaluation Available from aroma.cn.eval at: http://aroma-project.org For a given sample: find a clear change point label flanking regions, e.g. NORMAL (1,1) and DELETION (0,1) choose one reference state and one state to call For each value of a threshold τ: Call SNPs below τ a DELETION Count number of true and false DELETIONs. ROC curve is built by adjusting τ. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 38 / 60

TumorBoost: improved power to detect CN changes ROC evaluation ROC evaluation Available from aroma.cn.eval at: http://aroma-project.org For a given sample: find a clear change point label flanking regions, e.g. NORMAL (1,1) and DELETION (0,1) choose one reference state and one state to call For each value of a threshold τ: Call SNPs below τ a DELETION Count number of true and false DELETIONs. ROC curve is built by adjusting τ. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 38 / 60

TumorBoost: improved power to detect CN changes ROC evaluation ROC evaluation Available from aroma.cn.eval at: http://aroma-project.org For a given sample: find a clear change point label flanking regions, e.g. NORMAL (1,1) and DELETION (0,1) choose one reference state and one state to call For each value of a threshold τ: Call SNPs below τ a DELETION Count number of true and false DELETIONs. ROC curve is built by adjusting τ. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 38 / 60

TumorBoost: improved power to detect CN changes ROC evaluation Result: Result: Better detection of allelic imbalances Better detection of allelic imbalances Allelic imbalance NORMAL REGION GAIN 100% TumorBoost Total CNs NORMAL REGION GAIN True-Positive Rate 50% Total CN original 0% False-Positive Rate 50% Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 39 / 60

TumorBoost: improved power to detect CN changes ROC evaluation Complete preprocessing for a single tumor/normal pair Available from aroma.cn and aroma.affymetrix at: http://aroma-project.org Normalization and locus-level summarization using CRMAv2 (Bengtsson et al, 2009) for the normal and the tumor sample separately Naive genotyping of the normal sample: threshold density of β TumorBoost normalization (Bengtsson et al, 2010) Note: genotyping errors can be taken care of by smoothing or using confidence scores. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 40 / 60

TumorBoost: improved power to detect CN changes ROC evaluation Observed power to detect changes (1,1) vs (1,2) (1,1) vs (0,1) Legend: Total copy number Raw allele B fractions Normalized β (naive) Normalized β (Birdseed) (1,2) vs (0,2) (0,1) vs (0,2) TCN is consistent across change points β is not! Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 41 / 60

TumorBoost: improved power to detect CN changes ROC evaluation Expected power to detect changes C varies from one unit in all change points just shown For DH and thus (C 1, C 2 )) it s more complicated: ρ 0.0 0.5 1.0 (0,2) (0,1) (1,3) (1,2) (1,1) (0,0) 0.0 0.5 1.0 Tumor purity Difference in ρ 0.0 0.5 1.0 (1,1) to (0,1) (1,2) to (0,2) (1,1) to (1,2) (0,1) to (0,2) 0.0 0.5 1.0 Tumor purity The expected improvement depends on the type of change point and on normal contamination. Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 42 / 60

TumorBoost: improved power to detect CN changes ROC evaluation What if no matched normal is available? CalMaTe For each SNP: Estimate a calibration function (from observed signals to genotypes) using a set of reference samples Back-transform test samples Before CalMate normalization After CalMate normalization Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 43 / 60

Challenges for detecting and calling of copy number events Outline 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 44 / 60

Challenges for detecting and calling of copy number events Outline Detecting copy number changes from both C and DH 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 45 / 60

Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Changes can be reflected in both dimensions Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 46 / 60

Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH DH has greater detection power than C at a single locus Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 47 / 60

Challenges for detecting and calling of copy number events More informative probes for C than DH Affymetrix GenomeWideSNP 6 Detecting copy number changes from both C and DH All units CN units SNP units Frequency 1,856,069 946,705 909,364 Proportion 100% 51% 49% Unit types All units AA AB BB Frequency 1,856,069 326,500 251,446 331,418 Proportion 100% 18% 14% 18% SNPs by genotype call for sample TCGA-23-1027 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 48 / 60

Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Similar detection power at a fixed resolution Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 49 / 60

Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Need for a truly joint dimensional segmentation method Most methods segment only one of C and DH Some use two-way segmentation: Olshen et al, [PSCBS] A handful are truly two-dimensional : Chen et al, [pscn] Greenman et al, Biostat., 2010, [PICNIC] Sun et al, NAR, 2009, [genocna] Challenges for a truly joint segmentation method A two-dimensional signal Only heterozygous SNPs can be used to detect CN changes from DH Bias in the estimation of DH DH is not Gaussian Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 50 / 60

Challenges for detecting and calling of copy number events Outline Calling: influence of tumor purity, ploidy, and signal saturation 1 Genotyping microarrays in cancer research DNA copy number changes in cancer cells Genotyping microarray data 2 TumorBoost: improved power to detect CN changes Method: taking advantage of SNP effects Results: improved signal to noise ratio ROC evaluation 3 Challenges for detecting and calling of copy number events Detecting copy number changes from both C and DH Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 51 / 60

Challenges for detecting and calling of copy number events Copy numbers are not calibrated Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 52 / 60

ATION Challenges for detecting and calling of copy number events Non-calibrated signals: signal saturation Calling: influence of tumor purity, ploidy, and signal saturation CN obs = f(cn true ) < CN true C obs = f (C true ) < C obs f is unknown Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 53 / 60

Challenges for detecting and calling of copy number events Calling: influence of tumor purity, ploidy, and signal saturation Non-calibrated signals: ploidy and purity Copy number Pure tumor, ploidy = 2 Pure tumor, ploidy > 2 3 2 1 0 110 120 130 140 150 Position (Mb) Copy number 3 2 1 0 110 120 130 140 150 Position (Mb) Allele fractions 1.0 0.5 0.0 110 120 130 140 150 Position (Mb) Allele fractions 1.0 0.5 0.0 110 120 130 140 150 Position (Mb) Non pure tumor, ploidy=2 Non-pure tumor, ploidy < 2 Copy number 3 2 1 0 110 120 130 140 150 Position (Mb) Copy number 3 2 1 0 110 120 130 140 150 Position (Mb) Allele fractions 1.0 0.5 0.0 110 120 130 140 150 Position (Mb) Allele fractions 1.0 0.5 0.0 110 120 130 140 150 Position (Mb) Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 54 / 60

Challenges for detecting and calling of copy number events Calling: influence of tumor purity, ploidy, and signal saturation Purity, ploidy, and signal saturation Why copy numbers are not calibrated signal saturation non purity: presence of normal cells in the tumor sample ploidy: the total amount of DNA is fixed by the assay Remarks ploidy is not identifiable purity and ploidy are biological properties of the sample signal saturation is an artifact from the assay under the rug: tumor heterogeneity Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 55 / 60

Challenges for detecting and calling of copy number events Calling: influence of tumor purity, ploidy, and signal saturation OverUnder: Attiyeh et al, Genome Research, 2009 Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 56 / 60

Challenges for detecting and calling of copy number events Calling: influence of tumor purity, ploidy, and signal saturation GAP: Popova et al, Genome Biology, 2009 2 1 4 6 8 10 12 14 16 18 20 22 (a) BAF 0 1 (b) LRR 0-1 1 3 50000 0 5 100000 7 150000 9 11 200000 13 15 17 250000 19 21 X 300000 SNP Index B 0.2 0.4 0.6 0.8 1.0 B allele frequency Pierre Neuvial (UC Berkeley) ~70% tumor cells (d) 0.0 Log R ratio -0.2 0.0 0.2 Sub-clones -0.4 0.4 A 0.0 BB AB -0.4 AA Log R ratio -0.2 0.0 0.2 0.4 Log R ratio -0.2 0.0 0.2 ABB AAB -0.4 (c) 0.4 Homozygous regions Germline -- Acquired 0.2 0.4 0.6 0.8 1.0 B allele frequency SNP array data analyses in cancer studies p_baf = 0.3 q_lrr = 0.3 0.0 (e) 0.2 0.4 0.6 0.8 1.0 B allele frequency 2010-09-30 57 / 60

Challenges for detecting and calling of copy number events ASCAT: Van Loo et al, PNAS, 2010 Calling: influence of tumor purity, ploidy, and signal saturation Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 58 / 60

Challenges for detecting and calling of copy number events Calling: influence of tumor purity, ploidy, and signal saturation Comments on existing approaches What about Affymetrix data? Choice between candidate solutions Perform ad hoc correction for saturation Tumor heterogeneity? Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 59 / 60

Challenges for detecting and calling of copy number events Calling: influence of tumor purity, ploidy, and signal saturation Thanks Henrik Bengtsson Adam Olshen Maria Ortiz Angel Rubio Venkat E. Seshan Terry Speed Paul Spellman Nancy R. Zhang Funding: The Cancer Genome Atlas Pierre Neuvial (UC Berkeley) SNP array data analyses in cancer studies 2010-09-30 60 / 60