Understanding DNA Copy Number Data

Similar documents
Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

Comparison of segmentation methods in cancer samples

Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre

Introduction to LOH and Allele Specific Copy Number User Forum

Identification of regions with common copy-number variations using SNP array

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Chromothripsis: A New Mechanism For Tumorigenesis? i Fellow s Conference Cheryl Carlson 6/10/2011

DNA-seq Bioinformatics Analysis: Copy Number Variation

Systematic Analysis for Identification of Genes Impacting Cancers

Structural Variation and Medical Genomics

Distinguishing Second Primary Cancers From Metastases: Statistical Challenges in Testing Clonal Relatedness of Tumors

Global variation in copy number in the human genome

CNV Detection and Interpretation in Genomic Data

Genomic structural variation

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

Cytogenetics 101: Clinical Research and Molecular Genetic Technologies

Contents. 1.5 GOPredict is robust to changes in study sets... 5

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

Package Clonality. June 12, 2018

Memorial Sloan-Kettering Cancer Center

Cost effective, computer-aided analytical performance evaluation of chromosomal microarrays for clinical laboratories

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

Prenatal Diagnosis: Are There Microarrays in Your Future?

Integrated Analysis of Copy Number and Gene Expression

Genetic Association Testing of Copy Number Variation

Human Cancer Genome Project. Bioinformatics/Genomics of Cancer:

CHROMOSOMAL MICROARRAY (CGH+SNP)

LTA Analysis of HapMap Genotype Data

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

Applications of Chromosomal Microarray Analysis (CMA) in pre- and postnatal Diagnostic: advantages, limitations and concerns

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases

Harvard University. A Pseudolikelihood Approach for Simultaneous Analysis of Array Comparative Genomic Hybridizations (acgh)

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

Analysis of CGH and SNP arrays for the detection of chromosomal aberrations in single cells

Multiple Copy Number Variations in a Patient with Developmental Delay ASCLS- March 31, 2016

Bin Liu, Lei Yang, Binfang Huang, Mei Cheng, Hui Wang, Yinyan Li, Dongsheng Huang, Jian Zheng,

Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer

Assessment of Breast Cancer with Borderline HER2 Status Using MIP Microarray

Nature Biotechnology: doi: /nbt.1904

Computational Analysis of Genome-Wide DNA Copy Number Changes

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

The laws of Heredity. Allele: is the copy (or a version) of the gene that control the same characteristics.

Biostatistical modelling in genomics for clinical cancer studies

Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach

Compound heterozygosity Yurii S. Aulchenko yurii [dot] aulchenko [at] gmail [dot] com. Thursday, April 11, 13

November 9, Johns Hopkins School of Medicine, Baltimore, MD,

Genomic complexity and arrays in CLL. Gian Matteo Rigolin, MD, PhD St. Anna University Hospital Ferrara, Italy

Optimizing Copy Number Variation Analysis Using Genome-wide Short Sequence Oligonucleotide Arrays

Associating Copy Number and SNP Variation with Human Disease. Autism Segmental duplication Neurobehavioral, includes social disability

SNP Array NOTE: THIS IS A SAMPLE REPORT AND MAY NOT REFLECT ACTUAL PATIENT DATA. FORMAT AND/OR CONTENT MAY BE UPDATED PERIODICALLY.

Genomic Instability. Kent Nastiuk, PhD Dept. Cancer Genetics Roswell Park Cancer Institute. RPN-530 Oncology for Scientist-I October 18, 2016

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

False Discovery Rates and Copy Number Variation. Bradley Efron and Nancy Zhang Stanford University

Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep

Supplementary Figures

CNV detection. Introduction and detection in NGS data. G. Demidov 1,2. NGSchool2016. Centre for Genomic Regulation. CNV detection. G.

Comparing CNV detection methods for SNP arrays Laura Winchester, Christopher Yau and Jiannis Ragoussis

A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High- Resolution acgh Data

Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK 2

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

A. Incorrect! Cells contain the units of genetic they are not the unit of heredity.

SALSA MLPA KIT P078-B1 Breast Tumour Lot 0210, 0109

S1 Appendix: Figs A G and Table A. b Normal Generalized Fraction 0.075

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF BIOCHEMISTRY AND MOLECULAR BIOLOGY

Next Generation Sequencing as a tool for breakpoint analysis in rearrangements of the globin-gene clusters

Ginkgo Interactive analysis and quality assessment of single-cell CNV data

Analysis of acgh data: statistical models and computational challenges

SNP Array NOTE: THIS IS A SAMPLE REPORT AND MAY NOT REFLECT ACTUAL PATIENT DATA. FORMAT AND/OR CONTENT MAY BE UPDATED PERIODICALLY.

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Approach to Mental Retardation and Developmental Delay. SR Ghaffari MSc MD PhD

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022

Progressive Genomic Instability in the FVB/Kras[superscript LA2] Mouse Model of Lung Cancer

Multimarker Genetic Analysis Methods for High Throughput Array Data

Clinical Interpretation of Cancer Genomes

Penalized weighted low-rank approximation for robust recovery of recurrent copy number variations

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

BACOM2: a Java tool for detecting normal cell contamination of copy number in heterogeneous tumor

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University

Implementation of the DDD/ClinGen OGT (CytoSure v3) Microarray

Copy number and somatic mutations drive tumors

Application of Whole Genome Microarrays in Cancer: You should be doing this test!!

DETECTING HIGHLY DIFFERENTIATED COPY-NUMBER VARIANTS FROM POOLED POPULATION SEQUENCING

ABS04. ~ Inaugural Applied Bayesian Statistics School EXPRESSION

Package diggitdata. April 11, 2019

Bayesian Random SegmentationModels to Identify Shared Copy Number Aberrations for Array CGH Data

Vega: Variational Segmentation for Copy Number Detection

Cancer Genomics (Current technologies & clinical implication)

Structural Variants and Susceptibility to Common Human Disorders Dr. Xavier Estivill

DNA Copy Number Variation in Autism. A Senior Honors Thesis

Interactive analysis and quality assessment of single-cell copy-number variations

Illuminating the genetics of complex human diseases

AD (Leave blank) TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients

Investigating rare diseases with Agilent NGS solutions

GENOME-WIDE ASSOCIATION STUDIES

Transcription:

Understanding DNA Copy Number Data Adam B. Olshen Department of Epidemiology and Biostatistics Helen Diller Family Comprehensive Cancer Center University of California, San Francisco http://cc.ucsf.edu/people/olshena_adam.php May 20, 2010

Background The DNA sequence copy number at any locus in a genome is the number of copies of genomic DNA. The normal copy number is two for human autosomes. Copy number alterations are gains or losses of DNA They modify the function and/or expression of genes They are common in cancer: copy number is - Increased at sites of oncogenes; - Decreased at sites of tumor suppressor genes.

Array CGH: Hybridization and Analysis Forward hybridization Reverse hybridization Reference DNA Tumor DNA Reference DNA Tumor DNA Cy5 Cy3 Cy3 Cy5 Co-hybridization Co-hybridization Scan Scan Genomic markers (BAC, cdna, or oligo) Normal Loss Genomic markers (BAC, cdna, or oligo) Normal Gain Gain Loss Array analysis - resulting data are normalized log test over reference intensities for genomic markers

DNA Copy Number Arrays-Array CGH Platform # Colors Type of Markers # of Markers Commercial BAC 2 BACs (100-200kb) 1000-3000 Yes 31,000 cdna 2 cdnas (1kb) 1000-20,000 No ROMA 2 long oligo 85,000 No 400,000 Agilent 2 long oligo 244,000 Yes 1,000,000 Nimblegen 2 long oligo 400,000 Yes Affymetrix 1 short oligo 500,000 Yes 1,800,000 Mips 1 short oligo 50,000 Yes Illumina beads 1,200,000 Yes Review: Pinkel & Albertson (2005, Nature Genetics)

Example Breast ROMA array with 9820 probes log 2 (Test Reference) 2 1 0 1 2 8 Copies 7 Copies 6 Copies 5 Copies 4 Copies 3 Copies 2 Copies 1 Copy Genomic Position

Additional Complications 1. Tumor samples are a mixture of tumor and normal cells, and the amount of normal contamination is often unknown. 2. Tumor cells are not homogeneous. Gains and losses may occur in differing proportions of cells. 3. There is (usually) no gold standard of similar resolution.

Analysis Goals 1. Reconstruct the copy number state for the entire genome 2. Identify the regions of gain and loss 3. Divide the genome into regions of equal copy number

Analysis Approaches Smoothing (Hsu et al.; Tibshirani and Wang) Hidden Markov models (Fridlyand et al.; Guha et al.) Segmentation (Picard et al.; Venkatraman and Olshen) Methods compared by Lai et al. (2005, Bioinformatics)

Segmentation Approach log 2 (T R) 1.0 0.5 0.0 0.5 1.0 Gains Losses 0 20 40 60 80 100 120 Position in Chromosome

Circular Binary Segmentation View the data as if on a circle and segment into two arcs. Hence named circular binary segmentation (CBS). This results in two or three segments of the original data. Test statistic: T = max T ij, where 0 <i<j m Ȳ ij Z ij T ij = s ij (j i) 1 +(m j + i) 1, =(X i+1 +...+ X j )/(j i), Ȳ ij Z ij =(X 1 +...+ X i + X j+1 +...+ X m )/(m j + i), and s 2 ij is the corresponding mean square error. Split data if P (T >T obs ) α (probability under null of no change-point) and recurse until no further splits. Estimate probability by permutation.

Permutations Real, T=30.9 Permuted 1, T=3.1 log 2 (T R) 1.0 0.5 0.0 0.5 1.0 log 2 (T R) 1.0 0.5 0.0 0.5 1.0 0 20 40 60 80 100 120 Position in Chromosome 0 20 40 60 80 100 120 Position in Chromosome Permuted 2, T=3.1 Permuted 3, T=2.9 log 2 (T R) 1.0 0.5 0.0 0.5 1.0 log 2 (T R) 1.0 0.5 0.0 0.5 1.0 0 20 40 60 80 100 120 Position in Chromosome 0 20 40 60 80 100 120 Position in Chromosome

Circular?

We Made it Faster 9820 probes, max probe count 824, time: 347s vs 13s. log2(t/r) 2 1 0 1 2 0 500 1000 1500 2000 2500 3000 Genomic Position Cyan permutation Black hybrid with early stopping

Gains and Losses via Plateau Plot

Gains and Losses vis Plateau Plot

Looking Across Samples

Clustering Samples

Defining Regions Patients 0 10 20 30 40 50 60 _ 0 10 20 30 40 50 60 70 Markers

Defining Regions cont. Patients 0 10 20 30 40 50 60 _ 0 10 20 30 40 50 60 70 Markers

Defining Regions cont. Identifying regions of high frequency gain or lossminimal common regions (MCRs) Genomic Identification of Significant Targets in Cancer (GISTIC) integration of magnitude as well as frequencies (Beroukhim, PNAS, 2007) Significance Testing for Aberrant Copy number (STAC) (Diskin et al., Genome Res., 2006) Multiple Sample Analysis (MSA) (Guttman et al., PLoS Genetics, 2007)

Advanced Topics 1. SNP Arrays, Genoytping and Allele-Specific Copy Number 2. Copy Number Variation 3. Clonality

SNP Arrays and Genotyping SNP arrays can be used for tradional copy number analysis. They can also used by used for genotyping: genome-wide association studies integration of copy number and loss of heterozygosity (LOH)

Two Regions of Normal Copy Number? Copy Number 0 1 2 3 4 Copy Number 0 1 2 3 4 0 200 400 600 800 1000 Position 0 200 400 600 800 1000 Position

Two Regions of Normal Copy Number? Copy Number 0 1 2 3 4 Copy Number 0 1 2 3 4 0 200 400 600 800 1000 Position 0 200 400 600 800 1000 Position TRUE TRUE Homozygotes Homozygotes FALSE FALSE 0 200 400 600 800 1000 Position 0 200 400 600 800 1000 Position

Allele-Specific Copy Number Traditional methods measure the sum of the copy numbers from the two parental chromosomes. This is total copy number. SNP arrays can be used for allele-specific copy number (PSCN). A copy number of 2 for a SNP could mean: 1+1(normal) or 0+2(both altered).

Raw Data Total Copy Number A Copy Number SQRT Copy Number 0 1 2 3 4 SQRT Copy Number 0 1 2 3 4 0 500 1000 1500 2000 2500 3000 Genomic Position 0 500 1000 1500 2000 2500 3000 Genomic Position B Copy Number SQRT Copy Number 0 1 2 3 4 0 500 1000 1500 2000 2500 3000 Genomic Position

PSCBS Segmentation

An Example Sample data for PSCN of 1 (paternal chrom.) and 2 (maternal chrom.) Observe A B A B A B A B A B A B A B A B Maternal A A A B B B A A Paternal A B A B A A A A 1 2 3 4 5 6 7 8 SNP SNP Genotype A Copy Number B Copy Number Total Copy Number 1 AA 3.3 0 3.3 2 AB 1.9 1.1 3.0 3 AA 2.9 0 2.9 4 BB 0 3.0 3.0 5 AB 0.8 2.1 2.9 6 AB 1.0 2.2 3.2 7 AA 2.9 0 2.9 8 AA 3.0 0 3.0

PSCBS 1. Genotyping 2. Two Rounds of Segmentation 3. Calling Copy Number

Finding Heterozygotes Minimum of A Allele and B Allele 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Heterozygotes Homozygotes Minimize p l, p is points in the window, l in length of window

PSCBS 1. Genotyping 2. Two Rounds of Segmentation 3. Calling Copy Number

First Round of Segmentation If the total copy number has not changed, generally neither PSCN has changed. Start by running CBS on total copy number.

Second Round of Segmentation

Second Round of Segmentation The B-allele frequency (BAF) is B/(A+B)=B/Total CN. If PSCN changes, so should the BAF of the heterozygotes (based on paired normal). We use BAF from TumorBoost (Bengtsson et al., in press), which adjusts the tumor BAF based on the paired normal BAF. Segment on the decrease in heterozygosity ρ =2 BAF 0.5.

Second Round of Segmentation

PSCBS 1. Genotyping 2. Two Rounds of Segmentation 3. Calling Copy Number

Copy Number within Segments Four distinct clusters if unequal parent-specific copy number, three if equal copy number, two if LOH B Allele Copy Number 0 1 2 3 4 5 6 Chromosome 8p 0 1 2 3 4 5 6 A Allele Copy Number B Allele Copy Number 0 1 2 3 4 5 6 Chromosome 12q 0 1 2 3 4 5 6 A Allele Copy Number B Allele Copy Number 0 1 2 3 4 5 6 Chromosome 1p 0 1 2 3 4 5 6 A Allele Copy Number

Densities of BAFs

Parent-Specific CBS Algorithm 1. Assign genotypes based on minimum allele. 2. Run CBS on total copy number to get an estimate of the sum of the parent-specific copy numbers. 3. Run CBS within segments on 2 BAF 0.5 of heterozygotes. 4. Test for regions of LOH. 5. Test whether the parent-specific copy numbers are equal, whichisif the mode of the BAF =0.5. 6. If not LOH and unequal, estimate the difference of the parent-specific copy numbers within every CBS segment using only the heterozygotes. 7. From 2) and 6) estimate the parent-specific copy numbers.

Reconstruction

Advanced Topics 1. SNP Arrays, Genotyping and Allele-Specific Copy Number 2. Copy Number Variation 3. Clonality

Copy Number Variation Copy number variations are gains or losses in the germ line >1Kb (Redon et al., 2006). They have been associated with familial cancer (Lucito et al., 2007) and other complex disease (Sebat et al., 2007). When analyzing cancer samples it is important to distinguish between variations and cancer aberrations. Large regions of gain or loss are aberrations; small regions could be either.

The Cancer Genome Atlas (TCGA) TCGA is an NIH-funded project whose goal is better understanding of cancer though large-scale sequencing. To decide which genes to sequence, Cancer Genome Characterization Centers (CGCCs) were set up. Each CGCC applies a different array technology to the same hundreds of glioblastoma, lung, and ovarian samples. TCGA samples are supposed to have matching normals.

ACGH Matched Sample log2(test Reference) 2.0 1.0 0.0 1.0 2.0 Tumor 8 Copies 7 Copies 6 Copies 5 Copies 4 Copies 3 Copies 2 Copies 1 Copy log2(test Reference) 2.0 1.0 0.0 1.0 2.0 Normal 8 Copies 7 Copies 6 Copies 5 Copies 4 Copies 3 Copies 2 Copies 1 Copy

Current Methods for Handling CNVs Compile data from Toronto database (http://projects.tcag.ca/variation/) Then Screen out CNV regions from analysis Screen out regions within samples that may be due to CNVs Screen out regions defined by multiple samples that may be due to CNVs Ignore the problem completely

Predicting Copy Number Variation Segment both the tumor and matching normal sample. An observation is a small region of gain or loss in the tumor. The class label is whether the region is a CNV in the normal sample. The predictors are whether the region was found in the literature, the length of the region, etc. We used 43 matched pairs for training and 36 as a test set. Once the classification model is fit, it can be used to predict in cases where there is no matching sample.

Univariate Results Significant predictors Segment mean Gains and losses Other patients Literature (count) Literature (subjects) Length Segmental duplication region Not significant predictors Near centromere Near telomere

Multivariate Results in.lit< 2.606 in.lit>=2.606 absmeansd< 3.359 absmeansd>=3.359 0 in.lit< 6.073 in.lit>=6.073 1 segdup< 0.5 segdup>=0.5 1 0 1

Multivariate Results Best in literature models 80% accurate CART models 82% accurate Random Forests models 84% accurate Matched normals 94% accurate (no CNVs) Normals 70% accurate (all CNVs)

Advanced Topics 1. SNP Arrays, Genotyping and Allele-Specific Copy Number 2. Copy Number Variation 3. Clonality

Second Cancers When a second cancer appears, it could be a 1. Second primary - independent origin 2. Metastasis - clonal origin Distinguishing between the two has clinical importance: a metastasis is more serious Pathological decision criteria include histological type, stage, and anatomic location Molecular markers may improve decision making Begg with others have used LOH data for this purpose Can array DNA copy number data be used?

LOH and Clonality Clonal Case Ambiguous Case Independent Case Locus L R L R L R 1p - - - - - - 1q - - - - - + 3p + + - + + - 5q - - - + - - 6q + + + + + + 8p - - + + NA NA 11p NA NA NA NA NA NA 11q - - - - - - 13q - - + + + - 16q + + + + - - 17p - - - - - + 17q + + + + + + 18q - - + - - + 22q - - + + NA NA

DNA Copy Number Example 2 1 0 1 2 1 3 5 7 9 11 13 15 17 19 21 23 31 33 35 37 39 45 2 1 0 1 2 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 404244 46 Chromosome Arm (Genomic Order)

Lung Data 20 patients with paired non-small cell lung cancers (Girard et al., 2009) Samples hybridized to Agilent 244K array. Clinical and mutation data included. Clinical decision based on second primaries or metastases. The former may be spared adjuvant chemotherapy.

A Clonal Example (LR2=7.9E+23)

An Indep. Example (LR2=7.3E-06)

An Equivocal Example (LR2=0.3)

Lung Summary Copy number classification contradicted clinical classification in 4 of 20 cases. Additional 4 cases called equivocal by copy number. Three clinical indep. called clonal and supported by matching somatic mutations; one clinical clonal called indep. and no mutation data. Copy number data may be useful in clinical decision making.

Software CBS can be found in an R library called DNAcopy that is part of Bioconductor (www.bioconductor.org). PSCBS can be found in the library PSCBS on R-Forge (r-forge-r-project.org). CNV and clonality methods can be found on Irina Ostrovnaya s web page.

References 1. Begg, C. et al. (2006). Statistical Tests for Clonality. Biometrics. 2. Beroukhim, R. et al. (2007). Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma. PNAS 104 20007-20012. 3. Diksin, S. et al. (2004). STAC: A method for testing the significance DNA copy number aberrations across multiple array-cgh experiments. Genome Res. 16 1149-1158 4. Fridlyand, J. et al. (2004). Understanding Array CGH data. JMVA 90 132-153. 5. Girard, N. et al. (2009). Genomic and mutational profiling to assess clonal relationships between multiple non-small cell lung cancers. Clin Cancer Res. 15 5184-5190. 6. Guha, S. et al. (2006). Bayesian Hidden Markov Modeling of Array CGH Data. Harvard University Biostatistics Working Paper Series. Working paper 24. 7. Guttman, M. et al. (2007). Assessing the significance of conserved genomic aberrations using high resolution genomic microarrays. PLoS Genetics 3 e143. 8. Hardenbol, P. et al. (2003). Multiplexed genotyping with sequence-tagged molecular inversion probes. Nat. Biotechnol 21 673-678. 9. Hsu, L. et al. (2005). Denoising array-based comparative genomic hybridization data using wavelets. Bioinformatics 6 211-226. 10. Lai, WR. et al. (2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21 3763-3770. 11. Lucito, R. et al. (2007). Copy-Number variants in patients with a strong family history of pancreatic cancer. Cancer Biol Ther. Epub ahead of print.

References 12. Olshen, A. et al. (2010). Extension of circular binary segmentation to parent-specic copy number. Submitted. 13. Olshen, A., Venkatraman, E., Lucito, R. and Wigler, M. (2004). Circular Binary Segmentation for the analysis of array-based DNA copy number data. Biostatistics 5 557-572. 14. Ostrovnaya, A. et al. (2010). A metastasis or a second independent cancer? Evaluating the clonal origin of tumors using array copy number data. Stat Med. Epub ahead of print. 15. Ostrovnaya, A. et al. (2010). A classification model for distinguishing copy number variants from cancer-related alterations BMC Bioinformatics In press. 16. Picard, F. (2005). A statistical approach for array CGH data analysis. BMC Bioinformatics. 6 27. 17. Pinkel, D. and Albertson, D.G. (2005). Array comparative genomic hybridization and its applications in cancer. Nat Genet. 37 S11-S17. 18. Redon, R. et al. (2006). Global variation in copy number in the human genome. Nature. 444 444-454. 19. Sebat, J. et al. (2007). Strong association of de novo copy number mutations with autism. Science. 316 445-449. 20. Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hotspot detection for CGH data using the fused lasso. Biostatistics 9 18-29. 21. Venkatraman, E.S. and Olshen, A.B. (2007). A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23 657-663.