Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Similar documents
Genomic structural variation

Structural Variation and Medical Genomics

Nature Biotechnology: doi: /nbt.1904

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

Global variation in copy number in the human genome

Integrated detection and population-genetic analysis of SNPs and copy number variation

Integrated detection and population-genetic analysis of SNPs and copy number variation

Prenatal Diagnosis: Are There Microarrays in Your Future?

Integrated detection and population-genetic analysis. of SNPs and copy number variation

CNV Detection and Interpretation in Genomic Data

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative

Genome-Wide Analysis of Copy Number Variations in Normal Population Identified by SNP Arrays

Agilent s Copy Number Variation (CNV) Portfolio

Cytogenetics 101: Clinical Research and Molecular Genetic Technologies

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

Understanding DNA Copy Number Data

Introduction to LOH and Allele Specific Copy Number User Forum

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

Association for Molecular Pathology Promoting Clinical Practice, Basic Research, and Education in Molecular Pathology

LTA Analysis of HapMap Genotype Data

CHROMOSOMAL MICROARRAY (CGH+SNP)

Associating Copy Number and SNP Variation with Human Disease. Autism Segmental duplication Neurobehavioral, includes social disability

Multimarker Genetic Analysis Methods for High Throughput Array Data

CNV detection. Introduction and detection in NGS data. G. Demidov 1,2. NGSchool2016. Centre for Genomic Regulation. CNV detection. G.

November 9, Johns Hopkins School of Medicine, Baltimore, MD,

Multiple Copy Number Variations in a Patient with Developmental Delay ASCLS- March 31, 2016

Introduction to Genetics and Genomics

DNA Copy Number Variation in Autism. A Senior Honors Thesis

SNP Array NOTE: THIS IS A SAMPLE REPORT AND MAY NOT REFLECT ACTUAL PATIENT DATA. FORMAT AND/OR CONTENT MAY BE UPDATED PERIODICALLY.

Clinical Interpretation of Cancer Genomes

Introduction. 8 These authors contributed equally to this work

Identification of regions with common copy-number variations using SNP array

An International System for Human Cytogenetic Nomenclature (2013)

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

Mutational and selective effects on copy-number variants in the human genome

CURRENT GENETIC TESTING TOOLS IN NEONATAL MEDICINE. Dr. Bahar Naghavi

Genetics and Genomics in Medicine Chapter 8 Questions

Structural Variants and Susceptibility to Common Human Disorders Dr. Xavier Estivill

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits?

GENOME-WIDE ASSOCIATION STUDIES

Genomics 101 (2013) Contents lists available at SciVerse ScienceDirect. Genomics. journal homepage:

Genetic Association Testing of Copy Number Variation

and SNPs: Understanding Human Structural Variation in Disease. My

Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases

SNP Array NOTE: THIS IS A SAMPLE REPORT AND MAY NOT REFLECT ACTUAL PATIENT DATA. FORMAT AND/OR CONTENT MAY BE UPDATED PERIODICALLY.

Genetic Testing for Single-Gene and Multifactorial Conditions

Application of Whole Genome Microarrays in Cancer: You should be doing this test!!

Practical challenges that copy number variation and whole genome sequencing create for genetic diagnostic labs

Approach to Mental Retardation and Developmental Delay. SR Ghaffari MSc MD PhD

Cost effective, computer-aided analytical performance evaluation of chromosomal microarrays for clinical laboratories

Vega: Variational Segmentation for Copy Number Detection

Application of Array-based Comparative Genome Hybridization in Children with Developmental Delay or Mental Retardation

Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

Applications of Chromosomal Microarray Analysis (CMA) in pre- and postnatal Diagnostic: advantages, limitations and concerns

Copy number variation detection and genotyping from exome sequence data

Interactive analysis and quality assessment of single-cell copy-number variations

Nature Genetics: doi: /ng Supplementary Figure 1

What s the Human Genome Project Got to Do with Developmental Disabilities?

DETECTING HIGHLY DIFFERENTIATED COPY-NUMBER VARIANTS FROM POOLED POPULATION SEQUENCING

National Disease Research Interchange Annual Progress Report: 2010 Formula Grant

New and Developing Technologies for Genetic Diagnostics National Genetics Reference Laboratory (Wessex) Salisbury, UK - July 2010 BACs on Beads

Optimizing Copy Number Variation Analysis Using Genome-wide Short Sequence Oligonucleotide Arrays

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Supplementary Material to. Genome-wide association study identifies new HLA Class II haplotypes strongly protective against narcolepsy

Chapter 1 : Genetics 101

UNIVERSITI TEKNOLOGI MARA COPY NUMBER VARIATIONS OF ORANG ASLI (NEGRITO) FROM PENINSULAR MALAYSIA

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Copy Number Variations and Association Mapping Advanced Topics in Computa8onal Genomics

High Throughput Sequence (HTS) data analysis. Lei Zhou

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

22q11.2 DELETION SYNDROME. Anna Mª Cueto González Clinical Geneticist Programa de Medicina Molecular y Genética Hospital Vall d Hebrón (Barcelona)

Seven cases of intellectual disability analysed by genomewide SNP analysis. Rodney J. Scott

Large multi-allelic copy number variations in humans

Copy number variants and pharmacogenomics

RNA SEQUENCING AND DATA ANALYSIS

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Chromothripsis: A New Mechanism For Tumorigenesis? i Fellow s Conference Cheryl Carlson 6/10/2011

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Towards a Universal Law Controlling All Human Cancer Chromosome LOH Deletions, Perspectives in Prostate and Breast Cancers Screening

Children, Toronto, Ontario, Canada. Department of Laboratory Medicine and Pathobiology Hospital for Sick Children, Toronto, Ontario, Canada, M5G 1X8

ARTICLE Population Analysis of Large Copy Number Variants and Hotspots of Human Genetic Disease

Outline. Outline. Phillip G. Febbo, MD. Genomic Approaches to Outcome Prediction in Prostate Cancer

Implementation of the DDD/ClinGen OGT (CytoSure v3) Microarray

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

New Enhancements: GWAS Workflows with SVS

Understanding the Human Karyotype Colleen Jackson Cook, Ph.D.

The Human Major Histocompatibility Complex

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Population Genetics of Structural Variation Speaker Dr. Don Conrad

Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer

Genome Structural Variation

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High- Resolution acgh Data

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland

Supplementary Figures

IHCP bulletin INDIANA HEALTH COVERAGE PROGRAMS BT MARCH 13, 2012

Transcription:

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies Here we compare the results of this study to potentially overlapping results from four earlier studies of human structural variation. These studies used three different experimental approaches: 1. Representational oligonucleotide microarray (ROMA) analysis, which involves hybridization of a size-selected set of genomic restriction fragments to an oligonucleotide microarray; 2. comparative genomic hybridization to microarrays of bacterial artificial chromosomes (BAC array CGH); 3. Sequencing the ends of 589,275 fosmids from a single individual, and searching for paired end reads that map more than 48 kb apart on the reference sequence (which identified deletions in that individual relative to the reference sequence, since virtually no fosmids have inserts that large); and the current work, which searches for particular aberrant patterns of genotypes in SNP genotype data. Method Ref. Requires Individuals assayed Potential deletion variants identified ROMA 1 Microarray 19 76 CNPs (of which an unknown subset are deletions) BAC array CGH 2 Microarray 55 255 LCVs (of which an unknown subset are deletions) BAC array CGH 3 Microarray 47 119 CNPs (of which an unknown subset are deletions) Fosmid end reads 4 Resequencing 1 101 deletions SNP genotypes this work SNP genotypes 269 540 deletions Deletions vs. multi-copy duplications Copy number variation can result from either a deletion variant (a haplotype containing no copies of the sequence) or a multi-copy duplication (different haplotypes carrying different positive numbers of copies of the sequence). ROMA and BAC array CGH identify sites of copy number variation, or gains and loss of copy number in a proband relative to a reference sample. Because absolute copy number in the reference sample is not known, a copy number loss identified by ROMA or BAC array CGH could represent a deletion variant in the proband, or a multi-copy duplication that is present in more copies in the reference sample than in the proband.

Sensitivity as a function of variant size The sensitivity of all four approaches to detecting variants is strongly related to the size of the variant. This is evident from the technical requirements of all four approaches. ROMA BAC array CGH Fosmid end reads SNP genotypes Factor determining size sensitivity Requirement of differential hybridization to at least three consecutive probes (which are on average 32 kb apart) Requirement of detectable differential hybridization to a BAC probe (about 150 kb in size) The deletion must be larger than natural variation in fosmid insert sizes (+/- 8 kb). SNP density (at least two distinct SNPs must yield the same pattern of aberrant genotypes) Most CNPs discovered by ROMA are larger than 100 kb (only 7 are smaller than 20 kb):

The sizes of deletion variants identified by Tuzun et al. (2005) from fosmid end pair sequencing are estimated from the apparent discrepancy of fosmid insert sizes (relative to the expected 40 kb). A discrepancy of at least 8 kb is required for discovery, due to natural variation in the sizes of fosmid inserts.

The size distributions of deletion variants identified in this work is estimated from the distance spanned by the aberrant SNP genotypes.

How different methods locate variants To assess whether different approaches have identified the same variant, it is important to understand how each approach identifies the location of a variant. No current approach identifies the exact breakpoints of a variant (though the resequencing of fosmids that contain variants will ultimately accomplish this). The current work identifies the SNPs that are covered by a deletion variant. These are inner boundaries : the breakpoints of the deletion should lie outside of these SNPs. The resolution of these boundaries depends on SNP density, which is about one SNP per 3 kb in the current version of HapMap but will be one SNP per 1 kb in future versions. The fosmid-end-sequencing approach identifies paired end reads that flank a deletion variant. These are outer boundaries : the breakpoints of the deletion should lie inside of these sequence reads. These boundaries are initially at least 48 kb apart, but the discovery of additional, overlapping fosmids that also cover the deletion variant can refine these boundaries. ROMA identifies a series of restriction fragments that show differential hybridization between a proband and a reference sample. These are thought to lie inside the variant and therefore to be inner boundaries. The principal limit on precision is that the restriction fragments are separated by an average of 32 kb genomic distance. BAC array CGH identifies a BAC-sized region (about 150 kb) in which a significant amount of sequence is present in greater or fewer copies in a proband relative to a reference sample. It identifies a neighborhood that contains, overlaps with, or is contained by a large variant. Mutual discoveries Mutual discoveries between this work and the fosmid-end-sequencing method To assess the mutual discoveries between this work and the fosmidend-sequencing approach, we therefore looked for deletion variants

from our set (540 variants) that fell completely inside deletion variants identified from the fosmid approach (102 variants). We found 28 such mutual discoveries (vs. less than one expected by chance). 25 of these 28 variants were identified in more than one individual in our study, suggesting that they are common variants (making them more likely to have been sampled in the single individual from whom the fosmid library was constructed). As the fosmid-end-pair-sequencing approach will ultimately be applied to additional individuals (including some of the same individuals sampled for HapMap), we expect these approaches to converge toward agreement on a set of common deletion variants. The end-pair-sequencing approach also detects insertions relative to the reference sequence, which our approach does not; we found no overlaps between the insertions identified in that work and the deletions identified in the present work. This work Tuzun et al., 2005 Chrom SNP Rightmost SNP Left end-read boundary Right end-read boundary chr1 34,606,761 34,610,715 34,591,979 34,617,030 chr1 72,137,668 72,176,870 72,104,907 72,193,110 chr1 109,527,309 109,534,259 109,522,724 109,551,608 chr1 149,771,758 149,800,260 149,766,538 149,825,054 chr1 149,977,953 149,986,389 149,963,017 149,990,180 chr2 89,039,268 89,049,267 89,008,235 89,065,342 chr2 147,075,728 147,086,685 147,071,619 147,096,571 chr3 163,833,596 163,943,569 163,829,860 163,953,604 chr3 194,196,286 194,205,086 194,187,307 194,212,927 chr3 194,457,389 194,459,618 194,444,572 194,483,617 chr4 9,969,524 9,980,122 9,949,506 9,989,442 chr6 103,784,319 103,807,031 103,757,028 103,813,103 chr7 97,008,440 97,012,729 96,997,304 97,035,029 chr7 109,002,325 109,011,761 108,987,475 109,022,463 chr7 115,492,184 115,494,416 115,472,521 115,507,536 chr7 141,456,537 141,472,512 141,455,775 141,511,462 chr7 141,921,685 141,931,471 141,902,964 141,956,537 chr8 6,810,705 6,811,452 6,802,213 6,847,036 chr8 51,082,185 51,083,978 51,077,741 51,094,841 chr11 4,940,386 4,941,077 4,923,545 4,949,682 chr11 55,147,167 55,149,063 55,134,385 55,245,719 chr14 68,010,231 68,011,603 67,992,406 68,020,456 chr14 104,215,047 104,275,522 104,202,520 104,369,924 chr15 18,840,317 18,844,987 18,831,471 18,864,056 chr15 32,437,866 32,525,037 32,401,286 32,556,820

chr20 1,564,704 1,567,374 1,546,392 1,594,647 chr20 14,789,361 14,818,472 14,747,001 14,944,555 chr22 37,615,466 37,624,865 37,593,346 37,639,623 Mutual discoveries between this work and ROMA We looked for all places in which we found a deletion that overlapped with a ROMA CNP and covered at least 20% of the region assigned to the CNP. There were four mutual discoveries: one on chr6, one on chr15, one on chr14 (the immunoglobulin heavy chain locus), and one on chr22 (the immunoglobulin lambda locus). All four mutual discoveries involved common variants (that had been observed multiple times in one or both of the two studies). This work Sebat et al., 2004 Rightmost SNP probe Rightmost probe Chr SNP chr6 78,995,494 79,027,965 78,997,800 79,090,884 chr15 32,437,866 32,525,037 32,410,643 32,581,135 chr14 104,485,754 104,965,621 104,230,277 104,993,730 chr22 21,026,944 21,558,650 21,127,641 21,512,863 Mutual discoveries between this work and BAC array CGH We looked for all places in which we found a deletion variant that covered at least 20% of a BAC probe that had identified an LCV/CNP in the earlier studies. There were three mutual discoveries with Iafrate et al. (2004): one on chr4, one on chr14 (the immunoglobulin heavy chain locus), and one on chrx. All three mutual discoveries involved common variants (that had been observed multiple times in one or both of the two studies). Chr SNP This work Iafrate et al., 2005 Rightmost SNP BAC left end BAC right end chr4 34,677,422 34,724,191 34,674,501 34,823,905 chr14 104,485,754 104,965,621 104,767,866 105,076,137 chrx 91,086,005 91,109,766 90,900,000 91,100,000

There were six mutual discoveries with Sharp et al. (2005), including the immunoglobulin lambda and heavy chain loci: This work Sharp et al., 2005 Chr marker Rightmost marker BAC left end BAC right end chr4 70,447,409 70,542,965 70,432,219 70,591,332 chrx 46,929,298 47,028,433 46,881,874 47,078,955 46,939,097 47,119,352 chr14 104,215,047 104,275,522 104,194,660 104,377,772 chr14 104,485,754 104,965,621 104,413,088 104,573,219 104,580,604 104,731,664 chr15 32,437,866 32,525,037 32,447,228 32,598,686 chr22 21,026,944 21,558,650 21,389,432 21,565,251 Summary of mutual discoveries We shared 28 mutual discoveries with the fosmid-end-sequencing method, 4 with ROMA, and 3 and 6 with the two studies that used BAC array CGH. (35 shared discoveries total, since four loci were discovered in two earlier studies, and one locus was discovered in three earlier studies.) The larger number of mutual discoveries with the fosmid-end-sequencing method almost certainly reflects the sensitivity of that method for detecting variants in an intermediate size range (8+ kb) that overlaps significantly with the size range of the variants identified here. Fewer than 10% (35/540) of the deletion variants identified in the present work are shared with earlier studies. Are most large CNPs and LCVs duplications or deletions? ROMA and BAC array CGH identify sites of copy number variation, or gains and loss of copy number in a proband relative to a reference sample. Because absolute copy number in the reference sample is not known, a copy number loss identified by ROMA or BAC array CGH could represent a deletion variant in the proband, or a multi-copy duplication that is present in more copies in the reference sample than

in the proband. For example, of the 5 overlaps between our discoveries and the Sharp et al. study, two were reported as copy number gains in the earlier study, perhaps reflecting the presence of the deletion variant in the reference sample. Because most of these CNPs and LCVs are quite large (72% of the ROMA CNPs are larger than 100 kb, and the loci underlying the array CGH discoveries are assumed to be sufficiently large to result in a reproducible differential hybridization to a 150 kb BAC probe), more than 85% of them cover an ample number of HapMap SNPs for detecting common deletion variants if they exist at these sites. Yet deletion variants discovered in the present work appeared to explain only 10 of the 300 variants previously discovered by ROMA and BAC array CGH. We found no SNP support for potential deletion variants underneath 95% of the large (100+ kb) copy number polymorphisms identified by ROMA, despite the fact that 90% of these copy number polymorphisms have many SNPs (at least 20) available for detecting such deletion variants. We suggest that these CNPs are therefore likely to represent multicopy duplications. This possibility was suggested in the earlier studies, and is consistent with the observation that selection may be more tolerant of polysomy than of deletion at scales of hundreds of kilobases (Brewer et al., Am. J. Hum. Genet, 64, 1702-1708, 1999; Lindsley et al., Genetics 71, 157-184, 1972). References 1. Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525-8 (2004). 2. Iafrate, A. J. et al. Detection of large-scale variation in the human genome. Nat Genet 36, 949-51 (2004). 3. Sharp, A. J. et al. Segmental duplications and copy-number variation in the human genome. Am J Hum Genet 77, 78-88 (2005). 4. Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat Genet 37, 727-32 (2005).