CNV Detection and Interpretation in Genomic Data

Similar documents
Genomic structural variation

CURRENT GENETIC TESTING TOOLS IN NEONATAL MEDICINE. Dr. Bahar Naghavi

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

CNV detection. Introduction and detection in NGS data. G. Demidov 1,2. NGSchool2016. Centre for Genomic Regulation. CNV detection. G.

CHROMOSOMAL MICROARRAY (CGH+SNP)

Structural Variation and Medical Genomics

Cytogenetics 101: Clinical Research and Molecular Genetic Technologies

Applications of Chromosomal Microarray Analysis (CMA) in pre- and postnatal Diagnostic: advantages, limitations and concerns

Approach to Mental Retardation and Developmental Delay. SR Ghaffari MSc MD PhD

SNP Array NOTE: THIS IS A SAMPLE REPORT AND MAY NOT REFLECT ACTUAL PATIENT DATA. FORMAT AND/OR CONTENT MAY BE UPDATED PERIODICALLY.

Global variation in copy number in the human genome

Nature Biotechnology: doi: /nbt.1904

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

Sharan Goobie, MD, MSc, FRCPC

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

What s the Human Genome Project Got to Do with Developmental Disabilities?

Multiple Copy Number Variations in a Patient with Developmental Delay ASCLS- March 31, 2016

SNP Array NOTE: THIS IS A SAMPLE REPORT AND MAY NOT REFLECT ACTUAL PATIENT DATA. FORMAT AND/OR CONTENT MAY BE UPDATED PERIODICALLY.

Introduction to genetic variation. He Zhang Bioinformatics Core Facility 6/22/2016

Practical challenges that copy number variation and whole genome sequencing create for genetic diagnostic labs

Cost effective, computer-aided analytical performance evaluation of chromosomal microarrays for clinical laboratories

DNA-seq Bioinformatics Analysis: Copy Number Variation

22q11.2 DELETION SYNDROME. Anna Mª Cueto González Clinical Geneticist Programa de Medicina Molecular y Genética Hospital Vall d Hebrón (Barcelona)

Investigating rare diseases with Agilent NGS solutions

Association for Molecular Pathology Promoting Clinical Practice, Basic Research, and Education in Molecular Pathology

Chromothripsis: A New Mechanism For Tumorigenesis? i Fellow s Conference Cheryl Carlson 6/10/2011

Prenatal Diagnosis: Are There Microarrays in Your Future?

Clinical Interpretation of Cancer Genomes

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Understanding DNA Copy Number Data

Implementation of the DDD/ClinGen OGT (CytoSure v3) Microarray

Clinical evaluation of microarray data

Identification of regions with common copy-number variations using SNP array

Genetics update and implications for (General) Practice

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF BIOCHEMISTRY AND MOLECULAR BIOLOGY

Molecular and Clinical Characterization of Syndromes Associated With Intellectual Disability

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

Clinical Genomics. Ina E. Amarillo, PhD FACMGG

Population Genetics of Structural Variation Speaker Dr. Don Conrad

Genetic Counselling in relation to genetic testing

Case 1B. 46,XY,-14,+t(14;21)

Chapter 1 : Genetics 101

CYTOGENETICS Dr. Mary Ann Perle

Agilent s Copy Number Variation (CNV) Portfolio

Chromosome Abnormalities

Understanding The Genetics of Diamond Blackfan Anemia

Addressing the challenges of genomic characterization of hematologic malignancies using microarrays

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

CentoXome FUTURE'S KNOWLEDGE APPLIED TODAY

MPS for translocations

Hands-On Ten The BRCA1 Gene and Protein

The role of cytogenomics in the diagnostic work-up of Chronic Lymphocytic Leukaemia

Genetic Testing 101: Interpreting the Chromosomes

CentoXome FUTURE'S KNOWLEDGE APPLIED TODAY

Genetic Testing for the Developmental Delay/Intellectual Disability, Autism Spectrum Disorder, and Congenital Anomalies

Variations in Chromosome Structure & Function. Ch. 8

Canadian College of Medical Geneticists (CCMG) Cytogenetics Examination. May 4, 2010

Genetics and Genomics: Applications to Developmental Disability

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Corporate Medical Policy

Copy Number Varia/on Detec/on. Alex Mawla UCD Genome Center Bioinforma5cs Core Tuesday June 16, 2015

Using Network Flow to Bridge the Gap between Genotype and Phenotype. Teresa Przytycka NIH / NLM / NCBI

Molecular Markers. Marcie Riches, MD, MS Associate Professor University of North Carolina Scientific Director, Infection and Immune Reconstitution WC

Genetic Testing for Single-Gene and Multifactorial Conditions

Genomic Instability. Kent Nastiuk, PhD Dept. Cancer Genetics Roswell Park Cancer Institute. RPN-530 Oncology for Scientist-I October 18, 2016

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

UNIVERSITI TEKNOLOGI MARA COPY NUMBER VARIATIONS OF ORANG ASLI (NEGRITO) FROM PENINSULAR MALAYSIA

GENOMIC SCREENING AND CAUSES OF RARE DISORDERS

SUPPLEMENTARY INFORMATION

Structural Chromosome Aberrations

LTA Analysis of HapMap Genotype Data

Associating Copy Number and SNP Variation with Human Disease. Autism Segmental duplication Neurobehavioral, includes social disability

Rare Variant Burden Tests. Biostatistics 666

Benefits and pitfalls of new genetic tests

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

SEAMLESS CGH DIAGNOSTIC TESTING

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery

Human Genetics 542 Winter 2018 Syllabus

and SNPs: Understanding Human Structural Variation in Disease. My

Analysis with SureCall 2.1

Illuminating the genetics of complex human diseases

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Application of Whole Genome Microarrays in Cancer: You should be doing this test!!

Human Genetics 542 Winter 2017 Syllabus

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

Genetic Testing Methods

Rapid genomic screening of embryos using nanopore sequencing

An International System for Human Cytogenetic Nomenclature (2013)

Molecular Genetics of Paediatric Tumours. Gino Somers MBBS, BMedSci, PhD, FRCPA Pathologist-in-Chief Hospital for Sick Children, Toronto, ON, CANADA

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

RACP Congress 2017 Genetics of Intellectual Disability and Autism: Past Present and Future 9 th May 2017

Mutations Quick Questions and Notes (#1) QQ#1: What do you know about mutations?

Identifying Mutations Responsible for Rare Disorders Using New Technologies

On Missing Data and Genotyping Errors in Association Studies

Template for Reporting Results of Monitoring Tests for Patients With Chronic Myelogenous Leukemia (BCR-ABL1+)

Computational Systems Biology: Biology X

PROVIDER POLICIES & PROCEDURES

Epilepsy Genetics Service Kings College Hospital & St Thomas s Hospital, London

Transcription:

CNV Detection and Interpretation in Genomic Data Benjamin W. Darbro, M.D., Ph.D. Assistant Professor of Pediatrics Director of the Shivanand R. Patil Cytogenetics and Molecular Laboratory

Overview What are CNVs and why are they important? How can you find CNVs in your genomic datasets? How can you interpret the significance of CNVs discovered in your genomic datasets?

DNA Variations Single nucleotide variations (SNVs) and polymorphisms (SNPs) Insertions/Deletions (Indels) Structural Variations Short and Variable Repeats Copy-Number Variations (CNVs) Duplications (insertions) Deletions Levy et al. Nature. 2008. Vol. 456: 49.

Helpful Clarifications Indel: Gain or loss of segment of DNA typically of a small scale (<1kb) CNV: Segment of DNA with different copy numbers acorss individuals (>1kb classically but now >500bp) CNP: Copy number polymorphism is a CNV shared by >1% of a population CNA: Copy number aberration is what CNVs in cancer studies are often called including SCNA (somatic copy number aberration) Segmental Duplication (SegDup): Segment of DNA >1kb that occurs in two or more copies in a haploid genome (share > 90% sequence identity)

Segmental Duplications (SDs) Low copy repeat (LCR) regions (2-50 copies) Approximately 5-6% of the human genome Common in pericentromeric/subtelomeric regions (majority are interstitial) Majority are intrachromosomal (many interchromosomal) Can mediate the creation of additional CNVs

How CNVs Are Formed Nonallelic homologous recombination (NAHR) Unequal crossing over (LCR) Reciprocal deletion and duplication Deletion only Translocations and inversions Nonallelic homologous endjoining (NHEJ) Double stranded DNA break and subsequent repair pathways Malhotra et al. Cell. 2012. Vol. 148: 1223. Diverse repair products possible

How CNVs Are Formed Fork stalling and template switching (FoSTeS) Nick causes fork collapse 3 overhang invades leading strand guided by microhomology LINE1 Retrotransposition (not shown): Only active class of retrotransposons in humans and responsible for ~90% of small insertions in the human genome Malhotra et al. Cell. 2012. Vol. 148: 1223.

How Prevalent are CNVs? HapMap project suggests that ~12% of the human genome is copy number variable CNVs cover ~360Mb of the human genome (far more nucleotide content per genome than SNPs) Disease causing CNVs are typically rare, highly penetrant lesions (often de novo) The Copy Number Variation Project (http://www.sanger.ac.uk/humgen/cnv/gfx/chrplots_redyelgreenlabel.jpg)

Well Known Disease CNVs Deletion of 22q in DiGeorge/Velocardiofacial Syndrome Trisomy and monosomy (numerous examples) HER-2 amplification in breast cancer N-myc amplification in neuroblastoma Deletion of 13q14.3 in favorable prognosis CLL

CNVs and Disease First Line Clinical Diagnostic Test Non-syndromic intellectual disability Multiple congenital anomalies Autism spectrum disorders Other Conditions in which CNVs are Implicated Congenital cardiac disease Epilepsy Schizophrenia Spina bifida, BOR, CLP, obesity, and a variety of different cancers Fakhro et al. PNAS. 2011. Vol. 108: 2915.

What Do They Do? Alter chromosomal structure and gene expression through many different mechanisms Changing gene dosage Influence gene expression through positional effects (loss or gain of regulatory sequence) Create gene fusion events Predispose to deleterious genetic changes (structural variations)

Haploinsufficiency and Disease Virtually any disease caused by haploinsufficiency of a gene could theoretically be caused by a deletion CNV (duplication and over-expression is less clear cut)

Overview What are CNVs and why are they important? How can you find CNVs in your genomic datasets? How can you interpret the significance of CNVs discovered in your genomic datasets?

How do (did?) we currently detect these aberrations?

Karyotype and FISH Karyotype: Full genome but low resolution FISH: Improved resolution but locus specific Photos courtesy of the Shivanand R. Patil Cytogenetics and Molecular Laboratory

CNV Detection in Genomic Data Chromosomal Microarrays Comparative genomic hybridization (CGH) microarrays Single nucleotide polymorphism (SNP) microarrays Combination of CGH+SNP arrays High Throughput Sequencing Custom targeted capture Whole exome sequencing Whole genome sequencing Photos courtesy of the Shivanand R. Patil Cytogenetics and Molecular Laboratory

What is the difference between these techniques when it comes to CNV detection? Chromosomal Microarray Chromosome Analysis (Karyotype) Massively Parallel Sequencing Fluorescence In Situ Hybridization

What is the difference between these techniques when it comes to CNV detection? Throughput Resolution Precision Target Space

Comparing Genomes Like MegaTouch s Photo Hunt

Comparing Genomes Patient Genome Reference Genome Comparing patient to reference genome and finding gains/losses

Comparing Genomes Patient Genome Reference Genome Several deletions and duplications present in patient sample

Method Comparisons Patient Genome Reference Genome Near Perfect Resolution: Entire genome, every base pair seen

Method Comparisons Patient Genome Reference Genome Resolution: ~4-10Mb Conventional Cytogenetic Karyotyping

Method Comparisons Patient Genome Reference Genome Resolution: ~50-100kb Fluorescence In Situ Hybridization (FISH)

Method Comparisons Patient Genome Reference Genome Resolution: ~10kb Chromosomal Microarray Resolution

Method Comparisons Patient Genome Reference Genome Resolution: ~1bp Resolution: ~1bp (but incomplete) Whole Genome Sequencing

CNV Detection Algorithms Chromosomal Microarray VS. Massively Parallel Sequencing (See the last several slides of this presentation) (See the next several slides of this presentation)

Detecting CNVs in MPS Data Paired-End Split-Read Read-Depth Sequence Assembly Combination http://bioinfo.mc.vanderbilt.edu/cnvannotator/links.cgi

Detecting CNVs in MPS Data Paired-End No method alone is capable Split-Read of detecting all types of Read-Depth CNVs Sequence Assembly The 1000 Genomes Project used 19 algorithms to independently identify CNVs in Combination 185 human genomes http://bioinfo.mc.vanderbilt.edu/cnvannotator/links.cgi

Advantages & Disadvantages Paired-End Pros Precise Balanced (Inversions) Low Coverage Cons Bad for Targeted Projects Resolution ~ Insert Dependent Split-Read Read-Depth Sequence Assembly Good Breakpoint Resolution Most Like Array Analysis Any Unique Sequence Good for Custom Targets Not Insert Dependent Find Novel Insertions Lose Precision in Targeted Projects Bias (Normalize or Control Sample(s)) Only Imbalanced Need High Coverage Intensive for Large Genomes (Humans) Bad with Repeats

Which Algorithm Do I Choose?

Which Algorithm Do I Choose? Depends on Your Data!! and a few other things.

Which Algorithm Do I Choose? Depends on Your Data!! and a few other things. Here is the List

PEM, SR, AS Based Tools http://bioinfo.mc.vanderbilt.edu/cnvannotator/links.cgi

RD Based Tools for WGS http://bioinfo.mc.vanderbilt.edu/cnvannotator/links.cgi

Tools for Exome Sequencing http://bioinfo.mc.vanderbilt.edu/cnvannotator/links.cgi

Combination Tools Which One Should I Use!? http://bioinfo.mc.vanderbilt.edu/cnvannotator/links.cgi

Ask Yourself These Questions What did I sequence? Whole genome, exome, custom Human, mouse, fly, plant, other (genome size) How much data do I have? Just a few individuals or many Several tools require several samples for effective normalization What type of data do I have? Cases only? Cases and Controls? Tumor Normal Pairs?

Next Consider You will want to select more than one method or a combination algorithm You will want to test the algorithms yourself to ensure they are working correctly Is it easy to implement? Sample data? Does it run effective on your computer/cluster? Trio inheritance (median transmission rate from parent to child converges to 50%) How much of your data does it throw out? Does it calculate QC metrics and/or provide filtering recommendations? Biologic criteria (previous publications)

Calling CNVs Tailor to the Array FASST, RANK, SegMNT segmentation algorithms used to create CNV sets Chose those CNVRs that overlapped across all three calling algorithms Deletions > Duplications Brophy et. al. Hum Genet. 2013. July 13. [Epub ahead of print] Even distribution of unique vs. poly CNVRs Appropriate gene/exon content of CNVRs

Overview What are CNVs and why are they important? How can you find CNVs in your genomic datasets? How can you interpret the significance of CNVs discovered in your genomic datasets?

Interpretation of CNVs Who are these gentlemen?? http://dailyheadlines.uark.edu/ http://content.usatoday.com/topics/photo

Interpretation of CNVs James Watson Craig Venter http://dailyheadlines.uark.edu/ http://content.usatoday.com/topics/photo

Interpretation of CNVs James Watson Craig Venter First sequenced diploid genomes

Interpretation of CNVs 23 CNVs 26kb-1.6Mb 9 duplications 14 deletions 62 CNVs 30 duplications 32 deletions

How Prevalent are CNVs? HapMap project suggests that ~12% of the human genome is copy number variable CNVs cover ~360Mb of the human genome (far more nucleotide content per genome than SNPs) Disease causing CNVs are typically rare, highly penetrant lesions (often de novo) The Copy Number Variation Project (http://www.sanger.ac.uk/humgen/cnv/gfx/chrplots_redyelgreenlabel.jpg)

What is a Benign CNV? To determine which CNVs may be contributing to disease it is important to ascertain which CNVs are unlikely to cause disease Determine which CNVs are present in normal or more accurately control populations Presumably do not have significant congenital disease Very common variation is probably safe to rule out but what about rare variation?

What is a Benign CNV? To determine which CNVs may be contributing to disease it is important to ascertain which CNVs are unlikely to cause disease Conrad et. al. 2010 Cooper et. al. 2011 Abnormal UIHC Cytogenetics Local Database Cooper et. al. Nat Genet. 2011. Vol. 43: 838. Conrad et. al. Nature. 2010. Vol. 464: 704.

Conrad et. al. 2010 Rules Any VALIDATED CNV in the same direction (gain or loss) as well as ALL autosomal SegDups Cooper et. al. 2011 Abnormal >= 1% + 3 independent events OR >=0.5% and 5 independent events in the same direction (>30 controls, array studies only) >= 12 control samples in non-significant genes in the same direction >= 8 Benign or VUS, LB Interpretations in the same direction >= 2% in the same direction UIHC Cytogenetics Local Database

Conrad et. al. 2009 CNV Prioritization Benign (~98%) Cooper et. al. 2011 Abnormal Involve > 0 RefSeq Genes Diagnostic Known Pathogenic or Susceptibility Locus Cytogenetics Variant of Unclear Significance >1Mb Abnormal Brophy et. al. Hum Genet. 2013. July 13. [Epub ahead of print]

Classification as Benign Typical overlap with a benign CNV necessary to consider a detected CNV as benign varies >50% overlap commonly used >90% is stringent (clinical use) Best Rules to Use? Still an area of active investigation and development Collaborative project between us and the Bioinformatics Core Coming soon to the Galaxy Tool Shed

It s Not Benign Now What? Is it a known pathogenic or disease susceptibility CNV? Are there others like it in clinical databases? If so, are the phenotypes similar? What is the gene content of the CNV? Known disease genes (OMIM Morbid Map) or genomic loci? Inheritance and CNV copy number? Candidate genes? Interaction with known disease gene product? Predicted haploinsufficient genes? Parent testing to determine inheritance (inherited or de novo)?

Haploinsufficiency and Disease Virtually any disease caused by haploinsufficiency of a gene could theoretically be caused by a deletion CNV (duplication and over-expression is less clear cut) What if you find a small deletion CNV involving a potential disease gene? Predict haploinsufficiency Huang et. al. PLOS Genetics. 2010. 6(10): e1001154 http://www.plosgenetics.org/article/fetchsinglerepresentation.act ion?uri=info:doi/10.1371/journal.pgen.1001154.s001

Overview What are CNVs and why are they important? How can you find CNVs in your genomic datasets? How can you interpret the significance of CNVs discovered in your genomic datasets?

Acknowledgements John Manak, Ph.D. Shivanand R. Patil Cytogenetics and Molecular Laboratory Pat Brophy, M.D. Richard Smith, M.D. Members of the Manak, Brophy and Smith Laboratories Agshin Taghiyev, Ph.D.

The End

CMA CNV Detection Algorithms Table modified from Valsesia, et. al. Front Genet. 2013. May 30; 4:92. Multiple algorithms for detecting CNVs in microarray data Many based off Hidden Markov Models CBS one of the earliest and most heavily used ASCAT useful for determining tumor purity also

CMA CNV Detection Algorithms Additional algorithms that can be used with microarray data to detect CNVs Reference below provides a comparison of the different algorithms and array types Table modified from Pinto, et. al. Nat Biotechnol. 2011. May 8; 29:512.

Hidden Markov Model Goal of the HMM is to find the best state path using the symbols (ACGT), states (E, 5, I), emission probabilities (top), and transition probabilities (arrows) defined for a specific problem Eddy, et. al. Nat Biotechnol. 2004. Vol. 22:1315.