DNA-seq Bioinformatics Analysis: Copy Number Variation

Similar documents
AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Analysis with SureCall 2.1

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

Chip Seq Peak Calling in Galaxy

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Performance Characteristics BRCA MASTR Plus Dx

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation

Investigating rare diseases with Agilent NGS solutions

Introduction to LOH and Allele Specific Copy Number User Forum

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

Molecular Characterization of Tumors Using Next-Generation Sequencing

Cytogenetics 101: Clinical Research and Molecular Genetic Technologies

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

Genome. Institute. GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon s Cloud. R. Jay Mashl.

Detection of copy number variations in PCR-enriched targeted sequencing data

Golden Helix s End-to-End Solution for Clinical Labs

Please Silence Your Cell Phones. Thank You

Andrew Parrish, Richard Caswell, Garan Jones, Christopher M. Watson, Laura A. Crinnion 3,4, Sian Ellard 1,2

Data mining with Ensembl Biomart. Stéphanie Le Gras

AD (Leave blank) TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients

Assessing Laboratory Performance for Next Generation Sequencing Based Detection of Germline Variants through Proficiency Testing

CNV Detection and Interpretation in Genomic Data

NEXT GENERATION SEQUENCING. R. Piazza (MD, PhD) Dept. of Medicine and Surgery, University of Milano-Bicocca

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

Supplementary Figure 1

Understanding DNA Copy Number Data

NGS in tissue and liquid biopsy

Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

The feasibility of circulating tumour DNA as an alternative to biopsy for mutational characterization in Stage III melanoma patients

Nature Genetics: doi: /ng Supplementary Figure 1. Rates of different mutation types in CRC.

Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre

Recherche de variants génomiques en oncologie clinique. Avec des diapos, données & scripts R de: Yannick Boursin, IGR Bastien Job, IGR

Cancer Gene Panels. Dr. Andreas Scherer. Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer

PSSV User Manual (V1.0)

Identification of genomic alterations in cervical cancer biopsies by exome sequencing

Simple, rapid, and reliable RNA sequencing

CNV detection. Introduction and detection in NGS data. G. Demidov 1,2. NGSchool2016. Centre for Genomic Regulation. CNV detection. G.

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG)

ChIP-seq data analysis

p.r623c p.p976l p.d2847fs p.t2671 p.d2847fs p.r2922w p.r2370h p.c1201y p.a868v p.s952* RING_C BP PHD Cbp HAT_KAT11

Nature Biotechnology: doi: /nbt.1904

Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep

High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Copy Number Varia/on Detec/on. Alex Mawla UCD Genome Center Bioinforma5cs Core Tuesday June 16, 2015

Somatic cancer applications of NGS in in vitro Diagnostics.

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

WHOLE EXOME SEQUENCING PIPELINE EVALUATION AND MUTATION DETECTION IN ESOPHAGEAL CANCER PATIENTS

Targeted qpcr. Debate on PGS Technology: Targeted vs. Whole genome approach. Discolsure Stake shareholder of GENETYX S.R.L

PG-Seq NGS Kit for Preimplantation Genetic Screening

High Throughput Sequence (HTS) data analysis. Lei Zhou

Accel-Amplicon Panels

Tumor mutational burden and its transition towards the clinic

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Supplementary Figure 1. Estimation of tumour content

Below, we included the point-to-point response to the comments of both reviewers.

Calling DNA variants SNVs, CNVs, and SVs. Steve Laurie Variant Effect Predictor Training Course Prague, 6 th November 2017

5 th July 2016 ACGS Dr Michelle Wood Laboratory Genetics, Cardiff

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

Next Generation Sequencing as a tool for breakpoint analysis in rearrangements of the globin-gene clusters

RNA SEQUENCING AND DATA ANALYSIS

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Fluxion Biosciences and Swift Biosciences Somatic variant detection from liquid biopsy samples using targeted NGS

An integrated map of genetic variation from 1092 human genomes

SVIM: Structural variant identification with long reads DAVID HELLER MAX PLANCK INSTITUTE FOR MOLECULAR GENETICS, BERLIN JUNE 2O18, SMRT LEIDEN

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

Figure S4. 15 Mets Whole Exome. 5 Primary Tumors Cancer Panel and WES. Next Generation Sequencing

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Genomic structural variation

Illuminating the genetics of complex human diseases

The Epigenome Tools 2: ChIP-Seq and Data Analysis

Supplementary Note. Nature Genetics: doi: /ng.2928

Identifying Mutations Responsible for Rare Disorders Using New Technologies

Nature Medicine: doi: /nm.4439

Structural Variation and Medical Genomics

Session 4 Rebecca Poulos

underlying metastasis and recurrence in HNSCC, we analyzed two groups of patients. The

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022

No mutations were identified.

White Paper. Copy number variant detection. Sample to Insight. August 19, 2015

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

The Cancer Genome Atlas & International Cancer Genome Consortium

Copy number and somatic mutations drive tumors

Nature Genetics: doi: /ng Supplementary Figure 1. Somatic coding mutations identified by WES/WGS for 83 ATL cases.

Enterprise Interest Thermo Fisher Scientific / Employee

Transcription:

DNA-seq Bioinformatics Analysis: Copy Number Variation Elodie Girard elodie.girard@curie.fr U900 institut Curie, INSERM, Mines ParisTech, PSL Research University Paris, France

NGS Applications 5C HiC DNA-seq Chip-seq RNA-seq ENCODE project 2

Whole Genome or Target DNA sequencing Whole Genome vs Target Sequencing Genome Amplicon Sequencing Sequencing of a dedicated panel of genes/hotspots DNA fragmentation Hybridization PCR amplification Separation Elution Illumina technology (Hiseq, Miseq, NextSeq) Mostly IonTorrent technology (PGM/Proton) 3

Normalized copy number DNA-seq: some applications Detection of: - Single Nucleotide Variations (SNVs) & Insertions/Deletions (Indels) in germline sample in tumoral sample (somatic analysis) 3bp Del SNV - Copy Number Variations (CNV): large duplication or deletion event Tumor/Germline Exomes Germline events among a cohort Distinction between Capture & Amplicon sequencing Genomic position (3-kb window), chr5 4

Variant Calling: germline VS somatic Expectation: Homozygous variant: 0/0 (0% alternative allele) or 1/1 (100% alternative allele) Heterozygous variant: 0/1 (50% reference / 50% alternative) Reality in normal samples: sequencing data = noisy 0/0 : [0% - 25%[ 0/1 : [25%-75%] 1/1 : ]75%-100%] 0/0 0/1 1/1 Allelic ratio (%) Reality in tumor samples: Tumor = mixture of tumor clones & normal cells 0/1 : ]0% - 50%] 1/1 : ]0% - 100%] Normal cells Major clone 5

Normalized copy number DNA-seq: some applications Detection of: - Single Nucleotide Variations (SNVs) & Insertions/Deletions (Indels) in germline sample in tumoral sample (somatic analysis) 3bp Del SNV - Copy Number Variations (CNV): large duplication or deletion event Tumor/Germline Exomes Germline events among a cohort Distinction between Capture & Amplicon sequencing Genomic position (3-kb window), chr5 6

CNV detection methods Different methods: FISH: fluorescence in situ hybridization acgh: array-comparative genomic hybridization SNP array: genome-wide SNP array HTS: High-Throughput sequencing 7

Previous Workflow Reference Genome (Fasta) Reads (Fastq) Quality Control (QC1/QC2) Mapping PCR duplicates Marking --------- MarkDup /!\ Not for small targets / amplicon design /!\ QC3 Aligned and preprocessed reads (BAM) --------- - Marked PCR duplicates - Intersected on target regions - Realigned around indels - Recalibrated Target regions (bed) Target Intersection --------- Intersect Bam [Optional] Preprocess part 1 --------- Local realignment around indels [Optional] Preprocess part 2 --------- Base Quality Score Recalibration 8

Copy Number Variation Germlines Aligned and preprocessed Reads (BAM) Copy Number Variation Tumor Aligned and preprocessed Reads (BAM) Germline Aligned and preprocessed Reads (BAM) Copy Number Alteration 9

Copy Number Variation in Exome-seq Detection of large-scale variation events: amplification, gain, loss, deletion Criteria: (Somatic Germline/Somatic) %GC/Mappability Normalization Ploidy/Cellularity Estimation LOH Detection (Allele specific) Sub-clonal events detection Absence of Control Sample No tool meets all those criteria yet: Sequenza, Titan, facets : %GC Normalization, Allele-Specific (LOH), Cellularity Estimation, sub-clonal events (Facets, Titan) but require a normal sample CopywriteR: use off-target % to estimate CNV without normal samples Contra: Germline event using reference germlines to normalize read depth 10

Metrics Zhao et al., BMC Bioinformatics 2013 11

Normalization Normalization by: GC content Mappability (uniqueness of the region) Matched normal sample B-Allele Frequency Homozygous SNPs: BAF at 0 (AA) or 1 (BB) Heterozygous SNPs: BAF at 0.5 (AB) Allelic imbalance : intermediate values (AAB : 66%/33%) Help assess copy number Allows to determine LOH (Loss-of-heterozygosity) for somatic samples 12

Copy Number Variation & Amplicon-seq [not in Galaxy] CNVpanelizer: germline CNV event using references sample (other germlines) IonCopy: CNA detection without match-normal sample 13

Recurrent event [not in Galaxy] Fragl plot: representation of the frequency of each event among the cohort (red: gain/amplification, green: loss/deletion) 14

Workflow Germline1 Aligned and preprocessed Reads (BAM) Germline N Aligned and preprocessed Reads (BAM) Germline Copy Number Detection [Not in Galaxy ] - Cohort comparison - Litterature Comparison Tumor Aligned and preprocessed Reads (BAM) Germline Aligned and preprocessed Reads (BAM) Somatic Copy Number Detection [Not in Galaxy ] - Cohort comparison - Litterature Comparison 15

Dataset for somatic CNV Public data: Pair of Lung Adenocarcinoma Tumor & Match Germline Paired-end reads of 100bp, Illumina HiSeq2000 Available in EBI-SRA: ERP001071. Corresponding RNA-seq data available (ERP001058, Ju et al., Genome Res, 2012) Use of Sequenza: use of paired tumor-normal DNA sequencing data to estimate tumor cellularity and ploidy & to calculate allele-specific CN profiles Tumor Aligned and preprocessed Reads (BAM) Germline Aligned and preprocessed Reads (BAM) Somatic Copy Number Detection [Not in Galaxy ] - Cohort comparison - Litterature Comparison 16

Import Data 1. Go to http://sigenae-workbench.toulouse.inra.fr/galaxy 2. Go to «Shared Data» in the top menu then «Published Histories» 3. Click on «TP_CNV_SEQUENZA_FILES» then on «Import History» 17

Create Mpileup files In the search bar, find «Mpileup» : Create a per-base description of the alignments Repeat for the normal BAM file. Rename the outputs to identify Tumor from Germline 18

Call CNV using Sequenza In the search bar, find «Sequenza» Use read depth ratio (tumor/normal) & BAF extracted from mpileup files 19

Exome Somatic CNV: Sequenza B-Allele Frequency Depth Ratio Copy Number Allele-Spefic Copy Number 20

Datasets for Germline events Public data: Pair of Tumor & Match Germline + duplicate germline Available in Contra sourceforge repository Use of Contra: comparison of base-level log-ratios calculated from read depth between case and control samples Germline1 Aligned and preprocessed Reads (BAM) Germline N Aligned and preprocessed Reads (BAM) Germline Copy Number Detection [Not in Galaxy ] - Cohort comparison - Litterature Comparison 21

Import Data 1. Go to http://sigenae-workbench.toulouse.inra.fr/galaxy 2. Go to «Shared Data» in the top menu then «Published Histories» 3. Click on «TP_CNV_CONTRA_FILES» then on «Import History» 22

Create baseline from germlines In the search bar, find «Baseline : Control files for Contra» 23

Call CNV using Contra In the search bar, find «Contra Copy Number analysis» Set «bed» & «large deletion» to true in the optional parameters 24