Comprehensive Genome and Transcriptome Structural Analysis of a Breast Cancer Cell Line using PacBio Long Read Sequencing

Similar documents
SVIM: Structural variant identification with long reads DAVID HELLER MAX PLANCK INSTITUTE FOR MOLECULAR GENETICS, BERLIN JUNE 2O18, SMRT LEIDEN

Reducing INDEL calling errors in whole genome and exome sequencing data.

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

Ambient temperature regulated flowering time

Illuminating the genetics of complex human diseases

Algorithms for studying the structure and function of genomes

Phylogenomics. Antonis Rokas Department of Biological Sciences Vanderbilt University.

SUPPLEMENTARY INFORMATION

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Proteogenomic analysis of alternative splicing: the search for novel biomarkers for colorectal cancer Gosia Komor

Interactive analysis and quality assessment of single-cell copy-number variations

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Ginkgo Interactive analysis and quality assessment of single-cell CNV data

RNA SEQUENCING AND DATA ANALYSIS

TOWARDS ACCURATE GERMLINE AND SOMATIC INDEL DISCOVERY WITH MICRO-ASSEMBLY. Giuseppe Narzisi, PhD Bioinformatics Scientist

Binding Calculator Parameters

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Global variation in copy number in the human genome

Transcript reconstruction

Structural Variation and Medical Genomics

NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

Review: Genome assembly Reads

Nature Biotechnology: doi: /nbt.1904

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Transcriptome Analysis

Measuring DNA Methylation with the MinION. Winston Timp Department of Biomedical Engineering Johns Hopkins University 12/1/16

SCALPEL MICRO-ASSEMBLY APPROACH TO DETECT INDELS WITHIN EXOME-CAPTURE DATA. Giuseppe Narzisi, PhD Schatz Lab

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

VirusDetect pipeline - virus detection with small RNA sequencing

Victor Guryev. European Research Institute for the Biology of Ageing

Fluxion Biosciences and Swift Biosciences Somatic variant detection from liquid biopsy samples using targeted NGS

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

Sharan Goobie, MD, MSc, FRCPC

Trinity: Transcriptome Assembly for Genetic and Functional Analysis of Cancer [U24]

Bigomics : Challenges and promises in large scale sequencing projects

CNV detection. Introduction and detection in NGS data. G. Demidov 1,2. NGSchool2016. Centre for Genomic Regulation. CNV detection. G.

Copy Number Varia/on Detec/on. Alex Mawla UCD Genome Center Bioinforma5cs Core Tuesday June 16, 2015

Role of FISH in Hematological Cancers

Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data

Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep

Genomic structural variation

Calling DNA variants SNVs, CNVs, and SVs. Steve Laurie Variant Effect Predictor Training Course Prague, 6 th November 2017

Shape-based retrieval of CNV regions in read coverage data. Sangkyun Hong and Jeehee Yoon*

Introduction to Genetics

Data mining with Ensembl Biomart. Stéphanie Le Gras

Deep-Sequencing of HIV-1

High coverage in planta RNA sequencing identifies Fusarium oxysporum effectors and Medicago truncatularesistancemechanisms

NGS in tissue and liquid biopsy

November 9, Johns Hopkins School of Medicine, Baltimore, MD,

DNA-seq Bioinformatics Analysis: Copy Number Variation

NEXT GENERATION SEQUENCING. R. Piazza (MD, PhD) Dept. of Medicine and Surgery, University of Milano-Bicocca

Hands-On Ten The BRCA1 Gene and Protein

Polyomaviridae. Spring

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Algorithms for de novo genome assembly and disease analytics

PG-Seq NGS Kit for Preimplantation Genetic Screening

Analyse de données de séquençage haut débit

Carcinome du sein Biologie moléculaire. Thomas McKee Service de Pathologie Clinique Genève

Next generation sequencing of the Salix purpurea genome and transcriptome: Tools for the genetic improvement of willow biomass crops

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

RNA-Seq guided gene therapy for vision loss. Michael H. Farkas

MYC Translocations In Multiple Myeloma Involve Recruitment Of Enhancer Elements Resulting In Over- Expression and Decreased Overall Survival

Lectures 13: High throughput sequencing: Beyond the genome. Spring 2017 March 28, 2017

How to Standardise and Assemble Raw Data into Sequences: What Does it Mean for a Laboratory to Use Such Technologies?"

Linked read sequencing resolves complex genomic rearrangements in gastric cancer metastases

Daehwan Kim September 2018

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

Supplementary Figure 1. Schematic diagram of o2n-seq. Double-stranded DNA was sheared, end-repaired, and underwent A-tailing by standard protocols.

Detection of low-frequent mitochondrial DNA variants using SMRT sequencing

Next Generation Sequencing as a tool for breakpoint analysis in rearrangements of the globin-gene clusters

Molecular Markers. Marcie Riches, MD, MS Associate Professor University of North Carolina Scientific Director, Infection and Immune Reconstitution WC

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

The next 10 years of quantitative biology

Nature Medicine: doi: /nm.4439

Gene duplication and loss Part II

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Assignment 5: Integrative epigenomics analysis

The mutations that drive cancer. Paul Edwards. Department of Pathology and Cancer Research UK Cambridge Institute, University of Cambridge

Supplemental Figure 1. Genes showing ectopic H3K9 dimethylation in this study are DNA hypermethylated in Lister et al. study.

Results. Abstract. Introduc4on. Conclusions. Methods. Funding

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Palindromic amplification of the ERBB2 oncogene in human primary breast tumors. A common pattern of copy number transition of chromosome 17 in HER2-

High Throughput Sequence (HTS) data analysis. Lei Zhou

Chapter 4 Cellular Oncogenes ~ 4.6 -

Supplementary Tables. Supplementary Figures

Objectives. Morphology and IHC. Flow and Cyto FISH. Testing for Heme Malignancies 3/20/2013

Accel-Amplicon Panels

SNP Array NOTE: THIS IS A SAMPLE REPORT AND MAY NOT REFLECT ACTUAL PATIENT DATA. FORMAT AND/OR CONTENT MAY BE UPDATED PERIODICALLY.

Big Data Meets DNA How Biological Data Science is improving our health, foods, and energy needs

Genomic Medicine: What every pathologist needs to know

Supplemental Figure 1. Small RNA size distribution from different soybean tissues.

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

Cytogenetics 101: Clinical Research and Molecular Genetic Technologies

Transcription:

Comprehensive Genome and Transcriptome Structural Analysis of a Breast Cancer Cell Line using PacBio Long Read Sequencing Maria Nattestad Schatz + McCombie + Hicks at Cold Spring Harbor Laboratory McPherson + Beck at the Ontario Institute for Cancer Research Pacific Biosciences DNAnexus

SK- BR- 3 Most commonly used Her2- amplified breast cancer cell line 80 chromosomes instead of 46 OIen used for pre- clinical research on Her2- targekng therapeukcs such as HercepKn (Trastuzumab) and resistance to these therapies. (Davidson et al, 2000)

PacBio long- read DNA sequencing mean read length: 9 kb max read length: 71 kb 72X coverage coverage Genome- wide coverage averages around 54X Coverage per chromosome varies greatly as expected from previous karyotyping results

Genome structural analysis Assembly- based Alignment- based Assembly with Falcon on DNAnexus Alignment with BWA- MEM Alignment with MUMmer Copy number analysis SV- calling from split reads with Sniffles Call variants between consecukve alignments with ABVC Call variants within alignments with ABVC ~ 11,000 local variants 50 bp < size < 10 kbp ValidaKons 661 long- range variants (>10kb distance) SplitThreader Detailed analysis of Her2 amplificakons

Genome structural analysis Assembly- based Alignment- based Assembly with Falcon on DNAnexus Alignment with BWA- MEM Alignment with MUMmer Copy number analysis SV- calling from split reads with Sniffles Call variants between consecukve alignments with ABVC Call variants within alignments with ABVC ~ 11,000 local variants 50 bp < size < 10 kbp ValidaKons 661 long- range variants (>10kb distance) SplitThreader Detailed analysis of Her2 amplificakons

Assembly using PacBio yields far befer conkguity PacBio + Falcon Number of sequences: 13,532 Total sequence length: 2.97Gb Mean: 266 kb Max: 19.9 Mb N50: 2.46 Mb Illumina + Allpaths- LG RelaKve to a genome size of 3 Gb Number of sequences: 748,955 Total sequence length: 2.07 Gb Mean: 2.8 kb Max: 61 kb N50: 3.3 kb

ABVC: Assembly- Based Variant- Caller Defined point InserKon DeleKon reference reference conkg conkg Overlapping alignments suggest tandem repeat Tandem Expansions reference conkg Tandem ContracKons reference conkg Gap where sequences do not align uniquely suggests a repeat Repeat Expansions reference conkg Repeat ContracKons reference conkg

~ 11,000 local variants 50 bp < size < 10 kbp

Repeat ContracKons reference conkg Repeat Expansions reference conkg BLASTed 515 inserkons: 427 (83%) of them matched Alu elements

Genome structural analysis Assembly- based Alignment- based Assembly with Falcon on DNAnexus Alignment with BWA- MEM Alignment with MUMmer Copy number analysis SV- calling from split reads with Sniffles Call variants between consecukve alignments with ABVC Call variants within alignments with ABVC ~ 11,000 local variants 50 bp < size < 10 kbp ValidaKons 661 long- range variants (>10kb distance) SplitThreader Detailed analysis of Her2 amplificakons

Split- read variant calling with Sniffles to capture the long- range variants chromosome 2 chromosome 1 See Fritz at Poster 183

Long- range structural variants found by Sniffles Her2 oncogene 661 long- range variants (>10kb distance)

Her2 Chr 17: 83 Mb 8 Mb

Her2 Her2

8 Mb Chromosome 17 Her2 GSDMB RARA PKIA TATDN1

SplitThreader: Graphical threading to retrace complex history of rearrangements in cancer genomes 20 CHR 1 CHR 2 ATCGCCTA GTCCATAG 80 CHR 1 ATCG 20 CCGA 100 CHR 2 GTCC 80 100 ATAG

Her2 GSDMB RARA Chr 17 Chr 8 1. Healthy chromosome 17 2. TranslocaKon into chromosome 8 3. TranslocaKon within chromosome 8 4. Complex variant and inverted duplicakon within chromosome 8 5. TranslocaKon within chromosome 8 PKIA TATDN1

Transcriptome analysis with IsoSeq: Long- read RNA sequencing Full- length transcripts Found 17 gene fusions with both DNA and RNA evidence 13 seen in previous RNA- seq literature 4 novel fusions 2 previously observed fusions had RNA evidence but no direct link in the DNA Confirmed using SplitThreader

CYTH1- EIF3H gene fusion in the SplitThreader graph CYTH1 EIF3H

CYTH1- EIF3H gene fusion in the SplitThreader graph CYTH1 EIF3H CYTH1 EIF3H

Frankensteining the CYTH1- EIF3H gene fusion CYTH1 EIF3H

CYTH1- EIF3H gene fusion Chr 17 CYTH1 CYTH1 EIF3H? 36 27 30 IsoSeq reads 3.7 Mb Chr 8 EIF3H 8 kb

The genome informs the transcriptome Explain amplificakons Trace gene fusions More genomes coming soon! Data and addikonal results: hfp://schatzlab.cshl.edu/data/skbr3/

Acknowledgments Sara Goodwin Timour Baslan Fritz Sedlazeck Tyler Garvin Han Fang James Gurtowski Philipp Rescheneder Elizabeth Hufon Marley Alford Melissa Kramer Eric Antoniou James Hicks Michael Schatz W. Richard McCombie Karen Ng Timothy Beck Yogi Sundaravadanam John McPherson Elizabeth Tseng Jason Chin