Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Similar documents
genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Ambient temperature regulated flowering time

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery

Transcriptome Analysis

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Genomic structural variation

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Introduction to Systems Biology of Cancer Lecture 2

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Analyse de données de séquençage haut débit

NEXT GENERATION SEQUENCING. R. Piazza (MD, PhD) Dept. of Medicine and Surgery, University of Milano-Bicocca

ChIP-seq data analysis

Performance Characteristics BRCA MASTR Plus Dx

RNA SEQUENCING AND DATA ANALYSIS

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Transcript reconstruction

RNA-seq Introduction

RNA SEQUENCING AND DATA ANALYSIS

Profiles of gene expression & diagnosis/prognosis of cancer. MCs in Advanced Genetics Ainoa Planas Riverola

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Disclosure. Summary. Circulating DNA and NGS technology 3/27/2017. Disclosure of Relevant Financial Relationships. JS Reis-Filho, MD, PhD, FRCPath

Hands-On Ten The BRCA1 Gene and Protein

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR)

Results. Abstract. Introduc4on. Conclusions. Methods. Funding

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Simple, rapid, and reliable RNA sequencing

EPIGENOMICS PROFILING SERVICES

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

Molecular Markers. Marcie Riches, MD, MS Associate Professor University of North Carolina Scientific Director, Infection and Immune Reconstitution WC

MicroRNA expression profiling and functional analysis in prostate cancer. Marco Folini s.c. Ricerca Traslazionale DOSL

fl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b)

cis-regulatory enrichment analysis in human, mouse and fly

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Supplemental Information For: The genetics of splicing in neuroblastoma

Reporting TP53 gene analysis results in CLL

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

NGS in tissue and liquid biopsy

Introduction to Genetics

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Session 4 Rebecca Poulos

Deploying the full transcriptome using RNA sequencing. Jo Vandesompele, CSO and co-founder The Non-Coding Genome May 12, 2016, Leuven

Structural Variation and Medical Genomics

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

Research Strategy: 1. Background and Significance

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

Summary... 2 TRANSLATIONAL RESEARCH Tumour gene expression used to direct clinical decision-making for patients with advanced cancers...

Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

NGS for Cancer Predisposition

Inference of Isoforms from Short Sequence Reads

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

Supplementary Figures

Fluxion Biosciences and Swift Biosciences Somatic variant detection from liquid biopsy samples using targeted NGS

Circular RNAs (circrnas) act a stable mirna sponges

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

RNA- seq Introduc1on. Promises and pi7alls

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

EXPression ANalyzer and DisplayER

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

Viral genome sequencing: applications to clinical management and public health. Professor Judy Breuer

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Supplementary Information Titles Journal: Nature Medicine

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

Methods: Biological Data

Session 4 Rebecca Poulos

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Genome-wide Association Studies (GWAS) Pasieka, Science Photo Library

Genomic Medicine: What every pathologist needs to know

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Biomarker development in the era of precision medicine. Bei Li, Interdisciplinary Technical Journal Club

DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING. John Archer. Faculty of Life Sciences University of Manchester

Supplementary Figure 1. Schematic diagram of o2n-seq. Double-stranded DNA was sheared, end-repaired, and underwent A-tailing by standard protocols.

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

Transcriptome and isoform reconstruc1on with short reads. Tangled up in reads

Accel-Amplicon Panels

TITLE: Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer.

Introduction. Introduction

MutationTaster & RegulationSpotter

New Drug development and Personalized Therapy in The Era of Molecular Medicine

RNA-Seq Atlas of Glycine max: A guide to the Soybean Transcriptome

ABS04. ~ Inaugural Applied Bayesian Statistics School EXPRESSION

EXAMPLE. - Potentially responsive to PI3K/mTOR and MEK combination therapy or mtor/mek and PKC combination therapy. ratio (%)

Section D: The Molecular Biology of Cancer

Analysis with SureCall 2.1

Golden Helix s End-to-End Solution for Clinical Labs

Nature Genetics: doi: /ng Supplementary Figure 1. Somatic coding mutations identified by WES/WGS for 83 ATL cases.

DNA-seq Bioinformatics Analysis: Copy Number Variation

Transcription:

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1

To Cancer Genetics Studies Introduction Next Generation Sequencing (NGS) on Illumina platform is suitable for clinical applications that require large amounts of information, accurate quantification and high-sensitivity detection Mutation detection in tumours (from biopsies / circulating tumour cells (CTC)). Pathogen detection e.g. organism identification for epidemiological investigations Gut microbial flora genomics Detection of the presence of antibiotic resistance genes Comparison of novel sequences / genes to those in public databases 2

Applications of NGS to Cancer Genetics Some Commonly Applied Techniques Sequencing The Genome Reference alignment, targeted resequencing for polymorphism and mutation discovery De novo assembly for characterisation of novel genes, genomes. Paired-end sequencing highlights larger structural variants (inherited/acquired) Sequencing The Transcriptome RNA-Seq allows absolute quantification of gene expression across transcriptome No prior knowledge of content needed quantify expression of unknown genes Profiling of mrna, ncrna, mirna Sequencing The Cistrome ChIP-Seq allows profiling of cis-acting targets (DNA binding sites) of a trans-acting factor (transcription factor, restriction enzyme, etc) on a genome scale. Determine how proteins interact with DNA to regulate gene expression Determine how TFs and other proteins influence phenotype-affecting mechanisms SImilar approach can be used to characterise genomic methylation patterns the methylome 3

Applications of NGS to Cancer Genetics Levels of information extraction, data integration Variant Detection RNA-Seq Quantification RNA-Seq Discovery ChIP-Seq Integrate Associate observed variants with regulation/transcriptional changes; link to external databases Analyse Identify Process Generate Overlapping Genes Differential Expression Novel Isoforms Associated Genes Variant Detection Expression levels Novel gene models Motif finding Targeted Resequencing Density on known exons Novel Transcripts Binding Sources Consensus Sequence Identify splice-crossing reads Enriched regions De novo assembly / reads mapped to (un) annotated reference sequence 10 8-10 9 short DNA fragments Level of Information Extraction 4

Human Resequencing and Variant Detection Reference Assembly, Targeted Resequencing And Variant Detection Search for alterations at nucleotide level to explain changes in regulation/transcription Single Ended (SE) sequencing ~85% of complex genome accessible suitable for SNPs, small indels (DIPs) Paired-Ended (PE) sequencing ~99% of complex genome accessible Find longer DIPs Find larger structural variations Span repeat regions 5

Human Resequencing and Variant Detection 2009 Nature Paper Cytogenetically normal AML genome sequenced (32x) Comparison with matched normal tissue (14x) 98 full runs on Illumina GA to achieve required depth Alignment, variant discovery performed by MAQ 97.7% of variants in AML genome also in normal Further restricted to annotated gene-coding regions Across all tumour cells: found 10 genes with acquired mutations (8 novel) present in all cells at presentation and relapse Our study establishes whole genome sequencing as an unbiased method for discovering initiating mutations in cancer genomes, and for identifying novel genes that may respond to targeted therapies 6

Polymorphism detections within P53 P53 Variant detection study Guardian of the genome (Lane, 1992) Protects fidelity of DNA replication Directs cell arrest/apoptosis when stressed Mutated in more than half of human cancers http://p53.free.fr/ 7

Polymorphism detections within P53 P53 Variant detection study Guardian of the genome (Lane, 1992) Protects fidelity of DNA replication Directs cell arrest/apoptosis when stressed 17p13.1 Mutated in more than half of human cancers Human TP53 gene located on 17p13.1 Region sometimes deleted in human cancer 8

Polymorphism detections within P53 P53 Variant detection study Guardian of the genome (Lane, 1992) Protects fidelity of DNA replication Directs cell arrest/apoptosis when stressed 17p13.1 Mutated in more than half of human cancers Human TP53 gene located on 17p13.1 Region sometimes deleted in human cancer PCR amplification Study Search for variants on P53 gene in matched tumour samples. Use gene specific PCR to amplify exons only to maximise depth of coverage 9

Polymorphism detections within P53 P53 Variant detection study Guardian of the genome (Lane, 1992) Protects fidelity of DNA replication Directs cell arrest/apoptosis when stressed 35000 30000 Coverage of p53 gene Mutated in more than half of human cancers Human TP53 gene located on 17p13.1 Region sometimes deleted in human cancer Study Search for variants on P53 gene in matched tumour samples. Use gene specific PCR to amplify exons only to maximise depth of coverage Coverage per base position 25000 20000 15000 10000 5000 Use MAQ for alignment, variant discovery against P53 reference gene 12000 13000 14000 15000 16000 17000 18000 19000 Gene position Comparison with results from 454, Sanger 10

Polymorphism detections within BRCA1 BRCA1 Variant detection study Human tumour suppressor gene Primarily expressed in breast tissue Helps repair damaged DNA (if possible) Mutations to BRCA1 allow uncontrolled replication of damaged cells. 11

Polymorphism detections within BRCA1 BRCA1 Variant detection study Human tumour suppressor gene Primarily expressed in breast tissue Helps repair damaged DNA (if possible) CASAVA Demultiplex (11 samples) Map reads to ref (BRCA1) Mutations to BRCA1 allow uncontrolled replication of damaged cells. Pilot Study Search for variants on BRCA1 gene Use gene specific PCR to amplify exons only to maximise depth of coverage Multiplexed 11 samples loaded into one lane Use CASAVA for de-multiplexing, alignment Use SAMtools for consensus/indel calling, filtering Validation of results against known variants. SAMTools Conversion to SAM format Conversion to Pileup format Consensus/Indel Calling Filter for variants Comparison with Known variants 12

RNA-Seq: Transcriptome Analysis RNA-Seq Sequence RNA (translated to cdna) Mapped to annotated reference genome (annotated genes, known variants) Expression levels deduced from total number of reads that map to exons of a gene. RNA-Seq versus Microarray More sensitive to low-abundance transcripts absolute gene expression levels detectable can detect single molecules no prior knowledge required of content Greater ability to distinguish isoforms Ability to determine allelic expression Less biased 13

RNA-Seq: Transcriptome Analysis RNA-Seq Study of ovarian cancer cell lines Identification of changes in gene expression in strains with acquired drug-resistance Special interest in ncrna expression data Use Bowtie and Tophat to map reads, identify splice sites Use Cufflinks to assemble transcripts, calculate abundances ~87% of reads mapped to genome Use DESeq to perform differential expression tests Use DAVID (Database for Annotation, Visualisation and Integrated Discovery (http://david.abcc.ncifcrf.gov/)) for pathway analysis Found significant representation of cancer pathways and focal adhesion genes BOWTIE Maps reads to reference genome (hg19) TOPHAT Identifies splice sites (known/novel) CUFFLINKS Transcript Assembly, Quantification DESeq Differential Gene Expression of RNA-Seq data DAVID Pathways Analysis of deregulated genes http://david.abcc.ncifcrf.gov/ 14

ChIP-Seq: Genome-wide protein-dna interactions ChIP-Seq Chromatin-immunoprecipitation (ChIP) isolates proteinbound DNA Follow by deep sequencing of DNA fragments (Seq) Facilitates genome wide mapping pf DNA-protein interactions How TFs, other chromatin associated factors can affect phenotype. Regulation/Structural Analysis ChIP-Seq vs. ChIP-chip no prior knowledge of content required Similar approach can be used to map genomic methylation 15

ChIP-Seq: Genome-wide protein-dna interactions ChIP-Seq Study ofhaematopoietic Stem Cells Interest in Haematopoiesis and genetic circuitry of blood cell development Tal1 T-cell acute lymphocytic leukaemia protein 1 TF that controls development and differentiation of Haematopoietic Stem Cells (HSCs) Very few target genes had been validated. ChIP-Seq approach taken to generate a genome-wide catalogue of Tal1 binding events in stem cell line Use Illumina BeadStudio ChIP-Seq module to identify peaks (potential chromatin binding sites) Followed by in vivo validation (foetal liver, transgenic mice) Allows construction of in vivo validated network of 17 factors and respective regulatory elements 16

Applications of NGS to Cancer Genetics Levels of information extraction, data integration Variant Detection RNA-Seq Quantification RNA-Seq Discovery ChIP-Seq Integrate Associate observed variants with regulation/transcriptional changes; link to external databases Analyse Identify Process Generate Overlapping Genes Differential Expression Novel Isoforms Associated Genes Variant Detection Expression levels Novel gene models Motif finding Targeted Resequencing Density on known exons Novel Transcripts Binding Sources Consensus Sequence Identify splice-crossing reads Enriched regions De novo assembly / reads mapped to (un) annotated reference sequence 10 8-10 9 short DNA fragments Level of Information Extraction 17