Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Similar documents
Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep

Identifying Mutations Responsible for Rare Disorders Using New Technologies

Investigating rare diseases with Agilent NGS solutions

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Cytogenetics 101: Clinical Research and Molecular Genetic Technologies

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits?

Genomic structural variation

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Back to the Basics: Methyl-Seq 101

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery

Detection of copy number variations in PCR-enriched targeted sequencing data

Global variation in copy number in the human genome

SCALPEL MICRO-ASSEMBLY APPROACH TO DETECT INDELS WITHIN EXOME-CAPTURE DATA. Giuseppe Narzisi, PhD Schatz Lab

NEXT GENERATION SEQUENCING. R. Piazza (MD, PhD) Dept. of Medicine and Surgery, University of Milano-Bicocca

NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation

Analysis with SureCall 2.1

Illuminating the genetics of complex human diseases

SUPPLEMENTARY INFORMATION

DNA-seq Bioinformatics Analysis: Copy Number Variation

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

Next Generation Sequencing as a tool for breakpoint analysis in rearrangements of the globin-gene clusters

Implementation of the DDD/ClinGen OGT (CytoSure v3) Microarray

PERSONALIZED GENETIC REPORT CLIENT-REPORTED DATA PURPOSE OF THE X-SCREEN TEST

TOWARDS ACCURATE GERMLINE AND SOMATIC INDEL DISCOVERY WITH MICRO-ASSEMBLY. Giuseppe Narzisi, PhD Bioinformatics Scientist

GENOME-WIDE ASSOCIATION STUDIES

CURRENT GENETIC TESTING TOOLS IN NEONATAL MEDICINE. Dr. Bahar Naghavi

Agilent s Copy Number Variation (CNV) Portfolio

On Missing Data and Genotyping Errors in Association Studies

Evaluation of MIA FORA NGS HLA test and software. Lisa Creary, PhD Department of Pathology Stanford Blood Center Research & Development Group

2/10/2016. Evaluation of MIA FORA NGS HLA test and software. Disclosure. NGS-HLA typing requirements for the Stanford Blood Center

Copy Number Varia/on Detec/on. Alex Mawla UCD Genome Center Bioinforma5cs Core Tuesday June 16, 2015

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Introduction to the Genetics of Complex Disease

No mutations were identified.

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Identification of genomic alterations in cervical cancer biopsies by exome sequencing

PROGRESS: Beginning to Understand the Genetic Predisposition to PSC

Lecture 20. Disease Genetics

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

Nature Biotechnology: doi: /nbt.1904

Mapping by recurrence and modelling the mutation rate

Supplementary Figure 1. Schematic diagram of o2n-seq. Double-stranded DNA was sheared, end-repaired, and underwent A-tailing by standard protocols.

Van test naar diagnose naar

RNA SEQUENCING AND DATA ANALYSIS

PG-Seq NGS Kit for Preimplantation Genetic Screening

Structural Variation and Medical Genomics

Home Brewed Personalized Genomics

Welcome to the Genetic Code: An Overview of Basic Genetics. October 24, :00pm 3:00pm

Performance Characteristics BRCA MASTR Plus Dx

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

Human Genetics of Tuberculosis. Laurent Abel Laboratory of Human Genetics of Infectious Diseases University Paris Descartes/INSERM U980

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Supplementary Figure 1

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

NGS in tissue and liquid biopsy

Identification of regions with common copy-number variations using SNP array

For the 5 GATC-overhang two-oligo adaptors set up the following reactions in 96-well plate format:

AD (Leave blank) TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients

UNIVERSITI TEKNOLOGI MARA COPY NUMBER VARIATIONS OF ORANG ASLI (NEGRITO) FROM PENINSULAR MALAYSIA

ncounter TM Analysis System

Victor Guryev. European Research Institute for the Biology of Ageing

!"##"$%#"&!'&$'()$(%&'*& Terapia Pediatrica e Farmacologia dello Sviluppo +,-./&01,23&34,53& :&;.<&2-.=;:3&;.;2>6-6&-.&;&

Introduction to genetic variation. He Zhang Bioinformatics Core Facility 6/22/2016

Genetics and Genomics in Medicine Chapter 8 Questions

Implementation of BRCA Oncomine panel for germline and somatic variant analysis

SureSelect Cancer All-In-One Custom and Catalog NGS Assays

Practical challenges that copy number variation and whole genome sequencing create for genetic diagnostic labs

Golden Helix s End-to-End Solution for Clinical Labs

Supplementary Figures

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

CNV detection. Introduction and detection in NGS data. G. Demidov 1,2. NGSchool2016. Centre for Genomic Regulation. CNV detection. G.

CNV Detection and Interpretation in Genomic Data

Structural Variants and Susceptibility to Common Human Disorders Dr. Xavier Estivill

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

LTA Analysis of HapMap Genotype Data

Research Strategy: 1. Background and Significance

Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG)

Enterprise Interest Thermo Fisher Scientific / Employee

Comprehensive performance comparison of high-resolution array platforms for genome-wide Copy Number Variation (CNV) analysis in humans

Data mining with Ensembl Biomart. Stéphanie Le Gras

Viral genome sequencing: applications to clinical management and public health. Professor Judy Breuer

Single-Cell Sequencing in Cancer. Peter A. Sims, Columbia University G4500: Cellular & Molecular Biology of Cancer October 22, 2018

Nature Genetics: doi: /ng Supplementary Figure 1

Below, we included the point-to-point response to the comments of both reviewers.

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

The Importance of Coverage Uniformity Over On-Target Rate for Efficient Targeted NGS

Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

ncounter Data Analysis Guidelines for Copy Number Variation (CNV) Molecules That Count NanoString Technologies, Inc.

Most severely affected will be the probe for exon 15. Please keep an eye on the D-fragments (especially the 96 nt fragment).

Transcription:

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection Dr Elaine Kenny Neuropsychiatric Genetics Research Group Institute of Molecular Medicine Trinity College Dublin

Schizophrenia - background Common, complex brain disorder Presents with a significant heterogeneity of symptoms psychotic symptoms delusions and hallucinations negative symptoms affective flattening, alogia and avolition Cognitive deficits working memory, attention Age of onset: males between the age of 16 and 30 females largely presenting after the age of 30 Variable course of illness and complete remission is probably uncommon

Schizophrenia - background Chronic debilitating Devastating for individual, family and society Life expectancy reduced by 10 years ~20% of affected in full time employment Substantial societal burden from disease Biology poorly understood and treatments partially effective Higher incidence in men

Heritability of Schizophrenia

International Schizophrenia Consortium (ISC) GWAS study

Chromosomal Deletions Large chromosomal deletions (100Kb 3Mb) occur more frequently in schizophrenia cases compared to normal healthy controls Most of these deletions are spread across the genome and do not yet implicate specific genes in schizophrenia However, some schizophrenia deletions do cluster at certain regions, implicating specific genes in the illness

Where are the rest of the risk genes? Few replicated common SNPs for schizophrenia

Challenges for common disorders

How do we find the disease causing rare variants? Sequence whole genomes: Expensive! Sample number required for statistical power means huge costs A large portion of the genome will be non-functional and unlikely to harbour risk mutations From Mendelian disease, we know that (i) mutations causing amino acid changes account for ~60% of disease mutations (ii) small indels in genes account for ~25% of disease mutations (iii) <1% of disease mutations have been found in regulatory regions

Target Enrichment Best compromise Sequence portions of the genome Genomic regions of interest e.g. genes/promoters etc. Exome

The logical extension of sample pooling is to perform multiplexed target enrichments in which many samples are barcoded before capture

<3µg Library Preparation Ilumina SE/PE SOLiD Hybridization / Capture 0.5µg 24 hours Baits: - crna probes - Long (120bp) - Biotin labeled - User-defined (earray) - SurePrint synthesis Bead Separation Wash / Elution / Amp Page 15

Whole Genome Whole Exome N=200 genes 1 sample per flowcell 8 samples per flowcell 100+ samples per flowcell x3 Illumina flowcell 3Gb sequence post QC and alignment 25K per genome (direct consumables cost) 2.5K per exome (direct consumables cost) 250 per sample (direct consumables cost)

How to barcode/index samples?

Sequencing Library prep 5 3 Shear Sonicate 3 5 5 3 3 5 End Repair T4 and Klenow DNA polymerases, T4 PNK 5 3 3 5 Add A Base Klenow exo _ 5 A A A 3 3 A A A 5

Ligation of adapters 5 3 T 5 A A 5 T 3 5 5 3 T A A T 3 5

Ligation of indexed adapters 5 3 T A A T 3 5 5 3 T A A T 3 5

Indexing /Barcoding of DNA Samples + + + + AACCAT CAACCT GCATGT TCAGTT Target Enrichment (single Agilent SureSelect rxn) Sequencing (single lane of an Illumina GA II)

Sequencing Output: indexed 40bp reads AACCATTCCGTGTACTGACTGCTCGATATA CAACCTTCCGTGTACTGACTGCTCGATCTA AACCATTCCGTGTACTGACTGCTCGATATA GCATGTTCCGTGTACTGACTGCTCGATATA TCAGTTTCCGTCGATATA CAACCTTCCGTGTACTGACTGCTCGATATA AACCATTCCGTGTACTGACTGCTCGATATA TCAGTTTCCGTGTACTGACTGCTCGATATA GCATGTTCCGTGTACTGACTGCTCGATATA CAACCTTCCGTGTACTGACTGCTCGATATA GCATGTTCCGTGTACTGACTGCTCGATATA TCAGTTTCCGTGTACTGACTGCTCGATATA AACCATTCCGTGTACTGACTGCTCGATATA TCAGTTTCCGTCGATATA CAACCTTCCGTGTACTGACTGCTCGATCTA GCATGTTCCGTGTACTGACTGCTCGATATA Separate reads and analyze on a per-individual basis AACCATTCCGTGTACTGACTGCTCGATATA AACCATTCCGTGTACTGACTGCTCGATATA AACCATTCCGTGTACTGACTGCTCGATATA AACCATTCCGTGTACTGACTGCTCGATATA CAACCTTCCGTGTACTGACTGCTCGATATA CAACCTTCCGTGTACTGACTGCTCGATCTA CAACCTTCCGTGTACTGACTGCTCGATATA CAACCTTCCGTGTACTGACTGCTCGATCTA SNP Detection GCATGTTCCGTGTACTGACTGCTCGATATA GCATGTTCCGTGTACTGACTGCTCGATATA GCATGTTCCGTGTACTGACTGCTCGATATA GCATGTTCCGTGTACTGACTGCTCGATATA TCAGTTTCCGTGTACTGACTGCTCGATATA TCAGTTTCCGTGTACTGACTGCTCGATATA TCAGTTTCCGTGTACTGACTGCTCGATATA TCAGTTTCCGTGTACTGACTGCTCGATATA CNV Detection

SureSelect pilot study: combine indexed samples and target enrich using 1 capture library Targeted resequencing of 9 HapMap samples Single non-indexed sample library Single indexed sample library 3-index sample library 9-index sample library

Sequence coverage across PTBP2 for indexed vs non-indexed sample

Results: Number of sequence reads per indexed sample in sequenced libraries (pre-alignment to reference genome) 3 samples Median coverage of 41x 9 samples Median coverage of 11x Overall SNP concordance >99%

Target enrichment achieved non-index sample 1-index sample 3-index sample 1 9-index sample 1 Percentage reads in targeted 20% 22% 23% 19% regions +/- 50bp 2 Fold enrichment in targeted 1708 1885 1912 1608 regions 3 Percentage Target bases 98% 98% 98% 93% Covered 4 Median Coverage of Target 5 169x 6 93x 6 41x 11x 1 Average values given for multi-sample libraries. 2 Number of reads uniquely mapping to the target region (+/-50bp) as a % of the number of reads uniquely mapping to hg18. 3 (Sequence reads uniquely mapping to the target regions/sequence reads mapping to hg18) x Maximum Enrichment where Maximum Enrichment is a ratio of genome length (3,080,419,510bp) to target length (377388bp) 4 Percentage of target bases covered by at least one sequence read 5 (Number of 34bp reads matching target x 34)/target length 6 The difference in median read coverage between the non-indexed and indexed sample is reflective of the higher number of clusters on the flowcell and also the higher number of clusters passing QC filters in the non-indexed sample (83.48% vs 57.65%,

CNV detection Targeted 17 CNVs known to be polymorphic in Hapmap samples

CNV detection in raw data: 9-index sample Known CNV region targeted on chr 22 Strong correlation between sequence coverage and CNV genotype in the indexed samples (rho=1, p<0.0005)

Development of CNV detection algorithm In cases were raw read number not even across indexed samples- how to identify CNVs?

Development of CNV detection algorithm Step 1: Normalise read counts

Development of CNV detection algorithm Step 2: Identify regions where normalised read counts differ significantly between samples

Application to schizophrenia and autism Agilent earray Design Coding Exons (210 genes): 687,784bp Expanded Exons: 742,017bp exons <121bp in size were expanded to 121bp so that all exonic sequenced would be targeted by 2 crna baits Baited sequence: 1,033,568bp Coverage of exons by crna baits: 98%

How many DNA samples can be indexed in a single enrichment and sequencing reaction? Specificity and sensitivity of enrichment rxn (target = ~1Mb) 16-24 samples per lane Quantity of sequence reads post QC and alignment (80bp PE seq)

Acknowledgements Dr Derek Morris Dr Paul Cormican Dr Eleisa Heron William Gilks Sarah Furlong Dr Colm O Dushlaine Dr Carlos Pinto Dr Ric Anney Dr Aiden Corvin Dr Louise Gallagher Prof Michael Gill www.medicine.tcd.ie/sequencing