Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Similar documents
Analysis with SureCall 2.1

DNA-seq Bioinformatics Analysis: Copy Number Variation

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Identification of genomic alterations in cervical cancer biopsies by exome sequencing

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Performance Characteristics BRCA MASTR Plus Dx

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Supplementary Methods

Accel-Amplicon Panels

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

Hands-On Ten The BRCA1 Gene and Protein

Data mining with Ensembl Biomart. Stéphanie Le Gras

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

ncounter Data Analysis Guidelines for Copy Number Variation (CNV) Molecules That Count NanoString Technologies, Inc.

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

Introduction to LOH and Allele Specific Copy Number User Forum

A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R

Golden Helix s End-to-End Solution for Clinical Labs

Below, we included the point-to-point response to the comments of both reviewers.

Fluxion Biosciences and Swift Biosciences Somatic variant detection from liquid biopsy samples using targeted NGS

SUPPLEMENTARY INFORMATION

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG)

NGS in tissue and liquid biopsy

Investigating rare diseases with Agilent NGS solutions

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

CITATION FILE CONTENT/FORMAT

Identifying Mutations Responsible for Rare Disorders Using New Technologies

Next Generation Sequencing as a tool for breakpoint analysis in rearrangements of the globin-gene clusters

Reporting TP53 gene analysis results in CLL

Supplementary Figure 1. Schematic diagram of o2n-seq. Double-stranded DNA was sheared, end-repaired, and underwent A-tailing by standard protocols.

Illumina Trusight Myeloid Panel validation A R FHAN R A FIQ

6/12/2018. Disclosures. Clinical Genomics The CLIA Lab Perspective. Outline. COH HopeSeq Heme Panels

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Molecular Characterization of Tumors Using Next-Generation Sequencing

Implementation of BRCA Oncomine panel for germline and somatic variant analysis

Supplementary Figure 1. Estimation of tumour content

Assessing Laboratory Performance for Next Generation Sequencing Based Detection of Germline Variants through Proficiency Testing

Detection of low-frequent mitochondrial DNA variants using SMRT sequencing

Nature Biotechnology: doi: /nbt.1904

Evaluation of MIA FORA NGS HLA test and software. Lisa Creary, PhD Department of Pathology Stanford Blood Center Research & Development Group

2/10/2016. Evaluation of MIA FORA NGS HLA test and software. Disclosure. NGS-HLA typing requirements for the Stanford Blood Center

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

STREAMLINED MUTATION ANALYSIS FOR CLINICAL NEXT GENERATION SEQUENCING DATA

No mutations were identified.

Chip Seq Peak Calling in Galaxy

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Colorspace & Matching

Genomic structural variation

Supplementary Appendix

Supplemental Data: Detailed Characteristics of Patients with MKRN3. Patient 1 was born after an uneventful pregnancy. She presented in our

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

TOWARDS ACCURATE GERMLINE AND SOMATIC INDEL DISCOVERY WITH MICRO-ASSEMBLY. Giuseppe Narzisi, PhD Bioinformatics Scientist

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

Introduction of an NGS gene panel into the Haemato-Oncology MPN service

Supplemental Information For: The genetics of splicing in neuroblastoma

Cytogenetics 101: Clinical Research and Molecular Genetic Technologies

AD (Leave blank) TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

MEDICAL GENOMICS LABORATORY. Peripheral Nerve Sheath Tumor Panel by Next-Gen Sequencing (PNT-NG)

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery

NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation

Supplementary information to:

Cancer Gene Panels. Dr. Andreas Scherer. Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer

Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer

ChIP-seq data analysis

Cancer Informatics Lecture

Somatic cancer applications of NGS in in vitro Diagnostics.

Please Silence Your Cell Phones. Thank You

Nature Neuroscience: doi: /nn Supplementary Figure 1. Missense damaging predictions as a function of allele frequency

MSI positive MSI negative

Simple, rapid, and reliable RNA sequencing

Small RNA Sequencing. Project Workflow. Service Description. Sequencing Service Specification BGISEQ-500 SERVICE OVERVIEW SAMPLE PREPARATION

Andrew Parrish, Richard Caswell, Garan Jones, Christopher M. Watson, Laura A. Crinnion 3,4, Sian Ellard 1,2

COMPARISON OF HIV DRUG-RESISTANT MUTANT DETECTION BY NGS WITH AND WITHOUT UNIQUE MOLECULAR IDENTIFIERS (UMI)

5 th July 2016 ACGS Dr Michelle Wood Laboratory Genetics, Cardiff

FONS Nové sekvenační technologie vklinickédiagnostice?

CHR POS REF OBS ALLELE BUILD CLINICAL_SIGNIFICANCE

MEDICAL GENOMICS LABORATORY. Non-NF1 RASopathy panel by Next-Gen Sequencing and Deletion/Duplication Analysis of SPRED1 (NNP-NG)

SALSA MLPA KIT P050-B2 CAH

Title:Exome sequencing helped the fine diagnosis of two siblings afflicted with atypical Timothy syndrome (TS2)

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

Bioinformatics Laboratory Exercise

SALSA MLPA probemix P315-B1 EGFR

High Throughput TruSeq Stranded mrna Library Construction on the Biomek FX P

MRC-Holland MLPA. Description version 08; 30 March 2015

Global variation in copy number in the human genome

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

LTA Analysis of HapMap Genotype Data

SALSA MLPA probemix P241-D2 MODY mix 1 Lot D As compared to version D1 (lot D1-0911), one reference probe has been replaced.

Illuminating the genetics of complex human diseases

Detection of copy number variations in PCR-enriched targeted sequencing data

Transcription:

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Application Note Authors John McGuigan, Megan Manion, Kevin LeVan, CS Jonathan Liu Introduction The HaloPlex Target Enrichment System allows for rapid targeted sequencing and library preparation in a single-tube reaction with no need for library preparation and support for up to 96 barcoded samples in one reaction. It works in 4 steps (figure 1). 1. Digest sample DNA A DNA sample is fragmented using restriction enzymes, and denatured. 2. Hybridize probes The probe library is added and hybridized to the targeted fragments. Each probe is an oligonucleotide designed to hybridize to both ends of a targeted DNA restriction fragment, thereby guiding the targeted fragments to form circular DNA molecules. The probe also contains a method-specific sequencing motif that is incorporated during circularization. In addition, a sample barcode sequence is incorporated in this step. 3. Purify and ligate targets The HaloPlex probes are biotinylated and the targeted fragments can therefore be retrieved with magnetic streptavidin beads. The circular molecules are then closed by ligation, a very precise reaction that ensures that only perfectly hybridized fragments are circularized. 4. Amplify targeted fragments with PCR Only circular DNA targets are amplified, providing an enriched and barcoded amplification product that is ready for sequencing. Figure 1. HaloPlex Workflow This application note covers analysis of this data when sequencing is performed on Illumina HiSeq or MiSeq sequencing systems.

NextGENe allows for simple pointand-click project setup in an easyto-use Windows environment running on relatively low-cost hardware. It includes several features that make the analysis of this data fast and simple. After alignment the results can be viewed in the NextGENe Viewer with access to dbsnp, dbnsfp, and COSMIC databases. Figure 2 shows a screenshot from HaloPlex Cardiac Disease Panel targeted sequencing. The called mutation was validated using Sanger sequencing, in this sample. Procedure A given dataset can be processed in minutes on a desktop computer running a 64-bit Windows operating system. Several paired-end datasets were processed in this analysis, all sequenced on an Illumina MiSeq Personal Sequencer: Figure 2. c.2763delc in KCNH2 detected at approximately 23% frequency 2x Cardiomyopathy 3x Noonan Spectrum Disorder 3x Collagen Tissue Disease Format Conversion (figure 3) The FASTQ files are first converted to FASTA format Basecall quality scores are used to filter and trim the raw data using default settings for Illumina data. The Paired Reads Data option is selected to keep the paired reads on the same line in each file. Adapter sequence is trimmed at the same time according to a text file. Each sequence is on a separate line with 4 columns separated by tabs (NextGENe v2.3.1): Adapter name 5 Trim Sequence ( - is used here to skip 5 trimming) Figure 3. Format Conversion Settings 3 Trim Sequence two Illumina adapters are listed on separate lines AGATCGGAAGAGCACACGTCTGAA CTCCAGTCAC AGATCGGAAGAGCGTCGTGTAGGG AAAGAGTGTAGATCTCGGTGGTCG CCGTATCATT Trim mode P (Partial Match) is used here because we expect to see the beginning portion of the adapter sequence at the end of the read 2

Alignment to the human genome Alignment is performed against a pre-index human genome reference in order to avoid false positives caused by nonspecific amplification of untargeted regions Mismatches are called as variants in the mutation report if they pass the mutation filter. Settings used for the MiSeq projects in this analysis required more than 30x coverage, 5% frequency, and 15 total reads with the mutation (Figure 4). The Load Paired Reads option is selected in order to keep track of paired read associations. Project review The mutation report, expression report, and coverage curve reports are filtered by loading a BED file specifying amplicon or targeted loci regions. The latter two reports provide information about coverage in the regions of interest. The mutation report has many additional filtering options. Figure 4. Alignment and Mutation filter settings 3

Results Figure 5 shows the fraction of converted reads that were aligned for each sample, while figure 6 lists the fraction of converted reads aligned in the targeted regions. Results were very consistent for the samples within each panel. Cardiac Disease Panel results are in blue, Noonan Spectrum Disorder Panel results are in green, and Collagen Tissue Disease Panel results are in red. Figure 7 shows the average coverage in the targeted regions for each sample and figure 8 shows the fraction of targeted bases that had at least 30x coverage. The information for figures 5, 6, 7, and 8 can easily be obtained from NextGENe s Coverage Curve report. The report (when filtered with a BED file describing targeted regions) will list any regions below a set threshold of coverage and also provide a summary view (figure 9). Aligned Reads (% of Converted) 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% Figure 5. Alignment results for the entire reference genome Aligned in Targeted Regions (% of Converted) 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% Figure 6. Alignment results for the targeted regions 4

Average Coverage in Targeted Regions 10,000 1,000 100 10 1 Figure 7. Average coverage in the targeted regions 100.00% 30x Coverage (% of Targeted Regions) 80.00% 60.00% 40.00% 20.00% 0.00% Figure 8. Fraction of the targeted regions with 30 or more coverage Figure 9 Coverage Curve Summary report for a Cardiac Disease Panel sample. 5

All eight samples were successful in finding previously validated variants. One sample had a CNV variant that was detected using a new analysis that will be available in NextGENe v2.3.1. Mutation calling results are summarized in table 1. Figure 10 shows the deletion found in the first Noonan Spectrum Disorder sample. Figure 10. A 4 bp deletion in the NF1 gene found in a Noonan Spectrum Disorder sample Table 1. Mutation Results Panel Gene Mutation Frequency Chr Position Note Cardiac Disease Noonan Spectrum Disorder Collagen Tissue Disease KCNH2 Del Exon 11 N/A 7 ~150,645,530 150,645,631 CNV KCNH2 delc 22.58 7 150,644,896 NF1 delttac 8.87 17 29,665,754 SOS1 G>A 45.83 2 39,250,272 RAF1 C>T 45.43 3 12,645,699 rs80338796 COL5A2 G>A 44.78 2 189,906,320 TGFBR2 T>C 48.05 3 30,732,945 FBN1 deltg 40.68 15 48,752,496 6

The new CNV analysis module is based on the median coverage in each targeted region after global normalization. A log-2 ratio is calculated in a case-control comparison. A negative ratio around -1 indicates a deletion in the sample relative to the control, while a positive ratio indicates duplication. In this analysis the two cardiac disease panel samples were compared, with the first treated as the sample and the second treated as the control. Three possible CNV changes with a ratio value greater than 0.6 were found, including the deletion of exon 11 in the KCNH2 gene. Results are shown in figure 11. Figure 11. CNV analysis results Discussion When choosing alignment settings, it is important to consider the expected results. Increasing the minimum depth of coverage will reduce the number of false positives (even at lower mutation frequencies), but it may also decrease sensitivity. Setting a minimum number of mutant allele reads will allow for detection of low frequency variants in high coverage regions without allowing low frequency false positives to be called in low coverage regions. The coverage curve summary report is very useful for measuring potential loss of sensitivity because it reports the regions that did not meet a minimum level of coverage (figure 12). Figure 12. The Coverage Curve Report for a cardiomyopathy panel. Portions of targeted regions with less than 50x coverage are highlighted in pink and reported in a table. 7

NextGENe s whole genome alignment algorithm has three steps- Match perfect reads, match reads with some number of mismatches, and finally a seeded alignment. Shorter amplicons will benefit from the second step because there may not be enough bases on either side of a mutation to align perfectly matching seeds. Longer seed sizes will improve alignment specificity, and fewer seeds will improve speed. After alignment several projects can be compared and filtered against one another. Compound heterozygous, shared/different, low coverage, and Mendelian inheritance filtering are all possible (figure 13). After calling mutations, some functional prediction information is available from the dbnsfp v1.x database (1). This includes PolyPhen-2, SIFT, MutationTaster, LRT, and PhyloP in addition to 1000 genomes frequencies. The Sanger COSMIC (Catalog of Somatic Mutations in Cancer) database can also be imported (2). Acknowledgements We would like to thank Agilent Technologies and Berivan Baskin (Clinical Genetics, Uppsala University Hospital; The Centre for Applied Genetics; The Hospital for Sick Children) for supplying the data used in this analysis. References 1. Liu, X., Jian, X. & Boerwinkle, E. dbnsfp: A lightweight database of human nonsynonymous SNPs and their functional predictions. Human Mutation 32, 894-899 (2011). 2. Forbes, S.A. et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Research 39, D945-D950 (2010). Learn more about HaloPlex: www.agilent.com/genomics Figure 13. Three Collagen Tissue Disease projects. A two bp deletion found in one of the projects is highlighted. Learn more about NextGENe: www.softgenetics.com For research use only. Not approved for use in diagnostics procedures. Information is subject to change without notice. HiSeq and MiSeq are trademarks or registered trademarks of Illumina, Inc. Agilent Technologies, Inc., 2012 Published in USA, September 24, 2012 Publication Number 5991-1203EN