Nature Biotechnology: doi: /nbt.1904

Similar documents
Genomic structural variation

Structural Variation and Medical Genomics

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

CNV detection. Introduction and detection in NGS data. G. Demidov 1,2. NGSchool2016. Centre for Genomic Regulation. CNV detection. G.

CNV Detection and Interpretation in Genomic Data

Interactive analysis and quality assessment of single-cell copy-number variations

Supplementary Figure 1. Spitzoid Melanoma with PPFIBP1-MET fusion. (a) Histopathology (4x) shows a domed papule with melanocytes extending into the

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022

Global variation in copy number in the human genome

Victor Guryev. European Research Institute for the Biology of Ageing

SUPPLEMENTARY INFORMATION

DNA-seq Bioinformatics Analysis: Copy Number Variation

Cytogenetics 101: Clinical Research and Molecular Genetic Technologies

Applications of Chromosomal Microarray Analysis (CMA) in pre- and postnatal Diagnostic: advantages, limitations and concerns

Agilent s Copy Number Variation (CNV) Portfolio

Illuminating the genetics of complex human diseases

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

Below, we included the point-to-point response to the comments of both reviewers.

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

CHROMOSOMAL MICROARRAY (CGH+SNP)

Supplementary Figure 1. Estimation of tumour content

SVIM: Structural variant identification with long reads DAVID HELLER MAX PLANCK INSTITUTE FOR MOLECULAR GENETICS, BERLIN JUNE 2O18, SMRT LEIDEN

PSSV User Manual (V2.1)

Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep

Shape-based retrieval of CNV regions in read coverage data. Sangkyun Hong and Jeehee Yoon*

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

LTA Analysis of HapMap Genotype Data

Ambient temperature regulated flowering time

No mutations were identified.

Supplementary Figure 1. Schematic diagram of o2n-seq. Double-stranded DNA was sheared, end-repaired, and underwent A-tailing by standard protocols.

Supplementary Appendix

Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer

Vega: Variational Segmentation for Copy Number Detection

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Nature Genetics: doi: /ng Supplementary Figure 1

Introduction. 8 These authors contributed equally to this work

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

PSSV User Manual (V1.0)

Copy number variation detection and genotyping from exome sequence data

Identification of regions with common copy-number variations using SNP array

Introduction to genetic variation. He Zhang Bioinformatics Core Facility 6/22/2016

p.r623c p.p976l p.d2847fs p.t2671 p.d2847fs p.r2922w p.r2370h p.c1201y p.a868v p.s952* RING_C BP PHD Cbp HAT_KAT11

A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High- Resolution acgh Data

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

Evaluation of MIA FORA NGS HLA test and software. Lisa Creary, PhD Department of Pathology Stanford Blood Center Research & Development Group

2/10/2016. Evaluation of MIA FORA NGS HLA test and software. Disclosure. NGS-HLA typing requirements for the Stanford Blood Center

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

Lentiviral Delivery of Combinatorial mirna Expression Constructs Provides Efficient Target Gene Repression.

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed.

November 9, Johns Hopkins School of Medicine, Baltimore, MD,

Table S1. Relative abundance of AGO1/4 proteins in different organs. Table S2. Summary of smrna datasets from various samples.

Supplementary Information. Supplementary Figures

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Analysis with SureCall 2.1

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Nature Genetics: doi: /ng Supplementary Figure 1. Somatic coding mutations identified by WES/WGS for 83 ATL cases.

Understanding DNA Copy Number Data

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Hands-On Ten The BRCA1 Gene and Protein

Multimarker Genetic Analysis Methods for High Throughput Array Data

Nature Genetics: doi: /ng Supplementary Figure 1. Rates of different mutation types in CRC.

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG)

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

Multiple Copy Number Variations in a Patient with Developmental Delay ASCLS- March 31, 2016

Introduction to LOH and Allele Specific Copy Number User Forum

MIR retrotransposon sequences provide insulators to the human genome

Genome-Wide Analysis of Copy Number Variations in Normal Population Identified by SNP Arrays

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

Comprehensive Chromosome Screening Is NextGen Likely to be the Final Best Platform and What are its Advantages and Quirks?

Approach to Mental Retardation and Developmental Delay. SR Ghaffari MSc MD PhD

Calling DNA variants SNVs, CNVs, and SVs. Steve Laurie Variant Effect Predictor Training Course Prague, 6 th November 2017

SNP Array NOTE: THIS IS A SAMPLE REPORT AND MAY NOT REFLECT ACTUAL PATIENT DATA. FORMAT AND/OR CONTENT MAY BE UPDATED PERIODICALLY.

PSE-HMM: genome-wide CNV detection from NGS data using an HMM with Position-Specific Emission probabilities

SCALPEL MICRO-ASSEMBLY APPROACH TO DETECT INDELS WITHIN EXOME-CAPTURE DATA. Giuseppe Narzisi, PhD Schatz Lab

Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing

Integrated detection and population-genetic analysis. of SNPs and copy number variation

MSI positive MSI negative

High-throughput transcriptome sequencing

White Paper. Copy number variant detection. Sample to Insight. August 19, 2015

SNP Array NOTE: THIS IS A SAMPLE REPORT AND MAY NOT REFLECT ACTUAL PATIENT DATA. FORMAT AND/OR CONTENT MAY BE UPDATED PERIODICALLY.

Identification of genomic alterations in cervical cancer biopsies by exome sequencing

Genome. Institute. GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon s Cloud. R. Jay Mashl.

Supplementary Materials for

Detection of copy number variations in PCR-enriched targeted sequencing data

Implementation of the DDD/ClinGen OGT (CytoSure v3) Microarray

CITATION FILE CONTENT/FORMAT

STATISTICAL METHODS FOR THE DETECTION AND ANALYSES OF STRUCTURAL VARIANTS IN THE HUMAN GENOME. Shu Mei, Teo

Genome Structural Variation

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

MPS for translocations

Answers to Practice Items

SUPPLEMENTARY INFORMATION

Nature Genetics: doi: /ng Supplementary Figure 1. Details of sequencing analysis.

Integrated detection and population-genetic analysis of SNPs and copy number variation

Transcription:

Supplementary Information Comparison between assembly-based SV calls and array CGH results Genome-wide array assessment of copy number changes, such as array comparative genomic hybridization (acgh), is widely accepted by the scientific community for copy number variations (CNV) detection as CNVs, in principle, result from structural variation events. Array CGH typically has a resolution at the 10kbp scale without exact breakpoints defined, which does not overlap well with the length spectrum coverage and breakpoint features of our assembly-based method. Nevertheless, it is still interesting to see the comparison between the two technologies. We performed acgh between any two of the three genomes (the YH genome, the anonymous reference genome, and a Promega female sample (www.promega.com)) to sort out any putative aberrant copy number changes that are specific to YH genome. In total, 144 CNVs, including 42 multi-probe and 102 single-probe signals were called. Using a reciprocal overlap threshold of 50%, we found 20 (47.6%) multi-probe CNVs and 11 (10.7%) single-probe CNVs were called in acgh had SVs from our assemblybased approaches (Supplementary Dataset). Of note, 19 (61%) out of all 31 overlapping CNVs actually had multiple SV events within their genomic ranges. For example, a copy number loss on chromosome X called by acgh actually involved 60 different deletions and insertion events called from the assembly (Figure S8). This indicates that acgh would not only have ambiguous breakpoints but also aggregates multiple signals into a misleading average result, losing the internal details. Comparison between SV call sets with other studies The HuRef assembly 6 resolved by conventional Sanger sequencing had a better contiguity than both assemblies in our study. Therefore, it would be interesting to see in which SV category we still have a margin for improvement. A comparison between call sets (Table S4a) showed that our assembly-based SVs have a larger reciprocal overlap with HuRef SV calls than those of any other methods, which indicates that our methods have better power to discover SVs. The overlap rate in small indels is much higher than that of large SVs, which could be because: 1) large SVs are more

individual-specific than small indels as they confer higher deleterious impact with stronger negative selection; and/or 2) current assembly contiguity needs to be improved to include large SVs within a contig. We also compared our call sets with Pang et al. 7 s call sets from a range of technologies (Table S4b). In the Pang et al. call sets, those called from SR mapping had the highest overlap with our call sets, rather than those called from PEM or arrays. Considering the fact that indels called by the SR method are smaller than those from other methods, this again suggests that large indels are less likely to overlap between individuals than small ones.

Supplementary figures Figure S1. A gapped alignment plot to indicate a deletion between NCBI36 Chromosome 1, segment from coordination 246,118,102 to 246,124,262, and YH whole genome de novo assembly scaffold6804, breakpoint at 8,953. Scaffold 6804 5000 10000 15000 20000 246110000 246115000 246120000 246125000 246130000 Chormosome 1

Figure S2. Illustration of how read pair and read depth changes on sites of insertion or deletion in reference and assembly. Well-assembled sequence should always achieve a good pair-wise alignment result and read depth since a PE read would be aligned as two single-end reads around indels in a reference and the read depth of deleted regions in the reference should be very low. a. b.

Figure S3. A case of complex structural variation in YH genome. The figure illustrate a ~22kb inversion (pink line, as a cross between assembly and reference) at chromosome 10 with repetitive sequences (gray block) and several other events, including insertions ranging from 1bp to ~18kbp (green line and block), and deletions (violet line and block) among a hyper-mutation region (with over 10 insertion and deletion events spacing less than 200bp). Read depth (uppermost line chart) and PE reads alignment (medium curve chart) show the difficulty for this SV event to be detected by previous approaches including RD, PEM and SR.

Figure S4. The SV distribution of the whole genome and regions with significantly different numbers of SVs between YH and NA18507 genome. Histograms show the number of SVs in a 1-Mb bin on chromosomes. Regions with significant difference between two genomes are marked as purple (YH higher than NA18507) and green (NA18507 higher than YH) on the right of chromosomes.

Figure S5. Stacked histogram showing the portion of SVs of different length ranges overlap with unique and repetitive annotated regions in NCBI human reference genome build 36.

Figure S6. Venn graph showing the amount of affected gene features among those genes overlapping with SVs. CDS (Green); 3-UTR (Red); 5-UTR (Blue); Intron (Yellow). Numbers indicated are the numbers of genes with one or several gene features affected in YH genome; followed by that of the NA18507 genome. 890:;71&/4!"#$% &"#$% 23/023/ 3&022 20! 20! 330!5 &/0!1 /0/ 507 4/3304!!& 707 702 &02 306 3550&/5 6/042 '() *+,-.+

Figure S7. The frequency of structural variations (x-axis) detected in coding sequences showed a negative correlation with their length (y-axis). Mean length (bp) 0 500 1000 1500 0.01 0.05 0.1 0.2 0.5 1 Frequency

Figure S8. Comparison between array CGH signals and assembly-based SV calls showed that acgh signals are averaged from multiple smaller-scaled SV events. Insertion Deletion 91.45Mb 11438bp 38282bp 92.18Mb

Supplementary tables Table S1. Primers, sequences of randomly selected structural variations and Sanger capillary sequencing results for PCR validation. Table S1_PCR validation.xls Table S2. (a) Summary of Fosmid sequences validation results. (b) Details including chromosome and coordination of Fosmid sequences validation results. Table S2_Fosmid validation.xls Table S3. Structural variations predicted on the YH and NA18507 genome were, respectively, compared to sets of variants discovered by alternative approaches. Before the slash (/) are the numbers of overlapping variants of NA18507 genome, after are the numbers of overlapping variants of YH genome. Hyphen (-) means not applicable. The criteria FxOy extends x bp as flanking sequence at both sides of the breakpoints of identified variants for comparison, and require the length of the intersection between the validated and the identified variants to overlap by at least y bp of the length of the union of the intervals. DIP 1, small indels found as gaps in the paired-end alignment between the Fosmid end sequences and the reference; ESP 2, large structural variants that were found by analyzing discordant Fosmid clone-end alignment; Three separate sets of structural variants (maximum parsimony structural variation (MPSV) weighted, MPSV unweighted and probabilistic) predicted by Variation Hunter 3 ; MoDIL 4, the set of variants predicted by MoDIL utility. BreakDancer 5, a merged set with variants predicted by BreakDancerMini and BreakDancerMax; The dbsnp version 130 (v130) set refers to homozygous indels that are 30 bp or shorter in dbsnp version 130. The BreakSeq-YRI set refers to predicted variants in NA18507 by BreakSeq and a breakpoints library.

Table S3_Computational validation.xls Table S4. Comparison between SVs detected in YH genome, Levy et al. 6 and Pang et al. 7 Table S4_Compare to Levy and Pang.xlsx Table S5. Classification of those strongly conserved (dn/ds 0.1) genes containing SVs. Table S5_gene function.xls Supplementary Dataset Supplementary_aCGH.txt

References 1. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53-9 (2008). 2. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56-64 (2008). 3. Hormozdiari, F., Alkan, C., Eichler, E.E. & Sahinalp, S.C. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res 19, 1270-8 (2009). 4. Lee, S., Hormozdiari, F., Alkan, C. & Brudno, M. MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions. Nat Methods 6, 473-4 (2009). 5. Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods 6, 677-81 (2009). 6. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol 5, e254 (2007). 7. Pang, A.W. et al. Towards a comprehensive structural variation map of an individual human genome. Genome Biol 11, R52 (2010).