Supplementary Information. Supplementary Figures

Similar documents
Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

Global variation in copy number in the human genome

November 9, Johns Hopkins School of Medicine, Baltimore, MD,

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Genomic structural variation

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first

Nature Genetics: doi: /ng Supplementary Figure 1. Somatic coding mutations identified by WES/WGS for 83 ATL cases.

Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

LTA Analysis of HapMap Genotype Data

SVIM: Structural variant identification with long reads DAVID HELLER MAX PLANCK INSTITUTE FOR MOLECULAR GENETICS, BERLIN JUNE 2O18, SMRT LEIDEN

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Supplementary Figure 1. Estimation of tumour content

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Identification of regions with common copy-number variations using SNP array

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Nature Genetics: doi: /ng Supplementary Figure 1. Rates of different mutation types in CRC.

Supplementary Figures

Nature Biotechnology: doi: /nbt.1904

SUPPLEMENTARY INFORMATION

p.r623c p.p976l p.d2847fs p.t2671 p.d2847fs p.r2922w p.r2370h p.c1201y p.a868v p.s952* RING_C BP PHD Cbp HAT_KAT11

Chromosome Structure & Recombination

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

To test the possible source of the HBV infection outside the study family, we searched the Genbank

SUPPLEMENTAL INFORMATION

Nature Neuroscience: doi: /nn Supplementary Figure 1

Nature Methods: doi: /nmeth.3115

CNV PCA Search Tutorial

Next Generation Sequencing as a tool for breakpoint analysis in rearrangements of the globin-gene clusters

Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities

Supplementary Figure 1

Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer

Supplementary Figure 1. ALVAC-protein vaccines and macaque immunization. (A) Maximum likelihood

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Nature Genetics: doi: /ng Supplementary Figure 1

Supplementary Materials for

Supplementary Information

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

GENOME-WIDE ASSOCIATION STUDIES

Nature Immunology: doi: /ni Supplementary Figure 1. DNA-methylation machinery is essential for silencing of Cd4 in cytotoxic T cells.

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

Supporting Information

Supplementary Appendix

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

CHROMOSOMAL MICROARRAY (CGH+SNP)

Unit 5 Review Name: Period:

High Throughput Sequence (HTS) data analysis. Lei Zhou

MIR retrotransposon sequences provide insulators to the human genome

Investigating rare diseases with Agilent NGS solutions

Supplementary Figures

Nature Structural & Molecular Biology: doi: /nsmb.2419

Nature Genetics: doi: /ng Supplementary Figure 1. Country distribution of GME samples and designation of geographical subregions.

Chromatin marks identify critical cell-types for fine-mapping complex trait variants

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

Nature Genetics: doi: /ng Supplementary Figure 1. HOX fusions enhance self-renewal capacity.

Theta sequences are essential for internally generated hippocampal firing fields.

Supplementary Figure 1. Spitzoid Melanoma with PPFIBP1-MET fusion. (a) Histopathology (4x) shows a domed papule with melanocytes extending into the

Nature Genetics: doi: /ng Supplementary Figure 1. Clinical timeline for the discovery WES cases.

Introduction to genetic variation. He Zhang Bioinformatics Core Facility 6/22/2016

CCM6+7+ Unit 12 Data Collection and Analysis

indicated in shaded lowercase letters (hg19, Chr2: 217,955, ,957,266).

DNA-seq Bioinformatics Analysis: Copy Number Variation

Nature Medicine: doi: /nm.4439

EXPression ANalyzer and DisplayER

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

SUPPLEMENTARY FIGURES

MSI positive MSI negative

DETECTING HIGHLY DIFFERENTIATED COPY-NUMBER VARIANTS FROM POOLED POPULATION SEQUENCING

NGS panels in clinical diagnostics: Utrecht experience. Van Gijn ME PhD Genome Diagnostics UMCUtrecht

Updating penetrance estimates for syndromes with variable phenotypic manifestation. Adele Corrigan June 27th

Nature Genetics: doi: /ng Supplementary Figure 1

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Supplementary Figure 1. Schematic diagram of o2n-seq. Double-stranded DNA was sheared, end-repaired, and underwent A-tailing by standard protocols.

Mendel s Methods: Monohybrid Cross

The Chromosomal Basis of Inheritance

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

SUPPLEMENTARY FIGURES: Supplementary Figure 1

Multiple Copy Number Variations in a Patient with Developmental Delay ASCLS- March 31, 2016

Nature Genetics: doi: /ng Supplementary Figure 1

CURRENT GENETIC TESTING TOOLS IN NEONATAL MEDICINE. Dr. Bahar Naghavi

Unit 1 Exploring and Understanding Data

Figure 1: Transmission of Wing Shape & Body Color Alleles: F0 Mating. Figure 1.1: Transmission of Wing Shape & Body Color Alleles: Expected F1 Outcome

Tutorial on Genome-Wide Association Studies

John Nguyen, Nozomi Nishimura, Robert Fetcho, Costantino Iadecola, Chris B. Schaffer

Structural Variation and Medical Genomics

Lecture 20. Disease Genetics

fl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b)

Chapter 15 Notes 15.1: Mendelian inheritance chromosome theory of inheritance wild type 15.2: Sex-linked genes

Introduction to Genetics and Genomics

Supplemental Figure 1. Small RNA size distribution from different soybean tissues.

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

Calling DNA variants SNVs, CNVs, and SVs. Steve Laurie Variant Effect Predictor Training Course Prague, 6 th November 2017

Nature Genetics: doi: /ng Supplementary Figure 1. Immunofluorescence (IF) confirms absence of H3K9me in met-2 set-25 worms.

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

Supplemental Figures. Figure S1: 2-component Gaussian mixture model of Bourdeau et al. s fold-change distribution

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

SUPPLEMENTARY INFORMATION

Detection of copy number variations in PCR-enriched targeted sequencing data

Transcription:

Supplementary Information Supplementary Figures.8 57 essential gene density 2 1.5 LTR insert frequency diversity DEL.5 DUP.5 INV.5 TRA 1 2 3 4 5 1 2 3 4 1 2 Supplementary Figure 1. Locations and minor allele frequencies of all structural variants in curated data set. Each of the three chromosomes is indicated by black bar, with scale (in megabases) at bottom. From top (same data as Fig 1): density of essential genes (blue), locations of Tf-type retrotransposons (green), and diversity (π, average pairwise diversity from SNPs, purple). Bar heights for deletions and duplications are proportional to minor allele frequency, the scale for retrotransposons is the frequency of the insertion in the 57 non-clonal strains. Diversity and retrotransposon were calculated from 57 non-clonal strains as described in Jeffares, et al. 1. Below, we show different types of SVs: deletions (black), duplications (red), inversions (green) and translocations (blue). The vertical lines terminating with open circles above dotted lines emit from the mid-point of each SV and indicate the minor allele frequencies in the population of 161 strains.

distance from c/s end gene overlap: DUP distance from chromosome end 5 15 25 proportion transcript coverage all genes essential CNV rand REA rand DUP rand DUP rand gene overlap: DEL gene overlap: INV/TRA proportion transcript coverage proportion transcript coverage DEL rand DEL rand IN/TR rand IN/TR rand Supplementary Figure 2. Structural variations are biased towards chromosome ends and to low gene density regions. Top left panel, both CNVs and rearrangements are biased towards the ends of chromosomes. CNVs; median distance to chromosome ends 236 kb compared to chromosome- and sizematched random sites 944 kb, Wilcoxon rank sum test P = 1.3 x 1-11, rearrangements median distance 569 kb vs, matched random 863 kb, Wilcoxon test P =.3). All other panels, calculated proportion of each duplication and deletion that contained all protein-coding or essential genes. Box plots show the distributions of these proportions for all genes (grey), and proportion of coverage by essential genes (red), compared to the null distribution (rand). All comparisons were significantly less than the null distributions (Wilcoxon rank sum test, P-values <1.6 x 1-4 ). The same analysis was performed with the junctions of inversions and translocations, by calculating the transcript coverage in the region 5 bp upand down-stream of the predicted start and end junctions. These rearrangements are slightly biased away from genes (P = 1.9x1-3 ), but not significantly biased away from essential genes (P >.5). The null distributions were determined by selecting 1 regions for each actual variant/junction that were the same size, and were placed in random s on the same chromosome and calculating the gene coverage of these regions. Essential genes were those with the Fission Yeast Phenotype Ontology term defined as FYPO:261 ( inviable ) in PomBase.

DUP.I:12161..13 length: 84 DUP.II:5681..698 length: 13 DUP.II:1671..1716 length: 46 2 4 6 2 4 6 2 4 6 8 115 125 135 5e+5 6e+5 7e+5 8e+5 162 166 17 174 DUP.II:3241..326 length: 2 DUP.II:21161..2134 length: 18 DUP.III:18381..1934 length: 96 2 4 6 8 2 4 6 2 4 6 322 324 326 328 21 212 214 175 185 195 DUP.III:2121..258 length: 46 DUP.III:2741..286 length: 12 2 4 6 8 1 12 2 4 6 8 1 18 22 26 3 265 275 285 295 Supplementary Figure 3. Duplications that segregate within closely related strains. Plots show the average coverage in 1 kb non-overlapping windows for strains with a duplication (red) and all closely related strains without duplication (green); all these strains differ by <15 SNPs. The coverage of the two standard (h + and h - ) is shown in black. Top row, from left: variant DUP.I:12161..13 (cluster 12, from Japan in 57), DUP.II:5681..698 (cluster 12), DUP.II:1671..1716 (cluster 2, unknown origin), second row DUP.II:3241..326 (cluster 2), DUP.II:21161..2134 (cluster 1, includes reference strain from French grapes in 1947), DUP.III:18381..1934 (cluster 2, various locations 1921-22). Bottom row: DUP.III:2121..258 (cluster 6, Jamaica/USA), and DUP.III:2741..286 (cluster 5, Sicily 1966). Genes are shown on top of plots with exons as red rectangles and retrotransposon LTRs as blue rectangles.

(a) Normalised standard deviation of copy number (b) Normalised standard deviation of copy number R =.15 p =.18 Normalised branch length of SNP-based tree R =.9 p <.1 Normalised branch length of CNV NJ tree (c) DUP.I.551..556 DUP.I.2951..319 DUP.III.275493..284754 DUP.III.2741..286 DUP.III.1961..28 DUP.III.2661..288 DUP.III.2341..254 DUP.III.21461..2158 DEL.III.14727..149847 DEL.III.16921..176 DUP.III.2461..278 DUP.III.1171..1182 DUP.III.2221..24 DUP.III.2141..286 DUP.III.2121..258 DUP.III.221..3 DUP.III.2241..264 DUP.III.2781..32 DUP.III.1174378..118356 DUP.III.1841..1934 DEL.III.16881..172 DEL.III.41..52 DUP.III.2361..256 DUP.III.1893747..192584 DUP.III.431..488 DUP.III.6661..682 DUP.III.2181..24 DUP.III.2221..296 DUP.III.18381..1934 DUP.III.16881..174 DUP.III.21481..2164 DUP.I.54461..5564 DUP.I.321737..32781 DUP.I.16167..164225 DEL.I.581..98 DUP.I.2931..2942 DEL.I.5531..5544 DUP.I.1561..168 DUP.I.41841..4242 DUP.I.481..68 DUP.I.55461..5562 DEL.I.55241..5544 DUP.I.5452412..5459987 DUP.I.55441..5562 DEL.I.54281..544 DEL.I.481..98 DUP.I.55421..5562 DUP.I.154355..172511 DEL.I.5531927..5534736 DUP.I.54481..546 DEL.I.71..92 DUP.I.49121..494 DUP.I.3341..358 DUP.I.5561..5579132 DUP.I.29221..2962 DUP.I.292484..29232 DEL.I.5481999..5488369 DUP.I.27541..28 DUP.I.3321..321 DUP.I.12161..13 DUP.MT.1..19382 DUP.II.21161..2134 DEL.II.8895..12587 DEL.II.35622..359875 DUP.II.24221..2436 DEL.II.44821..4494 DUP.II.1671..1716 DUP.II.44741..4492 DEL.II.4481..4492 DUP.II.1117112..1121275 DUP.II.7181..862 DUP.II.4431..449 DUP.II.14361..1456 DUP.II.21..56 DUP.II.5681..698 DUP.II.1..56 DEL.II.36119..3613878 DUP.II.44821..4494 DEL.II.32659..3633 DUP.II.3241..326 DEL.II.1421..156 DEL.II.581..11 DUP.II.241..44 DEL.II.131..164 DEL.II.44381..4466 DEL.II.44761..4492 DUP.II.32361..326 1 2 3 4 5 6 7 8 9 111121314151617181922122232425 cluster value 12.5 1. 7.5 5. 2.5 Supplementary Figure 4. Relative standard deviation of copy number variation within clusters. (a) Standard deviation of copy number for a CNV across the dataset is only weakly correlated with the total branch length of SNP-based phylogeny from the 2kb up- and down-stream phylogeny. (b) Standard deviation is highly correlated with the branch length of a CNV-based neighbour-joining tree. (c) The relative standard deviation of each CNV within each identified cluster of strains (<15 SNPs apart) relative to its change in rest of the dataset. For clarity, all relative standard deviations <1 are shown as white.

all clusters clusters with 1 15 strains 1 2 3 4 5 1 2 3 4 cluster non ref allele freq cluster non ref allele freq DUP all clusters DEL all clusters 1 2 3 4 2 4 6 8..2.4.6.8 cluster non ref allele freq.5.1.15.2.25.3.35.4 cluster non ref allele freq INV all clusters (non REF) TRA all clusters (non REF) 1 2 3 4 5 6 2 4 6 8 1 cluster non ref allele freq cluster non ref allele freq INV all clusters (MAF) TRA all clusters (MAF) 2 4 6 8 1 2 4 6 8 12.1.2.3.4.5 cluster minor allele freq.1.2.3.4.5 cluster minor allele freq Supplementary Figure 5. Copy number variants are usually rare alleles within clonal populations. Clonal clusters, or clonal populations all differ by < 15 SNPs. In rows, from top left; we show the within-cluster frequency of the non-reference allele for all SVs, which is skewed to rare alleles. Limiting this analysis to cluster with 1 to 15 strains highlights the low frequency of non-reference alleles. Second row; CNVs (duplications and deletions) are skewed to rare alleles, because the non-reference allele is usually the derived allele. Third row; inversions and translocations are not skewed to the non-reference allele, but here non-reference alleles are not necessarily the derived allele. Bottom row; the minor allele of inversions and translocations, however, is skewed to rare alleles.

Supplementary Figure 6. Chromosome-scale view of gene expression changes. The relative gene expression levels (strain1/strain2) for arrays 2 and 3, and arrays 7 and 8 are shown with their s on the three chromosomes. Filled circles indicates genes that we consider to be up-regulated (red) or repressed (green). Those highlighted with open red circles are consistently altered in both arrays (either 2+3, or 7+8). The blue lines show where the segregating duplications are. Box plots at right show the spread of data.

array 1 DUP.II.5681.698 within DUP, P= 9.5e 23 near DUP, P=.42 array 1 DUP.I.12161.13 within DUP, P= 8.6e 21 near DUP, P=.77 array 2 DUP.II.1671.1716 within DUP, P= 7.2e 16 near DUP, P=.16 array 2 DUP.II.3241.326 within DUP, P= 3.3e 6 near DUP, P=.93 2 1 1 2 1 1 2 2 1 1 2 2 1 1 2 3 5 6 7 8 115 125 135 16 17 18 315 32 325 33 335 2 1 1 2 array 3 DUP.II.1671.1716 within DUP, P= 1.4e 13 near DUP, P=.33 2 1 1 2 array 3 DUP.II.3241.326 within DUP, P= 3.6e 6 near DUP, P=.83 2. 1.5 1..5..5 1. 1.5 array 4 DUP.II.21161.2134 within DUP, P=.99 near DUP, P=.5 2 1 1 2 3 4 array 5 DUP.III.18381.1934 within DUP, P= 4.1e 17 near DUP, P=.71 16 17 18 315 32 325 33 335 25 21 215 22 175 185 195 2 1 1 2 array 6 DUP.III.2121.258 within DUP, P= 6.2e 5 near DUP, P=.64 2 1 1 2 3 array 7 DUP.III.2741.286 within DUP, P=.13 near DUP, P=.86 2 1 1 2 3 array 8 DUP.III.2741.286 within DUP, P= 7.4e 5 near DUP, P=.99 expression ratio 1 2 3 4 5 6 all duplications Spearman rank P=.14 15 2 25 3 35 2 25 3 35 2 25 3 35 1 2 3 4 5 6 copy number ratio Supplementary Figure 7. No significant increase in gene expression immediately adjacent to duplications. For each duplication examined with DNA arrays, we show the relative expression (strain 1 vs strain 2) near the duplication. P-values show the support for the genes within the duplication (red vertical lines), or the 5 kb adjacent to the duplication (green vertical lines) being more highly expressed than all other genes in the chromosome (one-sided Wilcoxon rank sum tests). The grey horizontal lines show the 5 th, 5 th and 95 th percentiles for gene expression data on the chromosome. The bottom right panel shows that the median increase in expression level within a duplication correlates with the increase in genomic copy number. The solid back line shows the expected increase for the 1:1 correspondence between genomic copy number and relative expression (the line y=x), and the dashed line shows the linear model for the data. Copy number and relative expression change are correlated (Spearman rank correlation ρ =.71 and P =.14).

heritability 1. h contribution.8.6.4.2. SNP REA CNV real data heritability 1..8.6.4.2. simulated: SNPs only determine traits heritability 1..8.6.4.2. simulated: CNV only determine traits heritability 1..8.6.4.2. simulated: REA only determine traits Supplementary Figure 8. Contributions of SNPs, CNVs and rearrangements to traits. Top panel: for 227 traits, we show the total heritability estimated by the combination of 243,289 SNPs (green), 87 CNVs (red), and 26 rearrangements (grey). We then simulated data that was entirely due to the effects of SNPs (second panel), entirely due to the effects of CNVs (next panel) or entirely due to the effects of rearrangements (lower). In the second panel (entirely due to the effects of SNPs), the contribution of CNVs and rearrangements are artefacts, but these are relatively minor. This analysis indicates that the estimates are not strongly affected by linkage.

viable offspring (%) 2 4 6 8 τ =.35 P=.16 Jeffares 215 Avelar 213 1 2 3 4 SNP distance (% unshared SNPs) viable offspring (%) 2 4 6 8 τ =.26 P=.56 5 1 15 2 SV distance (unshared variants) viable offspring (%) 2 4 6 8 τ =.36 P= 2e 4 2 4 6 8 INV/TRA distance (unshared variants) viable offspring (%) 2 4 6 8 τ =.15 P=.11 5 1 15 2 CNV distance (unshared variants) viable offspring (%) 2 4 6 8 τ =.15 P=.11 2 4 6 8 merged CNV distance (unshared variants) SV distance 1 2 3 4 τ =.6 P= 2.1e 11 5 1 15 2 SNP distance Supplementary Figure 9. Correlations between spore viability, parental SNP-genetic distance and parental SV-genetic distance. Spore viability was measured for 58 crosses in total, including data from both Jeffares, et al. 1 (black) and Avelar, et al. 2 (red), with each circle representing one cross. Clockwise from top left plots show correlation of spore viability with various measures of genetic distances between parents; SN distance, all SV distances, all rearrangement differences (inversions/translocations), unmerged CNV distances, merged CNV distances, and finally the relationship between SNP distance and SV distance. Unmerged CNV differences count any CNV as being different between parents when either start or end coordinates are more than 1 kb apart. Because this definition can cause us to count largely overlapping events as different, we also counted merged differences where two CNVs were considered different only if their overlap was >5% of the total of both variants. This approach will exclude nested CNVs. CNV-genetic distance is not significantly correlated with viability in either case.

Supplementary Figure 1. SVs are enriched close to retrotransposon LTRs. For all SVs, we computed the closest distance of start or end coordinates to any LTR discovered previously 1. As a control, we compute the closest distance of 1 random coordinates on the same chromosome. Left: the distributions of distances for real SVs (grey), those that are within 2nt (black) or random coordinates (red). Right: using the same analysis, we show the closest distance of real SVs (CL: real), and random coordinates (C: random). We also show that both start and end coordinates of SVs (ST:real, EN:real) are closer than random s (POS:random). all SVs nearest LTR (nt) Density 5 1 15..5.1.15 Wilcox test P = 1.3e 25 CL:real CL:random ST:real EN:real POS:random 1 2 3 4 5

Supplementary Figure 11. Chromosome-scale read coverage plots for three chromosomes of strain JB127. Coverage is calculated relative to the reference strain (JB22 in our collection). Two large duplications that did not satisfy the criteria used to detect CNVs with cn.mops are indicated with blue arrows (DUP.I:2951..319, 24kb and DUP.I:551..556, 51kb).

Supplementary References 1 Jeffares, D. C. et al. The genomic and phenotypic diversity of Schizosaccharomyces pombe. Nat Genet 47, 235-241, doi:1.138/ng.3215 (215). 2 Avelar, A. T., Perfeito, L., Gordo, I. & Ferreira, M. G. Genome architecture is a selectable trait that can be maintained by antagonistic pleiotropy. Nat Commun 4, 2235, doi:1.138/ncomms3235 (213).