Supplementary Information Supplementary Figures.8 57 essential gene density 2 1.5 LTR insert frequency diversity DEL.5 DUP.5 INV.5 TRA 1 2 3 4 5 1 2 3 4 1 2 Supplementary Figure 1. Locations and minor allele frequencies of all structural variants in curated data set. Each of the three chromosomes is indicated by black bar, with scale (in megabases) at bottom. From top (same data as Fig 1): density of essential genes (blue), locations of Tf-type retrotransposons (green), and diversity (π, average pairwise diversity from SNPs, purple). Bar heights for deletions and duplications are proportional to minor allele frequency, the scale for retrotransposons is the frequency of the insertion in the 57 non-clonal strains. Diversity and retrotransposon were calculated from 57 non-clonal strains as described in Jeffares, et al. 1. Below, we show different types of SVs: deletions (black), duplications (red), inversions (green) and translocations (blue). The vertical lines terminating with open circles above dotted lines emit from the mid-point of each SV and indicate the minor allele frequencies in the population of 161 strains.
distance from c/s end gene overlap: DUP distance from chromosome end 5 15 25 proportion transcript coverage all genes essential CNV rand REA rand DUP rand DUP rand gene overlap: DEL gene overlap: INV/TRA proportion transcript coverage proportion transcript coverage DEL rand DEL rand IN/TR rand IN/TR rand Supplementary Figure 2. Structural variations are biased towards chromosome ends and to low gene density regions. Top left panel, both CNVs and rearrangements are biased towards the ends of chromosomes. CNVs; median distance to chromosome ends 236 kb compared to chromosome- and sizematched random sites 944 kb, Wilcoxon rank sum test P = 1.3 x 1-11, rearrangements median distance 569 kb vs, matched random 863 kb, Wilcoxon test P =.3). All other panels, calculated proportion of each duplication and deletion that contained all protein-coding or essential genes. Box plots show the distributions of these proportions for all genes (grey), and proportion of coverage by essential genes (red), compared to the null distribution (rand). All comparisons were significantly less than the null distributions (Wilcoxon rank sum test, P-values <1.6 x 1-4 ). The same analysis was performed with the junctions of inversions and translocations, by calculating the transcript coverage in the region 5 bp upand down-stream of the predicted start and end junctions. These rearrangements are slightly biased away from genes (P = 1.9x1-3 ), but not significantly biased away from essential genes (P >.5). The null distributions were determined by selecting 1 regions for each actual variant/junction that were the same size, and were placed in random s on the same chromosome and calculating the gene coverage of these regions. Essential genes were those with the Fission Yeast Phenotype Ontology term defined as FYPO:261 ( inviable ) in PomBase.
DUP.I:12161..13 length: 84 DUP.II:5681..698 length: 13 DUP.II:1671..1716 length: 46 2 4 6 2 4 6 2 4 6 8 115 125 135 5e+5 6e+5 7e+5 8e+5 162 166 17 174 DUP.II:3241..326 length: 2 DUP.II:21161..2134 length: 18 DUP.III:18381..1934 length: 96 2 4 6 8 2 4 6 2 4 6 322 324 326 328 21 212 214 175 185 195 DUP.III:2121..258 length: 46 DUP.III:2741..286 length: 12 2 4 6 8 1 12 2 4 6 8 1 18 22 26 3 265 275 285 295 Supplementary Figure 3. Duplications that segregate within closely related strains. Plots show the average coverage in 1 kb non-overlapping windows for strains with a duplication (red) and all closely related strains without duplication (green); all these strains differ by <15 SNPs. The coverage of the two standard (h + and h - ) is shown in black. Top row, from left: variant DUP.I:12161..13 (cluster 12, from Japan in 57), DUP.II:5681..698 (cluster 12), DUP.II:1671..1716 (cluster 2, unknown origin), second row DUP.II:3241..326 (cluster 2), DUP.II:21161..2134 (cluster 1, includes reference strain from French grapes in 1947), DUP.III:18381..1934 (cluster 2, various locations 1921-22). Bottom row: DUP.III:2121..258 (cluster 6, Jamaica/USA), and DUP.III:2741..286 (cluster 5, Sicily 1966). Genes are shown on top of plots with exons as red rectangles and retrotransposon LTRs as blue rectangles.
(a) Normalised standard deviation of copy number (b) Normalised standard deviation of copy number R =.15 p =.18 Normalised branch length of SNP-based tree R =.9 p <.1 Normalised branch length of CNV NJ tree (c) DUP.I.551..556 DUP.I.2951..319 DUP.III.275493..284754 DUP.III.2741..286 DUP.III.1961..28 DUP.III.2661..288 DUP.III.2341..254 DUP.III.21461..2158 DEL.III.14727..149847 DEL.III.16921..176 DUP.III.2461..278 DUP.III.1171..1182 DUP.III.2221..24 DUP.III.2141..286 DUP.III.2121..258 DUP.III.221..3 DUP.III.2241..264 DUP.III.2781..32 DUP.III.1174378..118356 DUP.III.1841..1934 DEL.III.16881..172 DEL.III.41..52 DUP.III.2361..256 DUP.III.1893747..192584 DUP.III.431..488 DUP.III.6661..682 DUP.III.2181..24 DUP.III.2221..296 DUP.III.18381..1934 DUP.III.16881..174 DUP.III.21481..2164 DUP.I.54461..5564 DUP.I.321737..32781 DUP.I.16167..164225 DEL.I.581..98 DUP.I.2931..2942 DEL.I.5531..5544 DUP.I.1561..168 DUP.I.41841..4242 DUP.I.481..68 DUP.I.55461..5562 DEL.I.55241..5544 DUP.I.5452412..5459987 DUP.I.55441..5562 DEL.I.54281..544 DEL.I.481..98 DUP.I.55421..5562 DUP.I.154355..172511 DEL.I.5531927..5534736 DUP.I.54481..546 DEL.I.71..92 DUP.I.49121..494 DUP.I.3341..358 DUP.I.5561..5579132 DUP.I.29221..2962 DUP.I.292484..29232 DEL.I.5481999..5488369 DUP.I.27541..28 DUP.I.3321..321 DUP.I.12161..13 DUP.MT.1..19382 DUP.II.21161..2134 DEL.II.8895..12587 DEL.II.35622..359875 DUP.II.24221..2436 DEL.II.44821..4494 DUP.II.1671..1716 DUP.II.44741..4492 DEL.II.4481..4492 DUP.II.1117112..1121275 DUP.II.7181..862 DUP.II.4431..449 DUP.II.14361..1456 DUP.II.21..56 DUP.II.5681..698 DUP.II.1..56 DEL.II.36119..3613878 DUP.II.44821..4494 DEL.II.32659..3633 DUP.II.3241..326 DEL.II.1421..156 DEL.II.581..11 DUP.II.241..44 DEL.II.131..164 DEL.II.44381..4466 DEL.II.44761..4492 DUP.II.32361..326 1 2 3 4 5 6 7 8 9 111121314151617181922122232425 cluster value 12.5 1. 7.5 5. 2.5 Supplementary Figure 4. Relative standard deviation of copy number variation within clusters. (a) Standard deviation of copy number for a CNV across the dataset is only weakly correlated with the total branch length of SNP-based phylogeny from the 2kb up- and down-stream phylogeny. (b) Standard deviation is highly correlated with the branch length of a CNV-based neighbour-joining tree. (c) The relative standard deviation of each CNV within each identified cluster of strains (<15 SNPs apart) relative to its change in rest of the dataset. For clarity, all relative standard deviations <1 are shown as white.
all clusters clusters with 1 15 strains 1 2 3 4 5 1 2 3 4 cluster non ref allele freq cluster non ref allele freq DUP all clusters DEL all clusters 1 2 3 4 2 4 6 8..2.4.6.8 cluster non ref allele freq.5.1.15.2.25.3.35.4 cluster non ref allele freq INV all clusters (non REF) TRA all clusters (non REF) 1 2 3 4 5 6 2 4 6 8 1 cluster non ref allele freq cluster non ref allele freq INV all clusters (MAF) TRA all clusters (MAF) 2 4 6 8 1 2 4 6 8 12.1.2.3.4.5 cluster minor allele freq.1.2.3.4.5 cluster minor allele freq Supplementary Figure 5. Copy number variants are usually rare alleles within clonal populations. Clonal clusters, or clonal populations all differ by < 15 SNPs. In rows, from top left; we show the within-cluster frequency of the non-reference allele for all SVs, which is skewed to rare alleles. Limiting this analysis to cluster with 1 to 15 strains highlights the low frequency of non-reference alleles. Second row; CNVs (duplications and deletions) are skewed to rare alleles, because the non-reference allele is usually the derived allele. Third row; inversions and translocations are not skewed to the non-reference allele, but here non-reference alleles are not necessarily the derived allele. Bottom row; the minor allele of inversions and translocations, however, is skewed to rare alleles.
Supplementary Figure 6. Chromosome-scale view of gene expression changes. The relative gene expression levels (strain1/strain2) for arrays 2 and 3, and arrays 7 and 8 are shown with their s on the three chromosomes. Filled circles indicates genes that we consider to be up-regulated (red) or repressed (green). Those highlighted with open red circles are consistently altered in both arrays (either 2+3, or 7+8). The blue lines show where the segregating duplications are. Box plots at right show the spread of data.
array 1 DUP.II.5681.698 within DUP, P= 9.5e 23 near DUP, P=.42 array 1 DUP.I.12161.13 within DUP, P= 8.6e 21 near DUP, P=.77 array 2 DUP.II.1671.1716 within DUP, P= 7.2e 16 near DUP, P=.16 array 2 DUP.II.3241.326 within DUP, P= 3.3e 6 near DUP, P=.93 2 1 1 2 1 1 2 2 1 1 2 2 1 1 2 3 5 6 7 8 115 125 135 16 17 18 315 32 325 33 335 2 1 1 2 array 3 DUP.II.1671.1716 within DUP, P= 1.4e 13 near DUP, P=.33 2 1 1 2 array 3 DUP.II.3241.326 within DUP, P= 3.6e 6 near DUP, P=.83 2. 1.5 1..5..5 1. 1.5 array 4 DUP.II.21161.2134 within DUP, P=.99 near DUP, P=.5 2 1 1 2 3 4 array 5 DUP.III.18381.1934 within DUP, P= 4.1e 17 near DUP, P=.71 16 17 18 315 32 325 33 335 25 21 215 22 175 185 195 2 1 1 2 array 6 DUP.III.2121.258 within DUP, P= 6.2e 5 near DUP, P=.64 2 1 1 2 3 array 7 DUP.III.2741.286 within DUP, P=.13 near DUP, P=.86 2 1 1 2 3 array 8 DUP.III.2741.286 within DUP, P= 7.4e 5 near DUP, P=.99 expression ratio 1 2 3 4 5 6 all duplications Spearman rank P=.14 15 2 25 3 35 2 25 3 35 2 25 3 35 1 2 3 4 5 6 copy number ratio Supplementary Figure 7. No significant increase in gene expression immediately adjacent to duplications. For each duplication examined with DNA arrays, we show the relative expression (strain 1 vs strain 2) near the duplication. P-values show the support for the genes within the duplication (red vertical lines), or the 5 kb adjacent to the duplication (green vertical lines) being more highly expressed than all other genes in the chromosome (one-sided Wilcoxon rank sum tests). The grey horizontal lines show the 5 th, 5 th and 95 th percentiles for gene expression data on the chromosome. The bottom right panel shows that the median increase in expression level within a duplication correlates with the increase in genomic copy number. The solid back line shows the expected increase for the 1:1 correspondence between genomic copy number and relative expression (the line y=x), and the dashed line shows the linear model for the data. Copy number and relative expression change are correlated (Spearman rank correlation ρ =.71 and P =.14).
heritability 1. h contribution.8.6.4.2. SNP REA CNV real data heritability 1..8.6.4.2. simulated: SNPs only determine traits heritability 1..8.6.4.2. simulated: CNV only determine traits heritability 1..8.6.4.2. simulated: REA only determine traits Supplementary Figure 8. Contributions of SNPs, CNVs and rearrangements to traits. Top panel: for 227 traits, we show the total heritability estimated by the combination of 243,289 SNPs (green), 87 CNVs (red), and 26 rearrangements (grey). We then simulated data that was entirely due to the effects of SNPs (second panel), entirely due to the effects of CNVs (next panel) or entirely due to the effects of rearrangements (lower). In the second panel (entirely due to the effects of SNPs), the contribution of CNVs and rearrangements are artefacts, but these are relatively minor. This analysis indicates that the estimates are not strongly affected by linkage.
viable offspring (%) 2 4 6 8 τ =.35 P=.16 Jeffares 215 Avelar 213 1 2 3 4 SNP distance (% unshared SNPs) viable offspring (%) 2 4 6 8 τ =.26 P=.56 5 1 15 2 SV distance (unshared variants) viable offspring (%) 2 4 6 8 τ =.36 P= 2e 4 2 4 6 8 INV/TRA distance (unshared variants) viable offspring (%) 2 4 6 8 τ =.15 P=.11 5 1 15 2 CNV distance (unshared variants) viable offspring (%) 2 4 6 8 τ =.15 P=.11 2 4 6 8 merged CNV distance (unshared variants) SV distance 1 2 3 4 τ =.6 P= 2.1e 11 5 1 15 2 SNP distance Supplementary Figure 9. Correlations between spore viability, parental SNP-genetic distance and parental SV-genetic distance. Spore viability was measured for 58 crosses in total, including data from both Jeffares, et al. 1 (black) and Avelar, et al. 2 (red), with each circle representing one cross. Clockwise from top left plots show correlation of spore viability with various measures of genetic distances between parents; SN distance, all SV distances, all rearrangement differences (inversions/translocations), unmerged CNV distances, merged CNV distances, and finally the relationship between SNP distance and SV distance. Unmerged CNV differences count any CNV as being different between parents when either start or end coordinates are more than 1 kb apart. Because this definition can cause us to count largely overlapping events as different, we also counted merged differences where two CNVs were considered different only if their overlap was >5% of the total of both variants. This approach will exclude nested CNVs. CNV-genetic distance is not significantly correlated with viability in either case.
Supplementary Figure 1. SVs are enriched close to retrotransposon LTRs. For all SVs, we computed the closest distance of start or end coordinates to any LTR discovered previously 1. As a control, we compute the closest distance of 1 random coordinates on the same chromosome. Left: the distributions of distances for real SVs (grey), those that are within 2nt (black) or random coordinates (red). Right: using the same analysis, we show the closest distance of real SVs (CL: real), and random coordinates (C: random). We also show that both start and end coordinates of SVs (ST:real, EN:real) are closer than random s (POS:random). all SVs nearest LTR (nt) Density 5 1 15..5.1.15 Wilcox test P = 1.3e 25 CL:real CL:random ST:real EN:real POS:random 1 2 3 4 5
Supplementary Figure 11. Chromosome-scale read coverage plots for three chromosomes of strain JB127. Coverage is calculated relative to the reference strain (JB22 in our collection). Two large duplications that did not satisfy the criteria used to detect CNVs with cn.mops are indicated with blue arrows (DUP.I:2951..319, 24kb and DUP.I:551..556, 51kb).
Supplementary References 1 Jeffares, D. C. et al. The genomic and phenotypic diversity of Schizosaccharomyces pombe. Nat Genet 47, 235-241, doi:1.138/ng.3215 (215). 2 Avelar, A. T., Perfeito, L., Gordo, I. & Ferreira, M. G. Genome architecture is a selectable trait that can be maintained by antagonistic pleiotropy. Nat Commun 4, 2235, doi:1.138/ncomms3235 (213).