Supplementary Figure 1 Somatic coding mutations identified by WES/WGS for 83 ATL cases. (a) The percentage of targeted bases covered by at least 2, 10, 20 and 30 sequencing reads (top) and average read depth (bottom) are shown for 81 paired tumor (T) and normal (N) WES samples. (b) Mutational signature identified in 81 WES cases, showing predominant age-related C>T transitions. (c) Correlation of the number of coding mutations (including synonymous SNVs) with patient age in 83 WES/WGS cases. (d g) All panels are aligned, with the vertical tracks representing 83 WES/WGS cases. The data are sorted by number of coding mutations (d) and disease subtype (f). The relative frequency of nucleotide substitutions (e), a heat map showing the distribution of mutations in significantly mutated genes (q < 0.1) identified by the MutSigCV algorithm (f) and the variant allele frequency (VAF) of mutations located in diploid regions (g) are depicted.
Supplementary Figure 2 Results of WGS for 48 ATL cases. (a) The percentage of targeted bases covered by at least 2, 10, 20 and 30 sequencing reads (top) and average read depth (bottom) are shown for 11 full-pass WGS and 37 low-pass WGS samples. (b) Venn diagram showing the number of coding mutations identified in WES and full-pass WGS (left), the correlation of VAFs for coding mutations between WES and full-pass WGS (middle) and comparison of the VAFs for coding mutations detected by both WES and full-pass WGS, only WES and only full-pass WGS (right) are shown (n = 9). (c) Number and type of somatic SNVs and indels detected by full-pass WGS (n = 11). (d) Rainfall plots represent the intermutational distances on the indicated chromosome, for which several genomic regions with localized hypermutations are shown. (e) APOBEC3B and APOBEC3G expression for CD4 + T cells from healthy controls (n = 3), mononuclear cells from HTLV-1 carriers (n = 3) and tumor cells from ATL cases (n = 57). (f) Mutational signature identified in 11 full-pass WGS cases, showing predominant agerelated C>T transitions. (g) Number and type of SVs detected by full-pass WGS (n = 11) and low-pass WGS (n = 37).
Supplementary Figure 3 Frequent deletions within known fragile sites in ATL. (a) Genome-wide distribution of deletion breakpoints in cases undergoing full-pass WGS (n = 11; top) and low-pass WGS (n = 37; bottom), showing very frequent deletions in known fragile sites. The y axis represents the frequency of cases with deletion breakpoints. (b) Frequent deletions in known fragile sites detected by SNP array analysis for 426 ATL cases. Segmented copy number data are shown. Each row represents a patient; deleted regions are shown in blue. (c) Comparison across different hematological malignancies of the frequency of cases showing copy number breakpoints within the indicated fragile sites or genes as detected by SNP array karyotyping. T-ALL, T cell acute lymphoblastic leukemia; B-ALL, B cell acute lymphoblastic leukemia; CLL, chronic lymphocytic leukemia; DLBCL, diffuse large B cell lymphoma; MCL, mantle cell lymphoma; FL, follicular lymphoma; MALT, mucosa-associated lymphoid tissue lymphoma; AML, acute myeloid leukemia; CML, chronic myeloid leukemia; MDS, myelodysplastic syndrome; CMML, chronic myelomonocytic leukemia; MPN, myeloproliferative neoplasm. SNP array data were obtained from GEO (GSE15187, GSE47682 and GSE12906) or our in-house database. (d) Multiple deletions within known fragile sites detected by full-pass and lowpass WGS (n = 48). Different colors represent different cases.
Supplementary Figure 4 HTLV-1 integration and expression. (a) HTLV-1 integration sites (blue boxes; n = 62) detected by full-pass (n = 11) and low-pass (n = 37) WGS. (b) Clonal structure of ATL in five representative cases. The allele frequencies of driver mutations and HTLV-1 integrations are shown in filled circles and red bars, respectively. Only mutations residing in copy number neutral segments are presented. (c) RNA-seq data are visualized with IGV, in which cumulative numbers of sequencing reads are displayed along genomic positions. Additional representative cases (ATL009, ATL011, ATL014 and ATL019) showing antisense-predominant HTLV-1 transcription found in most ATL cases are presented. Abnormal tax transcripts with deletions are indicated by asterisks. Boxes show ORFs. (d) Ratio of antisense to sense transcripts in 57 cases analyzed by RNA-seq, evaluated in the two regions in px showing unidirectional transcription. (e) Box plots (median and interquartile values) of HTLV-1 gene expression levels across 57 ATL cases. FPKM was calculated for each region where HTLV-1 genes were located. (f) Expression levels of read-through transcripts at 53 integration sites in 41 cases with WGS and RNA-seq data. (g) Frequency of integrations showing aberrantly spliced transcripts depending on the orientation of transcription for the cellular gene with respect to the HTLV-1 genome. (h) Summary of gene expression levels for 23 genes located adjacent to HTLV-1 integrations. Log 2 (FPKM + 1) was converted to a z score and plotted according to the orientation of transcription for the cellular gene with respect to the HTLV-1 genome. (i) The expression of 12 sense-oriented (left) and 11 antisense-oriented (right) genes found to be expressed in close proximity to a viral integration site was measured in 57 ATL cases analyzed by RNA-seq. Expression in the tumor sample having the relevant integration site for each gene is highlighted in red. The presence (+) or absence ( ) of aberrantly spliced fusions between the cellular and viral genomes is shown (bottom). (j) Transcripts from around the viral integration site in the C14orf159 locus in a representative case (ATL016), as compared to those in a control (ATL017). Read-through antisense transcripts transcribed from the 5 LTR into the juxtaposed cellular genome, a fusion transcript between R in the 3 LTR and exon 10 of C14orf159, and a fusion transcript between HBZ and exon 9 of C14orf159 are shown with fused sequences.
Supplementary Figure 5 Recurring somatic mutations detected by targeted capture sequencing for 370 ATL cases. (a) Comparison of WES/WGS (n = 83) and targeted capture sequencing (n = 370), showing a comparable frequency for 50 significant somatic mutations. (b) Box plots (median and interquartile values) of expression levels (log 2 (FPKM + 1)) for 50 significantly mutated genes across 57 ATL cases. Genes with FPKM <1 were filtered out from the list of significantly mutated genes. (c) Number of cases with copy number gain, copy number loss or UPD coexisting with gain-of-function (left) and loss-of-function (right) mutations of the indicated genes in 370 ATL cases. (d) VAFs of CARD11, IRF4, FAS, TP53, TBL1XR1, HLA-B, CD58 and GPR183 mutations in ATL cases with or without copy number gain or loss or UPD, indicating preferential amplification of mutant alleles. (e) The locations and types of somatic mutations in significantly mutated genes. NCBI protein reference sequences are shown in Supplementary Table 24. Nonsense, frameshift and splice-site mutations were distributed across the entire gene for TBL1XR1, CD58, POT1, IRF2BP2, EP300, CSNK2B and CSNK2A1, suggesting a loss-of-function nature for these mutations. Site-specific mutations were observed in NOTCH1, suggesting a gain-of-function nature for these mutations. Site-specific mutations were also found in CSNK1A1, including a known dominant-negative mutation encoding p.asp136asn in one case.
Supplementary Figure 6 CNVs detected by SNP array karyotyping for 426 ATL cases. (a) Comparison of Affymetrix 250K (n = 282) and Illumina 610K SNP arrays (n = 144) showing a comparable frequency of significantly altered regions. (b) Frequency of arm-level gains and losses in 426 ATL cases. Chromosome arms with estimated copy number 2.5 were considered to represent copy number gain, whereas those with estimated copy number <1.5 were considered to represent copy number loss. (c) The heat map shows somatic CNVs in each tumor (horizontal axis) plotted by chromosomal location (vertical axis). Unsupervised hierarchical clustering was performed with Manhattan distance and Ward s linkage algorithm. (d) Significant focal amplifications and deletions detected by GISTIC 2.0 analysis. Segmented copy number data from SNP arrays are shown. Each row represents a patient; amplified and deleted regions are shown in red and blue, respectively.
Supplementary Figure 7 The landscape of somatic mutations and CNVs in ATL. Experimental platforms, ploidy, disease subtype and CD28 fusion status (top), somatic mutations in significantly mutated genes (middle), and significant focal and broad CNVs (bottom) are shown across samples (n = 370).
Supplementary Figure 8 Deregulated functional pathways in ATL. (a) Major driver alterations, including mutations, CNVs and SVs (asterisks), are summarized according to their functionalities. Alteration frequencies are expressed as the percentage of examined cases with the alteration; 370 cases were analyzed for mutations, except for genes examined only by WES/WGS (83 cases; crosses) and 426 cases were analyzed for CNVs. Components of the Tax interactome are highlighted by red boxes. (b) Significant enrichments of the antigen presentation pathway in GSEA of expression data comparing ATL cases (n = 57) to healthy controls (n = 3) and HTLV-1 carriers (n = 3).
Supplementary Figure 9 Biological significance of PRKCB and CARD11 mutations. (a) Amino acid sequence alignment of the Homo sapiens PKCβ protein with those from various other organisms and other Homo sapiens PKCs using the ClustalW algorithm. The mutation and evolutionarily conserved sites are shown in red and blue, respectively. (b) Immunoblot of PKCβ and/or phosphorylated PKCβ expression in HEK293T (left) and Jurkat (right) cells expressing WT or Asp427Asn PKCβ. Blots representative of at least three independent experiments are shown. (c) Immunoblot analysis of PKCβ was performed for HEK293T cells transduced with the indicated PKCβ mutants after exposure to PMA and ionomycin for 15 min. Cytosolic PKCβ and β-tubulin protein levels were quantified using Photoshop software (n = 3). Data represent means ± s.d. *P < 0.05, Student s t test. (d) Box plots (median and interquartile values) of the relative expression levels of CARD11 exon 15 and CARD11 exon 1 in cases with or without CARD11 intragenic deletion. Student s t test was performed to compare expression (FPKM). (e) RNA-seq data are visualized with IGV for representative cases with (ATL026) and without (ATL027) an intragenic deletion of CARD11 that results in skipping of exons 15 17. The cumulative numbers of sequencing reads (depths) and junctional reads are shown along genomic positions. (f) Immunoblot of CARD11 expression in HEK293T cells expressing WT or Glu626Lys CARD11. Blots representative of three independent experiments are shown. (g,h) HEK293T (g) and Jurkat (h) cells expressing Glu626Lys CARD11 showed augmented NF- B transcription, as compared with cells expressing WT CARD11, in luciferase assays (n = 3). Data represent means ± s.d. *P < 0.05, **P < 0.005, ***P < 0.0005, Student s t test.
Supplementary Figure 10 Biological significance of CCR4 and CCR7 mutations. (a,b) Chemotaxis induced by CCL22 (a) or CCL19 (b) in B300-19 cells (a mouse pre-b cell line) expressing WT or Tyr331* CCR4 (a) or WT or Trp355* CCR7 (b) in Transwell assays (n = 3). The number of viable cells was assessed using CellTiter-Glo assays and normalized to the value for cells expressing WT without ligand. Data represent means ± s.d. *P < 0.05, **P < 0.005, ***P < 0.0005, Student s t test.
Supplementary Figure 11 Effect of CIMP status on overall survival in acute ATL. Kaplan-Meier survival curves for 44 acute ATL cases stratified by CIMP status (log-rank test).