Figure S1. Overview of the variant calling and verification process. This figure expands on Fig. 1c with details of verified variants identification in 547 additional validation samples. Somatic variants (SNVs, indels, focal and chromosome-armlevel CNVs, and fusion products) were first called in 197 diagnostic samples with remission DNA (for germline) using a Complete Genomics custom Whole Genome Sequencing (WGS) variant calling pipeline. Complete Genomics calls were optimized at the start of the TARGET project using 100 independentlyverified variants in WGS samples. Matched tumor and remission samples in 153 cases were used for somatic variant calling by both WGS and targeted capture sequencing (TCS) of genes recurrently impacted in the WGS samples. 72% of WGS SNVs, and 76% of WGS indels were confirmed by TCS (red & green text in figures). For focal copy number (CN) alterations spanning fewer than 7 genes, 75% of recurrent WGS deletion/loss and 85% gain/amplification calls matched recurrent alterations discovered by SNP6 arrays in 96 matching samples. For chromosomal junctions, we integrated WGS, clinical and RNA-seq data by majority vote, and confirmed 89% of WGS calls. An additional 29 samples from the WGS discovery cohort were verified by TCS of diagnostic cases only, as part of 146 tumors without matched remission (see top portion of the figure). The remainder of these 146 cases were not used for discovery or validation purposes, rather, we simply identified recurrence of variants that were observed and verified in other samples.
Figure S2. Cellular processes and pathways commonly impacted in pediatric AML. The height of each bar indicates the percentage of samples with verified fusions (green), SNVs/indels (grey), or focal CNVs (gold) in recurrently impacted genes within 684 pediatric AML samples. See Table S2b for a list of the impacted genes.
Figure S3. Data type overlap for TARGET and TCGA diagnostic samples. UpSet plots (http://www.caleydo.org/tools/upset/) showing the set overlaps for whole genome sequencing (WGS), whole exome (WXS), mrna sequencing, DNA methylation arrays (CpGmeth), mirna sequencing and targeted capture sequencing (TCS) in the TARGET and TCGA cohorts. The numbers of assays analyzed for each type are indicated by the horizontal bar graphs and number in the set intersection is illustrated in vertical bar graphs. The Clinical category includes samples comprising the entire TARGET AML dataset, including those in TARGET AML subprojects (e.g. previously reported WXS analysis 6 ). Data from these samples are included in the chromosomal arm level and karyotype based assessments of copy loss and fusions. (a) All TARGET AML project samples available. (b) All TCGA samples used for comparisons to TARGET. (c) Assay type overlaps for TCGA and TARGET data combined. a TARGET AML assay overlap (n=1023) * Clinical annotations include ISCN karyotype b TCGA assay overlap (n=177) c Combined TARGET & TCGA assay data overlap (n=1200)
Figure S4. Clonality estimates are consistent by age across cohorts. Both TCGA and TARGET AML cohorts contain affected individuals between the ages of 15 and 39 (adolescent and young adult, or AYA). Mutational and karyotypic clonality were assessed in AYA patients with whole-genome or whole-exome sequencing from either cohort, resulting in estimates from 40 TARGET AML subjects and 22 TCGA AML subjects in this age group. No significant association between cohort and mutational clonality estimate (p = 0.79613, Fisher s exact test) or karyotypic clonality (p= 0.180302, Fisher s exact test) is observed (TCGA AYA cases are older and more likely to have normal karyotype, though not significantly so). A multivariate Poisson model similarly shows little evidence for a significant cohort-wise effect. The strongest predictor of (decreasing) mutational clonality in AYAs is age at diagnosis (p=0.28).
Figure S5. The context of genome-wide mutation burden in pediatric AML. The mutational burden of SNV and indels is low in pediatric AML (blue), with a median of 10 mutations/case across the 197 sample WGS cohort. This places pediatric AML, along with other pediatric malignancies (rhabdoid tumor, Ewing sarcoma, medulloblastoma) and adult AML (red) among the least mutated of human cancers. Figure reproduced from the raw data reported by Lawrence and colleagues 51 updated to reflect TARGET AML results, plotted using the ggplot2 package in the R statistical environment.
Figure S6. A simplified visualization of common genomic variants in TARGET and TCGA AML data. Selected small variants are grouped by those that appear distinctive from core binding factor (CBF; t(8;21) and inv(16)) and KMT2A (aka MLL) fusions (grp1: mutations of WT1, NPM1, PTPN11, GATA2, CEBPA) and those that frequently cooccur with CBF alterations (grp2: mutations of KIT or ASXL2, loss of chr X). C, chromosomal alteration; J, junction/translocation; M, mutation; I, ITD. Pediatric Adult CBF grp1.var KMT2A FLT3 grp2.var NRAS KRAS ZEB2 MBNL1 grp1.var DNMT3A IDH2 IDH1 CBF TET2 TP53 NRAS grp2.var KMT2A KRAS
Figure S7. Adult-Pediatric mutational contrasts in AML. Lollipop plots generated with ProteinPaint (https://pecan.stjude.org/#/proteinpaint) highlight differences in frequency, type, and location of sequence variants in pediatric and adult AML. The plotted data reflect all somatic coding variants identified at presentation in 177 TCGA cases and 815 TARGET AML cases (WGS + TCS). Mutations are coded by functional class: blue, missense; brown, insertion; gray, deletion; red, frameshifting; orange, stop-gain; green, tandem duplication. a MYC b GATA2 TARGET TARGET c KRAS d FLT3 TARGET TCGA TARGET TCGA e NRAS f KIT TARGET TCGA TARGET TCGA
Figure S8. The impact of pediatric gene fusions on clinical outcome. (a) 199 patients evaluated for CBFA2T3-GLIS2 fusion had clinical outcome data available for analysis. Those with the fusion (n=9) had significantly worse overall survival than patients without the fusion (n=190) (p=0.0101). (b) 824 patients were evaluated for fusions involving ETS family transcription factors (ETV6, FUS, or ERG) through karyotype and/or transcriptome sequencing and had clinical outcome data available for analysis. Those with fusions (n=20) had significantly worse event-free survival than patients without a fusion (n=804) (p=0.0060). (c) 824 patients were evaluated for fusions involving KAT6A through karyotype and/or transcriptome sequencing and had clinical outcome data available for analysis. Those with fusions (n=8) had significantly worse overall survival than patients without a fusion (n=816) (p=0.0195). Differences in outcome were assessed by log-rank test. EFS, event free survival; OS, overall survival.
Figure S9. Pediatric CBL Exonic Deletions Detected by cdna Fragment Length Analysis. Representative examples of CBL wild-type and deletion transcripts detected by capillary electrophoresis of cdna. Horizontal axis depicts size of the PCR fragment (bp), while vertical axis indicates strength of signal. WT size (full-length transcript) is 685bp, exon 8 deletion only is 563bp, and deletions of exons 8 and 9 is 354bp.
Figure S10. Mutational frequency differences in key myeloid genes. (a) ECOG comparison 4. (b) TCGA comparison, balanced by cytogenetic subtypes (see online Methods). Error bars indicate the empirical SD from the resampling procedure. a b TARGET ECOG TARGET TCGA
Figure S11. Mutational co-occurrence in KMT2A rearranged childhood AML. We identified single copy segmental deletions of ZEB2 and/or MBNL1 in 14 patients, 6 of whom had concurrent KMT2A fusions (p=0.035, Fisher s exact test). The row entitled KMT2A (clinical) shows the manually-curated classification of the tumor primary cytogenetic type by combining results from clinical, genomic and RNA-seq assays. By this measure, all samples are classified as belonging to the KMT2A fusion cytogenetic group. The row entitled KMT2A (WGS) shows KMT2A variants found by WGS alone. Note 2 samples have copy number alterations as well as fusions impacting KMT2A. C, copy number alteration; J, junction/translocation; M, mutation; I, ITD. KMT2A (clinical) KMT2A (WGS) MLLT3 NRAS FLT3 KRAS MLLT10 MBNL1 TMEM14E ZEB2
Figure S12. Clonality at presentation in pediatric AML. (a) Mutation-based inference of clonality in 197 TARGET AML cases with WGS and 177 TCGA AML cases identifies 2 or more detectable clones in the majority of patients across age ranges. (b) A similar pattern with overall fewer detectable clones was observed by karyotypic inference of clonal relationships at presentation. a Infants (age <3) Children (age 3-15) AYA (age 15-40) Adults (age >40) b Mutational clones detected at diagnosis Karyotypic clones detected at diagnosis
Figure S13. Gene variants alone and in combination impact pediatric AML outcomes. (a) 963 patients from the TARGET dataset with clinical results for FLT3 internal tandem duplication (ITD), NPM1, WT1, NUP98-NSD1 fusion had clinical outcome data for analysis. Patients with a combination of FLT3 ITD and WT1 or NUP98-NSD1 versus FLT3 ITD alone or in combination with NPM1 mutation exhibit significantly decreased overall survival (p<0.001). (b) Similar results were found for COG trial AAML0531 (b), COG trial CCG-2961 (c), and the Dutch Childhood Oncology Group (DCOG) (d). In each trial those with FLT3 ITD plus WT1 and/or NUP98-NSD1 fusion exhibit significantly worse overall survival. The exact numbers of patients in each subgroup are indicated in the table below the figures. The total numbers of evaluable patients is indicated in the table below. ITD, FLT3-ITD. Cohort ITD - ITD - NPM1 + ITD - WT1 + ITD - NPM1 + WT1 + ITD - WT1 + NUP98-NSD1 + ITD - NUP98-NSD1 + ITD + ITD + NPM1 + ITD + WT1 + ITD + NPM1 + WT1 + ITD + WT1 + NUP98-NSD1 + ITD + NUP98-NSD1 + ITD - NPM1 + NUP98-NSD1 + TARGET 687 37 56 7 4 0 72 27 27 7 17 21 1 963 AAML 0531 651 41 43 5 3 0 67 28 21 3 12 13 1 888 CCG-2961 435 41 27 2 0 0 17 8 11 2 4 9 0 556 DCOG 225 14 14 0 1 1 28 9 9 0 4 9 0 314 Total
Figure S14. Remission rates vary for pediatric AML with FLT3-ITD according to cooperating mutations. The CCG-2961, AAML0531 and DCOG cohorts were combined to compare complete remission (CR) rates after one cycle of induction therapy for groups with FLT3-ITD cooperating mutations, as shown. CR rates are consistent with the survival outcomes (Figs. 3c and S13) among these studies: the poorest outcome group containing FLT3-ITD and a cooperating WT1 and/or NUP98-NSD1 fusion had the lowest CR rate, at 54.8%. The most favorable group, FLT3-ITD positive, NPM1 positive at 93.0% (groupwise p<0.0001, Kruskal-Wallis).
Figure S15. Novel ZEB2 and MBNL1 Deletions. (a-b) show short (<500 Kbp) deletion segments along chromosomes 2 (panel a, ZEB2) and chromosome 3 (panel b, MBNL1) in TARGET discovery cohort samples (n=197). (c) With the exception of one ZEB2-deleted sample (red point at top right of panel c), samples with ZEB2 and MBNL1 deletions are not impacted by large numbers of other CNVs. a b c
Expression Value ELF1 expression Figure S16. Novel ELF1 focal deletions in the TARGET discovery cohort. (a) Genome browser view of segmental deletions covering the ELF1 locus. Patients (n=197) are in rows, blue bars indicate length of deletion in that genomic region. (b) Genomic deletions were confirmed in a secondary assay using the ncounter CNV assay (Nanostring Techologies), with verification (boxed specimens with low probes signals as identified by green signals in the heatmap below) of all ELF1 deletions initially identified by WGS. (c) Expression values (RPKM) of ELF1 differ between those with the deletion and those with wild-type copy number (p=0.0077). (d) Unsupervised clustering of 63 differentially expressed genes (p<0.01) between patients with and without ELF1 deletion shows many genes are upregulated in the samples with ELF1 deletions. Orange labels on the y axis indicate patients with an ELF1 deletion. a b c d Expression of ELF1 1200 1000 800 600 400 200 0 del ELF1 deletion WT ELF1 WT
Figure S17. Summary view of the key fusion classes in pediatric AML. Each colored region represents a fusion family. Descriptive labels are written adjacent to each family. The fusion partner genes for each family are indicated by their HGNC symbols. The lines connecting gene symbols indicate fusion partners. The thickness of each line reflects the frequency of the observed fusion.
Figure S18. Varying the age cutoff for infants (< 3 years in Figure 4b) vs. children, to < 2 or even < 1, does not substantially alter conclusions about fusion prevalence. Panel c is the same as Fig. 4b (reproduced here for comparison). Panels a and b show how samples shift between age groups if the infant-child threshold is reduced to <2 years (b), or <1 year (a). Fusions are listed in the same order as in 4b and used the same color scheme. a b c Infants Children AYA Adults Infants Children AYA Adults <1 1-15 <2 2-15 Infants <3 Children 3-15 AYA Adults
Figure S19. Co-occurring mutations with CEBPA. (a) Oncoprint (http://www.cbioportal.org) showing all TARGET samples with functionally-validated CSF3R mutations 20. Green indicates samples with mutations. (b) CEBPA and GATA2 mutations combinatorially impact Event-Free Survival. a b Percent survival CEBPA and/or GATA2 in Normal Cyto EFS 100 GATA2 +, CEBPA - (N=7) 80 CEBPA +, GATA2 + (N=16) CEBPA +, GATA2 - (N=13) 60 Wildtype (N=143) 40 P=0.0177 20 0 0 1000 2000 3000 4000 EFS (Days)
Figure S20. Patterns of mutual co-occurrence and mutual exclusion among somatic pediatric AML variants. (a) Patterns of co-occurrence and (b) mutual exclusion among variants in the TARGET cohort were evaluated using CoMEt (see online methods). Line thickness represents log(p-value) for the observed co-occurrence rates. Orange boxes indicate cytogenetic groups. Except for copy number alterations at the top-right, which were only evaluated within the 197 samples with WGS, all other relations are among 684 samples with TCS. (c) An alternative derivation of conditional gene-gene relationships using a penalized Ising model yields similar conditional dependencies. a b
c
Figure S21. Anti-correlated DNA methylation and reduced transcription potential. By scanning 2000 bp upstream and 200 bp downstream of the transcription start site (TSS) for all known ENSEMBL isoforms of ~8000 expressed genes in AML, we fit segmented regression models of DNA methylation (X axis) against asinh (transcripts per million, TPM, Y axis) of each transcript or gene. Hyperbolic arcsine (asinh) is similar to log transformation but is defined at all points along the real number line. Since large batch effects confound the biological differences between TARGET pediatric AML and TCGA adult AML mrna data, we opted to take the within-cohort median expression for samples with 10% or less methylation at a CpG locus, and the silencing threshold at the locus corresponding to the gene of interest was then defined as the methylation fraction beyond which no sample in a cohort exceeded the median unmethylated expression level (from samples with <= 10% methylation) within its cohort. Any locus where healthy progenitors or myeloid cells showed >= 10% methylation was omitted from consideration. After these filtering steps, the most significantly associated locus (ideally correlated with r > 0.8 against its neighboring loci) was then selected as a tag CpG for the downstream transcript(s). A tag CpG for HumanMethylation450 arrays and either the same locus or (if not present) the best surrogate locus for HumanMethylation27 arrays passing the filters was retained for silencing calls. If no suitable HumanMethylation27 locus could be found, only samples with HumanMethylation450 data were assayed for silencing of a given gene. This method identified 119 genes with recurrent silencing by promoter hypermethylation within the TARGET and TCGA datasets. Examples below include THRB and WDR35 (components of NMF signature 2 and 13 signals, respectively), CDKN2B, and ULBP1, ULBP2 and ULBP3 (NK ligands). The red line marks the empirically determined silencing threshold (% methylation).
Figure S22. Integrative analysis of gene mutations, deletion, and transcriptional silencing by promoter methylation. Silencing (gold) or mutation/deletion events (gray) for each gene (rows) are displayed for all assayed patients (columns), with marginal total of events per patient illustrated in the upper histogram. The plotted data reflects 172 TCGA cases and 284 TARGET cases at 119 genes and are outlined in Tables S8-S9. These data represent a complete illustration of the subset shown in Fig. 5a with differences in row/column ordering based on differing clustering solutions for greater numbers of samples and genes. Status silenced mutated Cohort
Figure S23. NMF Deconvolution of genome-wide methylation patterns. DNA methylation signatures derived by non-negative matrix factorization (NMF) and in silico purification. Samples are ordered by hierarchical clustering of signatures (labeled at right) and demonstrate the relative similarity of methylation features from samples within cytogenetic categories (top ribbon). The plotted data are outlined in Table S10 and represent a complete illustration of those shown in Fig 5b. Associations Cohort
Figure S24. Two DNA methylation signatures mark poor prognosis. Kaplan-Meier plots for signatures 2 and 13. After stratifying by cohort and adjusting for both TP53 mutation status and white blood cell count, these two signatures predict significantly (p < 0.05) poorer event-free survival in both pediatric and adult patients with above-median scores. DNA methylation signature #2 DNA methylation signature #13
Consensus matrix Figure S25. Unsupervised Nonnegative Matrix Factorization (NMF) Clustering of mirna Expression. This figure is a fully annotated version of Fig. 6A in the main text. Unsupervised NMF clustering of mirna expression patterns in pediatric AML samples revealed 4 discrete pediatric subgroups (marked by the numbered colored rectangles at the top) that were correlated with specific genomic alterations (indicated by blue bars in the gray annotation rows below the race and FAB category annotations, near the top). 1 2 3 4 Consensus matrix Expression z-
Figure S26. Kaplan-Meier plots for samples expressing low and high levels of mirs let-7a-3p, let-7b-5p and 30a-3p. The expression (RPM) cut point between high and low expression groups for each mirna was defined using the X-tile method 77, where all separation points between patients are considered and the selected cut point is the one that provided the optimal (lowest) EFS log rank p-value. OS, overall survival. P=<0.0001 P=0.0001 P=<0.0001
Figure S27. Kaplan-Meier plots for samples expressing low and high levels of mirs 155-5p, 3614-5p, 4662-5p and 26a-2-3p. The expression (RPM) cut point between high and low expression groups for each mirna was defined using the X-tile method 77. OS, overall survival.
Figure S28. High expression levels of mirs 133a-3p, 212-3p, and 29c-5p have deleterious effects on event free survival (EFS). The expression (RPM) cut point between high and low expression groups for each mirna was defined using the X-tile method 77. EFS, event-free survival.