Large conserved domains of low DNA methylation maintained by Dnmt3a

Similar documents
Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

SUPPLEMENTARY FIGURES: Supplementary Figure 1

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Yue Wei 1, Rui Chen 2, Carlos E. Bueso-Ramos 3, Hui Yang 1, and Guillermo Garcia-Manero 1

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

SUPPLEMENTAL INFORMATION

Supplementary Figures

Nature Structural & Molecular Biology: doi: /nsmb.2419

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Assignment 5: Integrative epigenomics analysis

Nature Methods: doi: /nmeth.3115

SUPPLEMENTARY INFORMATION

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

Supplemental Information. A Highly Sensitive and Robust Method. for Genome-wide 5hmC Profiling. of Rare Cell Populations

EPIGENOMICS PROFILING SERVICES

Epigenomic Profiling of Young and Aged HSCs Reveals Concerted Changes during Aging that Reinforce Self-Renewal

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

Nature Immunology: doi: /ni Supplementary Figure 1. DNA-methylation machinery is essential for silencing of Cd4 in cytotoxic T cells.

Cluster Dendrogram. dist(cor(na.omit(tss.exprs.chip[, c(1:10, 24, 27, 30, 48:50, dist(cor(na.omit(tss.exprs.chip[, c(1:99, 103, 104, 109, 110,

Integrated analysis of sequencing data

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Dynamic reprogramming of DNA methylation in SETD2- deregulated renal cell carcinoma

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Supplemental Information. Genomic Characterization of Murine. Monocytes Reveals C/EBPb Transcription. Factor Dependence of Ly6C Cells

Supplementary Materials for

Nature Genetics: doi: /ng Supplementary Figure 1. HOX fusions enhance self-renewal capacity.

High Throughput Sequence (HTS) data analysis. Lei Zhou

An annotated list of bivalent chromatin regions in human ES cells: a new tool for cancer epigenetic research

Comparative DNA methylome analysis of endometrial carcinoma reveals complex and distinct deregulation of cancer promoters and enhancers

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

Supplementary Figures

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Lung Met 1 Lung Met 2 Lung Met Lung Met H3K4me1. Lung Met H3K27ac Primary H3K4me1

Supplemental Figure 1. Genes showing ectopic H3K9 dimethylation in this study are DNA hypermethylated in Lister et al. study.

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Results. Abstract. Introduc4on. Conclusions. Methods. Funding

The role of DNMT3A and HOXA9 hypomethylation in acute myeloid leukemia (AML)

Epigenetics DNA methylation. Biosciences 741: Genomics Fall, 2013 Week 13. DNA Methylation

Innate Immunity Dysregulation in Myelodysplastic Syndromes. CONTRACTING ORGANIZATION: University of Texas MD Anderson Cancer Center Houston, TX 77030

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Supplementary Fig.S1. MeDIP RPKM distribution of 5kb windows. Relative expression of DNMT. Percentage of genome. (fold change)

Allelic reprogramming of the histone modification H3K4me3 in early mammalian development

Dissecting gene regulation network in human early embryos. at single-cell and single-base resolution

Copyright 2014 The Authors.

Tissue of origin determines cancer-associated CpG island promoter hypermethylation patterns

Supplementary Information

Nature Genetics: doi: /ng Supplementary Figure 1

The epigenetic landscape of T cell subsets in SLE identifies known and potential novel drivers of the autoimmune response

Nature Immunology: doi: /ni Supplementary Figure 1. RNA-Seq analysis of CD8 + TILs and N-TILs.

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

Nature Immunology: doi: /ni Supplementary Figure 1 33,312. Aire rep 1. Aire rep 2 # 44,325 # 44,055. Aire rep 1. Aire rep 2.

Epigenetic programming in chronic lymphocytic leukemia

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

Stem Cell Epigenetics

Nature Medicine: doi: /nm.4439

Supplementary Materials for

Chip Seq Peak Calling in Galaxy

Cancer Informatics Lecture

Dnmt3a and Dnmt3b Have Overlapping and Distinct Functions in Hematopoietic Stem Cells

Myogenic Differential Methylation: Diverse Associations with Chromatin Structure

SUPPLEMENTARY INFORMATION

Single-cell RNA-Seq profiling of human pre-implantation embryos and embryonic stem cells

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

SUPPLEMENTARY APPENDIX

Histones modifications and variants

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

Genomic Methods in Cancer Epigenetic Dysregulation

Nature Genetics: doi: /ng Supplementary Figure 1. Somatic coding mutations identified by WES/WGS for 83 ATL cases.

SUPPLEMENTAL FILE. mir-22 and mir-29a are members of the androgen receptor cistrome modulating. LAMC1 and Mcl-1 in prostate cancer

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression

RUNX1 and FPD/AML Translational Research. The Leukemia and Lymphoma Society / Babich Family Foundation Partnership. September 2016

A fully Bayesian approach for the analysis of Whole-Genome Bisulfite Sequencing Data

SSM signature genes are highly expressed in residual scar tissues after preoperative radiotherapy of rectal cancer.

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

DNA methylation and differentiation: HOX genes in muscle cells

Nature Genetics: doi: /ng Supplementary Figure 1. Immunofluorescence (IF) confirms absence of H3K9me in met-2 set-25 worms.

Supplementary Information

SUPPLEMENTARY FIGURE 1: f-i

Chromatin marks identify critical cell-types for fine-mapping complex trait variants

MIR retrotransposon sequences provide insulators to the human genome

Host Double Strand Break Repair Generates HIV-1 Strains Resistant to CRISPR/Cas9

Nature Immunology: doi: /ni.3412

Epigenetics & Chromatin. Open Access RESEARCH

Nature Genetics: doi: /ng Supplementary Figure 1. Phenotypic characterization of MES- and ADRN-type cells.

ChARM: Discovery of combinatorial chromatin modification patterns in hepatitis B virus X-transformed mouse liver cancer using association rule mining

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

ESCs were lysed with Trizol reagent (Life technologies) and RNA was extracted according to

Table S1. Total and mapped reads produced for each ChIP-seq sample

DNA methylation & demethylation

Analysis of shotgun bisulfite sequencing of cancer samples

Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

Transcription:

Supplementary information Large conserved domains of low DNA methylation maintained by Dnmt3a Mira Jeong# 1, Deqiang Sun # 2, Min Luo# 1, Yun Huang 3, Grant A. Challen %1, Benjamin Rodriguez 2, Xiaotian Zhang 1, Lukas Chavez 3, Hui Wang 4, Rebecca Hannah 5, Sang-Bae Kim 6, Liubin Yang 1, Myunggon Ko 3, Rui Chen 4, Berthold Göttgens 5, Ju-Seog Lee 6, Preethi Gunaratne 7, Lucy A. Godley 8, Gretchen J. Darlington 9, Anjana Rao 3, Wei Li^2, and Margaret A. Goodell^1 # These authors contributed equally to this work ^These authors jointly directed this work Affiliations: 1 Stem Cells and Regenerative Medicine Center, Department of Pediatrics and Molecular & Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA. 2 Division of Biostatistics, Dan L. Duncan Cancer Center and Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas 77030, USA. 3 Division of Signaling and Gene Expression, La Jolla Institute for Allergy and Immunology, La Jolla, California 92037, USA 4 Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA. 5 Department of Hematology, Cambridge Institute for Medical Research and Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge University, Hills Road, Cambridge, UK. 6 Department of Systems Biology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77054, USA. 7 Department of Pathology, Baylor College of Medicine, and Department of Biology & Biochemistry, University of Houston, Houston, TX 77204, USA 8 Department of Medicine, The University of Chicago, Chicago, Illinois 60637, USA. 9 Huffington Center for Aging, Baylor College of Medicine, Houston, Texas 77030, USA. % Now at the Department of Internal Medicine, Washington University in St. Louis, St. Louis, Missouri 63110, USA. Correspondence to: Goodell@bcm.edu or WL1@bcm.edu 1

Supplementary Tables Supplementary Table 1: Whole genome sequencing statistics Supplementary Table 2: UMRs in WT HSCs Supplementary Table 3: UMRs in HOX genes Supplementary Table 4: UMRs in ESCs Supplementary Table 5: Canyon vs UMRs Supplementary Table 6: WT vs Dnmt3a KO HSC UMRs Supplementary Table 7: CMS 5hmC sites in WT and Dnmt3a KO Supplementary Table 8: oxbs sequencing Supplementary Table 9: Summary of genes overlapping leukemia Oncomine signatures 2

Supplementary Figure 1. Genomic features of the WT HSC methylome. Violin plots indicate the methylation ratios in regions with different genomic features. 3

1 2 3 4 5 6 7 8 9 10 X 19 18 17 16 15 14 13 12 11 Y Supplementary Figure 2. Canyon distribution in mouse chromosomes. 1,104 Canyons are plotted along different chromosomes. 4

Supplementary Figure 3. Large unmethylated Canyons around Hox genes. (a) Bar graph indicating the percentage of genes overlapping Hox genes for Canyons, other UMRs and control genes. Control genes are the merged version of three control gene sets: other transcription factors (TFs), other CGI and Bystander genes 1. The Y-axis indicates percent of overlapping genes. (b) Length distribution and Homeobox gene enrichment of the largest 20 canyons. Green star indicates Canyon genes expressed in HSCs. (c) UCSC genome browser track depicts methylation profile across the Barhl2 gene, (d) Uncx gene and (e) Pou3f2 gene in murine HSCs. Methylation ratios from 0% to 100%, for individual CpG sites are shown in red. The identified Undermethylated regions (UMRs) ( 10% methylation) are indicated by blue bars, while the CpG islands are indicated in green, repeats are marked in black, and mammalian conservation is indicated in dark blue. RNA-seq expression is shown at bottom in green. 5

number of repeats 0 50 100 15 0 2 00 2 5 0 3 00 number of repeats 0 10 2 0 3 0 40 50 60 number of Repeats 0 10 2 0 3 0 40 50 a b LINE mean pct covered by Repeats vertebrate phast conservation c 0.35 0.3 0.25 0.2 0.15 0.1 60% 40% 20% 0% Control sumr cumr Canyon 3 2 1 0 1 2 3 Repeat position on Canyon LTR 3 2 1 0 1 2 3 Repeat position on Canyon SINE Control sumr cumr Canyon 3 2 1 0 1 2 3 Repeat position on Canyon Supplementary Figure 4. Mammalian conservation scores and depletion of transposable elements in Canyons (a) Bar graph indicating the average phast conservation scores for Canyons, other UMRs and control genes. Control genes are the merge of three control gene sets: other TFs, other CGI and Bystander genes 1. (b) Position of repeats, including LINE, LTR and SINE on WT HSC Canyons. The X-axis of the plots indicates location of repeats relative to Canyons, where all Canyons are normalized to cover positions -1 to +1. Thus, a position weight of 0 represents the middle of the Canyons, while position weight ±1 represents the edges of the Canyons. (c) Bar graph indicating the mean percentage of the indicated regions that are covered by repeats. 6

Supplementary Figure 5. HSC-specific TF binding peak enrichment in Canyons. (a) UCSC genome browser track depicts overlap of TF binding peaks and Canyons, using over 150 ChIP-seq data sets including 10 hematopoietic stem cell TFs (Scl/Tal1, Lyl1, Lmo2, Gata2, Runx1, Meis1, Pu.1, Erg, Fli1, and Gfi1b). Purple boxes indicate TF binding peaks, the green bar indicates a Super enhancer in T helper (Th) cells (Super enhancer data from ref 2 ). (b) Position of binding peaks for the 10 hematopoietic stem cell TFs across all UMRs. ±1 represents the boundary of cumrs. (c) Bar graph indicating the mean percentage covered by TF binding peaks for Canyons and other UMRs and control genes. Control genes are the merge of three control gene sets: other TFs, other CGI and Bystanders 1 for negative controls. 7

Supplementary Figure 6. Canyon dynamics between ESC and HSC. (a) Dot plot to compare Canyon methylation ratios between HSCs and ESCs. Conserved Canyons are boxed in green, and cell-type-specific Canyons are boxed in red and blue with examples from each group depicted in b-e. (b and c) The UCSC browser image shows DNA methylation profiles for cell type-specific Canyons Pou5f1 (Oct4) (ESC-specific) and Erg (HSC-specific). (d and e) Depiction of the common Canyons Brd2 and Lhx5 genes. Lhx5 is not expressed in either cell type, while Brd2 is only expressed in HSCs (RNA seq data not shown). (f) Stable Pax7 Canyon in human hematopoietic progenitors and differentiated progeny (data from ref 3 ). (g) Stable Pax7 Canyon in mouse HSC and ESC (ES data from ref 4 ). (h) Bar graph to compare common canyon numbers between different cell types and different species. (Mouse ES data from ref 4 ), Human HPC, B cell and Neutrophil data from ref 3 and other human cell data from NIH Roadmap Epigenomics Mapping Consortium (www.roadmapepigenomics.org). HumanCD34: Mobilized CD34+ primary cells, Human Liver: Adult Liver, Human Brain: Brain Germinal Matrix, Human neurospere: Neurosphere cultured cells Ganglionic Eminence Derived. Y-axis indicates common Canyon numbers and percentages. 8

number of DMRs 0 50 100 150 200 250 300 350 0 20 40 60 80 a Normalized position to cumrs Normalized position to background Normalized position to Canyons Normalized position to background 3 2 1 0 1 2 3 3 2 1 0 1 2 3 UMR Canyon DMRs on cumrs (-1 to 1) DMRs on Canyons (-1 to 1) b Scale chr11: 50 kb mm9 Hoxb7 Hoxb9 Hoxb6 Hoxb8 Hoxb4 96,250,000 Hoxb3 Hoxb2 Hoxb1 Hoxb5 WT: WGBS WT: Canyon 3aKO: WGBS 3aKO: Canyon CpG Islands RepeatMasker Mammal Cons Supplementary Figure 7. Dnmt3a is involved in maintaining Canyons in HSCs. (a) Position of differentially methylated regions (DMRs) comparing WT and Dnmt3a KO HSCs on cumrs and Canyons. The grey bars represent controls, generated by assigning the UMRs and Canyons to random locations in the genome. The X-axis indicates the genomic position centered on the midpoint of UMRs normalized to -1 to +1. Thus, ±1 represents the boundaries of UMRs and Canyons. (b) UCSC genome browser track depicts methylation profiles across HoxB cluster. Several smaller Canyons merge and expand in the Dnmt3a KO HSCs to generate exceptionally large under-methylated regions. 9

Supplementary Figure 8. 5hmC peak enrichment on Canyon boundaries. (a) UCSC genome browser track depicts FAM32a gene which is one of the genes covered by at least 100 reads in BS-seq and oxbs-seq. Y-axis show % of 5mC or % of 5hmC. +/- % range indicates plus strand or minus strand. (b) Ccdc124 gene. (c) Position of CMS peaks in WT HSC on normalized cumrs and Canyons. The grey bars represent the control generated by assigning the UMRs and Canyons to random locations in the genome. The X-axis of the plots indicates genomic positions centered on the midpoint of UMRs, with ±1 representing the boundaries of the UMRs. 10

Supplementary Figure 9. Alterations of 5hmc peaks in the hypomethylated DMRs after Dnmt3a KO. (a) UCSC genome browser track depicts methylation profiles and 5hmC, detected by the CMS method, across the Srrm2 gene in WT and Dnmt3a KO HSCs. Blue box indicates methylation depleted region with decreased 5hmC signal and pink box indicates slightly methylation decreased region with increased 5hmC signal. (b) UCSC genome browser track depicts methylation profiles and CMS peaks across the Hoxb4 gene in WT and Dnmt3a KO HSCs. Blue box indicates methylation depleted region with decreased 5hmC signal and pink box indicates slightly methylation decreased region with increased 5hmC signal. (c) When the methylation is depleted in Dnmt3a KO (MethylationRatio<20%), the majority of 5hmc peaks showed decreased density. (d) When the methylation decreases but still retains a certain level in Dnmt3a KO (40%<MethylationRatio<80%), 3-fold more 5hmc peaks showed increased density. Each dot represents a CMS peak, which overlaps with hypomethylated DMR. X-axis and Y-axis values are normalized CMS read counts in the peak region for WT and Dnmt3a KO HSC. Grey dots represent those peaks that are not significantly changed, while red and blue dots represent those peaks that are significantly increased and decreased. 11

Supplementary Figure 10. Unsupervised clustering analysis of Canyon genes separates WT and DNMT3A mutant in 173 TCGA AML. Gene expression data from 173 TCGA AML patients by RNA-seq were used for hierarchical clustering of Canyon genes. Expression data were centralized by subtracting median expression level across samples. The data are presented in matrix format in which rows represent individual genes and columns represent each patient. Relatively increased expression is indicated by red color; relatively decreased expression by green color; intermediate expression by black color as indicated in the scale bar (log2 transformed scale). UC: Unexpressed Canyon genes (FPKM < 1) EC: Expressed Canyon genes (FPKM 1). 12

Supplementary Figure 11. Heatmap of unsupervised clustering analysis of Canyon gene expression from 947 Cancer Cell Line Encyclopedia (CCLE). Gene expression data from 947 cancer cells (GSE36139) by microarray were used for hierarchical clustering of Canyon genes. Expression data were centralized by subtracting median expression level across samples. The data are presented in matrix format in which rows represent individual genes and columns represent each cancer cell line. Relatively increased expression is indicated by red color; relatively decreased expression by green color; intermediate expression by black color as indicated in the scale bar (log2 transformed scale). UC: unexpressed Canyon genes (FPKM < 1) EC: Expressed Canyon genes (FPKM 1). 13

Supplementary Figure 12. Model to show the proposed mechanism of Canyon size dynamics. Dnmt3a and Tet proteins act at the edges of Canyons. As long as both proteins are present, Canyon size is maintained in successive rounds of cell division. Canyon expansion is more frequently associated with high levels of H3K4me3 and the presence of expressed genes. In the absence of Dnmt3a, the presence of 5hmC (deposited by Tet proteins) results in edge erosion. In the absence of Tet proteins, the absence of deposition of 5-hydroxymethylation is predicted to result in Canyon contraction. 14

References for Supplementary Data 1 Akalin, A. et al. Transcriptional features of genomic regulatory blocks. Genome Biol 10, R38 (2009). 2 Whyte, W. A. et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153, 307-319. (2013). 3 Hodges, E. et al. Directional DNA methylation changes and complex intermediate states accompany lineage specificity in the adult hematopoietic compartment. Molecular cell 44, 17-28 (2011). 4 Stadler, M. B. et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480, 490-495. (2011). 15