Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

Similar documents
Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

The Epigenome Tools 2: ChIP-Seq and Data Analysis

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

MIR retrotransposon sequences provide insulators to the human genome

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

Nature Structural & Molecular Biology: doi: /nsmb.2419

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression

The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

EPIGENOMICS PROFILING SERVICES

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

Yingying Wei George Wu Hongkai Ji

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Histones modifications and variants

Genome-wide Association Studies (GWAS) Pasieka, Science Photo Library

High Throughput Sequence (HTS) data analysis. Lei Zhou

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

Peak-calling for ChIP-seq and ATAC-seq

Assignment 5: Integrative epigenomics analysis

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Transcript-indexed ATAC-seq for immune profiling

cis-regulatory enrichment analysis in human, mouse and fly

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

Chromatin Structure & Gene activity part 2

Chromosome-Wide Analysis of Parental Allele-Specific Chromatin and DNA Methylation

Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding

Eukaryotic transcription (III)

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

Stem Cell Epigenetics

Comprehensive nucleosome mapping of the human genome in cancer progression

Allelic reprogramming of the histone modification H3K4me3 in early mammalian development

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

CTCF-Mediated Functional Chromatin Interactome in Pluripotent Cells

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Sudin Bhattacharya Institute for Integrative Toxicology

The epigenetic landscape of T cell subsets in SLE identifies known and potential novel drivers of the autoimmune response

H3K4 demethylase KDM5B regulates global dynamics of transcription elongation and alternative splicing in embryonic stem cells

Open Access RESEARCH. Background Malaria is a major public health problem in many developing countries, with the parasite Plasmodium

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

Nature Immunology: doi: /ni Supplementary Figure 1. DNA-methylation machinery is essential for silencing of Cd4 in cytotoxic T cells.

Distinct Epigenetic Signatures Delineate Transcriptional Programs during Virus-Specific CD8 + T Cell Differentiation

SUPPLEMENTARY INFORMATION

Yue Wei 1, Rui Chen 2, Carlos E. Bueso-Ramos 3, Hui Yang 1, and Guillermo Garcia-Manero 1

Chromatin marks identify critical cell-types for fine-mapping complex trait variants

Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, Connecticut c

Nature Genetics: doi: /ng Supplementary Figure 1

Statistical Assessment of the Global Regulatory Role of Histone. Acetylation in Saccharomyces cerevisiae. (Support Information)

Supplementary Figure 1 IL-27 IL

Global Epigenetic and Transcriptional Trends among Two Rice Subspecies and Their Reciprocal Hybrids W

An annotated list of bivalent chromatin regions in human ES cells: a new tool for cancer epigenetic research

A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding

Sex specific chromatin landscapes in an ultra compact chordate genome

Plasticity in patterns of histone modifications and chromosomal proteins in Drosophila heterochromatin

Supplementary Figure S1: Defective heterochromatin repair in HGPS progeroid cells

Dominic J Smiraglia, PhD Department of Cancer Genetics. DNA methylation in prostate cancer

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.

Myogenic Differential Methylation: Diverse Associations with Chromatin Structure

Supplemental Figure 1. Genes showing ectopic H3K9 dimethylation in this study are DNA hypermethylated in Lister et al. study.

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Control of nuclear receptor function by local chromatin structure

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Epigenetic Analysis of KSHV Latent and Lytic Genomes

FOLLICULAR LYMPHOMA- ILLUMINA METHYLATION. Jude Fitzgibbon

ChARM: Discovery of combinatorial chromatin modification patterns in hepatitis B virus X-transformed mouse liver cancer using association rule mining

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

TITLE: - Whole Genome Sequencing of High-Risk Families to Identify New Mutational Mechanisms of Breast Cancer Predisposition

Epigenetics. Lyle Armstrong. UJ Taylor & Francis Group. f'ci Garland Science NEW YORK AND LONDON

Sequence and chromatin determinants of cell-type specific transcription factor binding

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Research Article Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data

Genetics and Genomics in Medicine Chapter 6 Questions

Transcription factor p63 bookmarks and regulates dynamic enhancers during epidermal differentiation

Targeted Recruitment of Histone Modifications in Humans Predicted by Genomic Sequences GUO-CHENG YUAN ABSTRACT

Results. Abstract. Introduc4on. Conclusions. Methods. Funding

ChIP-seq data analysis

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

Dynamic reprogramming of DNA methylation in SETD2- deregulated renal cell carcinoma

Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features. Tyler Yue Lab

Imprinting. Joyce Ohm Cancer Genetics and Genomics CGP-L2-319 x8821

Computational Modeling of mirna Biogenesis

EXPression ANalyzer and DisplayER

Chromatin-Based Regulation of Gene Expression

Tutorial. ChIP Sequencing. Sample to Insight. September 15, 2016

Distinct patterns of epigenetic marks and transcription factor binding sites across promoters of sense-intronic long noncoding RNAs

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

TITLE: Identification of Estrogen Receptor Beta Binding Sites in the Human Genomes

Data mining with Ensembl Biomart. Stéphanie Le Gras

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

Supplementary Figures

Transcription:

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD, Crawford, GE, Ren, B (2007) Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet, 39:311 318.

The problem Different cells have different patterns of gene expression. Cells respond to their environment by changing the expression of genes. Enhancers are the DNA sequences used to drive the specific patterns of expression. People want to use a genomics approach to identify regulatory circuitry that controls patterns of gene expression. Once, we thought that if you knew the DNA sequence of a single enhancer that specified a pattern of expression that you could use this to identify other examples. E gen e E

The problem The DNA sequence of an enhancer is a poor predictor of function. It only identifies a tiny subset of enhancers. An early demonstration of this fact: Impey, S, McCorkle, SR, Cha-Molstad, H, Dwyer, JM, Yochum, GS, Boss, JM, McWeeney, S, Dunn, JJ, Mandel, G, Goodman, RH (2004) Defining the CREB regulon: a genome-wide analysis of transcription factor regulatory regions. Cell, 119:1041 1054. Canonical palindromic binding site TGACGTCA Approx freq of this seq is 1/65kb (one in 4^8), genome is 3X10^9, so in theory 46,000 sites, actual count is just over 32,000 ChIP:Saco = A PCR/sequencing technique Genome wide 6302 sites, many are NOT canonical Seq analysis id s too many and many real ones don t match the motif. Novel = new, not predicted. For chromosome 10

The problems Enhancers may not be near the promoters they regulate and so how do you identify their targets? Some experiments indicate that only ~20% of enhancers are near the promoter (~1kb). In addition, the presence of the DNA sequence for an enhancer does not mean that the enhancer is being used.

The problem The situation is a bit better for promoters but not much.

A possible solution? Epigenetic modifications may provide patterns that identify enhancers and promoters They may also provide patterns that say NOT enhancer or NOT promoter.

Now they do the entire genome but they began with data from a consortium of scientists. 30 megabases (1% of the genome) 44 genomic regions Known genes that are ON. Known genes that are OFF. Known NOT genes.

Nat Genet, 39:311 318. (2007) The over-arching hypothesis is that the transcription machinery uses histone modifications as a first identifier of where promoters and enhancers lie AND that cells modify these histone marks as a way to regulate gene expression. The goal of this paper is to see if histone marks can be used by the investigator to identify active genes. To figure out if a new technique can work it is nice to know what the answer is.

Nat Genet, 39:311 318. (2007) First we start with a circular process: Take KNOWN promoters and KNOWN enhancers and see if they are covered by recognizable patterns of histone modifications. Then they ask if these patterns alone can be used to identify these promoters and enhancers in the genome. Wait! - we must have had a way to identify KNOWN promoters and enhancers because they start with them. So why bother? Because these KNOWN promoters were identified because of substantial labor and effort by a large number of people over a long period of time. We need an easier way.

Nat Genet, 39:311 318. (2007) Known promoters that are on versus Known promoters that are off versus non promoter 1. What histone marks do they have? Are these marks a reliable way to identify them? Known promoters that are on vs Known promoters that are off vs non promoter Eventually ask if the new method can discover NEW promoters!

Nat Genet, 39:311 318. (2007) Known enhancers that are on versus Known enhancers that are off versus non enhancer 1. What histone marks do they have? Are these marks a reliable way to identify them? Known enhancers that are on vs Known enhancers that are off vs non enhancer Eventually ask if the new method can discover NEW enhancers!

Performed chromatin immunoprecipitation Used antibodies that recognized five different different histone modifications. H4ac H3ac H3K4me1 H3K4me2 H3K4me2 H3K4me3 H3 H4ac H3ac 4 logr -2 H3K4me1 H3K4me2 H3K4me3 RFX5 Analyze by ChIP:Chip using an ENCODE array.

Now they do the entire genome but they began with data from a consortium of scientists. 30 megabases (1% of the genome) 44 genomic regions Known genes that are ON. Known genes that are OFF. Known Enhancers. Known NOT genes and Known Not Enhancers.

What material was used? DNA chip ENCODE fragments: 30 Mb database of transcriptionally active sequences in humans. 38 bp resolution Chromatin will be prepared from HeLa cells. 10 kb blocks around each landmark. Landmarks are known promoters and known enhancers Start with 104 TSSs (core promoters) defined as active promoters in HeLa cells 104 TSSs defined as inactive in HeLa cells

TSS (transcription start site) is synonymous with the core promoters

Conceptual explanation of simple clustering and heat maps

Align so that transcription start sites are centered gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8-5 kb +5 kb

Use genomic tiling array to determine position of each mark. gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8-5 kb +5 kb X X X X X XXXXXX X X X XX X X X

gene 1-5 kb +5 kb X X Sort the genes into self-similar groups gene 2 gene 3 gene 4 gene 5 X X X XXXXXX X X similar to one another gene 6 X X gene 7 gene 8 X XX X

Sort the genes into self-similar groups gene 4 gene 7 gene 5 gene 6 gene 1 gene 2 gene 8 gene 3-5 kb +5 kb X X X X X X X X XX X X X XXXXXX

Compact so that many genes can be viewed in a small space. X X X X X X X X XX X X X XXXXXX

Compact so that many genes can be viewed in a small space.

Compact so that many genes can be viewed in a small space. Group 1 Group II Group III Group IV Called a heat map black = average red = greater than average green = less than average Color scheme can vary. In papers you should double check what the colors mean.

Compact so that many genes can be viewed in a small space. Group 1 Group II Group III Group IV Called a heat map black = average red = greater than average green = less than average Each line is very thin. About 200 lines here. Each line is like a track in Human Epigenome Browser. Color scheme can vary. In papers you should double check what the colors mean.

a 208 promoters (RefSeq TSSs) H4ac H3ac H3K4me1 H3K4me2 H3K4me3 H3 RNAPII TAF1 p300 % active Centered on known TSSs. P1 P2 18/102 24/41 P3 21/23 2007 Nature Publishing Group http://www.nature.com/naturegenetics Centered on p300 binding sites. b c d logr logr P4 2.5 0 0.5 5 kb TSS +5 kb TSS TSS TSS TSS TSS TSS TSS TSS Class P1 Class P2 Class P3 Class P4 logr 3 0 3 E1 E2 E3 2.5 0 0.5 74 enhancers (distal p300 binding sites) H4ac H3ac H3K4me1 H3K4me2 H3K4me3 H3 RNAPII TAF1 p300 5 kb Peak +5 kb Peak Peak Peak Peak Peak Peak Peak Peak Class E1 Class E2 Class E3 logr 3 0 3 Figure 1 Features of human transcriptional promoters and enhancers. ChIP-chip was performed against six histone markers and three general transcription factors in the ENCODE regions, and the data were clustered to reveal patterns at annotated promoters (a) and distal p300 binding sites (c). Promoter clustering was performed with 10-kb windows centered on RefSeq TSSs; enhancer windows were centered on promoter-distal p300 binding peaks. Average profiles for each marker within each class are shown below the clusters (b,d); each class is represented by a different color. The proportion of expressed genes ( % active ) in each promoter class is presented to the right of the cluster, illustrating the relationship between histone modification patterns and gene expression. Comparison of the clusters shows that active promoters and enhancers are similarly marked by nucleosome depletion (column H3) but distinctly marked by mono- and trimethylation of histone H3K4 (columns H3K4me1 and H3K4me3). Note the depletion of H3K4me1 and the peak of H3K4me3 at the TSS in promoters, compared with the enrichment of H3K4me1 and lack of H3K4me3 at enhancers. The presence of histone methylation and acetylation upstream and downstream of the TSS at promoters is distinct from the primarily downstream localization of these markers observed at yeast promoters. The same procedure was performed on data from treated HeLa cells, yielding similar results (Supplementary Fig. 2). H4ac, acetylated H4; H3ac, acetylated H3. 41/42

organized into classes a TAF II p250 208 promoters (RefSeq TSSs) H4ac H3ac H3K4me1 H3K4me2 H3K4me3 H3 RNAPII TAF1 p300 % active Centered on known TSSs. P1 P2 18/102 24/41 Increasing activity wrt transcription 2007 Nature Publishing Group http://www.nature.com/naturegenetics Centered on p300 binding sites. b c d logr logr P3 P4 2.5 0 0.5 5 kb TSS +5 kb TSS TSS TSS TSS TSS TSS TSS TSS Class P1 Class P2 Class P3 Class P4 logr 3 0 3 E1 E2 E3 2.5 0 0.5 74 enhancers (distal p300 binding sites) H4ac H3ac H3K4me1 H3K4me2 H3K4me3 H3 RNAPII TAF1 p300 How do we align enhancers? 5 kb Peak +5 kb Peak Peak Peak Peak Peak Peak Peak Peak Class E1 Class E2 Class E3 logr 3 0 3 Figure 1 Features of human transcriptional promoters and enhancers. ChIP-chip was performed against six histone markers and three general transcription factors in the ENCODE regions, and the data were clustered to reveal patterns at annotated promoters (a) and distal p300 binding sites (c). Promoter clustering was performed with 10-kb windows centered on RefSeq TSSs; enhancer windows were centered on promoter-distal p300 binding peaks. Average profiles for each marker within each class are shown below the clusters (b,d); each class is represented by a different color. The proportion of expressed genes ( % active ) in each promoter class is presented to the right of the cluster, illustrating the relationship between histone modification patterns and gene expression. Comparison of the clusters shows that active promoters and enhancers are similarly marked by nucleosome depletion (column H3) but distinctly marked by mono- and trimethylation of histone H3K4 (columns H3K4me1 and H3K4me3). Note the depletion of H3K4me1 and the peak of H3K4me3 at the TSS in promoters, compared with the enrichment of H3K4me1 and lack of H3K4me3 at enhancers. The presence of histone methylation and acetylation upstream and downstream of the TSS at promoters is distinct from the primarily downstream localization of these markers observed at yeast promoters. The same procedure was performed on data from treated HeLa cells, yielding similar results (Supplementary Fig. 2). H4ac, acetylated H4; H3ac, acetylated H3. 21/23 41/42

208 promoters (RefSeq TSSs) H4ac H3ac H3K4me1 H3K4me2 H3K4me3 H3 RNAPII TAF1 p300 % active P1 18/102 P2 P3 P4 2.5 24/41 21/23 41/42 E1 E2 E3 2.5 74 enhancers (distal p300 binding sites) H4ac H3ac H3K4me1 H3K4me2 H3K4me3 H3 RNAPII TAF1 p300 0 0.5 5 kb TSS +5 kb TSS TSS TSS TSS TSS TSS TSS TSS Class P1 Class P2 Class P3 Class P4 0 0.5 5 kb Peak +5 kb Peak Peak Peak Peak Peak Peak Peak Peak Class E1 Class E2 Class E3 logr 3 0 3 1. Don t use P1 promoters because there is no positive signal. These are thought to be promoters that are off. Other non-promoter regions might look like this. Probably can eventually use an off promoter mark. 2. Shape of the peak and intensity are going to be the what we are looking for.

208 promoters (RefSeq TSSs) H4ac H3ac H3K4me1 H3K4me2 H3K4me3 H3 RNAPII TAF1 p300 % active P1 18/102 P2 P3 P4 2.5 24/41 21/23 41/42 E1 E2 E3 2.5 74 enhancers (distal p300 binding sites) H4ac H3ac H3K4me1 H3K4me2 H3K4me3 H3 RNAPII TAF1 p300 0 0.5 5 kb TSS +5 kb TSS TSS TSS TSS TSS TSS TSS TSS Class P1 Class P2 Class P3 Class P4 0 0.5 5 kb Peak +5 kb Peak Peak Peak Peak Peak Peak Peak Peak Class E1 Class E2 Class E3 logr 3 0 3 Weak H3K4me1 and Strong H3K4me3 is most strongly correlated with an active core promoter. The shape is also important. Notice the nucleosome-free region at +1.

208 promoters (RefSeq TSSs) H4ac H3ac H3K4me1 H3K4me2 H3K4me3 H3 RNAPII TAF1 p300 % active P1 18/102 P2 P3 P4 2.5 24/41 21/23 41/42 E1 E2 E3 2.5 74 enhancers (distal p300 binding sites) H4ac H3ac H3K4me1 H3K4me2 H3K4me3 H3 RNAPII TAF1 p300 0 0.5 5 kb TSS +5 kb TSS TSS TSS TSS TSS TSS TSS TSS Class P1 Class P2 Class P3 Class P4 0 0.5 5 kb Peak +5 kb Peak Peak Peak Peak Peak Peak Peak Peak Class E1 Class E2 Class E3 logr 3 0 3 The opposite is found to be most tightly correlated with enhancers. Strong H3K4me1 and Weak H3K4me3 is most strongly correlated with an active enhancer. The shape is also important. Notice that the shape is different than the promoter mark shape.

44 genomic regions - 208 well described genes ENCODE fragments are discrete regions. This was used to generate the previous heat maps. Representing them as a line. 30 mbases of genomic DNA ENCODE fragments Want to use this information to directly ID promoters and enhancers. Really trying to catagorize sequence as 1. promoter 2. enhancer 3. not 1 or 2.

How do we figure out what to do? TRAINING SET Calculate the average profile for a mark. e.g. P4 H3ac could be one training set. 30 Megabases of genomic DNA

How do we figure out what to do? TRAINING SET Calculate the average profile for a mark. 30 Megabases of genomic DNA TEST SET is all overlapping 10 kb windows from the same 30 Megabase region Could we ID promoters with this average profile? Need something to examine but the answer must be known.

How do we figure out what to do? TRAINING SET Calculate the average profile for a mark. 30 Megabases of genomic DNA TEST SET is all overlapping 10 kb windows from the same 30 Megabase region. There are more windows here that in the training set. For each member of the TEST SET (-) determine how closely the histone mod profile matches the shape of one of the six TRAINING SET histone mods. Do this by determining the Euclidian distance.

Euclidian distance Test 1 Plot the peak on a coordinate system. Determine how many transformations are required to convert a test plot into the average. Test 2 Average Done for all six histone marks.

How do we figure out what to do? TRAINING SET Calculate the average profile for a mark. 30 Megabases of genomic DNA TEST SET is all overlapping 10 kb windows from the same 30 Megabase region. There are more windows here that in the training set. For each member of the TEST SET (-) determine how closely the histone mod profile matches the shape of one of the six TRAINING SET histone mods. Do this by determining the Euclidian distance.

How do we figure out what to do? TRAINING SET Calculate the average profile for a mark. 30 Megabases of genomic DNA TRAINING SET TEST SET is all overlapping 10 kb windows from the same 30 Megabase reagion For each member of the TEST SET (-) determine how closely the histone mod profile matches the shape of one of the six TRAINING SET histone mods. Do this by determining the Euclidian distance. Discrimination Filter The identified member must have at least a 0.4 correlation statistical correlation with one training set. It must also more closely match enhancer or promoter - not both. If either test fails then it is discarded.

So what!!! This shows that a training set can identify its own members!!! Big deal. How can we determine if it can determine members that ARE NOT in the training set!!! This is what they want. TRAINING SET Calculate the average profile for a mark. TRAINING SET TEST SET is all overlapping 10 kb windows from the same 30 Megabase reagion For each member of the TEST SET (-) determine how closely the histone mod profile matches the shape of one of the six TRAINING SET histone mods. Do this by determining the Euclidian distance. Discrimination Filter The identified member must have at least a 0.4 correlation statistical correlation with one training set. It must also more closely match enhancer or promoter - not both. If either test fails then it is discarded.

Demonstrate this by making training sets that are missing some members. TRAINING SET MINUS RANDOM 10% that is missing a random 10% of the promoters TRAINING SET TEST SET is all overlapping 10 kb windows from the same 30 Megabase reagion Search the test set and see how well it does in retrieving the missing 10%. Repeat this process with a different randonmly selected 10%. H3K4me3 is good for promoters H3K4me1 is good for enhancers but together they work better than either one separately. Decide that you need to use the output from two histone modifications for best predictability.

How well does it work to identify the 10% omitted from the TRAINING SET? Number of correct & incorrect predictions Each group differs by 10% of its members. Height is different because the outliers are different. This is just for the P3 Promoter class. Number of predictions needed to achieve these results.

But how can we tell if it really works? Functionally test a predicted enhancer that was located 6 kb upstream of SLC22A5 - little is known about its transcriptional regulation.

Carnitine transporter a Cloned regions of the SLC22A5 locus (chr5) 0 1 Distance, kb SLC22A5 Predicted enhancer (E) Promoter (P S ) Entire locus (L) Locus - E (L E) Test for enhancer properties in transfected HeLa cells. Showed that this approach successfully identified KNOWN promoters and enhancers AND identified NEW enhancers. These worked as enhancers when physically tested. b Relative intensity (arbitrary units) luciferase L L E 20 15 10 5 0 pgl3-b L L E Relative intensity (arbitrary units) P S Luciferase reporter activity in untreated HeLa cells Entire region 2.0 1.5 1.0 0.5 0 pgl3-b plasmid by itself plasmid by itself P S P S E P S E Entire region with deleted promoter. Promoter + pred enh Promoter w/o predicted enh..

Heintzman, et al (2009) Histone modifications at human enhancers reflect global celltype-specific gene expression. Nature, 459:108 112. Take home: Chromatin modifications at enhancers are globally related to cell-type-specific gene expression 42

H3K4me1, H3K4me3, H3K27ac Surprise! Chromatin marks at promoters are similar in all cell types. CTCF occupancy is same across all cell types. CTCF is an insulator bindng protein. Action is at the enhancers H3K4me1 and H3K27 Variability is greatest between cell types 43

5 human immortalized cell lines cervical carcinoma HeLa cells immortalized lymphoblast GM06690 (GM) leukaemia K562 embryonic stem cells (ES) BMP4-induced ES cells (des) * 55,454 enhancers examined

At core promoters the marks are the same in most tissue types. This was a big suprise. We are not shown it but expression of these genes differ. Example TSS a HeLa GM K562 ES des H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac 5 kb +5 kb logr 3 0 +3 Enhancers b HeLa GM K562 ES des H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac 5 kb +5 kb logr 3 0 +3 45

At core promoters the marks are the same in most tissue types. This was a big suprise. We are not shown it but expression of these genes differ. But enhancer marks are diagnostic for the tissue type. Example TSS a HeLa GM K562 ES des H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac 5 kb +5 kb logr 3 0 +3 Enhancers b HeLa GM K562 ES des H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac H3K4me1 H3K4me3 H3K27ac 5 kb +5 kb logr 3 0 +3 46

Chromatin modifications at enhancers are globally related to cell-type-specific gene expression Example 55,454 enhancers examined 5 cell line types d HeLa K562 H3K4me1 H3K4me3 H3K4me1 H3K4me3 0 c 5 kb +5 kb logr 3 0 +3 47

An enhancer mark can indicated increased responsivity Interferon activates Stat1 binding. Cell type TF % at HeLa enhancer MCF7 ER 32.6% (P < 1 10 300 ) HCT116 p53 21.4% (P = 2.3 10 30 ) ME180 p63 25.8% (P < 1 10 300 ) HeLa-IFN-γ STAT1 25.3% (P < 4.5 10 142 ) c Genes upregulated (%) 40 30 20 10 P = 5.8 10 8 STAT1 group I Genes in domains with STAT1 Random genes P = 5.4 10 1 STAT1 group II + IFN-γ IFN-γ STAT1 H3K4me1 H3K4me3 When you add interferon the ones with the enhancer mark (H3K4me1) turn on even both groups show STAT 1 binding. 0 STAT1 group I STAT1 group II 5 kb +5 kb logr 3 0 +3

Of course the work continues Table 3. Distinctive Chromatin Features of Genomic Elements Functional Annotation Histone Marks References Promoters H3K4me3 Bernstein et al., 2005; Kim et al., 2005; Pokholok et al., 2005 Bivalent/Poised Promoter H3K4me3/H3K27me3 Bernstein et al., 2006 Transcribed Gene Body H3K36me3 Barski et al., 2007 Enhancer (both active and poised) H3K4me1 Heintzman et al., 2007 Poised Developmental Enhancer H3K4me1/H3K27me3 Creyghton et al., 2010; Rada-Iglesias et al., 2011 Active Enhancer H3K4me1/H3K27ac Creyghton et al., 2010; Heintzman et al., 2009; Rada-Iglesias et al., 2011 Polycomb Repressed Regions H3K27me3 Bernstein et al., 2006; Lee et al., 2006 Heterochromatin H3K9me3 Mikkelsen et al., 2007 Rivera, C.M. & Ren, B. (2013) Mapping human epigenomes. Cell, 155, 39-55. Zhu, Y. et al. (2013) Predicting enhancer transcription and activity from chromatin modifications. Nucleic Acids Res, 41, 10032-10043. Zentner,G.E., Tesar,P.J. and Scacheri,P.C. (2011) Epigenetic signatures distinguish multiple classes of enhancers with distinct cellular functions. Genome Res., 21, 1273 1283. Rada-Iglesias,A., Bajpai,R., Swigut,T., Brugmann,S.A.,Flynn,R.A. and Wysocka,J. (2011) A unique chromatin signature uncovers early developmental enhancers in humans. Nature, 470, 279 283. Bonn,S., Zinzen,R.P., Girardot,C., Gustafson,E.H., Perez- Gonzalez,A., Delhomme,N., Ghavi-Helm,Y., Wilczynski,B., Riddell,A. and Furlong,E.E. (2012) Tissue-specific analysis of chromatin state identifies temporal signatures of enhancer activity during embryonic development. Nat. Genet., 44, 148 156. 49