ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

Similar documents
Chromatin marks identify critical cell-types for fine-mapping complex trait variants

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

The Epigenome Tools 2: ChIP-Seq and Data Analysis

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Assignment 5: Integrative epigenomics analysis

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Supplementary Figures

High Throughput Sequence (HTS) data analysis. Lei Zhou

Statistical Genetics. Matthew Stephens. Statistics Retreat, October 26th 2012

Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features. Tyler Yue Lab

Annotation of Functional Regulatory Elements in Livestock Species

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

Myogenic Differential Methylation: Diverse Associations with Chromatin Structure

MIR retrotransposon sequences provide insulators to the human genome

EPIGENOMICS PROFILING SERVICES

Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer

Nature Structural & Molecular Biology: doi: /nsmb.2419

RNA-seq Introduction

Tutorial. ChIP Sequencing. Sample to Insight. September 15, 2016

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Exploring chromatin regulation by ChIP-Sequencing

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Modeling gene expression using five histone modifications

BST227: Introduction to Statistical Genetics

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Reconstruction of enhancer-target networks in 935 samples of human primary cells, tissues and cell lines

Predicting chromatin organization using histone marks

Distinct patterns of epigenetic marks and transcription factor binding sites across promoters of sense-intronic long noncoding RNAs

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Peak-calling for ChIP-seq and ATAC-seq

DNA methylation and differentiation: HOX genes in muscle cells

cis-regulatory enrichment analysis in human, mouse and fly

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Genome-wide Association Studies (GWAS) Pasieka, Science Photo Library

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression

Metadata of the chapter that will be visualized online

Nature Genetics: doi: /ng Supplementary Figure 1

Genomic Methods in Cancer Epigenetic Dysregulation

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Research Article Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data

Chip Seq Peak Calling in Galaxy

Global Epigenetic and Transcriptional Trends among Two Rice Subspecies and Their Reciprocal Hybrids W

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

Yingying Wei George Wu Hongkai Ji

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome

Data mining with Ensembl Biomart. Stéphanie Le Gras

Introduction to Systems Biology of Cancer Lecture 2

Histones modifications and variants

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Epigenetics. Lyle Armstrong. UJ Taylor & Francis Group. f'ci Garland Science NEW YORK AND LONDON

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.

levels of genes were separated by their expression levels; 2,000 high, medium, and low

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

GeneOverlap: An R package to test and visualize

Transcription factor p63 bookmarks and regulates dynamic enhancers during epidermal differentiation

Expression and epigenomic landscape of the sex chromosomes in mouse post meiotic male germ cells

Allelic reprogramming of the histone modification H3K4me3 in early mammalian development

A Quick-Start Guide for rseqdiff

Supplementary Information

Lung Met 1 Lung Met 2 Lung Met Lung Met H3K4me1. Lung Met H3K27ac Primary H3K4me1

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

PSSV User Manual (V1.0)

User Guide. Association analysis. Input

The Biology and Genetics of Cells and Organisms The Biology of Cancer

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Cluster Dendrogram. dist(cor(na.omit(tss.exprs.chip[, c(1:10, 24, 27, 30, 48:50, dist(cor(na.omit(tss.exprs.chip[, c(1:99, 103, 104, 109, 110,

Open Access RESEARCH. Background Malaria is a major public health problem in many developing countries, with the parasite Plasmodium

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Experimental design and workflow utilized to generate the WMG Protein Atlas.

Epigenetic programming in chronic lymphocytic leukemia

CTCF-Mediated Functional Chromatin Interactome in Pluripotent Cells

Gene Expression DNA RNA. Protein. Metabolites, stress, environment

Epigenetics and Toxicology

Supplementary Figure 1. Nature Genetics: doi: /ng.3736

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

Regulation of Gene Expression in Eukaryotes

Sudin Bhattacharya Institute for Integrative Toxicology

TITLE OF MODULE: Epigenetics in Development and Disease

ChIP-seq data analysis

Are you the way you are because of the

Epigenomics. Ivana de la Serna Block Health Science

Overview: Conducting the Genetic Orchestra Prokaryotes and eukaryotes alter gene expression in response to their changing environment

Nature Methods: doi: /nmeth.3115

Doing more with genetics: Gene-environment interactions

ESCs were lysed with Trizol reagent (Life technologies) and RNA was extracted according to

Genetics, epigenetics, and Mendelian randomization: cuttingedge tools for causal inference in epidemiology

TrainMALTA Summer School on Epigenomics

Transcription:

ChromHMM Tutorial Jason Ernst Assistant Professor University of California, Los Angeles

Talk Outline Chromatin states analysis and ChromHMM Accessing chromatin state annotations for ENCODE2 and Roadmap Epigenomics Running the ChromHMM software

Talk Outline Chromatin states analysis and ChromHMM Accessing chromatin state annotations for ENCODE2 and Roadmap Epigenomics Running the ChromHMM software

Chromatin Marks for Genome Annotation 100+ histone modifications Specificity in: Histone protein Amino acid residue Chemical modification (e.g. methyl, acetylation) Number of occurrence of the modifications Examples H3K4me1 Enhancers H3K4me3 Promoters H3K27me3 Repressive H3K9me3 Repressive H3K36me3 Transcribed Image source: http://nihroadmap.nih.gov/epigenomics/ Histone Modifications can be Mapped Genome-wide with ChIP-seq

From chromatin marks to chromatin states Ernst and Kellis, Nat Biotech 2010 Promoter states Transcribed states Active Intergenic Learn de novo significant combinational and spatial patterns of chromatin marks Repressed Use for genome annotation

Approach: Multivariate Hidden Markov Model (ChromHMM) Enhancer Gene Starts Gene - Transcribed Region DNA Unobserved Binarized chromatin marks. Called based on a poisson distribution H3K4me1 H3K4me3 H3K4me3 H3K4me1 H3K36me3 H3K36me3 H3K36me3 H3K36me3 H3K27ac H3K4me1 Most likely Hidden State 1 2 3 4 5 5 5 5 5 6 6 6 200 base pair interval Emission distribution is a product of independent Bernoulli random variables Parameters are learned from the data 1: 2: 3: 0.8 H3K4me1 0.9 H3K4me3 0.9 H3K4me3 K27ac 0.8 K4me1 0.8 0.7 4: 5: 6: H3K4me1 0.9 H3K36me3 Emission probabilities - High Probability Chromatin Marks in State Ernst and Kellis, Nat Biotech 2010 ; Ernst and Kellis, Nature Methods 2012 1 2 6 3 5 4 Transition probabilities Red high probability Black medium probability 6

ENCODE: Study nine marks in nine human cell lines 9 marks 9 human cell types H3K4me1 H3K4me2 H3K4me3 H3K27ac H3K9ac H3K27me3 H4K20me1 H3K36me3 CTCF +WCE +RNA HUVEC Umbilical vein endothelial NHEK Keratinocytes GM12878 Lymphoblastoid x K562 Myelogenous leukemia HepG2 Liver carcinoma NHLF Normal human lung fibroblast HMEC Mammary epithelial cell HSMM Skeletal muscle myoblasts H1 Embryonic Brad Bernstein ENCODE Group HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 81 Chromatin Mark Tracks Learned jointly across cell types (virtual concatenation) State definitions are common State locations are dynamic Ernst et al, Nature 2011

Chromatin states dynamics across nine ENCODE cell types Single annotation track for each cell type Summarize cell-type activity at a glance Can study 9-cell activity pattern across Ernst et al, Nature 2011

Chromatin states to interpret disease variants Specific chromatin states enriched in GWAS catalog Enhancers from different cell types enriched in different traits Ernst and Kellis, Nature Biotech 2010 Ernst et al, Nature 2011 Imputation based chromatin state used in dissection FTO loci Claussnitzer et al, NEJM 2015 Many other examples in the literature De Jager et al, Nature Neuroscience 2014 Interpreting epigenetic disease associated variation in Alzheimer s disease

Talk Outline Chromatin states analysis and ChromHMM Accessing chromatin state annotations for ENCODE2 and Roadmap Epigenomics Running the ChromHMM software

Chromatin States Defined Across 127 Cell/Tissues Types 5 marks observed data Roadmap Epigenomics Consortium et al, Nature 2015 16 cell types from ENCODE2

Extended Chromatin State Model with H3K27ac 6 marks observed data Defined on 98 cell/tissue types only 16 cell types from ENCODE2 Roadmap Epigenomics Consortium et al, Nature 2015

Chromatin States Defined on Imputed Data Marks 12 marks imputed data Defined on all cell/tissue types ChromImpute method Ernst and Kellis, Nature Biotech 2015

ChromHMM Models across Many Roadmap/ENCODE2 Cell and Tissue Types 127 Cell/Tissue Types H3K4me1 H3K4me3 H3K27me3 H3K9me3 H3K36me3 98 Cell/Tissue Types H3K4me1 H3K4me3 H3K27me3 H3K9me3 H3K36me3 H3K27ac Images from UCSC genome browser Note: additional ChromHMM annotations based on older ENCODE models available from UCSC genome browser 127 Cell/Tissue Types H3K4me1 H3K4me3 H3K27me3 H3K9me3 H3K36me3 H3K27ac H3K9ac H4K20me1 H3K79me2 H3K4me2 H2A.Z DNase

Accessing Roadmap/ENCODE2 ChromHMM through the UCSC Genome Browser Track Hubs http://genome.ucsc.edu

Accessing Roadmap/ENCODE2 ChromHMM through the UCSC Genome Browser Track Hubs http://genome.ucsc.edu

Accessing Roadmap/ENCODE2 ChromHMM through the UCSC Genome Browser Track Hubs Note: Different than track hub Roadmap Epigenomics Data Complete Collection at Wash U VizHub

Accessing Roadmap/ENCODE2 ChromHMM through the UCSC Genome Browser Track Hubs

Accessing Roadmap/ENCODE2 ChromHMM through the UCSC Genome Browser Track Hubs

Accessing Roadmap/ENCODE2 ChromHMM through the UCSC Genome Browser Track Hubs

Accessing Roadmap/ENCODE2 ChromHMM through the UCSC Genome Browser Track Hubs

Accessing Roadmap/ENCODE2 ChromHMM through the UCSC Genome Browser Track Hubs

Human Epigenome Browser at Washington University http://epigenomegateway.wustl.edu/

Roadmap Epigenomics Integrative Analysis Portal http://compbio.mit.edu/roadmap

Roadmap Epigenomics Integrative Analysis Portal http://compbio.mit.edu/roadmap

Talk Outline Chromatin states analysis and ChromHMM Accessing chromatin state annotations for ENCODE2 and Roadmap Epigenomics Running the ChromHMM software

ChromHMM Website http://compbio.mit.edu/chromhmm Software download

ChromHMM Website http://compbio.mit.edu/chromhmm Software manual

Try to Run ChromHMM on Sample Data on Your Computer (Java should already be installed on your computer) 1. Download http://compbio.mit.edu/chromhmm/chromhmm.zip 2. Unzip ChromHMM.zip 3. Open a command line 4. Change into the ChromHMM directory 5. Enter the command: java -mx1600m -jar ChromHMM.jar LearnModel -p 0 SAMPLEDATA_HG18 OUTPUTSAMPLE 10 hg18

Input to ChromHMM ChromHMM models are learned from binarized data using its LearnModel command Binarized data is typically obtained starting from aligned reads. Apply BinarizeBed if reads are in BED format Apply BinarizeBam if reads are in BAM format

BinarizeBed Java command -mx1600m specifies memory to Java

BinarizeBed ChromHMM command

BinarizeBed File with the chromosome lengths for the assembly

BinarizeBed DIRECTORY of BED files

BinarizeBed cell mark cell-mark Cell-mark-file table Control data is optional and can also be treated as a mark

BinarizeBed Output directory

LearnModel -p 0 Use as many processors as available -p N Use up to N processors (default N=1)

LearnModel Directory with the Binarized Input

LearnModel Directory where the output goes

LearnModel Number of states

LearnModel Genome assembly

ChromHMM Report

ChromHMM Report

Emission Parameters

Transition Parameters

Model Parameter File

Segmentation File

Browser Files Can load into browser UCSC Genome, IGV https://www.broadinstitute.org/igv/

Enrichments

Positional Plots

Enrichments for Additional Cell Types

Additional Commands CompareModels the command allows the comparison of the emission parameters of a selected model to a set of models in terms of correlation.

Additional Commands MakeBrowserFiles (re)generates browser files from segmentation files and allows specifying the coloring

Additional Commands OverlapEnrichment (re)computes enrichments of a segmentation for a set of annotations

Additional Commands NeighborhoodEnrichment (re)computes enrichments of a segmentation around a set of anchor positions

Additional Commands Reorder reorders the states of the model

Summary Pre-computed ChromHMM chromatin state annotations available across over 100 cell/tissue types ChromHMM software available to run on your own data http://compbio.mit.edu/chromhmm

Collaborators and Acknowledgements Manolis Kellis ENCODE consortium Brad Bernstein production group Roadmap Epigenomics consortium Funding NHGRI, NIH, NSF, HHMI, Sloan Foundation