Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Similar documents
Peak-calling for ChIP-seq and ATAC-seq

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

ChIP-seq data analysis

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Tutorial. ChIP Sequencing. Sample to Insight. September 15, 2016

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Nature Structural & Molecular Biology: doi: /nsmb.2419

The Epigenome Tools 2: ChIP-Seq and Data Analysis

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Yue Wei 1, Rui Chen 2, Carlos E. Bueso-Ramos 3, Hui Yang 1, and Guillermo Garcia-Manero 1

Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data

Transcript-indexed ATAC-seq for immune profiling

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

Table S1. Total and mapped reads produced for each ChIP-seq sample

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Research Article Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data

cis-regulatory enrichment analysis in human, mouse and fly

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

EPIGENOMICS PROFILING SERVICES

Supplementary Figures

The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome

Assignment 5: Integrative epigenomics analysis

Allelic reprogramming of the histone modification H3K4me3 in early mammalian development

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Genome-wide Association Studies (GWAS) Pasieka, Science Photo Library

Functional annotation of farm animal genomes: ChIP-seq.

Sudin Bhattacharya Institute for Integrative Toxicology

Yingying Wei George Wu Hongkai Ji

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Recent advances in ChIP-seq analysis: from quality management to whole-genome annotation

Supplemental Figure 1: Asymmetric chromatin maturation leads to epigenetic asymmetries on sister chromatids.

Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

Part-II: Statistical analysis of ChIP-seq data

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Supplementary Figure 1 IL-27 IL

Statistical Assessment of the Global Regulatory Role of Histone. Acetylation in Saccharomyces cerevisiae. (Support Information)

Nature Genetics: doi: /ng Supplementary Figure 1

Results. Abstract. Introduc4on. Conclusions. Methods. Funding

Measuring DNA Methylation with the MinION. Winston Timp Department of Biomedical Engineering Johns Hopkins University 12/1/16

Sequence and chromatin determinants of cell-type specific transcription factor binding

Chapter 2. Aims & Objectives

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Stem Cell Epigenetics

High Throughput Sequence (HTS) data analysis. Lei Zhou

Exploring chromatin regulation by ChIP-Sequencing

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

Statistical Genetics. Matthew Stephens. Statistics Retreat, October 26th 2012

Nuclear RNA Sequencing of the Mouse Erythroid Cell Transcriptome

RNA-seq Introduction

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Exploring the Connection between Sequence and Coordinated Gene Activity for Adjacent Promoter Pairs

REVIEWERS' COMMENTS: Reviewer #1 (Remarks to the Author):

Analysis of the peroxisome proliferator-activated receptor-β/δ (PPARβ/δ) cistrome reveals novel co-regulatory role of ATF4

SUPPLEMENTARY INFORMATION

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

Hao D. H., Ma W. G., Sheng Y. L., Zhang J. B., Jin Y. F., Yang H. Q., Li Z. G., Wang S. S., GONG Ming*

Package NarrowPeaks. August 3, Version Date Type Package

Comparative analyses of histone H3K9 trimethylations in the heart and spleen of normal humans

Supplementary Information

MIR retrotransposon sequences provide insulators to the human genome

Molecular mechanism of the priming by jasmonic acid of specific dehydration stress response genes in Arabidopsis

Supplemental Materials

Introduction to Systems Biology of Cancer Lecture 2

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

TITLE: Identification of Estrogen Receptor Beta Binding Sites in the Human Genomes

PRC2 inhibition counteracts the culture-associated loss of engraftment potential of human cord blood-derived hematopoietic stem and progenitor cells

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Figure S1, Beyer et al.

Histones modifications and variants

Dynamic Changes in Chromatin Accessibility Occur in CD8 + T Cells Responding to Viral Infection

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Discovery of two identities of neuroblastoma cells via the analysis of super-enhancer landscapes

Measuring DNA Methylation with the MinION

Differential peak calling of ChIP-seq signals with replicates with THOR

The epigenetic landscape of T cell subsets in SLE identifies known and potential novel drivers of the autoimmune response

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Sirt1 Hmg20b Gm (0.17) 24 (17.3) 877 (857)

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

Chromatin Structure & Gene activity part 2

Epigenetic priors for identifying active transcription factor binding sites

Supplemental Information. Genomic Characterization of Murine. Monocytes Reveals C/EBPb Transcription. Factor Dependence of Ly6C Cells

Supplementary information

The search for cis-regulatory driver mutations in cancer genomes

The corrected Figure S1J is shown below. The text changes are as follows, with additions in bold and deletions in bracketed italics:

User Guide. Association analysis. Input

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Transcription:

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data Bioinformatics methods, models and applications to disease Alex Essebier

ChIP-seq experiment To determine protein binding sites in the genome Snapshot of in vivo sites occupied by protein Improve understanding of regulation in genome Improve understanding of epigenetics Transcription factors TFs Histone modifications HMs To tails of histone proteins forming nucleosomes

ChIP-seq data processing

ChIP-seq principles Wet lab Extract DNA bound by protein of interest

Was ChIP-seq successful? ChIP-seq principles Sequence depth Depends on size of genome and type of protein Mammalian TF 20 million reads

Was ChIP-seq successful? Sequence quality control High quality FastQC to analyse ChIP-seq principles

Was ChIP-seq successful? Alignment quality control Uniquely aligned reads ChIP-seq principles

Was ChIP-seq successful? ChIP-seq creates bimodal pattern of reads at peak Strand cross correlation analysis SCCA ChIP-seq principles

Basic principles of peak calling Sample Exposed to antibody Input No antibody exposure Peak With statistical significance Compared to To generate

The problem with peak calling Choice of peak caller depends on problem Based on statistical or probabilistic models Omic Tools reports 51 ChIP-seq tools In-house tools e.g. stalled or transient

Comparing peak callers - TFs HOMER and SPP fixed size peak 262bps and 470bps respectively MACS2 variable size peak Avg. 328bps, mode 140-180bps Peak caller Total % Unique MACS2 42,536 12% HOMER 45,044 19% SPP 19,474 0.7%

Number Peak quality control How active is the protein? Read coverage Are peak locations enriched for reads? Fraction of reads in peaks (FRiP) > 1% Generally observe > 10% E.g. below 6/50 reads in peak -> 12%

Replicate datasets Biological replicates can vary significantly Call peaks for replicates individually Compare/overlap to achieve golden standard Comparisons are dominated by poor replicate

PEAK ANALYSIS Exploring the peaks generated from ChIP-seq

Transcription Factors Confirm in vitro and in silico results Overlapping peaks with motifs Identify consensus motif For TFs which do not have an existing/known motif To identify variations in motif Differential peak binding To identify differences in binding patterns Compare cell types or time points

Histone Modifications Epigenetic analysis Generate epigenetic profiles Identify chromatin states genome wide E.g. ChromHMM Identify regulatory modules E.g. promoters or enhancers Differential peak binding Identify differences in epigenetic patterns

INTEGRATING DATA Combining data sets to improve outcomes

Data integration Experiments capture dependent regulatory events ChIP-seq regulatory elements DNase I hypersensitivity (DHS) chromatin accessibility RNA-seq expression patterns Consider multiple datasets to: Improve confidence and understanding Support hypotheses

Supporting HMs Explore chromatin environment Layered HMs DHS chromatin accessibility

ChIP-seq complications Possible to observe multiple states at one location False negatives Can t detect small sub-populations False positives General non-specific chromatin being pulled down Bias not removed by input

Supporting TFs Assumption: TFs bind open/active chromatin Preferentially bind regulatory regions E.g. promoters or enhancers

ChIP-seq complications ChIP-seq generates peaks for all of these events

TF target genes using RNA-seq RNA-seq on knock-out of TF Identify genes with changes in expression Gene 1 is down-regulated Direct target of TF

PRACTICAL EXAMPLE The role of Math1 in differentiation of cerebellum

Role of Math1 in differentiation Aim: to identify genes targeted by Math1 Approach: integrate available data Dataset Data type Called peaks Math1 ChIP-seq 8,804 H3K4me1 ChIP-seq 11,270 H3K4me3 ChIP-seq 15,894 DHS DNase I hypersensitivity 73,682 Math1_KO RNA-seq NA

Combining replicates Two replicates for H3K4me1 Two peak callers: MACS2 HOMER Data set Peaks Overlap MACS2_rep1 8,183 MACS2_rep2 9,789 5,269 HOMER_rep1 71,534 HOMER_rep2 70,469 48,661 H3K4me1 rep1 H3K4me1 rep2 IgG Control MACS2_rep1 HOMER_rep1 MACS2_rep2 HOMER_rep2

Combining replicates Two replicates Two peak callers: MACS2 HOMER Generate high quality merged output Requires called peak in 3 of 4 data sets 11,270 peaks in total H3K4me1 rep1 H3K4me1 rep2 IgG Control MACS2_rep1 HOMER_rep1 MACS2_rep2 HOMER_rep2 Merged_out

Identify regulatory regions Three outputs from epigenetic data: H3K4me1_DHS sites putative enhancers H3K4me3_DHS sites putative promoters H3K4me1_H3K4me3_DHS sites other Comparison Sites Overlap H3K4me1_DHS 9,011 80% of H3K4me1 H3K4me3_DHS 15,098 95% of H3K4me3 H3K4me1_H3K4me3_DHS 919

Bound Math1 Identify regulatory regions bound by Math1 Math1 binds preferentially to putative enhancer >50% Math1 binding sites do not overlap a defined regulatory region Putative Enhancer Putative Promoter No Overlap

% of total % of total Distance profiles Binding by Math1 selects for distal regulatory regions (>2,000bps from TSS) 100 100 80 60 40 20 H3K4me1 DHS H3K4me1 DHS Math1 80 60 40 20 H3K4me3 DHS H3K4me3 DHS Math1 0 Proximal Distal 0 Proximal Distal

Long distance regulation How to identify genes regulated by an enhancer?

RNA-seq Proximal putative promoter bound by Math1 81 8 18 Up regulated Down regulated No significant change Distal putative enhancer bound by Math1 176 312 Up regulated Down regulated CisMapper for long distance interactions 1562 No significant change

System complexity Small number of differentially expressed genes are bound by Math1 System redundancy Indirect changes in expression 326 Up regulated genes 182 63 2170 172 2693 Full RNA Math1 H3K4me1 Math1 H3K4me3 Down regulated genes Full RNA Math1 H3K4me1 Math1 H3K4me3

Take home messages Understand your data and how best to use it Quality control Peak calling Use multiple where possible Keep up to date with advances Data integration Use all available data to gain a more complete picture

Data Resources Klisch, T. J., Xi, Y., Flora, A., Wang, L., Li, W., & Zoghbi, H. Y. (2011). In vivo Atoh1 targetome reveals how a proneural transcription factor regulates cerebellar development. Proceedings of the National Academy of Sciences,108(8), 3288-3293. Frank, C. L., Liu, F., Wijayatunge, R., Song, L., Biegler, M. T., Yang, M. G.,... & West, A. E. (2015). Regulation of chromatin accessibility and Zic binding at enhancers in the developing cerebellum. Nature neuroscience, 18(5), 647-656. Useful papers Bailey, T., Krajewski, P., Ladunga, I., Lefebvre, C., Li, Q., Liu, T.,... & Zhang, J. (2013). Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS Comput Biol, 9(11), e1003326. Landt, S. G., Marinov, G. K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S.,... & Chen, Y. (2012). ChIP-seq guidelines and practices of the ENCODE and modencode consortia. Genome research, 22(9), 1813-1831. Farnham, P. J. (2009). Insights from genomic profiling of transcription factors.nature Reviews Genetics, 10(9), 605-616. Zhang, Y., Liu, T., Meyer, C. A., Eeckhoute, J., Johnson, D. S., Bernstein, B. E.,... & Liu, X. S. (2008). Model-based analysis of ChIP- Seq (MACS).Genome biology, 9(9), 1. Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y. C., Laslo, P.,... & Glass, C. K. (2010). Simple combinations of lineagedetermining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Molecular cell, 38(4), 576-589.