The Epigenome Tools 2: ChIP-Seq and Data Analysis

Similar documents
Peak-calling for ChIP-seq and ATAC-seq

ChIP-seq data analysis

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

EPIGENOMICS PROFILING SERVICES

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

MIR retrotransposon sequences provide insulators to the human genome

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

Nature Structural & Molecular Biology: doi: /nsmb.2419

Histones modifications and variants

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

High Throughput Sequence (HTS) data analysis. Lei Zhou

Chip Seq Peak Calling in Galaxy

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome

Eukaryotic transcription (III)

Part-II: Statistical analysis of ChIP-seq data

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Epigenetic Mechanisms

Gene Expression DNA RNA. Protein. Metabolites, stress, environment

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Exploring chromatin regulation by ChIP-Sequencing

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Supplementary Information

Open Access RESEARCH. Background Malaria is a major public health problem in many developing countries, with the parasite Plasmodium

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

REVIEWERS' COMMENTS: Reviewer #1 (Remarks to the Author):

Sirt1 Hmg20b Gm (0.17) 24 (17.3) 877 (857)

Supplementary Figures

Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding

Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features. Tyler Yue Lab

An annotated list of bivalent chromatin regions in human ES cells: a new tool for cancer epigenetic research

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Discovery of two identities of neuroblastoma cells via the analysis of super-enhancer landscapes

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

cis-regulatory enrichment analysis in human, mouse and fly

Epigenetics. Lyle Armstrong. UJ Taylor & Francis Group. f'ci Garland Science NEW YORK AND LONDON

Tutorial. ChIP Sequencing. Sample to Insight. September 15, 2016

Yue Wei 1, Rui Chen 2, Carlos E. Bueso-Ramos 3, Hui Yang 1, and Guillermo Garcia-Manero 1

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

ESCs were lysed with Trizol reagent (Life technologies) and RNA was extracted according to

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

DNA-seq Bioinformatics Analysis: Copy Number Variation

ChipSeq. Technique and science. The genome wide dynamics of the binding of ldb1 complexes during erythroid differentiation

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Introduction to Systems Biology of Cancer Lecture 2

Sudin Bhattacharya Institute for Integrative Toxicology

H3K4 demethylase KDM5B regulates global dynamics of transcription elongation and alternative splicing in embryonic stem cells

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.

Supplementary Figure 1 IL-27 IL

Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data

Epigenetics: The Future of Psychology & Neuroscience. Richard E. Brown Psychology Department Dalhousie University Halifax, NS, B3H 4J1

Expression and epigenomic landscape of the sex chromosomes in mouse post meiotic male germ cells

SUPPLEMENTAL INFORMATION

Epigene.cs: What is it and how it effects our health? Overview. Dr. Bill Stanford, PhD OFawa Hospital Research Ins.tute University of OFawa

Global Epigenetic and Transcriptional Trends among Two Rice Subspecies and Their Reciprocal Hybrids W

Epigenetics and Toxicology

Chromatin Structure & Gene activity part 2

Data mining with Ensembl Biomart. Stéphanie Le Gras

Transcript-indexed ATAC-seq for immune profiling

Comprehensive nucleosome mapping of the human genome in cancer progression

Statistical analysis of ChIP-seq data

Lecture 8. Eukaryotic gene regulation: post translational modifications of histones

Assignment 5: Integrative epigenomics analysis

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

The epigenetic landscape of T cell subsets in SLE identifies known and potential novel drivers of the autoimmune response

ChIPSeq. Technique and science. The genome wide dynamics of the binding of ldb1 complexes during erythroid differentiation

Integrated analysis of sequencing data

Nature Genetics: doi: /ng Supplementary Figure 1

Transcription and chromatin. General Transcription Factors + Promoter-specific factors + Co-activators

Testi del Syllabus. Testi in italiano. Resp. Did. SCHOEFTNER STEFAN Matricola: Docente SCHOEFTNER STEFAN, 6 CFU

Functional annotation of farm animal genomes: ChIP-seq.

Distinct patterns of epigenetic marks and transcription factor binding sites across promoters of sense-intronic long noncoding RNAs

Recent advances in ChIP-seq analysis: from quality management to whole-genome annotation

Lecture 10. Eukaryotic gene regulation: chromatin remodelling

Mechanisms of alternative splicing regulation

FOLLICULAR LYMPHOMA- ILLUMINA METHYLATION. Jude Fitzgibbon

Transcript reconstruction

Package NarrowPeaks. August 3, Version Date Type Package

Back to the Basics: Methyl-Seq 101

Epigenetics Armstrong_Prelims.indd 1 04/11/2013 3:28 pm

A high-throughput ChIP-Seq for large-scale chromatin studies

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Dynamic reprogramming of DNA methylation in SETD2- deregulated renal cell carcinoma

Expert Intelligence for Better Decisions Epigenetics:

Targeted Recruitment of Histone Modifications in Humans Predicted by Genomic Sequences GUO-CHENG YUAN ABSTRACT

Supplementary Figure S1: Defective heterochromatin repair in HGPS progeroid cells

Nature Immunology: doi: /ni Supplementary Figure 1. DNA-methylation machinery is essential for silencing of Cd4 in cytotoxic T cells.

Transcription:

The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu http://zanglab.com PHS5705: Public Health Genomics March 20, 2017 1

Outline Epigenome: basics review ChIP-seq overview ChIP-seq data analysis 2

Epigenome histone nucleosome The epigenome is a multitude of chemical compounds that can tell the genome what to do. The epigenome is made up of chemical compounds and proteins that can attach to DNA and direct such actions as turning genes on or off, controlling the production of proteins in particular cells. -- from genome.gov Original figure from ENCODE, Darryl Leja (NHGRI), Ian Dunham (EBI) 3

Epigenomic marks DNA methylation Histone marks Covalent modifications Histone variants Chromatin regulators Histone modifying enzymes Chromatin remodeling complexes * Transcription factors 4

Histone modifications Nucleosome Core Particles Core Histones: H2A, H2B, H3, H4 Covalent modifications on histone tails include: methylation (me), acetylation (ac), phosphorylation Histone variants Histone modifications are implicated in influencing gene expression. Notation: H3K4me3 Allis C. et al. Epigenetics. 2006 5

Histone modifications associate with regulation of gene expression Differential expression log2 (fold-change) 0.35 Fractions of enhancers 0.30 0.25 0.20 0.15 0.10 0.05 0 H2AK5ac H2AK9ac H2A.Z H2BK5ac H2BK5me1 H2BK12ac H2BK20ac H2BK120ac H3K4ac H3K4me1 H3K4me2 H3K4me3 H3K9ac H3K9me1 H3K9me2 H3K9me3 H3K14ac H3K18ac H3K23ac H3K27ac H3K27me1 e H3K27me2 H3K27me3 H3K36ac H3K36me1 H3K36me3 H3K79me1 H3K79me2 H3K79me3 H3R2me1 H3R2me2 H4K5ac H4K8ac H4K12ac H4K16ac H4K20me1 H4K20me3 H4K91ac Wang, Zang et al. Nat Genet 2008 6

Functions of histone marks Table 3. Distinctive Chromatin Features of Genomic Elements Functional Annotation Histone Marks Promoters H3K4me3 Bivalent/Poised Promoter Transcribed Gene Body Enhancer (both active and poised) Poised Developmental Enhancer Active Enhancer H3K4me3/H3K27me3 H3K36me3 H3K4me1 H3K4me1/H3K27me3 H3K4me1/H3K27ac Polycomb Repressed Regions Heterochromatin H3K27me3 H3K9me3 Rivera & Ren Cell 2013 7

H3K4me3/H3K27me3 Bivalent Domain H3K4me3 Repressed H3K27me3 Remained Poised Induced From: https://pubs.niaaa.nih.gov/publications/arcr351/77-85.htm 8

ChIP-seq: Profiling epigenomes with sequencing histone nucleosome ATAC-seq Original figure from ENCODE, Darryl Leja (NHGRI), Ian Dunham (EBI) 9

Published ChIP-seq datasets are skyrocketing We are entering the Big Data era Number of ChIP-seq datasets on GEO 3000 2500 2000 1 1500 1000 500 0! Mei et al. Nucleic Acids Research 2016 10

Chromatin ImmunoPrecipitation (ChIP) 11

Protein-DNA crosslinking in vivo (for TF) 12

Chop the chromatin using sonication (TF) or micrococal nuclease (MNase) digestion (histone) 13

Specific factor-targeting antibody 14

Immunoprecipitation 15

DNA purification 16

PCR amplification and sequencing 17

ChIP-seq data analysis overview Scale chr19: 500 bases hg19 15,308,000 15,308,100 15,308,200 15,308,300 15,308,400 15,308,500 15,308,600 15,308,700 15,308,800 15,308,900 15,309,000 15,309,100 15,309,200 User Supplied Track @ILLUMINA-8879DC:231:KK:3:1:1070:945 1:Y:0: NNNAATACAGTCAGAAACATATCATATTGGAGAATA #################################### @ILLUMINA-8879DC:231:KK:3:1:1153:945 1:Y:0: NNNAAGCACACAGAAGATAACTAAACAATCAAGTAG #################################### @ILLUMINA-8879DC:231:KK:3:1:1222:945 1:Y:0: NNNAAGGGTCTTGAGAAGAAATCATTCTGGATGGCA #################################### @ILLUMINA-8879DC:231:KK:3:1:1304:939 1:Y:0: NNNCCAGGCTCCCGCGATTCTCCTGCCTCAGCTTCT #################################### @ILLUMINA-8879DC:231:KK:3:1:1354:945 1:Y:0: NNNCTCTTCCTTAGCTAAACTTTCAACTAAGCCAAA #################################### @ILLUMINA-8879DC:231:KK:3:1:1411:932 1:Y:0: NNNGTAGGACCATTGGCGTTGCGACACAAAAAATTT #################################### @ILLUMINA-8879DC:231:KK:3:1:1496:937 1:Y:0: NNNTTCATCGGGTTGAGAGTCCCCTTGTTGCATGCA #################################### @ILLUMINA-8879DC:231:KK:3:1:1533:939 1:Y:0: NNNATTTTCCCGTTCCAGGTCGCAATTTCCGCCGTT #################################### @ILLUMINA-8879DC:231:KK:3:1:1573:940 1:Y:0: NNNGGGGTGCGCCTTTAGTCCCAGCTACTCAGGAAC #################################### 18

ChIP-seq data analysis overview Where in the genome do these sequence reads come from? - Sequence alignment and quality control What does the enrichment of sequences mean? - Peak calling What can we learn from these data? Downstream analysis and integration 19

ChIP-seq data analysis: basic processing alignment of each sequence read: bowtie or BWA cannot map to the reference genome can map to multiple loci in the genome can map to a unique location in the genome redundancy control: Langmead et al. 2009, Zang et al. 2009 20

ChIP-seq data analysis: Peak calling DNA fragment size estimation pile-up profiling peak model cross-correlation d Percentage 0.05 0.10 0.15 0.20 0.25 0.30 0.35 forward tags reverse tags 600 400 200 0 200 400 600 s 0.055 0.05 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 50 100 150 200 250 300 350 400 Peak/signal detection Distance to the middle 21

ChIP-seq data analysis: Peak calling Sharp peaks transcription factor binding, DNase, ATAC-seq MACS (Zhang, 2008) dynamic background Poisson model Broad peaks Histone modifications, super-enhancers Diffuse SICER (Zang, 2009) Spatial clustering of localized weak signal and integrative Poisson model NOTCH1 H3K27ac 22 Wang, Zang et al. 2014

MACS Model-based Analysis for ChIP-Seq Tag distribution along the genome ~ Poisson distribution (λ BG = total tag / genome size) ChIP-seq show local biases in the genome Chromatin and sequencing bias 200-300bp control windows have to few tags ChIP But can look further Dynamic λ local = max(λ BG, [λ ctrl, λ 1k,] λ 5k, λ 10k ) Control 300bp 1kb 5kb 10kb http://liulab.dfci.harvard.edu/macs/ Zhang et al, Genome Bio, 2008

SICER Spatial-clustering Identification of ChIP-Enriched Regions 5kb 10kb omictools.com Zang et al. Bioinformatics 2009 24

ChIP-seq peak calling: Parameters Parameter Genome Effective genome rate DNA fragment size Window size Gap size P-value cut-off False discovery rate (FDR) cut-off Remarks Species and reference genome version, e.g. hg38, hg18, mm10, mm9 Fraction of the mappable genome, vary in species, read length, etc. Estimated by default; can specify otherwise Data resolution, usually nucleosome periodicity length, i.e. 200bp (for SICER only) Allowable gaps between eligible windows, usually 2 or 3 windows Threshold for peak calling, from model Threshold for peak calling, BH correction from p-value. 25

ChIP-seq data analysis: Review 1. Read mapping (sequence alignment) 2. Peak calling: MACS or SICER 1. QC 2. DNA fragment size estimation (for Single-end) 3. Pile-up profile generation 4. Peak/signal detection 3. Downstream analysis/integration 26

Data formats fastq: raw sequences BED: chr11 10344210 10344260 255 0 - chr4 76649430 76649480 255 0 + chr3 77858754 77858804 255 0 + chr16 62688333 62688383 255 0 + chr22 33031123 33031173 255 0 - SAM/BAM: aligned sequencing reads bedgraph, Wig, bigwig: pile-up profiles for browser visualization 27

Data flow Raw sequence reads fastq Bowtie/BWA Reference genome Aligned reads MACS/SICER BAM/BED Profile; Peaks bedgraph/wig/bigwig BED 28

Galaxy: web-interface analysis platform https://usegalaxy.org/ 29

Run MACS on Cistrome, a Galaxy-based platform http://cistrome.org/ap/ 30

Run SICER on Galaxy-based platforms http://services.cbib.u-bordeaux.fr/galaxy/ 31

ChIP-seq: Downstream analysis Data visualization UCSC genome browser: http://genome.ucsc.edu/ WashU epigenome browser: http://epigenomegateway.wustl.edu/ IGV: http://software.broadinstitute.org/software/igv/ Meta analysis CEAS: http://liulab.dfci.harvard.edu/ceas/ Integration with gene expression BETA: http://cistrome.org/beta/ MARGE: http://cistrome.org/marge/ Integration with other epigenomic data GREAT: http://great.stanford.edu ENCODE SCREEN: http://screen.umassmed.edu/ MANCIE: https://cran.r-project.org/package=mancie Cistrome DB: http://cistrome.org/db/ 32

BETA: Binding Expression Target Analysis Regulatory Potential TSS P (g i )= j S(i) exp ( ) ij λ j i 33

MARGE: A big data driven, integrative regression and semisupervised approach for predicting functional enhancers samples sample samples enhancer samples selection prediction Wang, Zang et al. Genome Res 2016 34

https://www.encodeproject.org/ ENCODE 35

Cistrome Data Browser http://cistrome.org/db/ 36

ChIP-seq data analysis: Review 1. Read mapping (sequence alignment) 2. Peak calling: MACS or SICER 1. QC 2. DNA fragment size estimation (for Single-end) 3. Pile-up profile generation 4. Peak/signal detection 3. Downstream analysis/integration 37

Summary ChIP-seq is used to profile epigenomes ChIP-seq data analysis MACS for narrow peaks SICER for broad peaks Online tools and resources 38

Further Reading The cancer epigenome: Concepts, challenges, and therapeutic opportunities Science 17 Mar 2017: Vol. 355, Issue 6330, pp.1147-1152 http://science.sciencemag.org/content/355/6330/1147 Heterochromatin Euchromatin Heterochromatin Writer Reader Eraser Nucleosome DNA Histones Nucleus DNMT EZH2 MMSET DOT1L SETD7 MLL1 PRMT1 PRMT3 PRMT5 SMYD2 G9a/GLP CREBBP EP300 MOZ BRD2/3/4 SMARCA2 SMARCA4 BAZ2A PB1 BRPF1 ATAD2 L3MBTL3 WDR5 BRD9 BRD7 CBX7 HDAC JMJD3/UTX LSD1 IDH1* IDH2* Therapies targeting: DNA modifications Histone modifications DNA and histone modifications 39

Thank you very much! zang@virginia.edu http://zanglab.com 40