Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data

Similar documents
STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

Allelic reprogramming of the histone modification H3K4me3 in early mammalian development

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Bayesian performance

Peak-calling for ChIP-seq and ATAC-seq

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

Nature Structural & Molecular Biology: doi: /nsmb.2419

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Measuring DNA Methylation with the MinION. Winston Timp Department of Biomedical Engineering Johns Hopkins University 12/1/16

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

The Epigenome Tools 2: ChIP-Seq and Data Analysis

Assignment 5: Integrative epigenomics analysis

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data

Cognitive Modeling. Lecture 12: Bayesian Inference. Sharon Goldwater. School of Informatics University of Edinburgh

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

ChIP-seq data analysis

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

Statistical Genetics. Matthew Stephens. Statistics Retreat, October 26th 2012

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials

Bayesian and Frequentist Approaches

Bayesian hierarchical modelling

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

Lesson 87 Bayes Theorem

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

MIR retrotransposon sequences provide insulators to the human genome

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Human Cancer Genome Project. Bioinformatics/Genomics of Cancer:

Model-Based fmri Analysis. Will Alexander Dept. of Experimental Psychology Ghent University

Bayesian Hierarchical Models for Fitting Dose-Response Relationships

Predictive Blood DNA Markers for Breast Cancer Xiang Zhang, Ph.D.

RNA-seq Introduction

Table S1. Total and mapped reads produced for each ChIP-seq sample

Histones modifications and variants

Measuring DNA Methylation with the MinION

Geoffrey Stewart Morrison

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

SUPPLEMENTARY INFORMATION

Supplementary Figure 1. Schematic diagram of o2n-seq. Double-stranded DNA was sheared, end-repaired, and underwent A-tailing by standard protocols.

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

An Introduction to Bayesian Statistics

Supplemental Figure 1. Genes showing ectopic H3K9 dimethylation in this study are DNA hypermethylated in Lister et al. study.

Yue Wei 1, Rui Chen 2, Carlos E. Bueso-Ramos 3, Hui Yang 1, and Guillermo Garcia-Manero 1

Statistical Tolerance Regions: Theory, Applications and Computation

MS&E 226: Small Data

BAYESIAN HYPOTHESIS TESTING WITH SPSS AMOS

A Case Study: Two-sample categorical data

SUPPLEMENTARY INFORMATION

Individualized Treatment Effects Using a Non-parametric Bayesian Approach

Coherence and calibration: comments on subjectivity and objectivity in Bayesian analysis (Comment on Articles by Berger and by Goldstein)

Statistical Decision Theory and Bayesian Analysis

You must answer question 1.

THE INDIRECT EFFECT IN MULTIPLE MEDIATORS MODEL BY STRUCTURAL EQUATION MODELING ABSTRACT

Big Data Training for Translational Omics Research. Session 1, Day 3, Liu. Case Study #2. PLOS Genetics DOI: /journal.pgen.

What is a probability? Two schools in statistics: frequentists and Bayesians.

BayesOpt: Extensions and applications

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

Transcript-indexed ATAC-seq for immune profiling

Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference

Search e Fall /18/15

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Integrated analysis of sequencing data

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.

Epigenetic priors for identifying active transcription factor binding sites

Nature Immunology: doi: /ni Supplementary Figure 1 33,312. Aire rep 1. Aire rep 2 # 44,325 # 44,055. Aire rep 1. Aire rep 2.

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods

Supplementary Information

Introduction to Bayesian Analysis 1

Bayesian Methods for Medical Test Accuracy. Broemeling & Associates Inc., 1023 Fox Ridge Road, Medical Lake, WA 99022, USA;

GENETIC LINKAGE ANALYSIS

Identification of regions with common copy-number variations using SNP array

T-Statistic-based Up&Down Design for Dose-Finding Competes Favorably with Bayesian 4-parameter Logistic Design

Global Epigenetic and Transcriptional Trends among Two Rice Subspecies and Their Reciprocal Hybrids W

PSSV User Manual (V1.0)

Supplemental Figure 1: Asymmetric chromatin maturation leads to epigenetic asymmetries on sister chromatids.

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions

H3K4 demethylase KDM5B regulates global dynamics of transcription elongation and alternative splicing in embryonic stem cells

Meta-analysis of two studies in the presence of heterogeneity with applications in rare diseases

Functional annotation of farm animal genomes: ChIP-seq.

Meta-analysis of few small studies in small populations and rare diseases

The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome

Att vara eller inte vara (en Bayesian)?... Sherlock-conundrum

CTCF-Mediated Functional Chromatin Interactome in Pluripotent Cells

A Brief Introduction to Bayesian Statistics

Electronic Health Record Analytics: The Case of Optimal Diabetes Screening

Figure S1, Beyer et al.

EPIGENOMICS PROFILING SERVICES

Dynamic Causal Modeling

Ideal Observers for Detecting Motion: Correspondence Noise

Bayesian Inference Bayes Laplace

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression

Signal Detection Theory and Bayesian Modeling

Testing a Bayesian Measure of Representativeness Using a Large Image Database

Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features. Tyler Yue Lab

Experimental Design For Microarray Experiments. Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit

A Bayesian Nonparametric Model Fit statistic of Item Response Models

A Race Model of Perceptual Forced Choice Reaction Time

Transcription:

Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data Haipeng Xing, Yifan Mo, Will Liao, Michael Q. Zhang Clayton Davis and Geoffrey House

Using ChIP-seq data to identify islands of histone modification Histone modification: Biochemical modification of histone proteins Binding of modified histones to DNA can influence areas of DNA transcription through the binding of other proteins Main goal: better understand patterns in genome transcription by identifying areas of DNA that are associated with modified histones (using ChIP-seq).

Histone modifications Names for histone modification: H3K27me3 means (right to left) tri-methylation of the 27th lysine residue in the H3-family of histone proteins Histone modification H3K27me3 H3K36me3 Effect on transcription Decreases Increases

A Crash Course in Bayesian Statistics Bayes' Rule:

Coin Flipping Suppose I flip some coins. Out of 14 flips, I get 7 heads. What is the likelihood of this outcome?

Coin Flipping We know the likelihood of the outcome, P(D θ), but we want to estimate θ from the data, i.e. P(θ D).

Back to Bayes' Rule Bayes' Rule: Also Bayes' Rule:

Coin Flipping

Coin Flipping

Hierarchical Models But what about the other coins? Perhaps we know that a mint produces similar coins, with θ values drawn from some distribution.

Bayesian Change Point Detection A simplified explanation: Some process is switching between two states. Detect when it is in state A vs state B. Specified as a hierarchical model; involves simultaneously estimating parameters and hyperparameters.

Bayesian Change Point Detection

The Model The quanta of analysis are 200bp blocks. Read counts are Poisson random variables with parameter θi estimated for each block Change points occur when consecutive θ are not equal, i.e. θ i θ i +1

(Hyper)Parameters The θi are the estimated parameters Change points Ki are derived from the θi These θi are generated from a distribution with hyperparameters α, β, shape parameters for a Gamma distribution p, the probability of θ changing between two blocks

A Hidden Markov Model Similar to other HMM methods, with key improvements: The θ values are estimated from a continuous distribution, whence the infinite-state HMM The posterior distribution can be derived analytically, approximated quickly ( O(n) vs O(n^3) )

BCP calls large islands of histone modification Decreased transcription Increased transcription

BCP outperforms SICER in island Probability of a change point in read depth occurring at each site coverage Window width and allowed gap between windows that are assigned to the same island Table 1. H3K36me3 islands and common associations. parameter Avg. size 1 gene coverage 2 intergenic 3 H3K27me3 4 Rep.1 by 2 5 Rep. 2 by 1 6 pv 1e{ 5 25.8 0.497 0.089 0.019 0.851 0.805 BCP 7 pv 1e{ 4 25.3 0.496 0.089 0.019 0.852 0.804 pv 1e{ 3 24.7 0.494 0.09 0.02 0.852 0.803 pv 1e{ 2 23.9 0.492 0.09 0.021 0.853 0.802 W200-G200 2.7 0.323 0.085 0.021 0.689 0.805 W200-G400 4.5 0.37 0.088 0.025 0.736 0.814 SICER 8 W200-G800 8.7 0.437 0.094 0.032 0.8 0.818 W400-G800 6.8 0.276 0.095 0.032 0.796 0.818 W400-G1200 10.7 0.295 0.098 0.036 0.835 0.816 1.the average island size in kb; 2. the fraction of genes overlapped by an island; 3. the fraction of islands covered by intergenic sequence; 4.the fraction of islands overlapping H3K27me3 islands; 5. the fraction of replicate 1 overlapped by replicate 2; 6. the fraction of replicate 2 overlapped by replicate 1; 7. island coverage: 0.66 0.67; 8. island coverage: 0.62 0.68. doi:10.1371/journal.pcbi.1002613.t001

BCP calls H3K36me3 islands closer to known gene boundaries

BCP islands more consistent with different read depths

BCP makes consistent island calls between different histone modifications Increased transcription Decreased transcription

The following 2 slides are the rest of the graphs, just to have them in case someone asks; I am not planning to cover them during the presentation