Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data Haipeng Xing, Yifan Mo, Will Liao, Michael Q. Zhang Clayton Davis and Geoffrey House
Using ChIP-seq data to identify islands of histone modification Histone modification: Biochemical modification of histone proteins Binding of modified histones to DNA can influence areas of DNA transcription through the binding of other proteins Main goal: better understand patterns in genome transcription by identifying areas of DNA that are associated with modified histones (using ChIP-seq).
Histone modifications Names for histone modification: H3K27me3 means (right to left) tri-methylation of the 27th lysine residue in the H3-family of histone proteins Histone modification H3K27me3 H3K36me3 Effect on transcription Decreases Increases
A Crash Course in Bayesian Statistics Bayes' Rule:
Coin Flipping Suppose I flip some coins. Out of 14 flips, I get 7 heads. What is the likelihood of this outcome?
Coin Flipping We know the likelihood of the outcome, P(D θ), but we want to estimate θ from the data, i.e. P(θ D).
Back to Bayes' Rule Bayes' Rule: Also Bayes' Rule:
Coin Flipping
Coin Flipping
Hierarchical Models But what about the other coins? Perhaps we know that a mint produces similar coins, with θ values drawn from some distribution.
Bayesian Change Point Detection A simplified explanation: Some process is switching between two states. Detect when it is in state A vs state B. Specified as a hierarchical model; involves simultaneously estimating parameters and hyperparameters.
Bayesian Change Point Detection
The Model The quanta of analysis are 200bp blocks. Read counts are Poisson random variables with parameter θi estimated for each block Change points occur when consecutive θ are not equal, i.e. θ i θ i +1
(Hyper)Parameters The θi are the estimated parameters Change points Ki are derived from the θi These θi are generated from a distribution with hyperparameters α, β, shape parameters for a Gamma distribution p, the probability of θ changing between two blocks
A Hidden Markov Model Similar to other HMM methods, with key improvements: The θ values are estimated from a continuous distribution, whence the infinite-state HMM The posterior distribution can be derived analytically, approximated quickly ( O(n) vs O(n^3) )
BCP calls large islands of histone modification Decreased transcription Increased transcription
BCP outperforms SICER in island Probability of a change point in read depth occurring at each site coverage Window width and allowed gap between windows that are assigned to the same island Table 1. H3K36me3 islands and common associations. parameter Avg. size 1 gene coverage 2 intergenic 3 H3K27me3 4 Rep.1 by 2 5 Rep. 2 by 1 6 pv 1e{ 5 25.8 0.497 0.089 0.019 0.851 0.805 BCP 7 pv 1e{ 4 25.3 0.496 0.089 0.019 0.852 0.804 pv 1e{ 3 24.7 0.494 0.09 0.02 0.852 0.803 pv 1e{ 2 23.9 0.492 0.09 0.021 0.853 0.802 W200-G200 2.7 0.323 0.085 0.021 0.689 0.805 W200-G400 4.5 0.37 0.088 0.025 0.736 0.814 SICER 8 W200-G800 8.7 0.437 0.094 0.032 0.8 0.818 W400-G800 6.8 0.276 0.095 0.032 0.796 0.818 W400-G1200 10.7 0.295 0.098 0.036 0.835 0.816 1.the average island size in kb; 2. the fraction of genes overlapped by an island; 3. the fraction of islands covered by intergenic sequence; 4.the fraction of islands overlapping H3K27me3 islands; 5. the fraction of replicate 1 overlapped by replicate 2; 6. the fraction of replicate 2 overlapped by replicate 1; 7. island coverage: 0.66 0.67; 8. island coverage: 0.62 0.68. doi:10.1371/journal.pcbi.1002613.t001
BCP calls H3K36me3 islands closer to known gene boundaries
BCP islands more consistent with different read depths
BCP makes consistent island calls between different histone modifications Increased transcription Decreased transcription
The following 2 slides are the rest of the graphs, just to have them in case someone asks; I am not planning to cover them during the presentation