Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Similar documents
Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Nature Structural & Molecular Biology: doi: /nsmb.2419

High Throughput Sequence (HTS) data analysis. Lei Zhou

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Peak-calling for ChIP-seq and ATAC-seq

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

EPIGENOMICS PROFILING SERVICES

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

Assignment 5: Integrative epigenomics analysis

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

MIR retrotransposon sequences provide insulators to the human genome

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

The Epigenome Tools 2: ChIP-Seq and Data Analysis

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6

The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Stem Cell Epigenetics

RNA-seq Introduction

V24 Regular vs. alternative splicing

Allelic reprogramming of the histone modification H3K4me3 in early mammalian development

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Histones modifications and variants

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

Mechanisms of alternative splicing regulation

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

Results. Abstract. Introduc4on. Conclusions. Methods. Funding

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

ChIP-seq data analysis

Genome-wide Association Studies (GWAS) Pasieka, Science Photo Library

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

MODULE 3: TRANSCRIPTION PART II

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

the reaction was stopped by adding glycine to final concentration 0.2M for 10 minutes at

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

V23 Regular vs. alternative splicing

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Introduction to Systems Biology of Cancer Lecture 2

EXPression ANalyzer and DisplayER

cis-regulatory enrichment analysis in human, mouse and fly

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

Eukaryotic transcription (III)

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

Supplementary Figures

Genetics and Genomics in Medicine Chapter 6 Questions

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Where Splicing Joins Chromatin And Transcription. 9/11/2012 Dario Balestra

Nature Genetics: doi: /ng Supplementary Figure 1

Data mining with Ensembl Biomart. Stéphanie Le Gras

Yue Wei 1, Rui Chen 2, Carlos E. Bueso-Ramos 3, Hui Yang 1, and Guillermo Garcia-Manero 1

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Integrated analysis of sequencing data

CTCF-Mediated Functional Chromatin Interactome in Pluripotent Cells

Nature Immunology: doi: /ni Supplementary Figure 1. DNA-methylation machinery is essential for silencing of Cd4 in cytotoxic T cells.

Table S1. Total and mapped reads produced for each ChIP-seq sample

Effects of UBL5 knockdown on cell cycle distribution and sister chromatid cohesion

SUPPLEMENTARY FIGURES: Supplementary Figure 1

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures

Epigenetics DNA methylation. Biosciences 741: Genomics Fall, 2013 Week 13. DNA Methylation

Myogenic Differential Methylation: Diverse Associations with Chromatin Structure

User Guide. Association analysis. Input

H3K4 demethylase KDM5B regulates global dynamics of transcription elongation and alternative splicing in embryonic stem cells

Chromosome-Wide Analysis of Parental Allele-Specific Chromatin and DNA Methylation

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Supplemental Figure 1. Genes showing ectopic H3K9 dimethylation in this study are DNA hypermethylated in Lister et al. study.

Dissecting gene regulation network in human early embryos. at single-cell and single-base resolution

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

Supplementary Figures

Open Access RESEARCH. Background Malaria is a major public health problem in many developing countries, with the parasite Plasmodium

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Cluster Dendrogram. dist(cor(na.omit(tss.exprs.chip[, c(1:10, 24, 27, 30, 48:50, dist(cor(na.omit(tss.exprs.chip[, c(1:99, 103, 104, 109, 110,

Dynamic reprogramming of DNA methylation in SETD2- deregulated renal cell carcinoma

Part-II: Statistical analysis of ChIP-seq data

Inference of Isoforms from Short Sequence Reads

Epigenetic programming in chronic lymphocytic leukemia

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Yingying Wei George Wu Hongkai Ji

An Unexpected Function of the Prader-Willi Syndrome Imprinting Center in Maternal Imprinting in Mice

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland

ESCs were lysed with Trizol reagent (Life technologies) and RNA was extracted according to

Fragile X Syndrome. Genetics, Epigenetics & the Role of Unprogrammed Events in the expression of a Phenotype

Regulation of Gene Expression in Eukaryotes

RNA- seq Introduc1on. Promises and pi7alls

SUPPLEMENTAL INFORMATION

ChIPSeq. Technique and science. The genome wide dynamics of the binding of ldb1 complexes during erythroid differentiation

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

Transcription:

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne

ChIP-seq against histone variants: Biological questions ChIP-seq data for histone variants: Which regions of the genome are enriched in a particular variant Position of individual nucleosomes, Epigenetic genome organization, chromatin motifs Combined analysis Position of TF binding sites relative to nucleosomes. Position of transcription start sites relative to nucleosomes Advanced analysis: Identification of motifs (chromatin pattern recognition)

ChIP-seq data analysis: Biological questions Histone modifications: Hypothetical association with functional domains H3K4me3 promoters H2AZ promoters H3K4me1 enhancers H3K36me3 transcribed regions H3K9me3 gene silencing

Histone modifications: Autocorrelation analysis Top: Auto-correlation plot of H3K36me3 in mouse ES cells Bottom: Auto-correlation plot of H3K4me3 in mouse ES cells Observations: H3K36me3 long range correlation H3K4me3 short range correlation Count densities in enriched regions: H3K36me3: 0.005 (1 per nuclesome) H3k4me3: 0.04 (6 per nucleosome)

Fragment size distribution: ChIP against H3K4me3: Top: Nucleosome-treated Barski et al. (2007): human CD4+ cell lines Bottom: sonicated material Mikkelsen et al. (2007): mouse ES cells Conclusion: Nuclease treatment produces higher resolution data

Looking at histone modifications at a particular genomic region Based on data: from Barski et al. 2007, Cell 129, 823-837. ChIP-Seq tags from both strands centered by 70 bp. WIG file resolution: H3K4me1 50 bp, H3K4me3 10 bp, H2A.Z 25 bp.

Analysis of ChIP-Seq signal around transcription start sites (TSS) Input to ChIP-Cor analysis: Reference feature: ~40 000 annotated TSS from ENSEMBL Target features ChIP-Seq (Barski et al. 2007): H3K4me3 (centered) H2AZ (centered) POL II (centered) CTCF (centered) Target features sequences motifs (Bucher 1990) GC-box (predicted with weight matrix) CCAAT-box (weight matrix) Questions: Arrangement and histone composition of promoter nucleosomes Positional preference of promoter motifs relative to nucleosomes

Scaling: global over-representation ChIP-Seq signal around annotated TSS

Promoter elements and chromatin signatures around TSS Scaling: global over-representation, each curved divided by mean before superposition

Epigenetic promoter signatures in human: Summary of observations: Strong nucleosome phasing downstream of the TSS Strong enrichment in H3K4me3 and H2AZ (known before) H3K4me3 prefers downstream nucleosomes. first downstream nucleosome centered at pos. +120 (invariable) first upstream nucleosome centered at pos. < 180 (variable) minimal nucleosome-free region of 150 bp nucleosome-free region coincides with location of promoter elements Interesting difference to yeast promoters: first downstream nucleosome centered at +60 (TSS within nucleosome) nucleosome-free region about 120 bp

Nucleosome phasing around in vivo TFBS Nucleosome phasing: nucleosomes occur at same positions in different chromosomes. ChIP-Seq data: Phasing leads to periodic signal distributions of about 180 bp Are nucleosomes phased in human cells? Tentative answer: in certain regions only What determines nucleosome phasing? Tentative answer: TF binding sites and space competition Nucleosome-intrinsic binding preferences to specific sequences Expected ChIP-Seq profiles:

Example: CTCF sites (Chip-Seq data from Barski et al. 2007) CTCF weight matrix (Sequence logo): Average # of counts per predicted site Matrix derived from 3888 highly enriched peak regions Maximal score of weigh matrix: 90 Threshold score: 50 Human genome contains 77924 sites with score 50

H2AZ nucleosome distribution around predicted CTCF sites Reference feature matrix-predicted CTCF sites (score 50) Target: centered (±65 bp) H2AZ tags Related paper presentation: The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome. Fu et al. - PLoS Genetics - 2008

Partitioning algorithms Purpose: To partition the genome into signal-rich and signal-poor regions Applications: Variable peaks size Differential chromatin state analysis Algorithm type 1: Maximal scoring segment approach A segment is optimally delineated if It does not include a higher scoring segment It is not included in a higher scoring segment Algorithm type 2: Mixture model/optimal parsing approach Example: Chip-part (see following slides)

Overview of ChIP-partition Purpose: Segmentation: splitting of genome in signal-rich and signal-poor regions Input: Centered (or un-centered) tag counts Output: List of signal-enriched regions (beginning, end) Principle: Optimization of a partition scoring function by a fast dynamic programming algorithm Two parameters: count density threshold, transition penalty Scoring function for global results: sum of scores of Transitions (penalty) Scores of signal-rich region: length (local count-density threshold) Score of signal-poor region: length (threshold local count-density)

Viewing the results of the partitioning program in the genome browser Custom tracks: Mikkelsen07: results of ChIP-partition program (BED file) ESHyb.K36: from: http://www.isrec.isb-sib.ch/wig/hsm07_eshyb.k36_m_chr12.wig

Chromatin domains are differentially occupied in different cell lines Mouse Nanog region, H3K4me3 profiles. ES, ESHyb: embryonic stem cell lines, MEF: embryonic fibroblasts, NP neural progenitors Data from Mikkelsen et al. (2007). Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553-560.

Using partitioning for differential histone modifications A) ES ES.Hyb MEF NP B) CD1 CD2 CD3 CD4 CD5 CD6 CD7 CD8 C) CD1 CD2 CD3 CD4 CD5 CD6 CD7 CD7 ES 9 21 46 15 11 19 9 4 ES.Hyb 13 38 37 26 10 13 10 6 MEF 3 0 12 7 6 17 14 4 NP 1 0 1 4 0 14 8 3

CAGE - Cap Analysis of Gene expression Method based on chemical modification of Cap-structure. Figure shows old version for SAGEinspired concatemersequencing Source: Kodzius, et al. (2006). CAGE: cap analysis of gene expression. Nat Methods 3, 211-222.

Cage Data Analysis Goals: Identification of transcription start sites for promoter analysis Quantitative measurement of alternative promoter usage Detection of new transcripts Analysis methods Mapping of tags to genome (as with ChIP-Seq) Segmentation and Peak identification Classification of promoters according to initiation site patterns. Related paper presentation: A code for transcription initiation in mammalian genomes. Frith et al. - Genome Research - 2008

Example of CAGE tag distribution around a promoter site Detailed view below Larger regions (H3K4me1 in blue) Text:

RNA-Seq Source: Mortazavi et al. (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5, 621-628.

RNA-Seq: Some technical issues Mapping reads to the genome: 1. First try to map co-linear reads by Eland or similar software 2. Try to map remaining reads to synthetic splice-junction database Synthetic splice junction database may be based on Combinations of donors and acceptors from transcriptome database Combination of predicted donors and acceptors within genome interval Normalized expression measure: RPKM: reads per kilobase of exon model per million mapped reads Related paper presentation: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Marioni et al. - Genome Research 2008 A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Sultan et al. - Science - 2008 Further Reading: Wang et al. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470-476.

Methylome analysis Goal: Identification of the degree of CpG methylation at regional scale or bp resolution MeDIP-Seq: IP against methylated DNA UHT sequencing Computational issues: mapping of reads, segmentation Bisulfate sequencing: Bisulfate converts non-methylated C into U (sequenced as T) Computational issues: Mapping of reads to genome not entirely trivial! Related paper presentation: A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Down et al. - Nature Biotechnology - 2008