Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Similar documents
Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

Peak-calling for ChIP-seq and ATAC-seq

The Epigenome Tools 2: ChIP-Seq and Data Analysis

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

REVIEWERS' COMMENTS: Reviewer #1 (Remarks to the Author):

Nature Structural & Molecular Biology: doi: /nsmb.2419

EPIGENOMICS PROFILING SERVICES

ChIP-seq data analysis

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

CTCF-Mediated Functional Chromatin Interactome in Pluripotent Cells

Part-II: Statistical analysis of ChIP-seq data

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

RNA-seq Introduction

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

MIR retrotransposon sequences provide insulators to the human genome

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

ChipSeq. Technique and science. The genome wide dynamics of the binding of ldb1 complexes during erythroid differentiation

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression

Mechanisms of alternative splicing regulation

Introduction to Systems Biology of Cancer Lecture 2

Epigenetics: The Future of Psychology & Neuroscience. Richard E. Brown Psychology Department Dalhousie University Halifax, NS, B3H 4J1

Sudin Bhattacharya Institute for Integrative Toxicology

RESEARCHER S NAME: Làszlò Tora RESEARCHER S ORGANISATION: Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC)

Eukaryotic transcription (III)

Session 4 Rebecca Poulos

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

The Biology and Genetics of Cells and Organisms The Biology of Cancer

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

DNA-seq Bioinformatics Analysis: Copy Number Variation

Session 4 Rebecca Poulos

ChIPSeq. Technique and science. The genome wide dynamics of the binding of ldb1 complexes during erythroid differentiation

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Introduction. Introduction

Table S1. Total and mapped reads produced for each ChIP-seq sample

Exploring chromatin regulation by ChIP-Sequencing

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

High Throughput Sequence (HTS) data analysis. Lei Zhou

Transcript-indexed ATAC-seq for immune profiling

Genome-wide Association Studies (GWAS) Pasieka, Science Photo Library

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

Introduction to Cancer Bioinformatics and cancer biology. Anthony Gitter Cancer Bioinformatics (BMI 826/CS 838) January 20, 2015

Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features. Tyler Yue Lab

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6

Supplementary Information. Preferential associations between co-regulated genes reveal a. transcriptional interactome in erythroid cells

Yingying Wei George Wu Hongkai Ji

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

Simple, rapid, and reliable RNA sequencing

Genetics and Genomics in Medicine Chapter 6 Questions

Chromatin Structure & Gene activity part 2

Eukaryotic Gene Regulation

User Guide. Association analysis. Input

The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Yue Wei 1, Rui Chen 2, Carlos E. Bueso-Ramos 3, Hui Yang 1, and Guillermo Garcia-Manero 1

SYLLABUS SWI/SNF-dependent tumors

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

6.3 DNA Mutations. SBI4U Ms. Ho-Lau

The Cancer Genome Atlas & International Cancer Genome Consortium

Circular RNAs (circrnas) act a stable mirna sponges

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Nature Genetics: doi: /ng Supplementary Figure 1

Bio 111 Study Guide Chapter 17 From Gene to Protein

Tutorial. ChIP Sequencing. Sample to Insight. September 15, 2016

Sirt1 Hmg20b Gm (0.17) 24 (17.3) 877 (857)

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Clinical Oncology - Science in focus - Editorial. Understanding oestrogen receptor function in breast cancer, and its interaction with the

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

Supplementary Figures

BIO360 Fall 2013 Quiz 1

Ch. 18 Regulation of Gene Expression

Figure S1, Beyer et al.

Supplementary Figure 1 IL-27 IL

A novel ATAC-seq approach reveals lineage-specific reinforcement of the open chromatin landscape via cooperation between BAF and p63

Structural vs. nonstructural proteins

Introduction to Genetics

Measuring DNA Methylation with the MinION. Winston Timp Department of Biomedical Engineering Johns Hopkins University 12/1/16

SUPPLEMENTARY FIGURES

Comparative analyses of histone H3K9 trimethylations in the heart and spleen of normal humans

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Comprehensive nucleosome mapping of the human genome in cancer progression

Early Embryonic Development

Systems Analysis Of Chromatin-Related Protein Complexes In Cancer READ ONLINE

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG)

R2: web-based genomics analysis and visualization platform

3D genome organization in health and disease: emerging opportunities in cancer translational medicine

Statistical analysis of ChIP-seq data

Recombinant Protein Expression Retroviral system

Cellecta Overview. Started Operations in 2007 Headquarters: Mountain View, CA

Nature Immunology: doi: /ni Supplementary Figure 1 33,312. Aire rep 1. Aire rep 2 # 44,325 # 44,055. Aire rep 1. Aire rep 2.

Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding

Transcription:

Elucidating Transcriptional Regulation at Multiple Scales Using High-Throughput Sequencing, Data Integration, and Computational Methods Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012 1

Outline Background Transcriptional Regulation, ENCODE, ChIP-Seq Selected Projects from my PhD work Understanding the technical aspects of ChIP-Seq scoring and how choice of reference sample matters Using high-throughput sequencing to gain a genome-wide view of chromatin remodeling (SWI/SNF complex) Understanding the effects of long-range interactions and genome folding on transcription CAPE: a tool to classify features by RNAPII binding and gene expression 2

g DNA folding Transcriptional Regulation: A Cartoon View TF combinations DNA folding Site-specific binding Holstege and Young, PNAS, 1999 Histone modifications Chromatin remodeling 3 Credit: Adam Steinberg

ENCODE Data Description The ENCODE Project Consortium, PLoS Biology, 2011 4

ChIP-Seq 5

Early ChIP-Seq Questions How should peaks be identified? Which peaks are significant? The ChIP peaks seem obvious, so why not score against randomized background? Are controls or references needed? What biases are present? Does the peak calling need to be tuned for different factors and/or organisms? 6

Highlights from Key Papers Nature Biotechnology, 2009 PNAS, 2009 7

First surprise: Input DNA has structure Input DNA profile shows peaks itself and is not flat The Pol2 antibody is exceptional. For ChIP with a typical antibody, the input DNA peaks could affect the ability to call significant peaks 8

Origin of Input DNA from Nuclear Lysate 1. Reverse cross-links 2. Phenol-chloroform extract 3. Purify DNA 4. Size select DNA 5. Ligate Illumina adapters What happens if we change some of these variables? 9

Hypothesis and Strategy Initial hypothesis: Input DNA peaks will be highest in regions of open chromatin Input DNA peaks also seen in other genomes Strategy Using ChIP-Seq experiments in HeLa S3 and yeast, score all tracks and aggregate signal over interesting features 10

Reference Types We Examined Input DNA ChIP DNA that is not IP ed with an antibody MNase-digested DNA Use MNase to cleave DNA instead of sonication IgG (non-specific antibody) ChIP DNA IP ed with a non-specific antibody Naked DNA Sonicated DNA. Not crosslinked or IP ed. Proteins removed. 11

Both Size Selection and Crosslinking are Necessary Auerbach and Euskirchen et al., PNAS, 2009 12

Aggregation Plot Expressed Genes (TSS) Pol II Input DNA 100-350 bp Naked DNA Input DNA 350-500 bp IgG Mappability MNase 13 Input DNA enriched 4x over background!

Regions Associated with Active Transcription Input DNA 100-350 bp Input DNA 350-500 bp Auerbach and Euskirchen, et al. PNAS, 2009. 14

Regions Associated with Transcriptional Inactivity Input DNA 100-350 bp Input DNA 350-500 bp Auerbach and Euskirchen, et al. PNAS, 2009. 15

What are the peaks? 16

Summary and Bioinformatics Contributions First comprehensive analysis of ChIP-Seq reference DNAs on peak scoring Led to the choice of a preferred reference by our lab for ENCODE Consortium work (IgG) Integration of various data sets with reference DNA types to gain a greater understanding of scoring biases Useful for detecting accessible chromatin regions, particularly as a first pass 17

Generalized Peak Caller 18

Considerations with Early Peak Callers Usually designed around ChIP with an ideal antibody Also usually targeted toward one organism Default parameters typically arise from choices of the experimental collaborator How do peak callers work with more typical antibodies? How about with members of a protein complex? 19

ChIP-Seq of a Large Chromatin Remodeling Complex (SWI/SNF) Paper: Euskirchen and Auerbach, et al., PLoS Genetics, 2011 20

Chromatin Remodeling: Why You Should Care Can change whether a region is accessible to TFs and other proteins Quick way to regulate regions that are actively transcribed Zofall et al., Nature Structural & Molecular Biology, 2006 21

Chromatin Remodelers and Epigenetics de la Serna et al., Nature Reviews Genetics, 2006 22

Role in Cancer SWI/SNF subunit Cancer Mutation Type Reference Ini1 malignant rhabdoid tumors truncating mutations BAF250A/ARID1A BAF250A/ARID1A ovarian clear cell carcinomas transitional cell carcinoma of the bladder somatically acquired, inactivating mutations (1998) Nature 394: 203; (2006) Mod. Pathol. 19: 717 (2010) Science 330: 228; (2010) N. Engl. J. Med. 363:1532 somatic, non-silent mutations (2011) Nat. Genet. 43: 875 BAF200 hepatitis C virus-associated hepatocellular carcinomas somatic, inactivating mutations (2011) Nat. Genet. 43: 828 BAF180 clear cell renal carcinomas somatic, inactivating mutations (2011) Nature 469: 539 Brg1 & Brm Brg1 BAF250A/ARID1, Brg1 & BAF180 non-small cell lung carcinomas lung cancer cell lines, esp. nonsmall cell lung cancers pancreatic cancers 23 unknown; based on negative staining of tissue (2003) Cancer Res. 63: 560 inactivating mutations (2008) Hum. Mutat. 29: 617 various (nonsense, missense, indel, frameshift, rearrangement, splice site) Brd7 breast cancer multi-gene deletion (2012) PNAS 109: E252 (2010) Nature Cell Biol. 12, 380-389

SWI/SNF Has 288 Subunit Combinations! ARID (1a or 1b or 2) * * * * 24

Project Overview Analysis Questions Where does SWI/SNF bind and in what configurations? What other elements are associated with SWI/SNF binding sites? Functional implications (pathway analysis, etc.) Experimental Procedure ChIP-Seq against Brg1, BAF155, BAF170, and Ini1 in HeLa S3 cells Mass spectrometry to inventory co-immunoprecipitating proteins 25

Features We Integrated Feature Platform Source Ini1 Sequencing Euskirchen and Auerbach et al., 2011 Brg1 Sequencing Euskirchen and Auerbach et al., 2011 BAF155 Sequencing Euskirchen and Auerbach et al., 2011 BAF170 Sequencing Euskirchen and Auerbach et al., 2011 RNA Polymerase II Sequencing Rozowsky et al., 2009 IgG Control Sequencing Auerbach and Euskirchen et al., 2009 Lamin A/C Array Euskirchen and Auerbach et al., 2011 Lamin B Array Euskirchen and Auerbach et al., 2011 H3K27me3 Sequencing Cuddapah et al., 2009 CTCF Sequencing Cuddapah et al., 2009 Predicted enhancers Array Heintzman et al., 2009 RNA Polymerase III Sequencing Oler et al., 2010; Barski et al., 2010 RNA-Seq Sequencing Morin et al., 2008 Non-canonical small RNAs Sequencing 26 Affymetrix and CSHL ENCODE Transcription Project, 2009 DNA replication origins Array Cadoret et al., 2008

How to Combine Data? 27

Subunit Breakdown from ChIP-Seq Subunit Number in 49,555 union regions Ini1 24,478 (49%) BAF155 37,921 (77%) BAF170 25,433 (51%) Brg1 12,317 (25%) SWI/SNF Subunit Combinations Total Observed SWI/SNF high-confidence union set 49,555 Two or more subunits 30,310 Three or more subunits 15,535 Core set: Ini1, BAF155, and BAF170 (may include Brg1) 9,760 Ini1, BAF155, BAF170, and Brg1 4,750 28

SWI/SNF Co-occurrences CTCF, Pol II Enhancers, 5 ends, (any combination) SWI/SNF Union Set (49,555 regions) SWI/SNF Core Set (9,760 regions) 44,755 (90%) 8,968 (92%) Unclassified 4,800 (10%) 792 (8%) RNA Pol II Sites 19,669 (40%) 6,562 (67%) Putative Enhancers 21,228 (43%) 3,431 (35%) CTCF Sites 8,542 (17%) 1,692 (17%) 5 ends of Ensembl protein-coding genes (within 2.5 kb) 14,291 (29%) 4,089 (42%) 29

Association of Subunit Combinations with Transcription Levels Euskirchen and Auerbach, et al. PLoS Genetics, 2011. 30

Pathway Analysis Euskirchen and Auerbach, et al. PLoS Genetics, 2011. 31

Overrepresented GO Categories (Mass Spectrometry) Euskirchen and Auerbach, et al. PLoS Genetics, 2011. 32

Summary and Bioinformatics Contributions Different peak scoring criteria for ubiquitous factors Inferring information about a complex given ChIP-Seq from subunits Overall, SWI/SNF binds very generally, but is enriched at 5 ends, genes associated with cell cycle, DNA repair, and cancer. 33

SWI/SNF and DNA Looping Euskirchen and Auerbach, et al. PLoS Genetics, 2011. CIITA locus (~150 kb) 34

Exploring Transcription, DNA Folding, and Nuclear Organization in a Multidimensional Context Paper: Li, Ruan, Auerbach, and Sandhu, et al., Cell, 2012 35

ChIA-PET Chromatin Interaction Analysis by Paired End ditag Sequencing Collaboration with Stanford and Genome Institute of Singapore In addition to ChIA-PET method, FISH, qpcr, enhancer assays, and other methods used for validation. Question: How does transcriptional regulation work in 3-D space on an intrachromosomal level? 36

So How Does ChIA-PET Work? (Cliffs Notes Version) ChIP-Seq ChIA-PET DNA 1 DNA 1 D Linker DNA 2 DNA 2 37

The Textbook Version 38

The Textbook Version 39

First Goal - Transcription Factories Sutherland and Bickmore. Nature Reviews Genetics, 2009. 40

Second Goal - Formation of Protein Complexes PJ Farnham, Nature Reviews Genetics. 2009. 41

Models of Transcription Li, Ruan, Auerbach, and Sandhu, et al. Cell, 2012. 42

Gene Expression Characteristics Li, Ruan, Auerbach, and Sandhu, et al. Cell, 2012. 43

Binding of Different TFs Across Models Li, Ruan, Auerbach, and Sandhu, et al. Cell, 2012. 44

IRS1 and T2D: Long Range Interactions and Disease Li, Ruan, Auerbach, and Sandhu, et al. Cell, 2012. 45

ChIA-PET Conclusions Active regions are connected to other active regions Some factors are present at promoters while others are brought in by LRI Most interactions follow the basal promoter model, but most genes are involved in multigene complexes Long range interactions and its role in disease 46

Bioinformatics Contributions Integration of LRI data with ChIP-Seq, RNA- Seq, etc., to look at transcription as a system Basis for future studies in how various protein complexes are formed in vivo 47

CAPE - Coupled Analysis of Polymerase and Expression 48

Combining RNAPII ChIP-Seq with RNA- Seq A natural experiment for transcription analysis Simple to generate, gain a lot of information Many paired datasets available in public repositories Can identify transcripts with unexpected relationships between binding and expression compare to other organisms/samples/conditions 49

CAPE Summary Publicly available tool designed to categorize features based on expression & RNAPII binding Open-source and multiplatform (Java) Designed to work on diverse sets of genomes out of the box, but also allows for parameter customization Useful for comparative genomics (e.g. modencode) Two modules: CAPE-analyze and CAPE-compare 50

CAPE: Coupled Analysis of Polymerase Binding and Expression Auerbach et al. In revision. 51

Sample CAPE-analyze Output 52

Sample CAPE-compare Output (raw) 53

Sample CAPE-compare Output (HTML) 54

Sample CAPE-compare Output (Venn) 55

Overall Summary Technical implications of scoring ChIP-Seq data (PNAS) Considerations when analyzing data from ChIP-Seq experiments targeted to non-standard transcription factors and protein complexes (PLoS Genetics) How DNA folding affects how we view transcription and ChIP-Seq data (Cell) New, robust tool to quickly classify transcripts/genes based on mrna abundance and RNAPII binding levels 56

Other Work While at Yale Co-author of 14 peer-reviewed papers while at Yale (4 as primary or starred) 12 published 2 in press One manuscript being revised for resubmission 57

Acknowledgements 58

Questions? 59