A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Similar documents
The Epigenome Tools 2: ChIP-Seq and Data Analysis

Chip Seq Peak Calling in Galaxy

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

User Guide. Association analysis. Input

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

EXPression ANalyzer and DisplayER

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Data mining with Ensembl Biomart. Stéphanie Le Gras

Cancer Informatics Lecture

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

DNA-seq Bioinformatics Analysis: Copy Number Variation

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

Metabolomic and Proteomics Solutions for Integrated Biology. Christine Miller Omics Market Manager ASMS 2015

Peak-calling for ChIP-seq and ATAC-seq

Multi-omics data integration colon cancer using proteogenomics approach

Transcript reconstruction

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

High Throughput Sequence (HTS) data analysis. Lei Zhou

cis-regulatory enrichment analysis in human, mouse and fly

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

RNA-seq Introduction

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

Introduction. Introduction

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Transcriptome Analysis

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Micro-RNA web tools. Introduction. UBio Training Courses. mirnas, target prediction, biology. Gonzalo

Introduction to Systems Biology of Cancer Lecture 2

Nature Structural & Molecular Biology: doi: /nsmb.2419

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Session 4 Rebecca Poulos

EPIGENOMICS PROFILING SERVICES

Functional annotation of farm animal genomes: ChIP-seq.

COST ACTION CA IDENTIFYING BIOMARKERS THROUGH TRANSLATIONAL RESEARCH FOR PREVENTION AND STRATIFICATION OF COLORECTAL CANCER (TRANSCOLOCAN)

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation

Systems Biology in Metabolomics

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Table S1. Total and mapped reads produced for each ChIP-seq sample

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

Ambient temperature regulated flowering time

Supplementary Figures

TCGA. The Cancer Genome Atlas

Supplementary Figures

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

microrna analysis Merete Molton Worren Ståle Nygård

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

ChIP-seq data analysis

COST ACTION CA IDENTIFYING BIOMARKERS THROUGH TRANSLATIONAL RESEARCH FOR PREVENTION AND STRATIFICATION OF COLORECTAL CANCER (TRANSCOLOCAN)

Nature Neuroscience: doi: /nn Supplementary Figure 1

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

A Quick-Start Guide for rseqdiff

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

RNA Secondary Structures: A Case Study on Viruses Bioinformatics Senior Project John Acampado Under the guidance of Dr. Jason Wang

RNA SEQUENCING AND DATA ANALYSIS

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

Hands-On Ten The BRCA1 Gene and Protein

Part-II: Statistical analysis of ChIP-seq data

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

The Cancer Genome Atlas & International Cancer Genome Consortium

Assignment 5: Integrative epigenomics analysis

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

David Tamborero, PhD

Supplementary Figure 1 IL-27 IL

Analysis of the peroxisome proliferator-activated receptor-β/δ (PPARβ/δ) cistrome reveals novel co-regulatory role of ATF4

Small RNA Sequencing. Project Workflow. Service Description. Sequencing Service Specification BGISEQ-500 SERVICE OVERVIEW SAMPLE PREPARATION

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Results. Abstract. Introduc4on. Conclusions. Methods. Funding

SUPPLEMENTARY INFORMATION

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland

SUPPLEMENTAL FILE. mir-22 and mir-29a are members of the androgen receptor cistrome modulating. LAMC1 and Mcl-1 in prostate cancer

TPMI Presents: Translational Genomics Research Update, Opportunities and Challenges

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

SUPPLEMENTAL INFORMATION

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery

Tutorial. ChIP Sequencing. Sample to Insight. September 15, 2016

Session 4 Rebecca Poulos

Inference of Isoforms from Short Sequence Reads

MethylMix An R package for identifying DNA methylation driven genes

Golden Helix s End-to-End Solution for Clinical Labs

Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Nature Methods: doi: /nmeth.3115

MODULE 3: TRANSCRIPTION PART II

R2 Training Courses. Release The R2 support team

TrainMALTA Summer School on Epigenomics

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

Hao D. H., Ma W. G., Sheng Y. L., Zhang J. B., Jin Y. F., Yang H. Q., Li Z. G., Wang S. S., GONG Ming*

IPA Advanced Training Course

Transcription:

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis Jian Xu, Ph.D. Children s Research Institute, UTSW

Introduction Outline Overview of genomic and next-gen sequencing technologies Basic concepts of genomic data analysis A practical guide for integrative genomic analysis RNA-seq ChIP-seq Integrated analysis 2

CRI Sequencing Core Facility Illumina NextSeq500 Equipped with illumina NextSeq500 and BaseSpace Onsite server, mounted to BioHPC storage Sequencing types: 1x75bp, 2x75bp, 2x150bp Application: RNA-seq, ChIP-seq, exome-seq, targeted resequencing, amplicon sequencing, WGS, etc. Managed by Xin Liu & Jian Xu More information: http://cri.utsw.edu/sequencing-facility-home/ 3

BIG data Introduction to Omics Genomics, Proteomics, Metabolomics, Omics studies have been greatly advanced by the high-throughput technologies Aim: To characterize and quantify pools of biological molecules that translate into intracellular structure, function and dynamics 4

Next-Gen Sequencing (NGS) Rd1 SP Rd2 SP 5

Illumina Sequence by Synthesis (SBS) Technology Reversible terminator Two or four-color removable fluorescent dyes Ultra resolution, highspeed scanning optics 6

Subjects of Omics Study DNA Genome RNA Transcriptome Protein Proteome Metabolites Metabolome 7

RNA-seq I have the data, I now what? Metabolomics ChIP-seq Animal data Proteomics Clinical data Epigenomics Imaging data 8

Bioinformatics 101 for Bench Scientists 1. Data Generation and QC How to process and QC the data? How to download, organize and store your data? 2. Data Processing What questions do you want to answer? What file formats do you need? 3. Data Analysis and Integration Web-based (e.g. BioHPC, Galaxy/Cistrome) Command lines 4. Data Presentation How to visualize the data? (e.g. Graphing tools) 9

Overview of Genomics Data Analysis Data Generation Data Analysis Sample Collection Sample Preparation Integrated Analysis Data filtering Annotation Pathway reconstruction Modeling Microarray NGS MS Identification/Quantification of Interactions Data Collection Data Visualization Biological Validation Data Processing Functional Candidates Database Testable Hypothesis 10

RNA-seq Why RNA-seq and ChIP-seq? 1. Genes differentially expressed between condition A and B 2. Relative expression levels of genes in condition A and B 3. Functional groups or pathways ChIP-seq 1. Binding sites and densities of factorx across the whole genome 2. Cooperating factors of factorx Integrative analysis of RNA-seq and ChIP-seq 1. Target genes of factorx 2. How factorx cooperates with co-factors (TFs or epi-factors) and epigenetic landscapes to regulate gene expression 3. Mechanism of action and testable hypothesis 11

Overview of RNA-seq and ChIP-seq Analysis RNA-seq ChIP-seq BioHPC NGS Pipeline Cufflinks BWA, MACS CRI ChIP-seq Pipeline RNA-seq Output Files ChIP-seq Output Files Genes_count.txt Genes_fpkm.txt Gene_exp.diff Isoforms_count.txt Isoforms_fpkm.txt Isoform_exp.diff Read count for genes *FPKM of genes Differential gene list Read count for transcripts *FPKM of transcripts Differential transcript list *FPKM: fragments per kilo base of exon per million reads mapped Peaks.bed Peaks.xls Summits.bed Control.wig Treat.wig Control.tdf Treat.tdf Chr locations for called peaks Having more information of calling parameters and statistical data for each peak Can be used to further filter the peaks Chr locations for the centers of called peaks Having the peak intensity information Used in further downstream analysis Used to visualize peak shape and intensity in IGV 12

RNA-seq Analysis using BioHPC NGS Pipeline 13

Data Analysis: RNA-seq RNA-seq BioHPC NGS Pipeline Cufflinks RNA-seq Output Files Genes_count.txt Read count for genes Genes_fpkm.txt *FPKM of genes A gene list GSEA MSigDB DAVID Oncomine cbioportal Gene_exp.diff Differential gene list Isoforms_count.txt Isoforms_fpkm.txt Read count for transcripts *FPKM of transcripts Quantitative gene expression profiles javagsea Isoform_exp.diff Differential transcript list *FPKM: fragments per kilo base of exon per million reads mapped Note: Commonly used gene identities: 1) gene symbol; 2) Entrez gene ID; 3) refseq_mrna (transcripts) 14

What information do we get from RNA-seq? Differential expressed genes between condition A and B Functional Annotation: To look for enriched pathways, biological processes or features, or correlation with other defined gene signatures in different cellular or pathological contexts A list of genes (no quantitative data) GSEA/Molecular.Signature.Da tabas/investigate.gene.sets - > select gene signatures to compare http://www.broadinstitute.org/g sea/msigdb/annotate.jsp Oncomine cbioportal DAVID Bioinformatics Resources /Start.Analysis -> paste your gene list -> select correct identifier (official gene symbol or Refseq_mRNA) -> gene list as list type -> submit -> Functional annotation tools https://david.ncifcrf.gov/tools.jsp 15

ChIP-seq Analysis using CRI ChIP-seq Pipeline 16

17

18

Data Analysis: ChIP-seq Peaks.bed Peaks.xls Summits.bed ChIP-seq ChIP-seq Output Files Chr locations for called peaks Having more information of calling parameters and statistical data for each peak Can be used to further filter the peaks Chr locations for the centers of called peaks Control.wig Having the peak intensity information Treat.wig Used in further downstream analysis Control.tdf Treat.tdf BWA, MACS CRI-ChIP-seq Pipeline Used to visualize peak shape and intensity in IGV 1. Load to UCSC genome browser to view peak sites and to obtain the sequence of a specific site 2. Used in analyses of: Motif enrichment Venn diagram Profiling of common and unique peaks Used in statistical analyses, e.g. Profiling or comparison of binding intensity Correlation of the genomic binding of multiple factors Load to IGV to view peaks 19

Integrative Analysis by Galaxy / Cistrome http://cistrome.org/ap/ 20

Integrative Analysis by Galaxy / Cistrome http://cistrome.org/ap/ 21

Integrative Analysis by Galaxy / Cistrome Tools http://cistrome.org/ap/ View Window Input/Output 22

Integrative Analysis by Galaxy / Cistrome http://cistrome.org/ap/ Correlation Association Binding/Expression Motif 23

Integrative Analysis of ChIP-seq and RNA-seq Data Step 1: Identify and characterize TF binding sites across the genome SeqPose: find motifs Tips: 1) Less than 5000 lines allowed, so need to filter the peaks in peak.xls file using p- value or FDR as a cutoff; 2) Providing clues for cofactors..wig.bed CEAS: Enrichment on chrs and annotation Output:.pdf or.png BETA-minus: find genes* (among all genes) in a given range (e.g. +/- 50kb of TSS) of the binding sites *w/o consideration of differentially regulated genes Conservation plot 24

Integrative Analysis of ChIP-seq and RNA-seq Data Step 2: Identify candidate target genes of the TF ChIP-seq TF_Peaks.bed RNA-seq Diff_genes.txt Direct target genes: 1) the genes differentially regulated between condition A and B; AND 2) with TF binding sites near the given region (e.g. +/- 50kb) from TSS of the genes. Tip: Input file Diff_genes_TSS_ 50kb.txt Galaxy Toolbox -> Operate on Genomic Intervals Compare the peak bed file (peaks.bed) and diff_genes_tss_50kb.bed (genes.bed) file 25

Integrative Analysis of ChIP-seq and RNA-seq Data Step 3: Compare TF binding between conditions or the binding of multiple TFs A.wig A.bed B.bed B.wig Venn diagram A B Search for direct target genes in each condition Galaxy tool -> Operate on Genomic intervals Differential peaks: 1. Common_peaks.bed 2. A_unique_peaks.bed 3. B_unique_peaks.bed SeqPos motif scan Tip: Different motif enrichment would suggest the use of different co-factors. Sitepro: Draw the score of binding signals near given peak regions Heatmap: Draw heatmap near given peak regions and also cluster peaks 26

What to do with the candidate genes? Screen the gene list to identify ones with potential link to your biology story 1. Pathway enrichment (DAVID, GSEA) 2. Gene function and pathological correlations (Genecards, BioGPS, NCBI/Gene, Ensembl) 3. Interaction partner proteins (NCBI/Gene, Ensembl) 4. Expression levels in normal tissues and diseased tissues (Genecards, BioGPS, NCBI/GEO, Ensembl) 5. Mutations, frequency and functional impacts (cbioportal, COSMIC, Ensembl) 6. Testable hypothesis 7. Experimental validation 27

Acknowledgment: Min Ni, Ph.D., GMDP @ CRI Yuannyu Zhang, Ph.D. Yi Du, David Trudgian, Liqiang Wang @ BioHPC Members of Jian Xu Lab @ CRI Questions? 28