Package NarrowPeaks. August 3, Version Date Type Package

Similar documents
Peak-calling for ChIP-seq and ATAC-seq

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

High Throughput Sequence (HTS) data analysis. Lei Zhou

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

ChIP-seq data analysis

Package msgbsr. January 17, 2019

Exercises: Differential Methylation

Chip Seq Peak Calling in Galaxy

The Epigenome Tools 2: ChIP-Seq and Data Analysis

Package pepdat. R topics documented: March 7, 2015

Package splicer. R topics documented: August 19, Version Date 2014/04/29

Package leukemiaseset

Package TFEA.ChIP. R topics documented: January 28, 2018

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Assignment 5: Integrative epigenomics analysis

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

Data mining with Ensembl Biomart. Stéphanie Le Gras

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

User Guide. Association analysis. Input

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Package AbsFilterGSEA

Package bgafun. R topics documented: August 12, Type Package

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

Part-II: Statistical analysis of ChIP-seq data

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

EXPression ANalyzer and DisplayER

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Tutorial. ChIP Sequencing. Sample to Insight. September 15, 2016

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

MODULE 3: TRANSCRIPTION PART II

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Supplementary Figure 1 IL-27 IL

Package prognosticroc

Package GANPAdata. February 19, 2015

TCGA. The Cancer Genome Atlas

SUPPLEMENTAL INFORMATION

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Package AIMS. June 29, 2018

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

Package pathifier. August 17, 2018

Package MSstatsTMT. February 26, Title Protein Significance Analysis in shotgun mass spectrometry-based

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Analysis of ChIP-Seq Data with R/Bioconductor

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

Statistical analysis of ChIP-seq data

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Package citccmst. February 19, 2015

Supplementary Figures

EpiGRAPH regression: A toolkit for (epi-)genomic correlation analysis and prediction of quantitative attributes

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

The corrected Figure S1J is shown below. The text changes are as follows, with additions in bold and deletions in bracketed italics:

Package propoverlap. R topics documented: February 20, Type Package

Package ORIClust. February 19, 2015

Package IMAS. March 10, 2019

Package speff2trial. February 20, 2015

sirna count per 50 kb small RNAs matching the direct strand Repeat length (bp) per 50 kb repeats in the chromosome

the reaction was stopped by adding glycine to final concentration 0.2M for 10 minutes at

DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mrna-seq data

Lecture 21. RNA-seq: Advanced analysis

EPIGENOMICS PROFILING SERVICES

Package diggitdata. April 11, 2019

Bangor University Laboratory Exercise 1, June 2008

Package CompareTests

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

PSSV User Manual (V2.1)

Migraine Dataset. Exercise 1

Expert-guided Visual Exploration (EVE) for patient stratification. Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland

cis-regulatory enrichment analysis in human, mouse and fly

Reveal Relationships in Categorical Data

DNA-seq Bioinformatics Analysis: Copy Number Variation

ncounter Data Analysis Guidelines for Copy Number Variation (CNV) Molecules That Count NanoString Technologies, Inc.

Supplemental Figure 1. Small RNA size distribution from different soybean tissues.

A Quick-Start Guide for rseqdiff

Excel Solver. Table of Contents. Introduction to Excel Solver slides 3-4. Example 1: Diet Problem, Set-Up slides 5-11

Package ClusteredMutations

Package golubesets. August 16, 2014

Package p3state.msm. R topics documented: February 20, 2015

splicer: An R package for classification of alternative splicing and prediction of coding potential from RNA-seq data

Package ordibreadth. R topics documented: December 4, Type Package Title Ordinated Diet Breadth Version 1.0 Date Author James Fordyce

Review Questions in Introductory Knowledge... 37

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.

ChIP-Seq Analysis with R and Bioconductor

Nature Methods: doi: /nmeth.3115

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Identification of mirnas in Eucalyptus globulus Plant by Computational Methods

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Supplementary Figures

Package tumgr. February 4, 2016

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Transcription:

Package NarrowPeaks August 3, 2013 Version 1.5.0 Date 2013-02-13 Type Package Title Analysis of Variation in ChIP-seq using Functional PCA Statistics Author Pedro Madrigal <pm@engineering.com>, with contributions from Pawel Krajewski <pkra@igr.poznan.pl> Description The double aim of the package is to apply a functional version of principal component analysis (FPCA) to: (1) Process data in wiggle track format (WIG) commonly produced by ChIP-seq peak finders by applying FPCA over a set of selected candidate enriched regions. This is done in order to shorten the genomic locations accounting for a given proportion of variation among the enrichment-score profiles. The function narrowpeaks allows the user to discriminate between binding regions in close proximity to each other and to narrow down the length of the putative transcription factor binding sites while preserving the information present in the variability of the dataset and capturing major sources of variation. (2) Analyze differential variation when multiple ChIP-seq samples need to compared. The function narrowpeaksdiff quantifies differences between the tag-enrichment, and uses nonparametric tests on the FPC scores for testing differences between conditions. Depends R (>= 2.10.0), splines Maintainer Pedro Madrigal <pm@engineering.com> Imports GenomicRanges, IRanges, fda, CSAR Suggests rtracklayer, GenomicRanges, CSAR License Artistic-2.0 biocviews Visualization, ChIPseq, Transcription, Genetics LazyLoad yes LazyData yes 1

2 NarrowPeaks-package R topics documented: NarrowPeaks-package................................... 2 narrowpeaks......................................... 3 wig2csarscore...................................... 6 wigfile_test......................................... 8 Index 9 NarrowPeaks-package Analysis of Variation in ChIP-seq using Functional PCA Description Details The package processes data in wiggle track format (WIG) commonly produced by several ChIP-seq data analysis tools by applying functional version of principal component analysis (FPCA) over a set of selected candidate enriched regions. This is done in order to shorten the genomic locations accounting for a given proportion of variation among the enrichment-score profiles. It allows the user to discriminate between binding regions in close proximity to each other and to narrow down the length of the putative transcription factor binding sites while preserving the information present in the variability of the dataset and capturing major sources of variation. Package: NarrowPeaks Type: Package Version: 1.1 Date: 2013-01-09 License: Artistic-2.0 LazyLoad: yes Author(s) Pedro Madrigal, with contributions from Pawel Krajewski <pkra@igr.poznan.pl> Maintainer: Pedro Madrigal <pm@engineering.com> References Madrigal, P. et al. (submitted) NarrowPeaks: an R/Bioconductor package for quantitative analysis of variation in ChIP-seq datasets. Examples owd <- setwd(tempdir())

narrowpeaks 3 ##For this example we will use a subset of the AP1 ChIP-seq data (Kaufmann et ##al., 2010) ##The data is obtained after analysis using the CSAR package available in ##Bioconductor data("narrowpeaks-dataset") writelines(wigfile_test, con="wigfile.wig") ##Write binary files with the WIG signal values for each chromosome ##independently and obtain regions of read-enrichment with score values greater ##than t, allowing a gap of g. Data correspond to enriched regions found up ##to 105Kb in the Arabidopsis thaliana genome wigscores <- wig2csarscore(wigfilename="wigfile.wig", nbchr = 1, chrle=c(30427671)) gc(reset=true) library(csar) candidates <- sigwin(experiment=wigscores$infoscores, t=1.0, g=30) ##Narrow down ChIPSeq enriched regions by functional PCA shortpeaks <- narrowpeaks(inputreg=candidates, scoresinfo=wigscores$infoscores, lmin=0, nbf=150, rpenalty=0, nderiv=0, npcomp=2, pv=80, pmaxscor=3.0, ms=0) ###Export GRanges object with the peaks to annotation tracks in various ##formats. E.g.: library(genomicranges) names(elementmetadata(shortpeaks$broadpeaks))[3] <- "score" names(elementmetadata(shortpeaks$narrowpeaks))[2] <- "score" library(rtracklayer) export.bedgraph(object=candidates, con="csar.bed") export.bedgraph(object=shortpeaks$broadpeaks, con="broadpeaks.bed") export.bedgraph(object=shortpeaks$narrowpeaks, con="narrowpeaks.bed") setwd(owd) narrowpeaks Calculate Narrow Peaks from Enrichment-Score Profiles forming Broad Peaks Description Calculate narrow peaks from enrichment-score profiles forming broad peaks. Usage narrowpeaks(inputreg, scoresinfo, lmin = 0, nbf = 50, rpenalty= 0, nderiv= 0, npcomp = 5, pv = 80, pmaxscor = 0.0, ms = 0)

4 narrowpeaks Arguments inputreg scoresinfo lmin nbf rpenalty nderiv npcomp pv Output of the function sigwin in package CSAR. Output infoscores in the function wig2csarscore, or the function ChIPseqScore after data analysis with package CSAR. Minimum length of an enriched region from the WIG file to be processed. Integer value. Number of order 4 B-spline basis functions that will represent the shape of each candidate site. Integer value. Smoothing parameter for derivative penalization. Positive numeric value. Order of derivative penalization, if rpenalty>0. Integer value. Number of functional principal components. Integer value greater than or equal to nbf. Minimum percentage of variation to take into account during the analysis. Numeric value in the range 0-100. pmaxscor Cutoff for trimming of scoring function. Numeric value in the range 0-100. ms Details Peaks closer to each other than ms nucleotides are to be merged in the final list. Integer value. This function produces shortened sites from a list of candidate transcription factor binding sites of arbitrary extension and shape. First, the enrichment signal from each candidate site is represented by a smoothed function constructed using a linear combination of order 4 B-spline basis functions. The data values are fitted using either least squares (if rpenalty = 0), or penalized residuals sum of squares (spline smoothing if rpenalty > 0). Then, a functional principal component analysis for npcomp eigenfunctions is performed (Ramsay and Silverman, 2005), giving as a result a set of probe scores (principal component scores) which sum of squares is reported in elementmetadata(broadpeaks)[,"fpcascore"].the higher the value of fpcascore, the higher the variance that candidate peak accounts for within the original data. Details on the usage of semi-metrics in functional PCA is described in Ferraty and Vieu, 2006. After that, we impose the condition that total scoring function for each reported narrow peak must be at least pmaxscor per cent of the maximum value. Max value is calculated from a set of scoring functions using only the eigenfunctions required to achieve pv percent of variance. A new set of scores is computed using trimmed versions of the eigenfunctions (Madrigal et al., submitted), and the root square is stored in elementmetadata(narrowpeaks)[,"trimmedscore"]. Value A list containing the following elements: fdaprofiles broadpeaks A functional data object encapsulating the enrichment profiles (see fda package. To plot the data use plot.fd(fdaprofiles)). Description of the peaks prior to trimming. A GRanges object (see GenomicRanges package) with the information: seqnames (chromosome), ranges

narrowpeaks 5 narrowpeaks reqcomp pvar (start and end of the candidate site), strand (not used), max (maximum signal value for candidate site), average (mean signal value for candidate site), fpcascore (sum of squares of the first reqcomp principal component scores for candidate site). Description of the peaks after trimming. A GRanges object (see GenomicRanges package) with the information: seqnames (chromosome), ranges (start and end after trimming), strand (not used), broadpeak.subpeak, trimmedscore (see details), narroweddownto (length reduction relative to the candidate), merged (logical value). Number of functional principal components used. Integer value. Total proportion of variance accounted for by the reqcomp components used. Numeric value in the range 0-100 (always greater than or equal to argument pv). Author(s) Pedro Madrigal, <pm@engineering.com> References Madrigal, P. et al. (submitted) NarrowPeaks: an R/Bioconductor package for quantitative analysis of variation in ChIP-seq datasets. Ramsay, J.O. and Silverman, B.W. (2005) Functional Data Analysis. New York: Springer. Ferraty, F. and Vieu, P. (2006) Nonparametric Functional Data Analysis. New York: Springer. See Also wig2csarscore, NarrowPeaks-package Examples owd <- setwd(tempdir()) ##For this example we will use a subset of the AP1 ChIP-seq data (Kaufmann et ##al., 2010) ##The data is obtained after analysis using the CSAR package available in ##Bioconductor data("narrowpeaks-dataset") writelines(wigfile_test, con="wigfile.wig") ##Write binary files with the WIG signal values for each chromosome ##independently and obtain regions of read-enrichment with score values greater ##than t, allowing a gap of g. Data correspond to enriched regions found up ##to 105Kb in the Arabidopsis thaliana genome wigscores <- wig2csarscore(wigfilename="wigfile.wig", nbchr = 1, chrle=c(30427671)) gc(reset=true) library(csar) candidates <- sigwin(experiment=wigscores$infoscores, t=1.0, g=30) ##Narrow down ChIPSeq enriched regions by functional PCA

6 wig2csarscore shortpeaks <- narrowpeaks(inputreg=candidates, scoresinfo=wigscores$infoscores, lmin=0, nbf=150, rpenalty=0, nderiv=0, npcomp=2, pv=80, pmaxscor=3.0, ms=0) ###Export GRanges object with the peaks to annotation tracks in various ##formats. E.g.: library(genomicranges) names(elementmetadata(shortpeaks$broadpeaks))[3] <- "score" names(elementmetadata(shortpeaks$narrowpeaks))[2] <- "score" library(rtracklayer) export.bedgraph(object=candidates, con="csar.bed") export.bedgraph(object=shortpeaks$broadpeaks, con="broadpeaks.bed") export.bedgraph(object=shortpeaks$narrowpeaks, con="narrowpeaks.bed") setwd(owd) wig2csarscore Convert Data from a Wiggle Track (WIG) File to CSAR Binary Format Description Convert data from a wiggle track (WIG) file to CSAR binary format and extract enriched regions. Usage wig2csarscore(wigfilename, nbchr, chrle) Arguments wigfilename nbchr chrle WIG file containing the enrichment-score signal of a transcription factor binding experiment. Number of chromosomes. Vector of lengths of the chromosomes (in base pairs). Details The Wiggle format (WIG) is described on the UCSC Genome Bioinformatics web site: http: //genome.ucsc.edu/faq/faqformat. It allows the display of continuous value data in the genome browser. Although specifically designed for post-processing of WIG files, resulting from the analysis of ChIP-seq experiments (with Bioconductor packages BayesPeak, CSAR, PICS, or other tools such as MACS, F-seq, etc.), NarrowPeaks can process other type of sequencing data encoded in WIG format in order to locate regions of high variability in the data.

wig2csarscore 7 Value A list of two elements: infoscores A list with the same elements as reported by the function ChIPseqScore in the CSAR Bionductor package: chr (Chromosome names), chrl (Chromosome length (bp).), filenames (Name of the files where the score values are stored.), digits (Score values stored on the files need to be divided by 10^digits). Author(s) Pedro Madrigal, <pm@engineering.com> References Madrigal, P. et al. (submitted) NarrowPeaks: an R/Bioconductor package for quantitative analysis of variation in ChIP-seq datasets. Muino, J. et al. (2011) ChIP-seq analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions. Plant Methods 7:11. See Also narrowpeaks, NarrowPeaks-package Examples owd <- setwd(tempdir()) ##For this example we will use a subset of the AP1 ChIP-seq data (Kaufmann et ##al., 2010) ##The data is obtained after analysis using the CSAR package available in ##Bioconductor data("narrowpeaks-dataset") writelines(wigfile_test, con="wigfile.wig") ##Write binary files with the WIG signal values for each chromosome ##independently and obtain regions of read-enrichment with score values greater ##than t, allowing a gap of g. Data correspond to enriched regions found up ##to 105Kb in the Arabidopsis thaliana genome wigscores <- wig2csarscore(wigfilename="wigfile.wig", nbchr = 1, chrle=c(30427671)) setwd(owd)

8 wigfile_test wigfile_test Example Wiggle Track Produced After ChIP-seq Data Analysis Description Format Source Sample wiggle track produced after ChIP-seq data analysis. The data represents a small subset of a WIG file storing continuous value scores based on a Poisson test for the chromosome 1 of Arabidopsis thaliana (Kaufmann et al., 2010). It contains first 49515 lines of the WIG file for the full experiment. Wiggle track format (WIG) data in a character vector. Gene Expression Omnibus GSE20176 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=gse20176). Record from chromatin immunoprecipitation experiments wit AP1-specific antibodies followed by deep-sequencing in order to determine AP1 binding sites on a genome-wide scale in Arabidopsis thaliana. References See Also Kaufmann et al. (2010) Orchestration of Floral Initiation by APETALA1. Science 328:85-89. NarrowPeaks-package Examples data(narrowpeaks-dataset)

Index Topic datasets wigfile_test, 8 NarrowPeaks (NarrowPeaks-package), 2 narrowpeaks, 3, 7 NarrowPeaks-dataset (wigfile_test), 8 NarrowPeaks-package, 2 wig2csarscore, 5, 6 wigfile_test, 8 9