Package DeconRNASeq. November 19, 2017

Similar documents
DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mrna-seq data

Package ORIClust. February 19, 2015

Package TargetScoreData

Package AbsFilterGSEA

Package cssam. February 19, 2015

Package nopaco. R topics documented: April 13, Type Package Title Non-Parametric Concordance Coefficient Version 1.0.

Supplementary Figures

Package xseq. R topics documented: September 11, 2015

Package MSstatsTMT. February 26, Title Protein Significance Analysis in shotgun mass spectrometry-based

Package MethPed. September 1, 2018

User Guide. Association analysis. Input

Package diggitdata. April 11, 2019

Package CLL. April 19, 2018

Package TFEA.ChIP. R topics documented: January 28, 2018

Package msgbsr. January 17, 2019

Package IMAS. March 10, 2019

Package ordibreadth. R topics documented: December 4, Type Package Title Ordinated Diet Breadth Version 1.0 Date Author James Fordyce

Package inote. June 8, 2017

Package ega. March 21, 2017

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Vega: Variational Segmentation for Copy Number Detection

Package prognosticroc

A Quick-Start Guide for rseqdiff

Package pathifier. August 17, 2018

Package splicer. R topics documented: August 19, Version Date 2014/04/29

Package p3state.msm. R topics documented: February 20, 2015

Package wally. May 25, Type Package

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

EXPression ANalyzer and DisplayER

Reliability of Ordination Analyses

Package inctools. January 3, 2018

Metabolomic Data Analysis with MetaboAnalyst

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Package citccmst. February 19, 2015

Package longroc. November 20, 2017

Package ClusteredMutations

Module 3: Pathway and Drug Development

Package HAP.ROR. R topics documented: February 19, 2015

To open a CMA file > Download and Save file Start CMA Open file from within CMA

Package flowtype. R topics documented: July 18, Type Package. Title Phenotyping Flow Cytometry Assays. Version

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

Package propoverlap. R topics documented: February 20, Type Package

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

Package metaseq. R topics documented: November 1, Type Package. Title Meta-analysis of RNA-Seq count data in multiple studies. Version 1.1.

Package rvtdt. February 20, rvtdt... 1 rvtdt.example... 3 rvtdts... 3 TrendWeights Index 6. population control weighted rare-varaints TDT

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures

Assignment 5: Integrative epigenomics analysis

Package CompareTests

Package clust.bin.pair

Package speff2trial. February 20, 2015

Package QualInt. February 19, 2015

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Exercises: Differential Methylation

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022

Package cancer. July 10, 2018

Package StepReg. November 3, 2017

Package noia. February 20, 2015

Package TEQR. R topics documented: February 5, Version Date Title Target Equivalence Range Design Author M.

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

R documentation. of GSCA/man/GSCA-package.Rd etc. June 8, GSCA-package. LungCancer metadi... 3 plotmnw... 5 plotnw... 6 singledc...

High-Throughput Sequencing Course

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

CSDplotter user guide Klas H. Pettersen

CNV PCA Search Tutorial

Package BUScorrect. September 16, 2018

Package NarrowPeaks. August 3, Version Date Type Package

ivlewbel: Uses heteroscedasticity to estimate mismeasured and endogenous regressor models

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

The FunCluster Package

TADA: Analyzing De Novo, Transmission and Case-Control Sequencing Data

Package appnn. R topics documented: July 12, 2015

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Score Tests of Normality in Bivariate Probit Models

Package Actigraphy. R topics documented: January 15, Type Package Title Actigraphy Data Analysis Version 1.3.

CS 6824: Tissue-Based Map of the Human Proteome

Backcalculating HIV incidence and predicting AIDS in Australia, Cambodia and Vietnam. Australia

CoINcIDE: A framework for discovery of patient subtypes across multiple datasets

Package GANPAdata. February 19, 2015

Package sbpiper. April 20, 2018

SUPPLEMENTARY APPENDIX

Package mdsdt. March 12, 2016

Data mining with Ensembl Biomart. Stéphanie Le Gras

RNA-seq. Differential analysis

Package pepdat. R topics documented: March 7, 2015

Next Generation Cancer Diagnostics For First Time Right Therapy Choice. Anja van de Stolpe

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Package RNAsmc. December 11, 2018

COPD: Genomic Biomarker Status and Challenge Scoring

Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

Math 215, Lab 7: 5/23/2007

Systematic exploration of autonomous modules in noisy microrna-target networks for testing the generality of the cerna hypothesis

Package tumgr. February 4, 2016

Package metaseq. R topics documented: August 17, Type Package

Simple, rapid, and reliable RNA sequencing

An Analysis of MDM4 Alternative Splicing and Effects Across Cancer Cell Lines

Transcription:

Type Package Package DeconRNASeq November 19, 2017 Title Deconvolution of Heterogeneous Tissue Samples for mrna-seq data Version 1.20.0 Date 2013-01-22 Author Ting Gong <tinggong@gmail.com> Joseph D. Szustakowski <joseph.szustakowski@novartis.com> Depends R (>= 2.14.0), limsolve, pcamethods, ggplot2, grid Maintainer Ting Gong <tinggong@gmail.com> DeconSeq is an R package for deconvolution of heterogeneous tissues based on mrna-seq data. It modeled expression levels from heterogeneous cell populations in mrna-seq as the weighted average of expression from different constituting cell types and predicted cell type proportions of single expression profiles. License GPL-2 biocviews DifferentialExpression NeedsCompilation no R topics documented: DeconRNASeq-package.................................. 2 all.datasets.......................................... 2 array.proportions...................................... 3 array.signatures....................................... 3 condplot........................................... 4 datasets........................................... 5 decon.bootstrap....................................... 5 DeconRNASeq....................................... 6 fraction........................................... 7 liver_kidney......................................... 8 multiplot........................................... 9 multi_tissue......................................... 10 proportions......................................... 11 rmse............................................. 11 signatures.......................................... 12 x.data............................................ 12 1

2 all.datasets x.signature.......................................... 13 x.signature.filtered..................................... 13 x.signature.filtered.optimal................................. 14 Index 15 DeconRNASeq-package package DeconRNASeq contains function "DeconRNASeq", implementing the decomposition of RNA-Seq expression profilings of heterogeneous tissues into cell/tissue type specific expression and cell type concentration based on cell-type-specific reference measurements. Details Main function "DeconRNASeq" implements an nonnegative decomposition by quadratic programming as datasets = signature*a, where "datasets" are the originally measured data matrix (e.g. genes by samples), "signature" is the signature matrix (genes by cell types) and "A" the cell type concentration matrix (cell types by samples) Package: DeconRNASeq Type: Package Version: 1.0 Date: 2012-05-25 License: GPL version 2 or later DeconRNASeq(datasets, signature) References Gong, T., et al. (2011) Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples, PLoS One, 6, e27156. all.datasets data objects for rat liver_brain samples A data frame providing the expression profilings of GSE19830 microarray samples. all.datasets

array.proportions 3 A matrix with expression studies in the GSE19830 microarray samples: the first three columns are corresponding to the liver, while the last three samples are corresponding to the brain. data(rat_liver_brain) array.proportions proportions for rat liver and brain mixing samples array.proportions: a data frame providing the fractions for liver and brain from the microarray GSE 19830 study array.proportions a martix whose rows are mixing samples name and columns are fractions from pure live and brain tissues data(rat_liver_brain) array.signatures data objects for rat liver and brain pure samples array.signatures: a data frame providing the expression values from rat pure liver and brain samples, each has threee replicates array.signatures

4 condplot a data matrix with 30 expressions from rat pure liver and brain tissues data(rat_liver_brain) condplot Draw the plot of the condition numbers vs. the number of genes in the signature matrix A function is used to draw the plot of the condition number of signature matrices of all sizes, from a handful of genes in one extreme to the whole signature in the other. condplot(step, cond) Arguments step cond an array with the number of genes used to calculate the condition numbers of signature matrices, default stepwise = 20 an array with the condition numbers of signature matrices Value a plot for the condition numbers of signature matrices References Gong, T., et al. (2011) Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples, PLoS One, 6, e27156.

datasets 5 library(deconrnaseq) #################################################################### ## toy data example: step <- seq(20,1000, by=20) #every 20 genes ## cell type-specific gene expression matrix: x.signature <- matrix(rexp(2000),ncol=2) sig.cond <- sapply(step, function(x) kappa(scale(x.signature[1:x,]))) function (step, cond) datasets data objects for liver and kidney mixing samples A data frame providing the RPKM of seven mixing samples. datasets A data frame with 31979 genes expression on the 7 mixing samples: reads.1, reads.2, reads.3, reads.4, reads.5, reads.6, reads.7 data(liver_kidney) decon.bootstrap Estimate the confidence interval for the proportions predicted by deconvolution A function is used to estimate the the confidence interval for the proportions predicted by deconvolution through bootstrapping. decon.bootstrap(data.set, possible.signatures, n.sig, n.iter)

6 DeconRNASeq Arguments Value data.set the data object for mixing samples possible.signatures a data frame providing the expression values from pure tissue samples n.sig n.iter the number of genes/transcripts used for estimation of proportions from our deconvolution the number of bootstraps for our deconvolution A three dimentional array to store means and 95% confidence interval References Gong, T., et al. (2011) Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples, PLoS One, 6, e27156. DeconRNASeq Function for Deconvolution of Complex Samples from RNA-Seq. This function predicts proportions of constituting cell types from gene expression data generated from RNA-Seq data. Perform nonnegative quadratic programming to get per-sample based globally optimized solutions for constituting cell types. DeconRNASeq(datasets, signatures, proportions = NULL, checksig = FALSE, known.prop = FALSE, use.sc Arguments datasets signatures proportions measured mixture data matrix, genes (transcripts) e.g. gene counts by samples,. The user can choose the appropriate counts, RPKM, FPKM etc.. signature matrix from different tissue/cell types, genes (transcripts) by cell types. For gene counts, the user can choose the appropriate counts, RPKM, FPKM etc.. proportion matrix from different tissue/cell types. checksig whether the condition number of signature matrix should be checked, efault = FALSE known.prop use.scale fig whether the proportions of cell types have been known in advanced for proof of concept, default = FALSE whether the data should be centered or scaled, default = TRUE whether to generate the scatter plots of the estimated cell fractions vs. the true proportions of cell types, default = TRUE

fraction 7 Details Value Data in the originally measured mixuture sample matrix: datasets and reference matrix: signatures, need to be non-negative. We recommend to deconvolute without log-scale. Function DeconRNA-Seq returns a list of results out.all out.pca out.rmse estimated cell type fraction matrix for all the mixture samples svd calculated PCA on the mixture samples to estimate the number of pure sources according to the cumulative R2 averaged root mean square error (RMSE)) measuring the differences between fractions predicted by our model and the truth fraction matrix for all the tissue types References Gong, T., et al. (2011) Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples, PLoS One, 6, e27156. ## Please refer our demo ##source("deconrnaseq.r") ### multi_tissue: expression profiles for 10 mixing samples from multiple tissues #data(multi_tissue.rda) #datasets <- x.data[,2:11] #signatures <- x.signature.filtered.optimal[,2:6] #proportions <- fraction #DeconRNASeq(datasets, signatures, proportions, checksig=false, known.prop = TRUE, use.scale = TRUE) # fraction mixing fractions for multi-tissues mixing samples A data frame providing the fractions from multiple tissues in the mixing samples fraction A martix whose rows are mixing samples name and columns are fractions from pure tissues including brain, muscle, lung, liver and heart

8 liver_kidney data(multi_tissue) liver_kidney data objects for liver and kidney mixing samples a list containing: 1) datasets:a data drame providing the RPKM of seven mixing samples. 2) proportions: a data frame providing the fractions for liver and kidney in the mixing samples 3) signatures: a data frame providing the expression values from pure liver and kidney samples liver_kidney A list 1) a data frame with 31979 genes expression on the 7 mixing samples: reads.1, reads.2, reads.3, reads.4, reads.5, reads.6, reads.7 2) a martix whose rows are mixing samples name and columns are fractions from pure live and kidney tissues 3) a data matrix with 630 expressions from pure liver and kidney tissues data(liver_kidney)

multiplot 9 multiplot Draw the plots of proportions of cells determined from deconvolution vs. proportions of the cells actually mixed (when available) with RMSE. A function is used to draw the multiple plots of proportions of cells determined from deconvolution vs. proportions of the cells actually mixed. Each plot corresponds to one tissue/cell. multiplot(..., plotlist = NULL, cols) Arguments Value... any number of the plot objects that store the scatter plots for all the cells/tissue types plotlist any other plot objects cols columns of the plots, default = 1 A pdf file with the plots of proportions of cells determined from deconvolution vs. proportions of the cells actually mixed with RMSE References Gong, T., et al. (2011) Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples, PLoS One, 6, e27156. ## The function is currently defined as function (..., plotlist = NULL, cols) { pdf("scatterplots.pdf") require(grid) plots <- c(list(...), plotlist) numplots = length(plots) plotcols = cols plotrows = ceiling(numplots/plotcols) grid.newpage() pushviewport(viewport(layout = grid.layout(plotrows, plotcols))) vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y) for (i in 1:numPlots) { currow = ceiling(i/plotcols) curcol = (i - 1)%%plotCols + 1

10 multi_tissue print(plots[[i]], vp = vplayout(currow, curcol)) } dev.off() } multi_tissue data objects for multi-tissues mixing samples a list containing: 1) x.data:a data frame providing the RPKM of nine mixing samples. 2) x.signatures: a data frame providing the expression values from pure brain, muscle, lung, liver and heart samples. 3) x.signatures.filtered: a data frame providing the expression values from pure brain, muscle, lung, liver and heart samples after filtering. 4) x.signatures.filtered.optimal: a data frame providing the expression values from pure brain, muscle, lung, liver and heart samples used for the example in DeconRNA-Seq. 5)fraction: a data frame providing the fractions from 5 tissues in the mixing samples multi_tissue A list 1) a matrix with all the genes expression in the mixing samples: the first two columns are corresponding to the RefSeq accession numbers and gene symbols 2) a martix whose rows are gene symbols and columns are RPKM expressions from pure tissues. 3) a martix whose rows are gene symbols and columns are RPKM expressions from pure tissues: the genes with RPKM less than 200 within any of the five tissues have been filtered. 4) a martix whose rows are gene symbols and columns are RPKM expressions from pure tissues: based on the filtered signature matrix, the optimal number of genes have been selected for the deconvolution according to the condition numbers 5) a martix whose rows are mixing samples name and columns are fractions from pure tissues including brain, muscle, lung, liver and heart data(multi_tissue)

proportions 11 proportions proportions for liver and kidney mixing samples proportions: a data frame providing the fractions for liver and kidney in the mixing samples proportions a martix whose rows are mixing samples name and columns are fractions from pure live and kidney tissues data(liver_kidney) rmse Calculate the differences between proportions predicted by deconvolution and the values actually measured A function is used to calculate the root-mean-square error (RMSE) for the accurracy of estimated proportions. rmse(x, y) Arguments x y proportions from the actual measurement estimated proportions from our deconvolution Value A number for RMSE

12 x.data References Gong, T., et al. (2011) Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples, PLoS One, 6, e27156. signatures data objects for liver and kidney pure samples signatures: a data frame providing the expression values from pure liver and kidney samples signatures a data matrix with 630 expressions from pure liver and kidney tissues data(liver_kidney) x.data data objects for multi-tissues mixing samples A data frame providing the RPKM of nine mixing samples. x.data A matrix with all the genes expression in the mixing samples: the first two columns are corresponding to the RefSeq accession numbers and gene symbols data(multi_tissue)

x.signature 13 x.signature data objects for multi-tissues pure samples A data frame providing the expression values from pure brain, muscle, lung, liver and heart samples. x.signature A martix whose rows are gene symbols and columns are RPKM expressions from pure tissues. data(multi_tissue) x.signature.filtered filtered signatures for multi-tissues pure samples A data frame providing the expression values from pure brain, muscle, lung, liver and heart samples after filtering. x.signature.filtered A martix whose rows are gene symbols and columns are RPKM expressions from pure tissues: the genes with RPKM less than 200 within any of the five tissues have been filtered. data(multi_tissue)

14 x.signature.filtered.optimal x.signature.filtered.optimal selected signatures from multi-tissues pure samples A data frame providing the expression values from pure brain, muscle, lung, liver and heart samples used for the example in DeconRNA-Seq. x.signature.filtered.optimal A martix whose rows are gene symbols and columns are RPKM expressions from pure tissues: based on the filtered signature matrix, the optimal number of genes have been selected for the deconvolution according to the condition numbers data(multi_tissue)

Index Topic DeconRNASeq DeconRNASeq-package, 2 Topic DeconSeq DeconRNASeq, 6 Topic datasets all.datasets, 2 array.proportions, 3 array.signatures, 3 datasets, 5 fraction, 7 liver_kidney, 8 multi_tissue, 10 proportions, 11 signatures, 12 x.data, 12 x.signature, 13 x.signature.filtered, 13 x.signature.filtered.optimal, 14 Topic methods DeconRNASeq, 6 DeconRNASeq-package, 2 rmse, 11 signatures, 12 x.data, 12 x.signature, 13 x.signature.filtered, 13 x.signature.filtered.optimal, 14 all.datasets, 2 array.proportions, 3 array.signatures, 3 condplot, 4 datasets, 5 decon.bootstrap, 5 DeconRNASeq, 6 DeconRNASeq-package, 2 DeconRNASeq.package (DeconRNASeq-package), 2 DeconRNASeq_package (DeconRNASeq-package), 2 fraction, 7 liver_kidney, 8 multi_tissue, 10 multiplot, 9 proportions, 11 15