R documentation. of GSCA/man/GSCA-package.Rd etc. June 8, GSCA-package. LungCancer metadi... 3 plotmnw... 5 plotnw... 6 singledc...

Similar documents
Package ordibreadth. R topics documented: December 4, Type Package Title Ordinated Diet Breadth Version 1.0 Date Author James Fordyce

Package AbsFilterGSEA

Package cssam. February 19, 2015

User Guide. Association analysis. Input

Package ORIClust. February 19, 2015

The Tail Rank Test. Kevin R. Coombes. July 20, Performing the Tail Rank Test Which genes are significant?... 3

Package AIMS. June 29, 2018

Package Actigraphy. R topics documented: January 15, Type Package Title Actigraphy Data Analysis Version 1.3.

Package inote. June 8, 2017

Package p3state.msm. R topics documented: February 20, 2015

How To Use MiRSEA. Junwei Han. July 1, Overview 1. 2 Get the pathway-mirna correlation profile(pmset) and a weighting matrix 2

The LiquidAssociation Package

Package singlecase. April 19, 2009

Supplementary Information

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

Package nopaco. R topics documented: April 13, Type Package Title Non-Parametric Concordance Coefficient Version 1.0.

Package ClusteredMutations

Doing Thousands of Hypothesis Tests at the Same Time. Bradley Efron Stanford University

Package MSstatsTMT. February 26, Title Protein Significance Analysis in shotgun mass spectrometry-based

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022

Package prognosticroc

Reliability of Ordination Analyses

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

Package fabiadata. July 19, 2018

A Cross-Study Comparison of Gene Expression Studies for the Molecular Classification of Lung Cancer

Clustering mass spectrometry data using order statistics

Metabolomic Data Analysis with MetaboAnalyst

Expanded View Figures

PSYCHOLOGY 300B (A01)

Package CLL. April 19, 2018

Migraine Dataset. Exercise 1

Package rvtdt. February 20, rvtdt... 1 rvtdt.example... 3 rvtdts... 3 TrendWeights Index 6. population control weighted rare-varaints TDT

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Package citccmst. February 19, 2015

Package QualInt. February 19, 2015

Package msgbsr. January 17, 2019

Package cancer. July 10, 2018

Using dualks. Eric J. Kort and Yarong Yang. October 30, 2018

SubLasso:a feature selection and classification R package with a. fixed feature subset

Chapter 3 Software Packages to Install How to Set Up Python Eclipse How to Set Up Eclipse... 42

A Quick-Start Guide for rseqdiff

Psy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics

Bioimaging and Functional Genomics

Hands-On Ten The BRCA1 Gene and Protein

The FunCluster Package

Introduction. Lecture 1. What is Statistics?

Unit 1 Exploring and Understanding Data

T-Cell Network Example for GeneNet (August 2015) or later

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Cancer Informatics Lecture

EXPression ANalyzer and DisplayER

Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Package propoverlap. R topics documented: February 20, Type Package

Rapid Characterization of Allosteric Networks. with Ensemble Normal Mode Analysis

Package CompareTests

CSE 255 Assignment 9

Package xseq. R topics documented: September 11, 2015

Package frailtyhl. February 19, 2015

Instructions for doing two-sample t-test in Excel

Nature Methods: doi: /nmeth.3115

Package BUScorrect. September 16, 2018

Vega: Variational Segmentation for Copy Number Detection

Package ega. March 21, 2017

The rmeta Package February 17, 2001

cape: A package for the combined analysis of epistasis and pleiotropy

Rachael E. Jack, Caroline Blais, Christoph Scheepers, Philippe G. Schyns, and Roberto Caldara

Package DeconRNASeq. November 19, 2017

Package speff2trial. February 20, 2015

Package HAP.ROR. R topics documented: February 19, 2015

Cancer outlier differential gene expression detection

RchyOptimyx: Gating Hierarchy Optimization for Flow Cytometry

Package EpiEstim. R topics documented: February 19, Version Date 2013/03/08

Package TFEA.ChIP. R topics documented: January 28, 2018

Package noia. February 20, 2015

Supplementary Figures

Package preference. May 8, 2018

Cayley Graphs. Ryan Jensen. March 26, University of Tennessee. Cayley Graphs. Ryan Jensen. Groups. Free Groups. Graphs.

Aesthetic Response to Color Combinations: Preference, Harmony, and Similarity. Supplementary Material. Karen B. Schloss and Stephen E.

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

EPS 625 INTERMEDIATE STATISTICS TWO-WAY ANOVA IN-CLASS EXAMPLE (FLEXIBILITY)

Ulrich Mansmann IBE, Medical School, University of Munich

Package CorMut. January 21, 2019

Package seqdesign. April 27, Version 1.1 Date

GlobalAncova with Special Sum of Squares Decompositions

Knowledge discovery tools 381

Exercises: Differential Methylation

Reveal Relationships in Categorical Data

Stat Wk 8: Continuous Data

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Package bionetdata. R topics documented: February 19, Type Package Title Biological and chemical data networks Version 1.0.

GridMAT-MD: A Grid-based Membrane Analysis Tool for use with Molecular Dynamics

SUPPLEMENTARY FIGURES

Hypertension encoded in GLIF

CSDplotter user guide Klas H. Pettersen

Package mdsdt. March 12, 2016

Transcription:

R topics documented: R documentation of GSCA/man/GSCA-package.Rd etc. June 8, 2009 GSCA-package....................................... 1 LungCancer3........................................ 2 metadi........................................... 3 plotmnw.......................................... 5 plotnw........................................... 6 singledc.......................................... 8 Index 10 GSCA-package Gene Set Co-expression Analysis Details Package: GSCA Type: Package Version: 1.0 Date: 2009-04-26 License: LGPL >= 2.0 LazyLoad: yes Author(s) YounJeong Choi Maintainer: YounJeong Choi <ychoi@biostat.wisc.edu> 1

2 LungCancer3 Choi and Kendziorski (2009) LungCancer3 Three lung cancer microarray data sets and matched annotation Description The three human lung cancer microarray data sets, from Stanford (Garber et al., 2001), Harvard (Bhattacharjee et al., 2001), and Michigan (Beer et al., 2002). The 3,924 common Entrez Gene IDs are represented, matched across the three studies. The 3,649 common gene sets (GO categories and KEGG pathways) are represented, defined in Entrez Gene ID. Usage Format A list of two sub-lists named data and info, respectively. Details LungCancer3$data is a list of 3 R matrices named Harvard, Stanford, and Michigan, respectively. Each matrix contains 3,924 rows as the genes have been matched across three studies. Rows are named by Entrez Gene IDs. The number of columns corresponds to the number of arrays used in each study. Columns are named Tumor1, Tumor2,..., and Normal1, Normal2,..., similarly for three matrices. LungCancer3$data$Harvard (Harvard study) has 156 columns of 139 tumor vs. 17 normal samples; LungCancer3$data$Stanford (Stanford study) has 46 columns of 41 tumor vs. 5 normal samples; and LungCancer3$data$Michigan (Michigan study) has 96 columns of 86 tumor vs. 10 normal samples. For tumor samples, only adenocarcinoma samples have been included for consistency. LungCancer3$info is again a list of two objects, named GSdef and Name, respectively. LungCancer3$info$GSdef is a list of 3,649 gene set definitions for the 3,924 genes by GO categories and KEGG pathways. This list is named by the gene set IDs such as "GO:0007169" for GO categories and "00920" for KEGG pathways. Each entry of this list is a character vector of Entrez Gene IDs, which the gene set consists of. For example, LungCancer3$info$GSdef[[147]] returns c("348", "25", "27"). LungCancer3$info$Name is a character vector of length 3,649 corresponding to the gene sets defined in LungCancer3$info$GSdef, named by the gene set IDs. GSdef and Name have the gene sets in the same order. For example, LungCancer3$info$GSdef[[123]] and LungCancer3$info$Name[123] are both "GO:0032774".

metadi 3 Source Harvard data (Bhattacharjee et al.) http://www.broad.mit.edu/mpr/lung/ Stanford data (Garber et al.) http://smd.stanford.edu/cgi-bin/data/viewdetails.pl?exptid=12827&viewset=1 Michigan data (Beer et al.) http://dot.ped.med.umich.edu:2000/ourimage/pub/lung/index.html Beer, D. G. et al. (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine, 8, 816-824. Bhattacharjee, A. et al. (2001) Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl Acad. Sci., 98, 13790-13795. Garber, M. E. et al. (2001) Diversity of gene expression in adenocarcinoma of the lung. Proc. Natl Acad. Sci., 98, 13784-13789. Examples str(lungcancer3$data) str(lungcancer3$info$gsdef[1:3]) LungCancer3$info$Name[1:3] metadi The function to run a meta-gsca. Description Usage This function can be used to run a meta-gsca, described in Choi and Kendziorski (2009). Unlike singledc, the study-specific values (e.g., gene-gene pairwise correlations, difference in condition-specific correlation signs) need to be pre-calculated and provided as input. For each gene set defined in GSdefList, the dispersion index is calculated between two given studies and returned. Gene pairs are randomly permuted across gene sets for nperm times. Permutation-based p-values are calculated, based on the rank of observed DI among permuted index values. metadi(corr1, corr2, GSdefList, nperm, permdi = FALSE) Arguments corr1, corr2 Two symmetric matrices from two studies of interest. For meta-gsca, a difference matrix is used, of two condition-specific correlation sign matrices within each study. The lower triangle is used. rownames(data) is used to subset a sub-matrix from data for each gene set. (Rows must be named by gene IDs used in GSdefList. For example, if GSdefList defines gene sets in Entrez Gene IDs, rownames(data) should be Entrez Gene IDs.

4 metadi GSdefList nperm permdi A list of character vectors that define gene sets. Each entry of this list is a gene set. The desired number of permutations. TRUE/FALSE. If set TRUE, dispersion index values from permutation are saved and returned; if FALSE, permutation-based dispersion index values are not returned. Default is FALSE. Details Value Gene pairs are permuted across gene sets, as described in Choi and Kendziorski (2009). For each permutation, annotated gene pairs (gene pairs which belong to at least one gene set) are randomly re-assigned to gene sets, and dispersion indices (DIs) are calculated based on those random gene sets. As focus is on preservation, the p-value for each gene set is calculated as: p = sum(permutation DIs <= observed DI) / nperm. DI pvalue permv The dispersion index vector for each gene set. The permutation-based p-value for each gene set. The permutation-based DI matrix, of nperm columns. The first column is identical to what is returned by DI. Note Currently, metadi implements meta-analysis of gene-gene pairwise correlations from two studies. In addition to meta-gsca described in Choi and Kendziorski (2009), which uses the sign difference, raw correlations can be input to investigate preservation of them. Author(s) YounJeong Choi Choi and Kendziorski, submitted Examples GS <- LungCancer3$info$GSdef GSdesc <- LungCancer3$info$Name data.grouped <- list( Tumor = list(harvard = LungCancer3$data$Harvard[, 1:139], Michigan = LungCancer3$data$Michigan[, 1:86]), Normal = list(harvard = LungCancer3$data$Harvard[, 140:156], Michigan = LungCancer3$data$Michigan[, 87:96])) corr.t <- lapply(lapply(data.grouped$tumor, t), cor, use = "pairwise.complete.obs")

plotmnw 5 corr.n <- lapply(lapply(data.grouped$normal, t), cor, use = "pairwise.complete.obs") cor.diff <- list(harvard = corr.t$harvard - corr.n$harvard, Michigan = corr.t$michigan - corr.n$michigan) cor.diff.sign <- list( Harvard = apply((cor.diff$harvard > 0), 2, as.numeric) - apply((cor.diff$harvard < 0), 2, as.numeric), Michigan = apply((cor.diff$michigan > 0), 2, as.numeric) - apply((cor.diff$michigan < 0), 2, as.numeric)) for (i in 1:length(cor.diff.sign)) { rownames(cor.diff.sign[[i]]) <- colnames(cor.diff.sign[[i]]) } dist.hm <- metadi(cor.diff.sign$harvard, cor.diff.sign$michigan, GS, 3, permdi = TRUE) plotmnw The function to draw a multi-edge network from a list of correlation matrices from multiple studies. Description Usage This function draws a network with multiple lines per edge, representing multiple study-specific correlations for the corresponding gene pairs. Nodes can be colored by DE information, and each line in edges can be colored based on the correlation magnitude and direction. plotmnw(cormatrix.list, genes = NULL, mycolors = bluered(201), ncolor = "white", no Arguments cormatrix.list A list of symmetric correlation matrices from multiple studies. genes mycolors ncolor node.de r nr jt lwd xlab1 text.cex Node names to be displayed in the network plot. If NULL (default), rownames(cormatrix.list[[1 is used. Edge color palette. The default is bluered(201) implemented by gplots, which sets c("blue", "white", "red") for the correlation of c(-1, 0, 1). Node color for selected genes specified by node.de. A vector of 0 s and 1 s. The "1" nodes are colored in ncolor. The radius of the big circle where nodes are placed on The radius of each node Distance between adjacent lines in an edge The edge line thickness Sub-text to be place below the network Node font size... The rest of arguments are passed to plot.

6 plotnw Details Note Edge lines are sorted first and put in order separately for every edge, for effective display of study agreement. Because edges tend to be thicker as the number of studies grow larger, including too many genes (nodes) may result in a noninformative network plot. Author(s) YounJeong Choi Choi and Kendziorski (2009) Examples GS <- LungCancer3$info$GSdef GSdesc <- LungCancer3$info$Name setid <- "GO:0008033" gid <- GS[[setid]] ss <- c("kars", "SARS", "AARS", "SSB", "POP1", "RPP30") data.list <- list(harvard = LungCancer3$data$Harvard[gid, 140:156], Stanford = LungCancer3$data$Stanford[gid, 42:46], Michigan = LungCancer3$data$Michigan[gid, 87:96]) cormatrix.list <- lapply(lapply(data.list, t), cor, use = "pairwise.complete.obs") plotmnw(cormatrix.list, genes = ss, mycolors = bluered(201), ncolor = "yellow", node.de = c(rep(1, 3), rep(0, 3)), lwd = 5, jt = 0.3) plotnw The function to draw a network from a correlation matrix Description Usage This function is a custom wrapper of plot.graph implemented in Rgraphviz. It draws a network from a given correlation matrix. Nodes can be colored by DE information, and edges can be colored based on the correlation magnitude and direction. plotnw(genes = NULL, cormatrix, node.de = NULL, ncolor = "red", ecolor = NULL, ewid

plotnw 7 Arguments genes cormatrix node.de ncolor ecolor Node names to be displayed in the network plot. If NULL (default), rownames(cormatrix) is used. A symmetric correlation matrix to draw a network from. A vector of 0 s and 1 s. The "1" nodes are colored in ncolor. Node color for selected genes specified by node.de. Edge color palette. The default is bluered(201) implemented by gplots, which sets c("blue", "white", "red") for the correlation of c(-1, 0, 1). ewidth Edge width. Default is set to 1. gtype Graph type (layout) to be passed to plot.graph. One of dot, neato, twopi, circo, and fdp. The default is neato.... The rest of arguments are passed to plot.graph. Author(s) YounJeong Choi Choi and Kendziorski (2009) See Also plot.graph, Rgraphviz Examples GS <- LungCancer3$info$GSdef GSdesc <- LungCancer3$info$Name setid <- "GO:0019216" gid <- GS[[setid]] ss <- c("serpina3", "SOD1", "SCAP", "NPC2", "ADIPOQ", "PRKAA1", "AGT", "PPARA", "BMP6", "BRCA1") plotnw(genes = ss, cormatrix = cor(t(lungcancer3$data$michigan[gid, 87:96]), use = "pairwise

8 singledc singledc The function to run a single-study GSCA, differential co-expression (DC) analysis Description This function runs a single-study GSCA, differential co-expression (DC) analysis, described in Choi and Kendziorski (2009). The condition-specific gene-gene pairwise correlations are first calculated; then for each gene set defined in GSdefList, the dispersion index is calculated across conditionspecific correlations. Samples are randomly permuted across conditions for nperm times. Permutation-based p-values are calculated, based on the rank of observed DI among permuted index values. Usage singledc(data, group, GSdefList, nperm, permdi = FALSE) Arguments data group GSdefList nperm permdi A data matrix of rows representing genes and columns representing arrays. rownames(data) is used to subset a sub-matrix from data for each gene set. (Rows must be named by gene IDs used in GSdefList. For example, if GSdefList defines gene sets in Entrez Gene IDs, rownames(data) should be Entrez Gene IDs. A numeric vector that specifies the number of arrays (columns) in each condition. For example, if c(10, 5) is provided, first 10 columns of the data matrix are used for one condition and the next 5 are used for the other condition. A list of character vectors that define gene sets. Each entry of this list is a gene set. The desired number of permutations. TRUE/FALSE. If set TRUE, dispersion index values from permutation are saved and returned; if FALSE, permutation-based dispersion index values are not returned. Default is FALSE. Details Samples (columns) are permuted across conditions. For each permutation, condition-specific correlations are re-calculated based on permuted samples, and dispersion indices (DIs) are calculated based on those permutation-based correlations. As focus is on difference, the p-value for each gene set is calculated as: p = sum(permutation DIs >= observed DI) / nperm.

singledc 9 Value DI pvalue permv The dispersion index vector for each gene set. The permutation-based p-value for each gene set. The permutation-based DI matrix, of nperm columns. The first column is identical to what is returned by DI. Note Currently, singledc implements DC analysis for two conditions (e.g., tumor vs. normal) and three conditions (e.g., AA, AB, and BB genotypes). For three conditions, pairwise DIs are first calculated and averaged (internally). Author(s) YounJeong Choi Choi and Kendziorski, submitted. Examples GS <- LungCancer3$info$GSdef GSdesc <- LungCancer3$info$Name dc.m <- singledc(data = LungCancer3$data$Michigan, group = c(86, 10), GSdefList = GS, nperm = 3, permdi = TRUE)

Index Topic DC singledc, 8 Topic GSCA GSCA-package, 1 metadi, 3 plotmnw, 5 plotnw, 6 singledc, 8 Topic lung cancer, microarray meta-analysis LungCancer3, 2 Topic meta-gsca metadi, 3 Topic multi-edge plotmnw, 5 Topic network plotmnw, 5 plotnw, 6 GSCA (GSCA-package), 1 GSCA-package, 1 LungCancer3, 2 metadi, 3 plot.graph, 7 plotmnw, 5 plotnw, 6 Rgraphviz, 7 singledc, 8 10