goprofiles: an R package for the Statistical Analysis of Functional Profiles

Similar documents
Gene expression correlates of clinical prostate cancer behavior

Predictive Biomarkers

Dinesh Singh, M.D. Assistant Professor of Surgery Director of Laparoscopy and Endourology

Intelligent Patient Profiling for Diagnosis, Staging and Treatment Selection in Colon Cancer

Package GANPAdata. February 19, 2015

AUTHOR PROOF COPY ONLY

User Guide. Association analysis. Input

Active Learning with Support Vector Machine Applied to Gene Expression Data for Cancer Classification

Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer

Package golubesets. August 16, 2014

REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE

Package propoverlap. R topics documented: February 20, Type Package

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach)

A Quick-Start Guide for rseqdiff

Identification of Genes Associated with Tumor Progression Using Microarrays

CURRICULUM VITA OF Xiaowen Chen

Journal of Engineering Technology

CANCER CLASSIFICATION USING SINGLE GENES

THE gene expression profiles that are obtained from

Advances in model-based gene-set analysis

The SAS contrasttest Macro

About diffcoexp. Wenbin Wei, Sandeep Amberkar, Winston Hide

QUICK REFERENCE INSTRUCTIONS For use with Sofia and Sofia 2. Rx only

The Blueprint of Life: DNA to Protein. What is genetics? DNA Structure 4/27/2011. Chapter 7

The Blueprint of Life: DNA to Protein

Case Studies of Signed Networks

A Biclustering Based Classification Framework for Cancer Diagnosis and Prognosis

Study Endpoint Considerations: Final PRO Guidance and Beyond

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Classification of cancer profiles. ABDBM Ron Shamir

Daily inspiration. Ultrasound system HS70A SAMSUNG MEDISON CO., LTD. Scan code or visit to learn more

Analyzing Gene Expression Data: Fuzzy Decision Tree Algorithm applied to the Classification of Cancer Data

Analysis of Sample Set Enrichment Scores: assaying the enrichment of sets of genes for individual samples in genome-wide expression profiles

IPUMS â CPS Extraction and Analysis

QUICK REFERENCE INSTRUCTIONS

Chapter 1. New Theory of Discriminant Analysis Problem 1. Shuichi Shinmura. 1.1 Introduction Theme of Theory. 1.1.

Inflammatory Bowel Disease - Target Initial Survey

QUICK REFERENCE INSTRUCTIONS For use with Sofia only.

A micropower support vector machine based seizure detection architecture for embedded medical devices

The i2b2 transmart Foundation Community Meeting

Introduction to Annotation for Gene Expression Analyses

Ashwini K Maurya, PhD Student, MSU Supervised by Prof. Ashis Sengupta, Applied Statistical Unit, Indian Statistical Institute, Kolkata, India

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, MD

How To Use MiRSEA. Junwei Han. July 1, Overview 1. 2 Get the pathway-mirna correlation profile(pmset) and a weighting matrix 2

Chapter 11. Cell Communication. Signal Transduction Pathways

Supplementary Information. Deciphering the Venomic Transcriptome of Killer- Wasp Vespa velutina

Integrating Novel Non-Pharmacological Services in the Department of Veterans Affairs

Conference Program. Genomic heterogeneity in localized lung cancer Andrew Futreal, The University of Texas MD Anderson Cancer Center, Houston, TX

Advances in Brief. Abstract

DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mrna-seq data

ncounter Data Analysis Guidelines for Copy Number Variation (CNV) Molecules That Count NanoString Technologies, Inc.

SHOEBOX Audiometry Pro. Quickstart Guide. SHOEBOX Audiometry Pro

A hierarchical two-phase framework for selecting genes in cancer datasets with a neuro-fuzzy system

A Bioinformatics Method for Identifying RNA Structures within Human Cells

Micro-RNA web tools. Introduction. UBio Training Courses. mirnas, target prediction, biology. Gonzalo

A General Workflow to Analyze Key Cancer-Related Genes Based on NCBI illustrated by the case of gastric cancer

Package diggitdata. April 11, 2019

STATION 1 SCIENTIFIC INVESTIGATION VOCABULARY 2015 FALL BENCHMARK BIOLOGY

Rival Plausible Explanations

Protein quantitation guidance (SXHL288)

Revealing the mechanisms of epileptogenesis to design innovative treatments what are the tools?

STAT 151B. Administrative Info. Statistics 151B: Introduction Modern Statistical Prediction and Machine Learning. Overview and introduction

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

Letter of Intent. Understanding the Chaos: What Counsellors Need to Know About Adult Attention Deficit. Channone Danychuk

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Biochip analysis of prostate cancer

Content Updates: July 2017

Impact of breastfeeding on saliva PCR results for newborn CMV screening

Unit 1 Matter & Energy for Life

Researchers often use observational data to estimate treatment effects

The Sorrow Of War. The Sorrow Of War

Cell Transport and Homeostasis

Inferring Biological Meaning from Cap Analysis Gene Expression Data

The opinions expressed are the speaker s and not his company s.

CMS-50E Instructions by Cooper Medical Supplies (These instructions are to supplement the manufacturer s user manual not to replace it!

Johns Hopkins Manual For Gastrointestinal Endoscopy Nursing By Jeanette Ogilvie RN BSN CGRN;Lynn Norwitz BS;Anthony Kalloo MD READ ONLINE

Opinion Microarrays and molecular markers for tumor classification Brian Z Ring and Douglas T Ross

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing

Chromosomal Translocations and Genome Rearrangements in Cancer

Quantifying the Association Between Discrete Event Time Series

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Whole liver transcriptome analysis for the metabolic adaptation of dairy cows

Christine Vogel 1, Edward M. Marcotte 1 *

Chapter 1. Introduction

Comment on McLeod and Hume, Overlapping Mental Operations in Serial Performance with Preview: Typing

HEALTH MANAGEMENT SYSTEM NUTRITION MODULE TABLE OF CONTENTS

Page 2. When sirna binds to mrna, name the complementary base pairs holding the sirna and mrna together. One of the bases is named for you. ...with...

Matrix sentence test (Italian)

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

The Roles of Short Term Endpoints in. Clinical Trial Planning and Design

MethylMix An R package for identifying DNA methylation driven genes

Courtroom II - 2nd Floor

Implementation of Automatic Retina Exudates Segmentation Algorithm for Early Detection with Low Computational Time

Statement and Return Report for Certification

Protein Biomarker Biomarker Discover y y in Organ Organ Transplantation A A Proteomics Proteomics Approach Tara r Sig del, Sig PhD 9/26/2011

7.012 Problem Set 7. c) What % of females in this population should be red-green colorblind?

How much data is enough? Predicting how accuracy varies with training data size

Improving drug safety decisions with more comprehensive searches

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

Quality Management System Certification. Understanding Quality Management System (QMS) certification

Transcription:

goprofiles: an R package for the Statistical Analysis of Functional Profiles Alex Sánchez, Jordi Ocaña and Miquel Salicrú April 3, 218 1 Introduction This document presents an introduction to the use of goprofiles, an R package for the analysis of lists of genes based on their projection at a given level of the Gene Ontology following the methodology developed by [2, 3]. 2 Requirements To use the package one must have R (2.7 or greater) installed. Bioconductor 2.2 or greater is also needed. 3 Quick start For the impatient user the following lines provide a quick and simple example on the use of the package to explore and compare two experimental datasets obtained from two prostate cancer experiments ([4, 1]). Detailed explanations about the computations can be found in the following chapters and in the package help. The analysis proceeds as follows: First a dataset is loaded into memory. This dataset contains several lists of genes, from two different studies, selected as being differentially expressed in prostate cancer. help(prostateids) will provide extra information about the lists of genes. > require(goprofiles) > data(prostateids) Next a functional profile is build for each list. For simplicity it is build for the MF ontology only. > welsh.mf <- basicprofile (welsh1entrezids[1:1], onto="mf", level=2, orgpacka > singh.mf <- basicprofile (singh1entrezids[1:1], onto="mf", level=2, orgpacka > welsh.singh.mf <-mergeprofileslists(welsh.mf, singh.mf, profnames=c("welsh", "S > printprofiles(welsh.singh.mf, percentage=true) Functional Profile ================== [1] "MF ontology" Description GOID Welsh Singh 1 antioxidant activity GO:1629 1. 2. 7 binding GO:5488 96. 89.8 1 catalytic activity GO:3824 25.3 42.9 12 molecular function regula... GO:98772 13.1 15.3 2 signal transducer activit... GO:4871 8.1 5.1 1

5 structural molecule activ... GO:5198 16.2 15.3 15 transcription regulator a... GO:1411 8.1 11.2 6 transporter activity GO:5215 9.1 9.2 > plotprofiles (welsh.mf, atitle="welsh (21). Prostate cancer data") Welsh (21). Prostate cancer data. MF ontology transporter activity translation regulator act... transcription regulator a... toxin activity structural molecule activ... signal transducer activit... protein tag nutrient reservoir activi... molecular function regula... molecular carrier activit... hijacked molecular functi... catalytic activity cargo receptor activity... binding antioxidant activity 1 9 8 16 8 13 25 95 2 4 6 8 Figure 1: Basic profile for a dataset at the second level of the MF ontology Changing the parameter onto to ANY instead of MF will compute profiles for the three ontologies. > welsh <- basicprofile (welsh1entrezids[1:1], onto="any", level=2, orgpackage A visual comparison of profiles can be useful to give hints about the difference between them. Finally a numerical comparison of both profiles is performed and its summary is printed. Notice that the comparison rebuilds the profiles, that is the input for the computation are the two lists of genes not the profiles. > compared.welsh.singh.1.mf <- comparegenelists (welsh1entrezids[1:1], singh > print(compsummary(compared.welsh.singh.1.mf)) Sqr.Euc.Dist StdErr pvalue.95ci.low.39549.24192.2425 -.7866.95CI.up.86964 2

> plotprofiles (welsh.singh.mf, percentage=t,atitle="welsh vs Singh", legend=t) Welsh vs Singh. MF ontology transporter activity transcription regulator a... 9.18 9.9 11.22 8.8 Singh Welsh structural molecule activ... 15.31 16.16 signal transducer activit... 5.1 8.8 molecular function regula... 15.31 13.13 catalytic activity 25.25 42.86 binding 89.8 95.96 antioxidant activity 2.4 1.1 2 4 6 8 1 Figure 2: Comparison between two profiles at the second level of the MF ontology 3

If one finds that the lists are globally different it may be interesting to check out to which categories may be this difference attributed. Recent versions of goprofiles (from 1.11.2) allow to do a Fisher test on every category in the level of the profile. > list1 <- welsh1entrezids[1:1] > list2 <- singh1entrezids[1:1] > commprof <- basicprofile(intersect(list1, list2), onto="mf", level=2, orgpackag > fishergoprofiles(welsh.mf$mf, singh.mf$mf, commprof, method="holm") GO:1629 GO:5488 GO:3824 GO:98772 GO:4871 1..579118.76324 1. 1. GO:5198 GO:1411 GO:5215 1. 1. 1. attr(,"unadjusted") GO:1629 GO:5488 GO:3824 GO:98772 GO:4871.4977129.827311.9545.3958775.397495 GO:5198 GO:1411 GO:5215.84514.6322631 1. The reason to compare two lists from similar studies may be to combine them. In this case what is more meaningful to do is an equivalence test, instead of a difference test as above. > data(prostateids) > expandedwelsh <- expandedprofile(welsh1entrezids[1:1], onto="mf", + level=2, orgpackage="org.hs.eg.db") > expandedsingh <- expandedprofile(singh1entrezids[1:1], onto="mf", + level=2, orgpackage="org.hs.eg.db") > commongenes <- intersect(welsh1entrezids[1:1], singh1entrezids[1:1]) > commonexpanded <- expandedprofile(commongenes, onto="mf", level=2, orgpackage=" > equivmf <-equivalentgoprofiles (expandedwelsh[['mf']], + qm = expandedsingh[['mf']], + pqn= commonexpanded[['mf']]) > print(equivsummary(equivmf, decs=5)) > 4 User guide Sqr.Euc.Dist StdErr.3955.2419 pvalue CI.up.7948.7934 d Equivalent? (1=yes,=not).2. This introduction has simply shown the possibilities of the program. A more complete user guide can be found at the program s wiki in github: https://github.com/alexsanchezpla/goprofiles/wiki References [1] Singh, Dinesh., William R Sellers, Philip K. Febbo, Kenneth Ross, Donald G Jackson, Judith Manola, Christine Ladd, Pablo Tamayo, Andrew A Renshaw, Anthony V D Amico, Jerome P Richie, Eric S Lander, Massimo Loda, Philip W Kantoff, and Todd R Golub. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2):23 29, 22. 4

[2] A. Sánchez-Pla, M. Salicrú, and J. Ocaña. Statistical methods for the analysis of high-throughput data based on functional profiles derived from the gene ontology. Journal of Statistical Planning and Inference, 137(12):3975 3989, 27. [3] M. Salicrú, J. Ocaña and A. Sánchez-Pla. Comparison of Gene Lists based on Functional Profiles. BMC Bioinformatics, DOI: 1.1186/1471-215-12-41, 211. [4] J.B. Welsh, Lisa M. Sapinoso, Andrew I. Su, Suzanne G. Kern, Jessica Wang-Rodriguez, Christopher A. Moskaluk-Jr., Henry F. Frierson, and Hampton Garret M. Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res, 61:5974 5978, 21. 5