Introduction to antiprofiles

Similar documents
Using Messina. Mark Pinese. October 13, Introduction The problem Example: Designing a colon cancer screening test...

MethylMix An R package for identifying DNA methylation driven genes

metaseq: Meta-analysis of RNA-seq count data

GeneOverlap: An R package to test and visualize

AIMS: Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype

Analysis of shotgun bisulfite sequencing of cancer samples

Cancerclass: An R package for development and validation of diagnostic tests from high-dimensional molecular data

splicer: An R package for classification of alternative splicing and prediction of coding potential from RNA-seq data

Using the DART package: Denoising Algorithm based on Relevance network Topology

Supervised analysis of MS images using Cardinal

Infer mirna-mrna interactions using paired expression data from a single sample

Quality assessment of TCGA Agilent gene expression data for ovary cancer

How to use MiRaGE Package

Benjamin T. Langmead September, 2012

How To Use SubpathwayGMir

Introduction to epigenetics: chromatin modifications, DNA methylation and the CpG Island landscape. Héctor Corrada Bravo CMSC702 Spring 2014

TITAN: Subclonal copy number and LOH prediction from whole genome sequencing of tumours

Matching the Cisplatin Heatmap

The LiquidAssociation Package

SubLasso:a feature selection and classification R package with a. fixed feature subset

Mirsynergy: detect synergistic mirna regulatory modules by overlapping neighbourhood expansion

ABSTRACT. follows from prognosis and diagnosis applications. Anomaly classification is also a

Applying Metab. Raphael Aggio. October 30, This document describes how to use the function included in the R package Metab.

Package leukemiaseset

Gene expression anti-profiles as a basis for accurate universal cancer signatures

How To Use MiRSEA. Junwei Han. July 1, Overview 1. 2 Get the pathway-mirna correlation profile(pmset) and a weighting matrix 2

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

Package CLL. April 19, 2018

DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mrna-seq data

Package diggitdata. April 11, 2019

Mirsynergy: detect synergistic mirna regulatory modules by overlapping neighbourhood expansion

Checking the Clinical Information for Docetaxel

Large hypomethylated blocks as a universal defining epigenetic alteration in human solid tumors

Package MethPed. September 1, 2018

NIH Public Access Author Manuscript Nat Genet. Author manuscript; available in PMC 2012 February 1.

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Cardinal: Analytic tools for mass spectrometry imaging

Package citccmst. February 19, 2015

A fatty liver study on Mus musculus

A fatty liver study on Mus musculus

Ulrich Mansmann IBE, Medical School, University of Munich

MOST: detecting cancer differential gene expression

Cardinal: Analytic tools for mass spectrometry

Exercises: Differential Methylation

Supplementary Methods: IGFBP7 Drives Resistance to Epidermal Growth Factor Receptor Tyrosine Kinase Inhibition in Lung Cancer

Cancer outlier differential gene expression detection

SIMULATED DATA. CMH Argentina. FC Brazil. FA Chile. IHSS/HE Honduras. INCMNSZ Mexico. IMTAvH Peru. Combined

Nature Methods: doi: /nmeth.3115

Mangrove: Genetic risk prediction on trees

GlobalAncova with Special Sum of Squares Decompositions

September 20, 2017 SIMULATED DATA

Increased methylation variation in epigenetic domains across cancer types

Colon cancer subtypes from gene expression data

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

Digitizing the Proteomes From Big Tissue Biobanks

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

About diffcoexp. Wenbin Wei, Sandeep Amberkar, Winston Hide

LowMACA: Low frequency Mutation Analysis via Consensus Alignment

Phenotype prediction based on genome-wide DNA methylation data

Package NarrowPeaks. August 3, Version Date Type Package

Supplementary Figures

Package ORIClust. February 19, 2015

Metabolomic Data Analysis with MetaboAnalyst

Vega: Variational Segmentation for Copy Number Detection

Package cancer. July 10, 2018

A walk in the park with Probabilites and Stats. it::unimi::sps::webcomm

Package flowtype. R topics documented: July 18, Type Package. Title Phenotyping Flow Cytometry Assays. Version

Title: Human breast cancer associated fibroblasts exhibit subtype specific gene expression profiles

Differential Expression Analysis of Microarray Gene Data for Cancer Detection

1 Setup. 2 Quality assessment and pre-processing. 2.1 Importing gene expression data

Package cssam. February 19, 2015

Using dualks. Eric J. Kort and Yarong Yang. October 30, 2018

Integrative Omics for The Systems Biology of Complex Phenotypes

Package PREDAsampledata

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

I. Clinical and pathological features of the. Expression of the Selected Genes in Publicly. Available PCa Microarray Data

Package curatedbladderdata

LowMACA: Low frequency Mutation Analysis via Consensus Alignment

AP Statistics. Semester One Review Part 1 Chapters 1-5

LowMACA: Low frequency Mutation Analysis via Consensus Alignment

Package inote. June 8, 2017

Package AbsFilterGSEA

TITLE: TREATMENT OF ENDOCRINE-RESISTANT BREAST CANCER WITH A SMALL MOLECULE C-MYC INHIBITOR

DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging

Li-Xuan Qin 1* and Douglas A. Levine 2

Methylation patterns in ejaculated sperm from Nellore-Angus crossbred bulls

Unsupervised Identification of Isotope-Labeled Peptides

Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE

Expanded View Figures

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers

Peak-calling for ChIP-seq and ATAC-seq

Quantitative Radiomics System: Decoding the Tumor Phenotype

DeSigN: connecting gene expression with therapeutics for drug repurposing and development. Bernard lee GIW 2016, Shanghai 8 October 2016

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures

RmiR package vignette

I. Setup. - Note that: autohgpec_v1.0 can work on Windows, Ubuntu and Mac OS.

LC-MS. Pre-processing (xcms) W4M Core Team. 29/05/2017 v 1.0.0

Barrett s Esophagus and Esophageal Adenocarcinoma Epigenetic Biomarker Discovery Using Infinium Methylation

Profile Analysis. Intro and Assumptions Psy 524 Andrew Ainsworth

Human Cancer Genome Project. Bioinformatics/Genomics of Cancer:

Transcription:

Introduction to antiprofiles Héctor Corrada Bravo hcorrada@gmail.com Modified: March 13, 2013. Compiled: April 30, 2018 Introduction This package implements the gene expression anti-profiles method in [1]. Anti-profiles are a new approach for developing cancer genomic signatures that specifically takes advantage of gene expression heterogeneity. They explicitly model increased gene expression variability in cancer to define robust and reproducible gene expression signatures capable of accurately distinguishing tumor samples from healthy controls. In this vignette we will use the companion antiprofilesdata package to illustrate some of the analysis in that paper. > # these are libraries used by this vignette > require(antiprofiles) > require(antiprofilesdata) > require(rcolorbrewer) Colon cancer expression data The antiprofilesdata package contains expression data from normal colon tissue samples and colon cancer samples from two datasets in the Gene Expression Omnibus, http: //www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse8671 and http://www.ncbi.nlm. nih.gov/geo/query/acc.cgi?acc=gse4183. Probesets annotated to genes within blocks of hypo-methylation in colon cancer defined in [2]. Let s load the data and take a look at its contents. 1

> data(apcolondata) > show(apcolondata) ExpressionSet (storagemode: lockedenvironment) assaydata: 5339 features, 68 samples element names: exprs protocoldata: none phenodata samplenames: GSM95473 GSM95474... GSM215114 (68 total) varlabels: filename DB_ID... Status (7 total) varmetadata: labeldescription featuredata: none experimentdata: use 'experimentdata(object)' Annotation: hgu133plus2 > # look at sample types by experiment and status > table(apcolondata$status, apcolondata$subtype, apcolondata$experimentid),, = GSE4183 adenoma colorectal_cancer normal tumor 0 0 0 8 0 1 15 15 0 0,, = GSE8671 adenoma colorectal_cancer normal tumor 0 0 0 7 0 1 0 0 0 23 The data is stored as an ExpressionSet. This dataset contains colon adenomas, benign but hyperplastic growths along with the normal and tumor tissues. Let s remove these from the remaining analysis. > drop=apcolondata$subtype=="adenoma" > apcolondata=apcolondata[,!drop] 2

Building antiprofiles The general anti-profile idea is to find genes with hyper-variable expression in cancer with respect to normal samples and classify new samples as normal vs. cancer based on deviation from a normal expression profile built from normal training samples. Anti-profiles are built using the following general algorithm: 1. Find candidate differentially variable genes (anti-profile genes): rank by ratio of cancer to normal variance 2. Define region of normal expression for each anti-profile gene: normal median ± 5 * normal MAD 3. For each sample to classify: (a) count number of antiprofile genes outside normal expression region (anti-profile score) (b) if score is above threshold, then classify as cancer We will use data from one of the experiments to train the anti-profile (steps 1 and 2 above) and test it on the data from the other experiment (step 3). Computing variance ratios The first step in building an antiprofile is to calculate the ratio of normal variance to cancer variance. This is done with the apstats function. > trainsamples=pdata(apcolondata)$experimentid=="gse4183" > colonstats=apstats(exprs(apcolondata)[,trainsamples], + pdata(apcolondata)$status[trainsamples],minl=5) > head(getprobestats(colonstats)) affyid SD0 SD1 stat meds0 mads0 4895 214974_x_at 0.23369554 7.777167 5.056543-1.5182409 0.20537749 10 210118_s_at 0.12684950 3.991524 4.975750-0.7449072 0.08599552 1318 205719_s_at 0.09529811 2.817550 4.885850-0.8532822 0.08670758 2707 205863_at 0.12430282 3.431335 4.786839-0.5508941 0.15574925 4896 215101_s_at 0.24540888 6.578072 4.744406-0.9257671 0.15043660 2272 227140_at 0.24976921 5.718607 4.516996-1.8026488 0.13726594 3

We can see how that ratio is distributed for these probesets: > hist(getprobestats(colonstats)$stat, nc=100, + main="histogram of log variance ratio", xlab="log2 variance ratio") Histogram of log variance ratio Frequency 0 50 150 250 2 0 2 4 log2 variance ratio Building the anti-profile Now we construct the anti-profile by selecting the 100 probesets most hyper-variable probesets > ap=buildantiprofile(colonstats, tissuespec=false, sigsize=100) > show(ap) 4

AntiProfile object with 100 probes Normal medians Min. 1st Qu. Median Mean 3rd Qu. Max. -1.8026-0.8542-0.5102 0.6835-0.1187 23.6885 Using cutoff 5 Computing the anti-profile score Given the estimated anti-profile, we can get anti-profile scores for a set of testing samples. > counts=apcount(ap, exprs(apcolondata)[,!trainsamples]) > palette(brewer.pal(8,"dark2")) > # plot in score order > o=order(counts) > dotchart(counts[o],col=pdata(apcolondata)$status[!trainsamples][o]+1, + labels="",pch=19,xlab="anti-profile score", + ylab="samples",cex=1.3) > legend("bottomright", legend=c("cancer","normal"),pch=19,col=2:1) 5

Cancer Normal 10 20 30 40 50 60 anti profile score The anti-profile score measures deviation from the normal expression profile obtained from the training samples. We see in this case that the anti-profile score can distinguish the test samples perfectly based on deviation from the normal profile. References [1] Hector Corrada Bravo, Vasyl Pihur, Matthew McCall, Rafael A Irizarry, and Jeffrey T Leek. Gene expression anti-profiles as a basis for accurate universal cancer signatures. BMC Bioinformatics, 13(1):272, October 2012. [2] Kasper Daniel Hansen, Winston Timp, Héctor Corrada Bravo, Sarven Sabunciyan, Benjamin Langmead, Oliver G McDonald, Bo Wen, Hao Wu, Yun Liu, Dinh Diep, Eirikur Briem, Kun Zhang, Rafael A Irizarry, and Andrew P Feinberg. Increased methylation variation in epigenetic domains across cancer types. Nature Genetics, 43(8):768 775, August 2011. 6

SessionInfo ˆ R version 3.5.0 (2018-04-23), x86_64-pc-linux-gnu ˆ Locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=C, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8, LC_IDENTIFICATION=C ˆ Running under: Ubuntu 16.04.4 LTS ˆ Matrix products: default ˆ BLAS: /home/biocbuild/bbs-3.7-bioc/r/lib/librblas.so ˆ LAPACK: /home/biocbuild/bbs-3.7-bioc/r/lib/librlapack.so ˆ Base packages: base, datasets, grdevices, graphics, methods, parallel, stats, utils ˆ Other packages: Biobase 2.40.0, BiocGenerics 0.26.0, RColorBrewer 1.1-2, antiprofiles 1.20.0, antiprofilesdata 1.15.0, locfit 1.5-9.1, matrixstats 0.53.1 ˆ Loaded via a namespace (and not attached): compiler 3.5.0, grid 3.5.0, lattice 0.20-35, tools 3.5.0 7