RchyOptimyx: Gating Hierarchy Optimization for Flow Cytometry

Similar documents
Package flowtype. R topics documented: July 18, Type Package. Title Phenotyping Flow Cytometry Assays. Version

The CAPRI-T was one of two (along with the CAPRI-NK, see reference. [29]) basic immunological studies nested within the CAMELIA trial.

Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE

Nature Methods: doi: /nmeth.3115

Network Analysis of Toxic Chemicals and Symptoms: Implications for Designing First-Responder Systems

supplemental Figure 1

Comparing Multifunctionality and Association Information when Classifying Oncogenes and Tumor Suppressor Genes

Package AbsFilterGSEA

Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010

Supplementary Figure 1

Reliability of Ordination Analyses

A Quick-Start Guide for rseqdiff

Expanded View Figures

What to Measure, How to Measure It

<10. IL-1β IL-6 TNF + _ TGF-β + IL-23

Biostatistics II

Fast and Easy Isolation of T Cells

Treatment with IL-7 Prevents the Decline of Circulating CD4 + T Cells during the Acute Phase of SIV Infection in Rhesus Macaques

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Nature Medicine: doi: /nm.2109

R documentation. of GSCA/man/GSCA-package.Rd etc. June 8, GSCA-package. LungCancer metadi... 3 plotmnw... 5 plotnw... 6 singledc...

User Guide. Association analysis. Input

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.

Metabolomic Data Analysis with MetaboAnalyst

Flow Cytometry Critical Assessment of Population Identification Methods

Nature Immunology: doi: /ni Supplementary Figure 1. RNA-Seq analysis of CD8 + TILs and N-TILs.

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

Package HAP.ROR. R topics documented: February 19, 2015

IgG3 regulates tissue-like memory B cells in HIV-infected individuals

Visualization of Human Disorders and Genes Network

SUPPLEMENTARY MATERIAL

Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science

CyTOF analyses in rheumatoid arthritis. Deepak Rao, MD PhD Rheumatology, Immunology, Allergy, BWH

Early immunologic and virologic predictors of clinical HIV-1 disease progression

SUPPLEMENTARY INFORMATION

Supplementary Figure 1

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

The Simulacrum. What is it, how is it created, how does it work? Michael Eden on behalf of Sally Vernon & Cong Chen NAACCR 21 st June 2017

Ulrich Mansmann IBE, Medical School, University of Munich

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

Supplementary Figure 1. Using DNA barcode-labeled MHC multimers to generate TCR fingerprints

Multilevel data analysis

Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities

SubLasso:a feature selection and classification R package with a. fixed feature subset

Package ORIClust. February 19, 2015

Naive, memory and regulatory T lymphocytes populations analysis

Low Avidity CMV + T Cells accumulate in Old Humans

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Network-based pattern recognition models for neuroimaging

SUPPLEMENTARY INFORMATION

Package Actigraphy. R topics documented: January 15, Type Package Title Actigraphy Data Analysis Version 1.3.

Application of Resampling Methods in Microarray Data Analysis

Predicting Breast Cancer Survivability Rates

ECDC HIV Modelling Tool User Manual

IAT 355 Visual Analytics. Encoding Information: Design. Lyn Bartram

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Supplementary Figure 1. Immune profiles of untreated and PD-1 blockade resistant EGFR and Kras mouse lung tumors (a) Total lung weight of untreated

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Package CorMut. January 21, 2019

Numerical Integration of Bivariate Gaussian Distribution

Supporting Information

Animal tissue-specific biomolecules influence on rats with cyclophosphamide-induced immunosuppression

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

Package cssam. February 19, 2015

Using CART to Mine SELDI ProteinChip Data for Biomarkers and Disease Stratification

Modeling Sentiment with Ridge Regression

Progress in Risk Science and Causality

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

EXPression ANalyzer and DisplayER

Package xseq. R topics documented: September 11, 2015

Package ordibreadth. R topics documented: December 4, Type Package Title Ordinated Diet Breadth Version 1.0 Date Author James Fordyce

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

Understanding Poor Vaccine Responses: Transcriptomics of Vaccine Failure

Gene Ontology 2 Function/Pathway Enrichment. Biol4559 Thurs, April 12, 2018 Bill Pearson Pinn 6-057

CNV PCA Search Tutorial

Alessandra Franco MD PhD UCSD School of Medicine Department of Pediatrics Division of Allergy Immunology and Rheumatology

Naïve Bayes classification in R

Package CorMut. August 2, 2013

Mature microrna identification via the use of a Naive Bayes classifier

ECDC HIV Modelling Tool User Manual version 1.0.0

Supplementary Figure 1. ALVAC-protein vaccines and macaque immunization. (A) Maximum likelihood

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

Lentiviral Delivery of Combinatorial mirna Expression Constructs Provides Efficient Target Gene Repression.

CHAPTER 3 LABORATORY PROCEDURES

Supplementary Information

Identifying or Verifying the Number of Factors to Extract using Very Simple Structure.

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

Journal: Nature Methods

Supplement for: CD4 cell dynamics in untreated HIV-1 infection: overall rates, and effects of age, viral load, gender and calendar time.

Vega: Variational Segmentation for Copy Number Detection

CSE 255 Assignment 9

Structured models for dengue epidemiology

Evaluation of a Single-Platform Technology for Lymphocyte Immunophenotyping

HD1 (FLU) HD2 (EBV) HD2 (FLU)

Table 1. Summary of PET and fmri Methods. What is imaged PET fmri BOLD (T2*) Regional brain activation. Blood flow ( 15 O) Arterial spin tagging (AST)

Transcription:

RchyOptimyx: Gating Hierarchy Optimization for Flow Cytometry Nima Aghaeepour and Adrin Jalali April 4, 2013 naghaeep@bccrc.ca Contents 1 Licensing 1 2 Introduction 1 3 First Example: Preparing Raw Data for RchyOptimyx 2 3.1 Processing using flowtype...................... 2 3.2 Basic RchyOptimyx Functionality.................. 4 4 Analysis of a Large Dataset 6 4.1 Second Example: Optimization against a Clinical Outcome... 7 4.2 Third Example: Optimization against Event Overlap....... 11 1 Licensing Under the Artistic License, you are free to use and redistribute this software. 2 Introduction This document demonstrates the functionality of the RchyOptimyx package, a tool for cellular hierarchy OPTIMization for flow cytometry data (named after Archeopteryx). RchyOptimyx models all possible gating strategies and marker panels that can be generated using a high-color assay, and uses dynamic programing and optimization tools from graph-theory to determine the minimal sets of markers that can identify a target population to a desired level of purity. A cellular 1

hierarchy is a directed acyclic graph (DAG), embedded in a plane as a topdown diagram, with one node on the top most level representing all cells (or a major component therefore, such as T-cells) and nodes further down showing more specific cell populations. All the intermediate cell populations are placed in the hierarchy using parent-child relationships. The graph starts from level 0 to level m including i-marker phenotypes on i th level. The phenotype with 0 markers is the top most phenotype with all cells and the phenotype with m markers is the cell population of interest. The required input phenotypes and their respective scores (target values of the optimization) can be generated either using manual gates or automated gating algorithms (see the flowtype package in Bioconductor for examples). 3 First Example: Preparing Raw Data for Rchy- Optimyx In this example, we start from a raw flowset and generated the required materials to produce an RchyOptimyx graph. The dataset consists of a flowset HIVData with 18 HIV + and 13 normals and a matrix HIVMetaData which consists of FCS filename, tube number, and patient label. In this example, we are interested in the second tube only. For more details please see the flowtype package. > library(flowtype) > data(hivdata) > data(hivmetadata) > HIVMetaData <- HIVMetaData[which(HIVMetaData[,'Tube']==2),]; We convert the subject labels so that HIV + and normal subjects are labeled 2 and 1, respectively. > Labels=(HIVMetaData[,2]=='+')+1; 3.1 Processing using flowtype We start by calculating the cell proportions using flowtype: > library(flowcore); > library(rchyoptimyx); > ##Markers for which cell proportions will be measured. > PropMarkers <- 5:10 > ##Markers for which MFIs will be measured. > MFIMarkers <- PropMarkers > ##Marker Names > MarkerNames <- c('time', 'FSC-A','FSC-H','SSC-A', + 'IgG','CD38','CD19','CD3', + 'CD27','CD20', 'NA', 'NA') 2

> ##Apply flowtype > ResList <- fsapply(hivdata, 'flowtype', PropMarkers, + MFIMarkers, 'kmeans', MarkerNames); > ##Extract phenotype names > phenotype.names=names(reslist[[1]]@cellfreqs) Then we extract all cell proportions from the list of flowtype results and normalize them by the total number of cells in each sample to create the all.proportions matrix. > all.proportions <- matrix(0,3^length(propmarkers) + -1,length(HIVMetaData[,1])) > for (i in 1:length(ResList)){ + all.proportions[,i] = ResList[[i]]@CellFreqs + all.proportions[,i] = all.proportions[,i] / + max(all.proportions + [which(names(reslist[[i]]@cellfreqs)==''),i]) + } We use a t-test to select the phenotypes that have a significantly different mean across the two groups of patients (FDR=0.05). Remember that in real world use-cases the assumptions of a t-test must be checked or a resamplingbased alternative (e.g., a permutation test) should be used. Sensitivity analysis (e.g., bootstrapping) is also necessary. Eight phenotypes are selected as statistically significant. > Pvals <- vector(); > EffectSize <- vector(); > for (i in 1:dim(all.proportions)[1]){ + + ##If all of the cell proportions are 1 (i.e., the phenotype + ##with no gates) the p-value is 1. + if (length(which(all.proportions[i,]!=1))==0){ + Pvals[i]=1; + EffectSize[i]=0; + next; + } + temp=t.test(all.proportions[i, Labels==1], + all.proportions[i, Labels==2]) + Pvals[i] <- temp$p.value + EffectSize[i] <- abs(temp$statistic) + } > Pvals[is.nan(Pvals)]=1 > names(pvals)=phenotype.names > ##Bonferroni's correction > selected <- which(p.adjust(pvals)<0.1); > print(names(selected)) 3

[1] "IgG-CD38-CD19-CD27+CD20-" "IgG-CD38-CD19-CD27+" [3] "IgG-CD38-CD27+CD20-" "IgG-CD38-CD27+" [5] "IgG-CD19-CD27+CD20-" "IgG-CD19-CD27+" [7] "IgG-CD27+CD20-" "IgG-CD27+" 3.2 Basic RchyOptimyx Functionality We select the longest one (IgG-CD38-CD19-CD27+CD20-) for further analysis using RchyOptimyx. First we need to create the Signs matrix. We use the digitsbase number to generate a matrix with 3 length(p ropmarkers) 1 rows and length(p ropm arkers) columns. flowtype produces it s results in the exact same order, making assigning row and column names easy. > library(sfsmisc) > Signs=t(digitsBase(1:(3^length(PropMarkers)-1), + 3,ndigits=length(PropMarkers))) > rownames(signs)=phenotype.names; > colnames(signs)=markernames[propmarkers] > head(signs) [1] 0 0 0 0 0 0 Now we can plot the complete hierarchy: > res<-rchyoptimyx(signs, -log10(pvals), + paste(signs['igg-cd38-cd19-cd27+cd20-',], + collapse=''), factorial(6), FALSE) > plot(res, phenotypescores=-log10(pvals), ylab='-log10(pvalue)') 4

All Cells CD20 CD27+ CD19 CD38 IgG CD20 CD27+ CD19 CD38 IgG CD27+ CD20 CD19 CD20 CD19 CD27+ CD38 CD20 CD38 CD27+CD19 IgG CD20 IgG CD27+CD19 IgG IgG CD38 CD27+CD20 CD19 CD20 CD19 CD27+ CD38 CD20 CD38 CD27+ CD38 CD19 IgG CD20 IgG CD27+ IgG CD19 IgG CD38 CD19 CD27+ CD20 CD38 CD27+ CD20 CD38 CD19 CD20 CD38 CD19 CD27+ IgG CD27+ CD20 IgG CD19 CD20 IgG CD19 CD27+ IgG CD38 CD20 IgG CD38 CD27+ IgG CD38 CD19 CD19 CD27+CD20 CD38 CD27+CD20 CD38 CD19 CD20 CD38 CD19 CD27+ IgG CD27+CD20 IgG CD19 CD20 IgG CD19 CD27+ IgG CD38 CD20 IgG CD38 CD27+ IgG CD38 CD19 CD38 CD19 CD27+ CD20 IgG CD19 CD27+ CD20 IgG CD38 CD27+ CD20 IgG CD38 CD19 CD20 IgG CD38 CD19 CD27+ CD38 CD19 CD27+CD20 IgG CD19 CD27+CD20 IgG CD38 CD27+CD20 IgG CD38 CD19 CD20 IgG CD38 CD19 CD27+ IgG CD38 CD19 CD27+ CD20 IgG CD38 CD19 CD27+CD20 log10(pvalue) 0 1 2 3 4 5 and an optimized hierarchy (with only the top 15 paths): > res<-rchyoptimyx(signs, -log10(pvals), + paste(signs['igg-cd38-cd19-cd27+cd20-',], + collapse=''), 15, FALSE) > plot(res, phenotypescores=-log10(pvals), ylab='-log10(pvalue)') 5

CD19 CD27+CD20 IgG CD27+CD20 IgG CD27+CD20 IgG CD19 CD27+CD20 CD19 CD27+ CD38 CD19 CD27+ IgG CD38 CD27+CD20 All Cells CD27+ IgG CD38 CD19 CD27+CD20 CD27+ CD20 CD19 IgG CD38 IgG CD27+ IgG CD19 CD27+ IgG CD38 CD19 CD27+ CD38 CD27+ CD19 IgG CD20 CD20 CD38 IgG CD19 CD38 IgG CD19 CD20 CD38 CD38 CD19 CD20 IgG CD20 CD38 IgG CD38 CD27+ CD19 log10(pvalue) 0 1 2 3 4 5 4 Analysis of a Large Dataset In this section we will describe two use-cases of RchyOptimyx using a real-world clinical dataset of 17-color assays of 466 HIV + subjects. We start by loading the library (for installation guidelines see the Bioconductor website). > library(rchyoptimyx) > data(hivdata) The HIVData dataset consists of a matrix (Signs) and 2 numeric vectors LogRankPvals and OverlapScores). The Signs matrix consists of 10 columns (one per measured marker) and 3 10 1 = 59048 rows (one per immunophenotype) similar to the previous example. See [1] or the flowtype package for more 6

details. For every phenotype (row), the entity corresponding to a given marker (column) can be 0, 1, and 2 for negative, neutral, and positive respectively. The row names and column names are set respectively. LogRankPvals is a vector of log-rank test P-values to determine the correlation between HIV s progression and each of the measured immunophenotypes in 466 HIV positive patients (lower values represent a stronger correlation). For more details see [1]. The names of the vector match the names of the Signs matrix. Ganesan et. al. define Naive T-cells as CD28+CD45RO-CD57-CCR5-CD27+ CCR7+ within the CD3+CD14- compartment [2]. The OverlapScores vector has the proportion of Naive T-cells (as defined above) to the total number of cells in any given immunophenotype (a higher value represents a larger overlap): Number of CD28 + CD45RO CD57 CCR5 CD27 + CCR7 + cells All Samples Number of cells in the given population (Number of Samples) The names of the vector match the names of the Signs matrix. KI67 + CD4 CCR5 + CD127 cells in HIV + patients have a negative correlation with protection against HIV [1]. The P-value assigned to the immunophenotype confirms this: > LogRankPvals['KI-67+CD4-CCR5+CD127-'] 4.1 Second Example: Optimization against a Clinical Outcome KI-67+CD4-CCR5+CD127-1.702094e-10 We first need to calculate the base-3 code of the immunophenotype using the Signs matrix: > paste(signs['ki-67+cd4-ccr5+cd127-',], collapse='') [1] "2111012110" Since 4 markers are involved in this immunophenotype, there are 4! = 24 paths to reach this population. RchyOptimyx can calculate and visualize all of these paths, alongside each intermediary population s predictive power: > par(mar=c(1,4,1,1)) > res<-rchyoptimyx(signs, -log10(logrankpvals), + '2111012110', 24,FALSE) > plot(res, phenotypescores=-log10(logrankpvals), + ylab='-log10(pvalue)') (1) 7

KI 67+CD4 KI 67+ KI 67+CD4 CCR5+ KI 67+CCR5+ KI 67+ CD4 All Cells CD4 CCR5+ KI 67+CD4 CD127 CD4 CCR5+ CCR5+ CD4 CCR5+ KI 67+ CD127 KI 67+ CCR5+ CD4 CCR5+ CD4 CD127 KI 67+ CD127 KI 67+CD127 KI 67+CD4 CCR5+CD127 CD127 KI 67+CCR5+CD127 CD127 CD127 KI 67+ CD127 CD4 CCR5+ CD4 CD127 KI 67+ CCR5+CD127 CD4 CD127 CCR5+ KI 67+ CD127 CCR5+ KI 67+CD4 CCR5+ CD4 CD4 CCR5+CD127 log10(pvalue) 0 2 4 6 8 10 12 The width of the edges demonstrates the amount of predictive power gained by moving from one node to another. The color of the nodes demonstrates the predictive power of the cell population. Node colors, edge weights, and node/edge labels can be removed from the graph as desired using the parameters of the plot 8

All Cells CD127 CCR5+ CD4 KI 67+ CCR5+CD127 CD4 CD127 CD4 CCR5+ KI 67+CD127 KI 67+CCR5+ KI 67+CD4 CD4 CCR5+CD127 KI 67+CCR5+CD127 KI 67+CD4 CD127 KI 67+CD4 CCR5+ KI 67+CD4 CCR5+CD127 function: res is an OptimizedHierarchy object: > summary(res) An optimized hierarchy with 16 nodes, 32 edges, and 24 paths This object stores the scores assigned to everyone of the calculated paths: > plot(ecdf(res@pathscores), verticals=true) 9

ecdf(res@pathscores) Fn(x) 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 x We can re-run RchyOptimyx to limit the hierarchy to the top 4 paths: > par(mar=c(1,4,1,1)) > res<-rchyoptimyx(signs, -log10(logrankpvals), '2111012110', 4,FALSE) > plot(res, phenotypescores=-log10(logrankpvals), ylab='-log10(pvalue)') 10

All Cells CCR5+ KI 67+ CCR5+ KI 67+ KI 67+ CCR5+ CD127 KI 67+CCR5+ KI 67+CD127 CD4 CD127 CCR5+ KI 67+CD4 CCR5+ KI 67+CCR5+CD127 CD127 CD4 KI 67+CD4 CCR5+CD127 log10(pvalue) 0 2 4 6 8 10 12 This suggests that the KI-67 + CCR5 + cell population is also correlated with protection against HIV but uses only 2 markers. This can be confirmed by the log-rank test P-value: > LogRankPvals['KI-67+CCR5+'] KI-67+CCR5+ 1.317502e-11 4.2 Third Example: Optimization against Event Overlap Ganesan et. al. use a strict but potentially redundant definition for naive T- cells, of CD28 + CD45RO CD57 CCR5 CD27 + CCR7 + within the CD3 + CD14 compartment [2]. Since 6 markers are involved, 720 paths can exist: 11

> res<-rchyoptimyx(signs, OverlapScores, + paste(signs['cd28+cd45ro-cd57-ccr5-cd27+ccr7+',], + collapse=''), 720, FALSE) > par(mar=c(1,4,1,1)) > plot(res, phenotypescores=overlapscores, ylab='purity', + uniformcolors=true, edgeweights=false, edgelabels=false, + nodelabels=true) All Cells CCR7+ CD45RO CD27+ CD57 CCR5 CD28+ CD45RO CCR7+ CD27+CCR7+ CD57 CCR7+ CCR5 CCR7+ CD28+CCR7+ CD45RO CD27+ CD45RO CD57 CD45RO CCR5 CD57 CD27+ CCR5 CD27+ CD57 CCR5 CD28+CD45RO CD28+CD27+ CD28+CD57 CD28+CCR5 CD45RO CD27+CCR7+ CD45RO CD57 CCR7+ CD45RO CCR5 CCR7+ CD57 CD27+CCR7+ CCR5 CD27+CCR7+ CD57 CCR5 CCR7+ CD28+CD45RO CCR7+ CD28+CD27+CCR7+ CD28+CD57 CCR7+ CD28+CCR5 CCR7+ CD45RO CD57 CD27+ CD45RO CCR5 CD27+ CD45RO CD57 CCR5 CD57 CCR5 CD27+ CD28+CD45RO CD27+ CD28+CD45RO CD57 CD28+CD45RO CCR5 CD28+CD57 CD27+ CD28+CCR5 CD27+ CD28+CD57 CCR5 CD45RO CD57 CD27+CCR7+ CD45RO CCR5 CD27+CCR7+ CD45RO CD57 CCR5 CCR7+ CD57 CCR5 CD27+CCR7+ CD28+CD45RO CD27+CCR7+ CD28+CD45RO CD57 CCR7+ CD28+CD45RO CCR5 CCR7+ CD28+CD57 CD27+CCR7+ CD28+CCR5 CD27+CCR7+ CD28+CD57 CCR5 CCR7+ CD45RO CD57 CCR5 CD27+ CD28+CD45RO CD57 CD27+ CD28+CD45RO CCR5 CD27+ CD28+CD45RO CD57 CCR5 CD28+CD57 CCR5 CD27+ CD45RO CD57 CCR5 CD27+CCR7+ CD28+CD45RO CD57 CD27+CCR7+ CD28+CD45RO CCR5 CD27+CCR7+ CD28+CD45RO CD57 CCR5 CCR7+ CD28+CD57 CCR5 CD27+CCR7+ CD28+CD45RO CD57 CCR5 CD27+ CD28+CD45RO CD57 CCR5 CD27+CCR7+ Here is the distribution of these paths: > plot(ecdf(res@pathscores), verticals=true) 12

ecdf(res@pathscores) Fn(x) 0.0 0.2 0.4 0.6 0.8 1.0 3 4 5 6 x And a cellular hierarchy with the top 5 paths: > res<-rchyoptimyx(signs, OverlapScores, + paste(signs['cd28+cd45ro-cd57-ccr5-cd27+ccr7+',], + collapse=''), 5, FALSE) > par(mar=c(1,4,1,1)) > plot(res, phenotypescores=overlapscores, ylab='purity') 13

All Cells CCR7+ CCR5 CCR7+ CCR5 CCR5 CCR7+ CD45RO CCR5 CCR7+ CD45RO CCR5 CD45RO CCR7+ CD45RO CCR5 CCR7+ CD27+ CD57 CD45RO CCR5 CD27+CCR7+ CD45RO CD57 CCR5 CCR7+ CD28+ CD57 CD27+ CD28+CD45RO CCR5 CD27+CCR7+ CD45RO CD57 CCR5 CD27+CCR7+ CD57 CD28+ CD28+CD45RO CD57 CCR5 CD27+CCR7+ Purity 0 0.2 0.4 0.6 0.8 1 This shows that a 95% pure population of strictly naive T cells can be identified using only 3 markers (CD45RO CCR5 CCR7 + ). > OverlapScores['CD45RO-CCR5-CCR7+'] CD45RO-CCR5-CCR7+ 0.9489143 References [1] N. Aghaeepour, P. K. Chattopadhyay, A. Ganesan, K. O Neill, H. Zare, A. Jalali, H. H. Hoos, M. Roederer, and R. R. Brinkman. Early Immunologic Correlates of HIV Protection can be Identified from Computational Analysis 14

of Complex Multivariate T-cell Flow Cytometry Assays. Bioinformatics, Feb 2012. [2] A. Ganesan, P.K. Chattopadhyay, T.M. Brodie, J. Qin, W. Gu, J.R. Mascola, N.L. Michael, D.A. Follmann, and M. Roederer. Immunologic and virologic events in early hiv infection predict subsequent rate of progression. Journal of Infectious Diseases, 201(2):272, 2010. 15