The power of single cells: Building a tumor immune atlas Dana Pe er Department of Biological Science Department of Systems Biology Columbia

Similar documents
Bayesian Inference for Single-cell ClUstering and ImpuTing (BISCUIT) Elham Azizi

Single-Cell Sequencing in Cancer. Peter A. Sims, Columbia University G4500: Cellular & Molecular Biology of Cancer October 22, 2018

Welcome. Nanostring Immuno-Oncology Summit. September 21st, FOR RESEARCH USE ONLY. Not for use in diagnostic procedures.

AbSeq on the BD Rhapsody system: Exploration of single-cell gene regulation by simultaneous digital mrna and protein quantification

Research Strategy: 1. Background and Significance

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University

T. R. Golub, D. K. Slonim & Others 1999

Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE

Identification of Tissue Independent Cancer Driver Genes

RNA-seq Introduction

Chapter 1. Introduction

Graphical Modeling Approaches for Estimating Brain Networks

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Nature Immunology: doi: /ni Supplementary Figure 1. Examples of staining for each antibody used for the mass cytometry analysis.

Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Effector T Cells and

CS2220 Introduction to Computational Biology

Cancer immunity and immunotherapy. General principles

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Supplemental Information. Aryl Hydrocarbon Receptor Controls. Monocyte Differentiation. into Dendritic Cells versus Macrophages

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

Spatially resolved multiparametric single cell analysis. Technical Journal Club 19th September 2017 Christina Müller (Group Speck)

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets!

Using CART to Mine SELDI ProteinChip Data for Biomarkers and Disease Stratification

CyTOF analyses in rheumatoid arthritis. Deepak Rao, MD PhD Rheumatology, Immunology, Allergy, BWH

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Personalized, Evidence-based, Outcome-driven Healthcare Empowered by IBM Cognitive Computing Technologies. Guotong Xie IBM Research - China

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

Emerging Tissue and Serum Markers

Nature Structural & Molecular Biology: doi: /nsmb.2419

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Kelvin Chan Feb 10, 2015

fl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b)

National Academies Next Generation SAMPLE Researchers TITLE Initiative HERE

Immuno-Oncology Therapies and Precision Medicine: Personal Tumor-Specific Neoantigen Prediction by Machine Learning

Doing Thousands of Hypothesis Tests at the Same Time. Bradley Efron Stanford University

Inferring Biological Meaning from Cap Analysis Gene Expression Data

MethylMix An R package for identifying DNA methylation driven genes

Supplemental Table I.

CS 6824: Tissue-Based Map of the Human Proteome

Bayesian Prediction Tree Models

Transcript-indexed ATAC-seq for immune profiling

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012

DISCOVER MORE WITH LESS SAMPLE. nanostring.com/3d

EXPression ANalyzer and DisplayER

Statement of research interest

Deep Learning Models for Time Series Data Analysis with Applications to Health Care

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers

Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science

Start your T cell research right

Supplementary Figure 1 IL-27 IL

New technologies for studying human immunity. Lisa Wagar Postdoctoral fellow, Mark Davis lab Stanford University School of Medicine

Lentiviral Delivery of Combinatorial mirna Expression Constructs Provides Efficient Target Gene Repression.

Molecular Profiling of Tumor Microenvironment Alex Chenchik, Ph.D. Cellecta, Inc.

Nonresponse Rates and Nonresponse Bias In Household Surveys

Immuno-Oncology Therapies and Precision Medicine: Personal Tumor-Specific Neoantigen Prediction by Machine Learning

High-Throughput Sequencing Course

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

bivariate analysis: The statistical analysis of the relationship between two variables.

Michael Hallquist, Thomas M. Olino, Paul A. Pilkonis University of Pittsburgh

PRECISION MEDICINE: THE FUTURE OF PRIMARY CARE

Gene-microRNA network module analysis for ovarian cancer

BayesOpt: Extensions and applications

Supplementary Figure 1. Immune profiles of untreated and PD-1 blockade resistant EGFR and Kras mouse lung tumors (a) Total lung weight of untreated

Panel: Machine Learning in Surgery and Cancer

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Supplement to SCnorm: robust normalization of single-cell RNA-seq data

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

CSE Introduction to High-Perfomance Deep Learning ImageNet & VGG. Jihyung Kil

Advanced Data Modelling & Inference

Do Your Flow Cytometric LDTs. Validation Guidelines? Fiona E. Craig, MD University of Pittsburgh School of Medicine

Data Analysis Using Regression and Multilevel/Hierarchical Models

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

A fresh approach to Immuno-oncology: Ex vivo analysis of drug efficacy in fresh patient tumortissue

RNA- seq Introduc1on. Promises and pi7alls

Outlier Analysis. Lijun Zhang

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection

Social Effects in Blau Space:

The modular and integrative functional architecture of the human brain

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Adaptive Trial Design and Incorporation of Biomarkers to Maximize Achievable Objectives. In Early Phase Clinical Studies

Immune Cell Phenotyping in Solid Tumors using Quantitative Pathology

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Workgroup Webinar Tuesday, May 26, :00 p.m.

Deep Learning Analytics for Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations

Statistical Audit. Summary. Conceptual and. framework. MICHAELA SAISANA and ANDREA SALTELLI European Commission Joint Research Centre (Ispra, Italy)

cis-regulatory enrichment analysis in human, mouse and fly

Molecular mechanisms of the T cellinflamed tumor microenvironment: Implications for cancer immunotherapy

Bioinformatics and Computational Pharmacology

Cancer is One Disease

Complier Average Causal Effect (CACE)

Multivariate Multilevel Models

Package xseq. R topics documented: September 11, 2015

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

Bayesian (Belief) Network Models,

An Introduction to Multiple Imputation for Missing Items in Complex Surveys

Reveal Relationships in Categorical Data

Using Bayesian Networks to Analyze Expression Data Λ

Transcription:

The power of single cells: Building a tumor immune atlas Dana Pe er Department of Biological Science Department of Systems Biology Columbia University

The Precision Medicine Initiative Most personalized medicine efforts are focused on DNA, but DNA will not cut it Doctors have always recognized that every patient is unique. You can match a blood transfusion to a blood type that was an important discovery. What if matching a cancer cure to our genetic code was just as easy, just as standard? What if figuring out the right dose of medicine was as simple as taking our temperature? President Obama, 2015

C Precision Medicine? Controls Cases C C C T T C T T C 90% of the loci that associate with human traits and diseases are outside genes Recent evidence supports that these fall in regions that regulate gene expression DNA RNA NETWORK PHENOTYPE

Cells: Key intermediates from genotype to phenotype When a genetic association is found: Which tissue dysfunctions?

3 billion letters ACCAGTTACGACGGTCA GGGTACTGATACCCCAA ACCGTTGACCGCATTTA CAGACGGGGTTTGGGTT TTGCCCCACACAGGTAC GTTAGCTACTGGTTTAG CAATTTACCGTTACAAC GTTTACAGGGTTACGGT One Genome Many Cell Types Gene expression is central to understanding genetics and disease TGGGATTTGAAAAAAAG TTTGAGTTGGTTTTTTC ACGGTAGAACGTACCGT TACCAGTA Expressed genes differ between cell types The regulatory region of a gene differs between cell types! Tissues contain many different cell types

A cell atlas will be as empowering as A Human Cell Atlas the human genome map. Our genes are well mapped, but most of cell types remain unknown Cells are basic biological units Diseases are caused by malfunction of specific cell types. Goal: Construct a comprehensive map of all cell types in our body

Protein Y A Geometric Approach to Phenotype Cell Phenotype: A configuration of multidimensional expression Defines a region in phenotypic space Data will consist of millions of multi-parameter cells Protein X Emerging high dimensional single cell technologies: CyTOF, singlecell RNA-seq and MIBI allow us to characterize phenotypic space 8

visne map of healthy bone-marrow Visualizing information derived from many dimensions Amir et. al. Nature Biotech 2013 t-sne based Non-Linear Dimensionality Reduction

Cell phenotypes accumulate in complex non-convex manifolds

A social network for cells CD34 Expression low high Levine*,Simonds* et. al. Cell 2015 Sample H1 Convert data to graph using Jaccard metric Graph approximates phenotypic manifold Perform density estimation in the graph Identify regions of phenotypic stability Produce explicit labeling of subpopulations

Phenograph, community detection Community detection identifies densely interconnected node sets W ij : affinity function [ij coupling] s i : total affinity of i c i : community assignment for i 2m: vol(w) [normalization] Combinatorial optimization Louvain method provides efficient heuristic (Blondel et al. J. Stat. Mech. 2008)

PhenoGraph outperforms leading methods for subpopulation detection Can run on 1 million cells in the same time it takes competing methods to run on 80,000 cells

Immunotherapy in Cancer The miracle: 40% of metastatic melanoma patients showing durable response of many years Success stories in many additional bad cancers including Lung, AML, Bladder, Glio-blastoma Immunotherapy works for a small % of cancer patients, but when it works, it works

Precision Cancer Medicine Current efforts are based on targeted therapy, but Cancer is so smart and evolving, simple drugs will not cut it. Preexisting resistant clones present before treatment Need smart and adaptive drug like our own immune system Need: big data approaches to understand how immunotherapy can be extended to all patients

Tumor Immune System Atlas Need thousands of CD45+ cells per tumor Goal: Characterize sub-populations in tumor immune ecosystem. Challenge: Substantial unknown diversity. A better understanding of tumor immune eco-system will aid the development strategies to activate it against the tumor.

In-Drop Parallel Processing of RNA-Seq Libraries from >10,000 Individual Cells Cells RT/lysis Encapsulation Hydrogel beads Microfludic device can do 30,000 cells in one experiment Tiny wells cut cost of reagents by 1000-fold Highly scalable and inexpensive single-cell RNAseq Klein*, Mazutis* et. al. Cell 2015

In-drop characterization of tumor immune cells in breast cancer Data-Driven approach: > 3000 CD45+ collected per tumor Mean molecules per cell > 3500 Carr, Mazutis, Plitas with Rudensky lab, MSKCC

CD45+ TILs from 4 breast cancers tumors Entire regions on the map are tumor specific Are these differences real biology or technical effects? tsne 2D projection

Genes Single cell RNA-seq as imagined Count Matrix Cells 2D projection of cells (tsne) Cluster 1 Cluster 2 Cluster 3

Genes Problem: Single-cell RNA-seq data involves significant dropouts and library size variation Observed Count Matrix Cells 2D projection of cells (tsne) Need to Impute dropouts & Normalize data Cluster 1 Cluster 2 Cluster 3 Typically sample only 5% of a cell s transcriptome

Genes Common Approach: Normalizing independent of cell types Observed Count Matrix Cells Normalization Clustering Cells Downstream Analysis To mean/median library size BASiC/ERCCs Problems: Dropouts not resolved Zeros remain zero! Removes biological stochasticity specific to cell type Leads to improper clustering Biased results in downstream analysis

How can we impute expression in Single Cell RNA-seq data? A 2D projection of cells (TSNE) Prabhakaran*, Azizi* et.al, ICML 2016

Idea 1: Impute dropouts based on expression in cells with same type A No expression of Gene A in a cell But we observe similar cells mostly express Gene A

Idea 1: Impute dropouts based on expression in cells with same type A No expression of Gene A in a cell But we observe similar cells mostly express Gene A Impute dropout in Gene A based on similar cells

Idea 2: Impute dropouts based on co-expression patterns No significant inference based on similar cells

Idea 2: Impute dropouts based on co-expression patterns No significant inference based on similar cells However Gene A always coexpressed with Gene B in cells of same type 27

Idea 2: Impute dropouts based on co-expression patterns No significant inference based on similar cells However Gene A always coexpressed with Gene B in cells of same type Impute dropout in Gene A based on Gene B

Imputing & Normalization Histogram of library size in example dataset From Zeisel, Science 2014 In addition to imputing dropouts, we need to normalize data by library size

Problem with Global Normalization Cells with different sizes have very different total number of transcripts Example Housekeeping Gene High chance of Dropouts in smaller cells

Problem with Global Normalization Dropout not resolved Spurious Differential Expression

Idea: Different normalization for each cell type Problem: We don t know cell types Need to infer cell clusters

Approach: Simultaneous inference of clusters and imputing parameters Clustering Cells iterative learning Imputing & Normalization x 1 x 3 x 0.75 x 1

Approach: Simultaneous inference of clusters and imputing parameters Parameters characterizing each cluster iterative learning Cell-specific Parameters for Imputing & Normalization x 3 x 1 Assignment of cells to clusters x 0.75 x 1

Genes Modeling: Clusters of Cells using a Bayesian Mixture Model Ideal Count Matrix (normalized) Cells Cluster 1 Cluster 2 Cluster 3

Genes Modeling: Clusters of Cells using a Bayesian Mixture Model Ideal Count Matrix (normalized) Cells Cluster 1 Cluster 2 Cluster 3 One gene Model distributions of Log of counts for each gene per cell type as a Gaussian distribution Cluster 1 Cluster 2 Cluster 3

Genes Modeling: Clusters of Cells using a Bayesian Mixture Model Ideal Count Matrix (normalized) Cells Cluster 1 Cluster 2 Cluster 3 One gene Cluster 1 Cluster 2 Cluster 3 Each gene: Mixture of Log-Normal Models

Genes Modeling: Clusters of Cells using a Bayesian Mixture Model Ideal Count Matrix (normalized) Cells Cluster 2 Cluster 1 Cluster 3 Two genes Cluster 1 Cluster 2 Cluster 3 Modeling all genes together: Mixture of Multivariate Log-Normals To also take advantage of co-expression patterns in learning clusters

With Technical Variation Without Technical Variation Generative Model with Technical Variation Latent counts which we want to recover Observation

BISCUIT (Bayesian Inference for Single Cell ClUstering and ImpuTing) Cluster-specific parameters Cell-specific parameters Observed gene expression per cell j

BISCUIT (Bayesian Inference for Single Cell ClUstering and ImpuTing) Hyper-priors Global Distributions of Observed Data Hyper-parameters Cluster-specific parameters Priors set based on observed lib size distribution Cell-specific parameters Observed gene expression per cell j

Inference Algorithm Parallel Sampling from derived conditional posterior distributions: P(parameter data, other parameters) Estimate hyper-priors based on Data Sample hyperparameters Sampling technical variation parameters Gibbs iterations scaling mean, cov per cell Sampling clusterspecific parameters Sampling assignment of cells to clusters using Chinese Restaurant Process (CRP) Also allows estimating the number of clusters

Genes Cluster-dependent Imputing & Normalizing Cells BISCUIT clusters Observed Data

Genes Cluster-dependent Imputing & Normalizing Cells BISCUIT clusters Observed Data Sort cells into clusters

Genes Cluster-dependent Imputing & Normalizing Cells BISCUIT clusters Observed Data

Genes Cluster-dependent Imputing & Normalizing Cells BISCUIT clusters & parameters Observed Data

Genes Cluster-dependent Imputing & Normalizing Cells BISCUIT clusters & parameters Observed Data Impute & Normalize With a linear transformation

Genes Genes Cluster-dependent Imputing & Normalizing Cells BISCUIT clusters & parameters Cells Observed Data Imputed Data Impute & Normalize With a linear transformation Cluster 1 Cluster 2 Cluster 3

Performance: Testing on neuron single cell data 3005 mouse cortex cells from Zeisel et al., Science 2015 Deep coverage (2 million reads per cell) gives good ground truth for 7 Cell types. No prior information used: selected 558 genes with largest standard deviation across cells F-score: 0.91

Comparing: Biscuit to other methods F-score: 0.91 F-score: 0.79 F-score: 0.5 for 67% of cells F-score: 0.74 F-score: 0.61

Reminder: Breast TIL data before Biscuit tumors Skews data, nonoverlapping cells across tumors Unclear structure of cell types, mostly distinguishes myeloid from lymphoid cells

Breast cancer TIL data after Biscuit 12,000 Cells, ~3000 molecules per cell 4 tumors Most of the tumor specific regions vanish Most of the map includes cells from all 4 tumors

Tregs (FOXP3) Exhausted T-cells Breast cancer TIL data after Biscuit T-cells CD4+ (CD3, CD4) T-cells CD8+ (CD3D, CD8A, CD8B) Bcells (CD79A, CD19) DCs (CD1C) NKs (NKG7) mast cells Monocytes (CD14,CD68) Myeloid (CD14+,CD81+, APOE-) Myeloid (CD14+,CD81+, APOE+)

Patient specific differences in co-variation structure Patient 1 Patient 2 Stronger covariance between ICOS and CTLA4 These are the co-receptors that are targeted by immunotherapy Patient 3 Patient 4 Weaker covariance between OX40 and GITR

Summary for Biscuit We introduce BISCUIT: iteratively clusters and normalizes single-cell RNA-seq data based on different cell types. hierarchical Bayesian mixture model with an efficient Gibbs sampler for inferring cell-specific parameters. imputes dropout gene expression values. We constructed a cell atlas of the tumor immune system: Captured a rich diversity of tumor immune cell types Cancer specific differences in co-receptor patterns that can guide combinatorial immunotherapy (releasing multiple breaks).

Once we have the parts we can ask how these interact Effect of tissue context, organ site, cancer/healthy? What happens in response to drug?

Longitudinal Studies: How does the tumor ecosystem change under drug perturbation? both cell state and cell to cell interactions How does this differ between responders and non-responders? Our measurements are genome-wide, mechanism!!

Acknowledgements Elham Azizi Sandhya Prabhakaran Ambrose Carr Linas Mazutis Jacob Levine Kristy Choi Josh Nainys Manu Setty Vaidas Kiseliovas Rami Eitan Zakary Posewitz Sasha Rudensky (MSKCC) George Plitas Casia Konopac Patients