Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010

Similar documents
Introduction to Gene Sets Analysis

Characteriza*on of Soma*c Muta*ons in Cancer Genomes

Paradigm. Some slide sources: Josh Stuart (UCSC) AACR 12 slides Carl Edward Rasmussen (Cambridge) Factor graphs slides.

Structured Association Advanced Topics in Computa8onal Genomics

Understanding Genotype- Phenotype relations in Cancer via Network Approaches

Classifica4on. CSCI1950 Z Computa4onal Methods for Biology Lecture 18. Ben Raphael April 8, hip://cs.brown.edu/courses/csci1950 z/

Gene-microRNA network module analysis for ovarian cancer

StratomeX Visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype Characterization

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Outline. What s inside this paper? My expectation. Software Defect Prediction. Traditional Method. What s inside this paper?

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

Identifying the Zygosity Status of Twins Using Bayes Network and Estimation- Maximization Methodology

Module 3: Pathway and Drug Development

Translational Bioinformatics: Connecting Genes with Drugs

Copy Number Variations and Association Mapping Advanced Topics in Computa8onal Genomics

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser

Results and Discussion of Receptor Tyrosine Kinase. Activation

Bayesian (Belief) Network Models,

Journal: Nature Methods

Bayes Linear Statistics. Theory and Methods

Assessing Functional Neural Connectivity as an Indicator of Cognitive Performance *

Predicting Breast Cancer Recurrence Using Machine Learning Techniques

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

From mirna regulation to mirna - TF co-regulation: computational

Common Data Elements: Making the Mass of NIH Measures More Useful

Package diggitdata. April 11, 2019

Propensity Score. Overview:

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach

Supplementary Materials for

Predicting Kidney Cancer Survival from Genomic Data

Graphical Modeling Approaches for Estimating Brain Networks

Deep Learning Analytics for Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations

Expanded View Figures

Recording ac0vity in intact human brain. Recording ac0vity in intact human brain

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

A Versatile Algorithm for Finding Patterns in Large Cancer Cell Line Data Sets

Computational Approach for Deriving Cancer Progression Roadmaps from Static Sample Data

Identification of Tissue Independent Cancer Driver Genes

A Network Partition Algorithm for Mining Gene Functional Modules of Colon Cancer from DNA Microarray Data

A probabilistic method for food web modeling

Session 4 Rebecca Poulos

CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL

Bayesian Latent Subgroup Design for Basket Trials

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Learning Deterministic Causal Networks from Observational Data

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

Nature Methods: doi: /nmeth.3115

Package pathifier. August 17, 2018

Integration of high-throughput biological data

Simultaneous Identification of Multiple Driver Pathways in Cancer

56:134 Process Engineering

Sets, Logic, and Probability As Used in Decision Support Systems

Application of Tree Structures of Fuzzy Classifier to Diabetes Disease Diagnosis

DawnRank: discovering personalized driver genes in cancer

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

A review of approaches to identifying patient phenotype cohorts using electronic health records

An Efficient Attribute Ordering Optimization in Bayesian Networks for Prognostic Modeling of the Metabolic Syndrome

Mul$ Voxel Pa,ern Analysis (fmri) Mul$ Variate Pa,ern Analysis (more generally) Magic Voxel Pa,ern Analysis (probably not!)

INTEGRATION OF MULTI-PLATFORM HIGH-DIMENSIONAL OMIC DATA

The power of single cells: Building a tumor immune atlas Dana Pe er Department of Biological Science Department of Systems Biology Columbia

Decades of cancer research have demonstrated that. Computational approaches for the identification of cancer genes and pathways

Table of content. -Supplementary methods. -Figure S1. -Figure S2. -Figure S3. -Table legend

Data Analysis Using Regression and Multilevel/Hierarchical Models

Numerical Integration of Bivariate Gaussian Distribution

Nature Genetics: doi: /ng Supplementary Figure 1

Classification and Predication of Breast Cancer Risk Factors Using Id3

Multiplexed Cancer Pathway Analysis

Neurons and neural networks II. Hopfield network

Network-assisted data analysis

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

Package xseq. R topics documented: September 11, 2015

MolEcular Taxonomy of BReast cancer International Consortium (METABRIC)

Comparing Multifunctionality and Association Information when Classifying Oncogenes and Tumor Suppressor Genes

CS 4365: Artificial Intelligence Recap. Vibhav Gogate

ASMS 2015 ThP 459 Glioblastoma Multiforme Subtype Classification: Integrated Analysis of Protein and Gene Expression Data

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

Genomic analysis of essentiality within protein networks

Using CART to Mine SELDI ProteinChip Data for Biomarkers and Disease Stratification

Integrative analysis of survival-associated gene sets in breast cancer

Sbarrato_Supplementary_Fig1

MethylMix An R package for identifying DNA methylation driven genes

Section D: The Molecular Biology of Cancer

Micro-RNA web tools. Introduction. UBio Training Courses. mirnas, target prediction, biology. Gonzalo

Understanding DNA Copy Number Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Computational Investigation of Homologous Recombination DNA Repair Deficiency in Sporadic Breast Cancer

MODEL-BASED CLUSTERING IN GENE EXPRESSION MICROARRAYS: AN APPLICATION TO BREAST CANCER DATA

Network-based pattern recognition models for neuroimaging

Transcript reconstruction

Biosta's'cs Board Review. Parul Chaudhri, DO Family Medicine Faculty Development Fellow, UPMC St Margaret March 5, 2016

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH

Comparative Effectiveness Research (CER) and Personalized Medicine: Policy, Science, and Business

A Trial Implementa.on of a High Density Health Informa.on Exchange Standard: Are We Ready for Coordinated Care in High Impact Condi.ons?

Mapping evolutionary pathways of HIV-1 drug resistance using conditional selection pressure. Christopher Lee, UCLA

Patient networks! in cancer:! a platform for data integration

Identification of Causal Genetic Drivers of Human Disease through Systems-Level Analysis of Regulatory Networks

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection

Title: Pathway-Based Classification of Cancer Subtypes

SUPPLEMENTARY MATERIAL

RchyOptimyx: Gating Hierarchy Optimization for Flow Cytometry

Transcription:

Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010 C.J.Vaske et al. May 22, 2013 Presented by: Rami Eitan

Complex Genomic Rearrangements Cancer tissue experience molecular changes Varied genomic data available copy number variations mutations, gene expression Stratification of cancers can improve: diagnosis prognosis risk assessment response to treatment

Complex Genomic Rearrangements Genetic alterations differ between patients Pathways often are common

Pathways What is a pathway? Figure : The P53 pathway

Pathways A set of interactions between entities, logically grouped together around a biological process. Protein-coding genes, small molecules, complexes, gene families, abstract processes Available databases: Reactome, KEGG, NCI

Motivation Integra+ve analysis of cancer genome data Copy number varia+ons, gene expressions Leverage pathway informa+on to find frequently occurring pathway perturba+ons NCI pathway interac+on database, KEGG etc.

Observed Data Figure : Gene expression Figure : Copy number

Motivation Pathway informa+on contains informa+on on how genes are supposed to behave

Input Infer integrated pathway activity (IPA) Produce a matrix A. A ij is the inferred activity of entity i in patient j

PARADIGM

Factor graph Factor graph is a probabilistic graphical model. Variables, factors. Figure : A simple factor graph

PARADIGM Model Factor graph representa+on of various en++es corresponding to a single gene

PARADIGM Model: Gene Interactions

A factor graph for a pathway PARADIGM Model:

Model Specification Convert an NCI pathway into a factor graph NCI pathway to Bayesian network Directed network Each variable takes values of - 1 (de- ac+va+on), 0 (normal), 1 (ac+va+on) mrna: over expression for ac+va+on Copy number varia+ons: more than two copies for ac+va+ons Probability distribu+on of each node Labeled edges for posi+ve/nega+ve interac+ons Set the value of the child node as weighted votes from its parents

Model Specification Conver+ng the Bayesian network to a factor graph Assign a factor to each group of variables consis+ng of a node and its parents Z: normaliza+on constant ε = 0.001

Inference Observed variables: copy number variations, gene expressions Unobserved variables: protein, protein activity, overall pathway activity state Learn models with EM algorithm E step: Infer the probabilities of the unobserved variables M step: Change parameters to to maximize the likelihood given the probabilities

Expectation Maximization Figure : EM algorithm

Log-likelihood Ratio Test Test sta+s+c for assessing en+ty i s ac+vity given data D The probabili+es can be obtained by performing inference on the factor graph

Significance assessment Permutate the labels of the observed data Within permutation: choosing random genes from the same pathway Any permutation: choosing any random genes 1000 permutations of each type are used to determine null distribution

Decoy paths Create decoy paths by replacing genes with random genes Maintain the same structure All complexes and abstract processes remain the same

Log-likelihood Ratio Test Aggrega+ng over mul+ple values en+ty i takes

Dataset Breast cancer copy number and gene expression data TCGA Glioblastoma copy number and gene expression data Pathways from NCI pathway interac+on database (PID)

Results - breast cancer Breast Cancer dataset: 56172 IPA s (7%) found to be significantly higher 497 significant entities per patient on average 103 out of 127 pathways had at least one entity altered in 20% or more of the patients

Results - GBM GBM dataset: 141682 IPA s (9%) found to be significantly higher 616 significant entities per patient on average 110 out of 127 pathways had at least one entity altered in 20% or more of the patients

EM Convergence Original data vs. permuted data Red: real data Green: permuted data

Results - decoy paths Distinguishing decoy from real pathways Figure : PARADIGM vs SPIA: FP rate

Results - decoy paths Distinguishing decoy from real pathways Breast cancer AUC: PARADIGM: 0.669 SPIA: 0.602 GBM AUC: PARADIGM: 0.642 SPIA: 0.604

Top PARADIGM Pathways of Breast Cancer

Top PARADIGM Pathways of Glioblastoma

Glioblastoma Subtypes

Survival Rates for Each Subtypes

Results - Patient vs permutation Figure : Patient vs permuted IPA s

Results - Patient vs permutation Figure : Patient vs permuted IPA s. Source: Broad Institute/Dana-Farber Cancer Institute/Harvard Medical School

Summary PARADIGM integrates different types of data, including gene- expression, copy number varia+on, and pathway database, in order to infer pathway ac+vi+es for individual cancer pa+ents. Factor graph model for represen+ng pathway and modeling datasets Pathway ac+vi+es inferred by PARADIGM can be used to iden+fy cancer subtypes

Questions

Discussion Can the method be successfully expanded to more observed data? Instead of using the pathways as is, can this method be used to find new pathways and interactions?