Bayesian Prediction Tree Models

Similar documents
Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University

Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes

Trans-Study Projection of Genomic Biomarkers in Analysis of Oncogene Deregulation and Breast Cancer

Data analysis and binary regression for predictive discrimination. using DNA microarray data. (Breast cancer) discrimination. Expression array data

Contemporary Classification of Breast Cancer

Introduction to Discrimination in Microarray Data Analysis

Breast cancer classification: beyond the intrinsic molecular subtypes

Triple Negative Breast Cancer

Application of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets

Data Analysis Using Regression and Multilevel/Hierarchical Models

Use of Archived Tissues in the Development and Validation of Prognostic & Predictive Biomarkers

ABS04. ~ Inaugural Applied Bayesian Statistics School EXPRESSION

Outline. Outline. Phillip G. Febbo, MD. Genomic Approaches to Outcome Prediction in Prostate Cancer

Recursive Partitioning Method on Survival Outcomes for Personalized Medicine

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

8/8/2011. PONDERing the Need to TAILOR Adjuvant Chemotherapy in ER+ Node Positive Breast Cancer. Overview

Breast Surgery When Less is More and More is Less. E MacIntosh, MD June 6, 2015

Molecular Enhancement of Sentinel Node Evaluation

In-Vitro to In-Vivo Factor Profiling in Expression Genomics. 1.4 The E2F3 Signature in Ovarian, Lung, and Breast Cancer... 19

SYSTEMIC THERAPY OPTIONS FOR BREAST CANCER IN 2014

Basement membrane in lobule.

MODEL-BASED CLUSTERING IN GENE EXPRESSION MICROARRAYS: AN APPLICATION TO BREAST CANCER DATA

Bayesian Inference Bayes Laplace

Statistical Tolerance Regions: Theory, Applications and Computation

Bayesian Inference for Single-cell ClUstering and ImpuTing (BISCUIT) Elham Azizi

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Multigene Testing in NCCN Breast Cancer Treatment Guidelines, v1.2011

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Predicting clinical outcomes in neuroblastoma with genomic data integration

Design for Targeted Therapies: Statistical Considerations

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Wan-Ting, Lin 2017/11/21. JAMA Intern Med. doi: /jamainternmed Published online August 21, 2017.

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach

Chapter 1. Introduction

BayesRandomForest: An R

Cross-Study Projections of Genomic Biomarkers: An Evaluation in Cancer Genomics

Molecular Characterization of Breast Cancer: The Clinical Significance

P. RICHARD HAHN. Research areas. Employment. Education. Research papers

Breast cancer: Molecular STAGING classification and testing. Korourian A : AP,CP ; MD,PHD(Molecular medicine)

RNA preparation from extracted paraffin cores:

2017 Breast Cancer Update

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

BAYESIAN ANALYSIS IN CANCER PATHWAY STUDIES AND PROBABILISTIC PATHWAY ANNOTATION

Biostatistical modelling in genomics for clinical cancer studies

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

Nature Genetics: doi: /ng Supplementary Figure 1. Rates of different mutation types in CRC.

Statistical Issues in Translational Cancer Research

Genomic Profiling of Tumors and Loco-Regional Recurrence

LETTERS. Oncogenic pathway signatures in human cancers as a guide to targeted therapies

MRI-Based Biomarkers of Therapeutic Response in Triple-Negative Breast Cancer

Impact of Prognostic Factors

Mammographic density and risk of breast cancer by tumor characteristics: a casecontrol

MicroRNA expression profiling and functional analysis in prostate cancer. Marco Folini s.c. Ricerca Traslazionale DOSL

OPERATIONAL RISK WITH EXCEL AND VBA

Integrative analysis of survival-associated gene sets in breast cancer

Pathology of Inflammatory Breast Cancer (IBC) A rare tumor

Molecular in vitro diagnostic test for the quantitative detection of the mrna expression of ERBB2, ESR1, PGR and MKI67 in breast cancer tissue.

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Ecological Statistics


Advanced IPD meta-analysis methods for observational studies

Moving from the Means to the Standard Deviations in Symptom Management Research

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

An Integrative Analysis of Cancer Gene Expression Studies using Bayesian Latent Factor Modeling

Should we still be performing IHC on all sentinel nodes?

Janet E. Dancey NCIC CTG NEW INVESTIGATOR CLINICAL TRIALS COURSE. August 9-12, 2011 Donald Gordon Centre, Queen s University, Kingston, Ontario

BayesOpt: Extensions and applications

FISH mcgh Karyotyping ISH RT-PCR. Expression arrays RNA. Tissue microarrays Protein arrays MS. Protein IHC

SUPPLEMENTARY APPENDIX

Carcinome du sein Biologie moléculaire. Thomas McKee Service de Pathologie Clinique Genève

Breast Cancer Basics. Clinical Oncology for Public Health Professionals. Ben Ho Park, MD, PhD

Table S2. Expression of PRMT7 in clinical breast carcinoma samples

Prediction of Lymph Node Involvement in Patients with Breast Tumors Measuring 3 5 cm in a Middle-Income Setting: the Role of CancerMath

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

Breast Cancer: Who Gets It? Who Survives? The Latest Information

Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang

National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2009 Formula Grant

Speaker s Bureau. Travel expenses. Advisory Boards. Stock. Genentech Invuity Medtronic Pacira. Faxitron. Dune TransMed7 Genomic Health.

Post Neoadjuvant therapy: issues in interpretation

10/15/2012. Biologic Subtypes of TNBC. Topics. Topics. Histopathology Molecular pathology Clinical relevance

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Biomarker adaptive designs in clinical trials

Bayesian and Classical Approaches to Inference and Model Averaging

Lecture Outline Biost 517 Applied Biostatistics I

Bayesian Nonparametric Methods for Precision Medicine

Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:

Advanced Bayesian Models for the Social Sciences

Biomarkers in oncology drug development

Big Image-Omics Data Analytics for Clinical Outcome Prediction

An Introduction to Bayesian Statistics

Bayesian Joint Modelling of Benefit and Risk in Drug Development

Individual Participant Data (IPD) Meta-analysis of prediction modelling studies

Landmarking, immortal time bias and. Dynamic prediction

Creating prognostic systems for cancer patients: A demonstration using breast cancer

Colon cancer subtypes from gene expression data

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

10CS664: PATTERN RECOGNITION QUESTION BANK

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics

Transcription:

Bayesian Prediction Tree Models

Statistical Prediction Tree Modelling for Clinico-Genomics Clinical gene expression data - expression signatures, profiling Tree models for predictive sub-typing Combining clinical + molecular data Gene discovery and prioritisation Breast cancer prognosis - clinical testing

Lymph Node Metastasis - Key Breast Cancer Risk Factor - But... lymph node dissection carries morbidity, inaccuracy

Gene Subsets Associated with Axillary Lymph Node Status Gene Patient/Breast Tumour Sample

Expression Signatures - Metagenes Multiple related genes: multiplicities, redundancies idiosyncratic noise, variability Aggregate Metagene summaries: characterize patterns noise reduction in estimating common signal improve statistical efficiency, robustness multiple small clusters of genes

Metagenes Characterising Factors Y=A(M)+E Dimension reduction: Signal improvement Clustering, k-means Empirical or model-based factor analysis (Latent) factors in graphical models Related/intersecting clusters West, 2003, Bayesian Statistics 7 Reclustering, Covariance selection & Graphical Models software tools.. MetageneCreator (Dobra, West, 2004, forthcoming)

Atherosclerosis Metagenes - Disease Susceptibility -

ER Metagenes in Breast Cancers 79 genes: ER regulated: Tff1, bcl-1,2,.. 27 genes ESR1 39 genes other co-regulated, androgens,.. 49 samples - Huang et al, Duke PNAS 2001

ER Metagenes in Breast Cancers 158 samples - Pittman et al, Duke/KFSYS PNAS 2004

Lymph Node Metastasis Metagenes Classification Trees Models: - Lymph node outcomes - Recurrence outcomes Cross-validation assessment Predictive classification Huang et al, Lancet 2003

Binary Tree Models

Tree Models: Classification via Recursive Partitioning Non-linear: interactions of covariates Natural in contexts of clinical prognosis Many trees: Model & prediction uncertainty Prospective or retrospective studies

Binary Outcomes: Y=0/1 - Retrospective Sampling (Case-Control Control studies) - π = Pr(Y = 1 x 1 τ1,...,x k τ k )

Binary Outcomes: Prospective Inference from Retrospective Model π = Pr(Y = 1 x τ,...,x 1 1 k τ k ) a 1 y = Pr (x1 τ1 Y = y), y = 0, 1 a 2 y = Pr(x2 τ 2 x1 τ1, Y = y), y = 01, π = Pr(Y = 1) a 11, a 2, 1 1 π Pr(Y = 0 ) a a 1, 0 2, 0

Binary Outcomes: - Retrospective Model - Model conditional CDFs for predictors F(x x τ,...,x τ Y = k k k, 1 1 y) Want consistency as thresholds vary Nonparametric Bayes: Dirichlet model Modelling in x space joint structure Implies Beta priors on a ky

Dirichlet Process: Beta Margins F(x k -,y=1) a k1 a k1 * x k τ k * τ k

Assessing Predictor:Threshold Pairs for Association with Binary Outcome Any predictor (metagene) x, threshold τ Sample arranged in 2x2 table x τ Y=0 n 00 Y=1 n 01 N 0 Retrospective: column totals fixed x > τ n 10 n 11 N 1 Two Bernoulli sequences: columns with M 0 M 1 success probabilities a 0 & a 1 Beta priors: Conjugate (Pittman et al, Biostatistics 2004) Assess/test: Bayes factor for a 0 a 1 versus a 0 =a 1 Nonlinear association measure: Function of threshold

Growing & Using Binary Trees Node splits: assess candidate predictor:threshold pairs 2x2 table: conservative Bayesian tests Multiple trees: Multiple splits at any node Inference: 1 tree beta posteriors for a iy simulate: impute Pr(Y=1 leaf) Prediction with multiple trees: Average across trees Model uncertainty Smoothing partitions (Pittman et al, Biostatistics 2004)

Predicting ER Status With Metagene Trees Duke PNAS 2001 data Out-of-sample cross validation Probability of ER+ (Pittman et al, Biostatistics 2004)

Predicting Lymph Node Status With Metagenes Out-of-sample cross validation LN+ LN- Probability of LN+ Huang et al, Lancet 2003

Gene Identification Implicated metagenes gene subsets Genes correlated with key metagenes Breast Cancer nodal metastasis: Interferon pathway/inducible gene subset Interferons mediate anti-tumour response Evidence of disfunction of normal anti-tumour response? BRCA1, p27

Duke PNAS 2004 Breast Cancer Data Examples (see Matlab code explorations) Explore ER (0/1) regression using metagene predictors Explore tree models for cancer recurrence Look at nonlinear association with status - for each metagene across the expression (threshold) range Fit forests of binary trees, look at top trees, predictions of samples held-out

Tree Models for Survival Analysis

Prospective Tree Models - Recurrence in Breast Cancer - Survival tree models Prospective Clinical + metagene predictors Survival distributions in subgroups

Challenge: Dissect Heterogeneity Cancer is fundamentally a heterogeneous disease Understanding and dissecting the heterogeneity is key to prognosis and treatment

Clinical Predictors: Low Resolution Phenotypes Lymph node involvement Hormone receptor status Tumor size Nuclear grade Visual/path features

Refining Phenotype: Adding Molecular Information Analysis of the tumor Gene expression DNA copy status (CGH) DNA methylation Protein profiles Metabolic profiles Analysis of the host Genotypes Serum protein profiles Serum metabolic profiles Serum gene expression profiles?

Metagenes are Predictive of Breast Cancer Recurrence Poor survival Good survival Tumors It s genomic, but group prediction, yet another biomarker

A 70 Gene Predictor van de Vijver et al., N. Engl. J. Med. 347: 1999 (2002) Note the heterogeneity of the poor prognosis group

Moving Gene Expression Profiling to the Clinic

Personalised Prediction - Dissecting Heterogeneity - Poor survival Good survival Tumors

Dissecting Heterogeneity Using Multiple Metagenes Mg440 Mg408 Mg109

Subtyping by Multiple Metagenes - Can Improve Risk Discrimination -

Predictive Survival Tree Models (Pittman et al, PNAS 2004) Leaf j: Y ~ Weib(µ j, α) Hierarchical (gamma) priors: p(µ j m) = Ga(µ j c,c/m j ) - mean m j has hyperprior - m j similar for sibling leaves - shrinkage/data sharing Marginal likelihood of tree: p(y Tree,α)=Π leaves j p(y j, α)

Growing & Using Survival Trees Node splits: assess candidate predictor:threshold pairs 2 Weibulls vs 1? Bayes test Multiple trees: Multiple splits at any node Inference: 1 tree Empirical Bayes for m j, α Pareto predictions in leaf Prediction with multiple trees: Average across trees Model uncertainty Simulation (Pittman et al, PNAS 2004)

Forests of Metagene Trees

Clinico-Genomics for Modifying and Refining Risk Stratification 0-3 nodes, high Mg440 Genomic data identifies higher risk patients within clinical low risk group Treatment decisions? Clinical trials? 0-3 nodes 0-3 nodes, low Mg440 4+ nodes

Forests of Clinico-Genomic Trees Select from full sets of clinical and genomics potential predictor variables multiple trees variable subset combinations co-occurrence

Personalised Prognosis: Prediction of Individual Outcomes Patient 158: Tumor size 1.5cm LN=1 ER- Mg440= 0.25 Mg20= -.04 etc

Predictions: Uncertainty & Assessment Out-of-sample cross validation Recur+ Recur-

Personalised Prognosis

Biology behind Metagenes Erb-b2/Her-2 nu metagene

Gene Identification - Biological Signatures - Erb-B2/Her2-nu metagene Several ER metagenes Recurrence metagenes: Myc oncogene, E2Fs Lymph node & recurrence predictors - metagenes predictive of lymph node status surrogate for node counts - BRCA1, p27 - Interferon inducible genes

Some Current Related Studies Validation studies Combining data - Multi-center studies Ethnic variation in genomic patterns Treatment trials - Improved risk stratification Treatment response predictors Patterns related to genes involved in cell proliferation - > improved biological metagene characterisations > refined subtyping : Myc, Ras, Rb/E2F,, metagenes (Nature Genetics 2003, Cancer Research 2003)

Signaling Pathways That Control Cell Proliferation Huang et al, Nature Genetics 2003 Rb+/+ Rb-/-

An Rb Metagene Predicts Loss of Rb Function in Tumors Human Rb+/- Rb-/- Rb-/- Retinoblastoma Mouse Rb+/- Pituitary tumors Thyroid tumors Black et al, Cancer Research 2003

Potential for Improved Characterisation and Prediction 158 52 Mg440 61 97 28 62 Oncogene pattern ER 45 16 29 68 13 56 36 69 reflecting multiple interacting aspects of tumour genomics and also potential to aid in identification of therapeutic targets

Pathway Predictions and MetaPathways E2F1 E2F2 E2F3 Rb Ras Myc Tumor

Making Use of Pathway Predictions E2F1 E2F2 E2F3 Rb Ras Myc Tumor

Using Pathway Information to Guide Treatment? Metastatic Breast Cancer Patients Biopsy of metastastic tumour Treat with Drug A Response Non-response Pathway Prediction -Ras Pathway Specific Drug

Current Foci - Statistics & Computation - Model refinements smooth partition trees; aggregation of predictors Stochastic search MCMC analysis, and search & annealing to explore space of candidate trees, and posterior over trees Cluster/distributed computing Defining and evaluating patterns improved metagene models: factor methods, large-scale graphical models, association network

Large-scale graphical model search and evaluation Inference on large, sparse inverse covariance matrix (Dobra et al, JMVA 2004)

Duke University Joseph Nevins Erich Huang Jennifer Pitman Holly Dressman Adrian Dobra Penni Black Quanli Wang Guang Yao Chris Hans Carlos Carvalho Molecular Genetics Statistics Cancer Genomics Computational & Applied Genomics Program Koo Foundation-Sun Yat Sen Cancer Center Andrew Huang, Skye Cheng, Mei-Hua Tsou