BREAST CANCER EPIDEMIOLOGY MODEL:

Similar documents
An Improved Algorithm To Predict Recurrence Of Breast Cancer

Predicting Breast Cancer Survivability Rates

Mammogram Analysis: Tumor Classification

Predicting Sleep Using Consumer Wearable Sensing Devices

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

J2.6 Imputation of missing data with nonlinear relationships

Model-free machine learning methods for personalized breast cancer risk prediction -SWISS PROMPT

Classification of benign and malignant masses in breast mammograms

Stage-Specific Predictive Models for Cancer Survivability

Applying One-vs-One and One-vs-All Classifiers in k-nearest Neighbour Method and Support Vector Machines to an Otoneurological Multi-Class Problem

AClass: A Simple, Online Probabilistic Classifier. Vikash K. Mansinghka Computational Cognitive Science Group MIT BCS/CSAIL

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

Improved Intelligent Classification Technique Based On Support Vector Machines

CLASSIFICATION OF BREAST CANCER INTO BENIGN AND MALIGNANT USING SUPPORT VECTOR MACHINES

CSE Introduction to High-Perfomance Deep Learning ImageNet & VGG. Jihyung Kil

TRIPODS Workshop: Models & Machine Learning for Causal I. & Decision Making

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH

Lecture 13: Finding optimal treatment policies

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Biomarker adaptive designs in clinical trials

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

MS&E 226: Small Data

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:

Mammogram Analysis: Tumor Classification

Applying Data Mining for Epileptic Seizure Detection

10CS664: PATTERN RECOGNITION QUESTION BANK

Comparing Multifunctionality and Association Information when Classifying Oncogenes and Tumor Suppressor Genes

Automatic Detection of Epileptic Seizures in EEG Using Machine Learning Methods

Prediction Models of Diabetes Diseases Based on Heterogeneous Multiple Classifiers

Analysis of Classification Algorithms towards Breast Tissue Data Set

Evaluating Classifiers for Disease Gene Discovery

Classification of Mammograms using Gray-level Co-occurrence Matrix and Support Vector Machine Classifier

In-hospital Intensive Care Unit Mortality Prediction Model

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis

Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems

Detection of Glaucoma and Diabetic Retinopathy from Fundus Images by Bloodvessel Segmentation

Ch.20 Dynamic Cue Combination in Distributional Population Code Networks. Ka Yeon Kim Biopsychology

Predictive performance and discrimination in unbalanced classification

Sound Texture Classification Using Statistics from an Auditory Model

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction

Nonlinear, Nongaussian Ensemble Data Assimilation with Rank Regression and a Rank Histogram Filter

Computer-Aided Diagnosis for Microcalcifications in Mammograms

Supersparse Linear Integer Models for Interpretable Prediction. Berk Ustun Stefano Tracà Cynthia Rudin INFORMS 2013

COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION

Classification with microarray data

Derivative-Free Optimization for Hyper-Parameter Tuning in Machine Learning Problems

An Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework

Detection of Heart Disease using Binary Particle Swarm Optimization

Large-Scale Statistical Modelling via Machine Learning Classifiers

Gender Based Emotion Recognition using Speech Signals: A Review

Using AUC and Accuracy in Evaluating Learning Algorithms

International Journal of Advance Engineering and Research Development A THERORETICAL SURVEY ON BREAST CANCER PREDICTION USING DATA MINING TECHNIQUES

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Classification of Thyroid Nodules in Ultrasound Images using knn and Decision Tree

Time-to-Recur Measurements in Breast Cancer Microscopic Disease Instances

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Breast Cancer Diagnosis Based on K-Means and SVM

Effect of Feedforward Back Propagation Neural Network for Breast Tumor Classification

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Methods and Limitations Overview

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

3. Model evaluation & selection

SVM-Kmeans: Support Vector Machine based on Kmeans Clustering for Breast Cancer Diagnosis

Sensory Cue Integration

Introduction to Epidemiology Screening for diseases

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012

ITERATIVELY TRAINING CLASSIFIERS FOR CIRCULATING TUMOR CELL DETECTION

7.1 Grading Diabetic Retinopathy

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang

Utilizing Posterior Probability for Race-composite Age Estimation

Ductal Carcinoma-in-Situ: New Concepts and Controversies

Learning from data when all models are wrong

Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors

Introduction to Computational Neuroscience

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

Contents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network

Critical reading of diagnostic imaging studies. Lecture Goals. Constantine Gatsonis, PhD. Brown University

Performance of Median and Least Squares Regression for Slightly Skewed Data

NAÏVE BAYESIAN CLASSIFIER FOR ACUTE LYMPHOCYTIC LEUKEMIA DETECTION

Machine learning for neural decoding

Application of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets

EECS 433 Statistical Pattern Recognition

Chapter 1. Introduction

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer

BayesOpt: Extensions and applications

Brain Tumour Detection of MR Image Using Naïve Beyer classifier and Support Vector Machine

Personalized Colorectal Cancer Survivability Prediction with Machine Learning Methods*

Transcription:

BREAST CANCER EPIDEMIOLOGY MODEL: Calibrating Simulations via Optimization Michael C. Ferris, Geng Deng, Dennis G. Fryback, Vipat Kuruchittham University of Wisconsin 1

University of Wisconsin Breast Cancer Simulation Model Purpose Use detailed individual-woman level discrete event simulation of processes Breast cancer natural history Breast cancer detection Breast cancer treatment Non-breast cancer mortality among US women To replicate historical breast cancer surveillance data, 1975-2 2

How to screen even more! 3

History of Breast Cancer Incidence 1975-2 In Situ Inc./1K pop. Localized Inc./1K pop. 7 6 5 SEER 4 3 2 1 1975 1985 1995 Year Regional Inc./1K pop. WCRS 16 14 12 1 8 6 4 2 1975 1985 1995 Year Distant Inc./1K pop. 9 8 7 6 5 4 3 2 1 1975 1985 1995 Year 18 16 14 12 1 8 6 4 2 1975 1985 1995 Year 4

Natural History Breast cancer occult onset rate, proportional to background incidence but lagged in time. Gompertzian growth from 2mm to 8 cm Gamma distribution of initial growth rates Fit mean and variance of this distribution 5

Breast Cancer Detection Screening sensitivity: probability of detection at screening as function of Size of the tumor Age of the woman ( 5 yr, > 5 yr) Year in which she is being screened Non-screen detection ( clinical surfacing ) annual probability as function of Size of the tumor Year (1975 2) 6

Mammogram Sensitivity Clinical Surfacing 1 1.9.9 Prob.8.7.6.5.4.3.2.1 <5 (1984) 5+ (1984) <5 (2) 5+ (2) 2 4 6 8 Size (cm) Prob.8.7.6.5.4.3.2.1 199 2 2 4 6 8 1 Size (cm) 7

Non-natural natural history parameters taken as fixed Mammography dissemination model By year of simulation (1975-2) By age of simulated woman Breast cancer incidence in absence of screening. Age-period-cohort model incidence fixed input Calibrate occult onset to this Calibrate an average lag in time between onset and incidence. Non-breast cancer mortality 8

The Problem Determine values for free parameters that make simulation closely track observations (2): optimize Develop a model of where good parameters are: approximate Develop biological insight by understanding model of good parameters: inference 9

Acceptance Sampling used to fit model parameters 1. Specify simulation model as function of randomly sampled free parameters. 2. Compute implied 25-year incidence & mortality curves 3. Determine fit to observed data using acceptance envelopes. repeat this tens of thousands of times to find best fitting parameters (use large scale parallel computing to accomplish) Acceptance sampling also produces joint posterior distribution for all sampled parameters 1

Acceptance Envelopes Upper and lower bound placed around SEER and WCRS stagespecific incidence rate curves to screen for acceptable model-generated curves. Criteria for acceptance envelopes 1. Envelops SEER rates and most of WCRS (biased toward SEER) 2. Penalizes simulated curves not having inflections representing stage specific characteristics In situ and localized stages must be increasing (no downturn) Flattening of regional & distant stages over time 3. Width set to encompass 95% variation expected in rates given size of population simulated 11

Envelopes to Score Incidence fit Incidence In Situ Incidence Localized Incidence rate per 1, 7 6 5 4 3 2 1 SEER WCRS 1975 1985 1995 Year Incidence Regional 9 8 7 6 5 SEER 4 3 WCRS 2 1 1975 1985 1995 Year 18 16 14 SEER 12 1 8 6 4 2 WCRS 1975 1985 1995 Year Incidence Distant 18 16 SEER 14 12 1 8 6 4 WCRS 2 1975 1985 1995 Year 12

Objective function Given one replication with fixed input parameters count of number of points at which model output falls outside of envelope worst score = 14 good score 1 best score = Call this f(v) Really a discrete valued rank function Interested in level sets of function 13

Total sampled input parameters governing incidence of breast cancer: Natural history: 1 Sensitivity of mammography: 4-16 Annual surfacing: 2-12 effectively something like 3 parameters 14

1 Parameters sampled Parameter Wide ranges Focused ranges Final value 1. Limited Malignant Potential (LMP) Fraction [% - 55%] (1%) 2. InSituBoundary [.75 1.] (.1) 3. Max LMP Size [InSitu - 1.5 cm] (.1) 4. LMP Dwell Time [1-3] (.5) 5. Onset Proportion.85 1.2 (.1) 6. Onset Lag [1-8] (.5) 7. Percent 4 nodes [ 5%] (1%) 8. Percent 5 nodes [ 5%] (1%) 9. Mean Gamma [.1.2] (.1) 1. Var Gamma [.6.1] (.1) [3% - 5%] (1%) [.85.99] (.1) 42%.95 cm 1 cm 1 cm [1.5-2.5] (.5).8 1. (.1) [1.5-4] (.5) [ 1%] (1%) [2 4%] (1%) [.8.18] (.1) [.1.5] (.1) 2 y.9 3 y 1% 2%.12.12 15

Feasible Point Generation 5, points (v) generated uniformly at random Using CONDOR (12 machines) can evaluate approximately 1 per day f(v) involves simulation of 3 million women 363 are feasible points (in L(1)) Can we optimize? Can we characterize these points? 16

Finding Feasible Points First sample from wide ranges on all parameters in the model, but holding LMP% 1%. Result of 29, sampled input parameter vectors: Frequency 4 3 2 1 LMP -1% (Wide ranges) Sample 43, vectors with 1%<LMP%<55%: 1 2 3 4 5 6 7 score LMP 11-55% (Wide ranges) 8 9 1 7 Conclude: good solutions (score 1) are very rare Conclude: rule out LMP%<1% Frequency 6 5 4 3 2 1 1 2 3 4 5 score 6 7 8 9 1 17

Now constrain 3%<LMP<5%, and hold other parameters in more focused ranges around our best solution and sample 3, new parameter vectors. 35 3 25 LMP 3-5% (Focused ranges) Result is mostly poor solutions, but 363 parameter vectors have good scores. Frequency 2 15 1 5 1 2 3 4 5 score 6 7 8 9 1 Conclude: good solutions are rare Distribution of score 1-1 (Focused ranges) Question: do good solutions occur in relatively compact regions of parameter space? Frequency 7 6 5 4 3 Best 2 1 2 4 6 score 8 1 18

Study 2) Neighborhoods around Good and Poor Solutions Process: Pick starting input vector and sample 5 new input parameter sets in tight neighborhood around the starting one to see whether there is a good solution nearby. 8 6 InitScr = 2 InitScr = 5 InitScr = 8 Three good solutions 4 Frequency 2 8 6 InitScr = 17 InitScr = 2 InitScr = 3 Three poor solutions 4 2 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Score Conclusion: When we start with a bad input parameter set and vary parameters around it, we cannot find a good solution. Good solutions appear around good solutions, but are still rare. 19

Stochastic Variation Given fixed input parameters there is still stochastic variation of breast cancer incidence and mortality Each replication is 1 alternative history of breast cancer incidence and mortality over years 1975-2 for population size of Wisconsin Score each replication Multiple replications reduce feasible pts Get histogram of scores 2

3 replications of best model (95% intervals) In Situ Localized Regional Incidence per 1, 7 6 5 4 3 2 1 1975 1985 1995 18 16 14 12 1 8 6 4 2 1975 1985 1995 9 8 7 6 5 4 3 2 1 1975 1985 1995 Year Year Year Distant Breast BC mortality Cancer Mortality rate Acceptance scores Incidence per 1, 18 16 14 12 1 8 6 4 2 1975 1985 1995 Year 6 5 4 3 2 1 1975 1985 1995 Year Frequency 7 6 5 4 3 2 1 2 4 6 8 1 score 1 2 3 4 5 6 7 8 9 1 21

Process Split data into training and testing set Solve optimization problem over training set to generate classifier Validate using testing set Classifers: SVM knn (nearest neighbour) C45 (decision tree) Boosting (multiple classifiers then vote) Only simulate points classified as good 22

Linear Classifier w v w= í +1 +1-1 v w= í à 1 23

24

25

26

27

Classifier Kernel generates nonlinear separator Linear, Polynomial, Gaussian 28

Nonlinear Support Vector Machine Tradeoff errors and margin Can interpret this as columns of A indexing basis functions (reduced SVM) Feature selection removes columns of A; can be formulated as a (non)linear MIP Large scale (dense) optimization 29

Classifier Evaluation Imbalanced data; many more negative points Accuracy is great by classifying all points as negative! More effective to use TP and TN Can generate classifers with blue or red validations a c t u a l predicted + - + TP FN 1.95.6 - FP TN 1.4.7 3

Cutting planes (2d projections).18 Plot of meangamma v.s. vargamma 4 Plot of lap v.s. onsetprop.17 3.5.16 meangamma.15.14 lag 3 2.5.13.12 2.11.1.15.2.25.3.35.4.45 vargamma 1.5.8.82.84.86.88.9.92.94.96.98 1 onsetprop 31

Training/Testing set refinement TP = 32, TN large! One-sided sampling (Kubat, Matwin) Generate consistent subset of TN (C) using 1NN (from random initial set) Recursively perform Tomek reduction until C is suitable Resampling with replacement Increase size of TP using score weighted probablilites 32

Accuracy (TP/TN) 33

Generation method Generate 1, samples uniformly at random (potential values for v) Use naïve cutting planes from data to remove very poor samples Sequentially generate classifiers using one sided sampling with high estimated TP Remove negative points from sample Some positive points removed Many more negative points removed 864 remaining 788 remaining 34

Reality check Bad news using experts values for remaining 2 parameters only 2% of these evaluate as P+ Good news using Bayesian estimate with 363 + as a prior for remaining 2 parameters gives 65% as P+ Experts agree that Bayes estimate values are good 35

Results (II) Run one more classifier that has TP accuracy of.4 and TN accuracy of.9 Maybe sacrifice some + s Resulting 22 points 195 are P+ Open research: do feature selection to determine which of the 3 parameters are the most important 36

Stochastic function values New dataset with 1 replications at points with scores 3 Update classifiers to use replication data Far fewer points in TP Generation process results in new points (all are good), but 2 of which seem better than the experts best solution Utilize derivative free optimization code or response surface methodology to really optimize 37

Surrogate optimization Use Dace toolbox to generate interpolant Smoothness dependent on number of training values (24, use max) Use Nelder Mead simplex method, or nonlinear linesearch method for optimization of this surrogate New point generated evaluated 3 times 38

What obj values for DACE? Define objective as Average of max Average of mean (trained by max of reps) 5.25 3.62 (trained by mean of reps) 5.45 3.17 (trained by max +.1 mean) (the first function is trained by max and the second is trained by mean) 5.35 3.42 5.37 3.43 39

Function domain/grid Min Max Min grid size range onsetprop.8 1.1.2 meangamma.1.18.1.8 vargamma.1.45.1.35 boundinsitu.85.99.1.14 LMPFract.3.5.1.2 aggr4node.1.1.1 Aggr5Node.2.4.1.2 lag 1.5 4.5 2.5 LMPRegress 1.5 2.5.5 1. 4

Point/refinement tradeoff 12 1 8 6 24 points 4 2 1 2 3 4 5 6 7 8 9 1 12 12 1 1 8 8 6 6 4 4 2 2 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 1 (1 refine) 15 points (19 refine) 41

Representation Describe feasible set: Can generate more members of set Set is not connected understand biological significance of each piece Does a representation of feasible set lead to fuller understanding of flaws in the simulation 42

Feasible level set d1 d2 unit =[d1, d2] d1 d2 unit =[d1, d2] d1 d2 d1 d2 unit =[d1, d2] unit =[d1, d2] 43

Island quality Island 1 97/98 points have values <=1 Island 2 2/49 points have scores <= 1 Island 3 97/17 points have scores <= 1 Island 4 59/69 points have scores <= 1 Island 5 59/61 points have scores <= 1 44

Conclusions Effectively generate parameter settings for complex simulations Feasible sets may form disconnected "islands" in parameter space indicating possible biological switches governing biological behavior of the system or other complex system behaviors of interest. Optimization improves understanding; imbalanced datasets are widespread 45

Key issues Stochastic (expensive) function values Derivatives not (easily) computable Classifiers applied to unequally sized data sets User defined objective without smoothness/connectivity properties Approximation in good regions 46

Implications In 2 44% of in situ breast carcinomas diagnosed were LMP 31% of small localized invasive breast cancers were LMP 3% of all breast cancer survivors alive were LMP These are women who are being treated for a pseudo disease If substantiated, need work to find biological marker for LMP to avoid over treating 47