Knowledge Discovery and Data Mining. Testing. Performance Measures. Notes. Lecture 15 - ROC, AUC & Lift. Tom Kelsey. Notes

Similar documents
Various performance measures in Binary classification An Overview of ROC study

Predictive Models for Healthcare Analytics

Week 2 Video 3. Diagnostic Metrics

When Overlapping Unexpectedly Alters the Class Imbalance Effects

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

BMI 541/699 Lecture 16

METHODS FOR DETECTING CERVICAL CANCER

Biomarker adaptive designs in clinical trials

Sensitivity, Specificity, and Relatives

COMP90049 Knowledge Technologies

Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN

Statistics, Probability and Diagnostic Medicine

Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data

Assessment of performance and decision curve analysis

VU Biostatistics and Experimental Design PLA.216

ROC (Receiver Operating Characteristic) Curve Analysis

Derivative-Free Optimization for Hyper-Parameter Tuning in Machine Learning Problems

Behavioral Data Mining. Lecture 4 Measurement

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Comparing Complete and Partial Classification for Identifying Latently Dissatisfied Customers

Zheng Yao Sr. Statistical Programmer

PERFORMANCE MEASURES

Systematic Reviews and meta-analyses of Diagnostic Test Accuracy. Mariska Leeflang

Evaluation of diagnostic tests

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T.

An Introduction to ROC curves. Mark Whitehorn. Mark Whitehorn

Receiver operating characteristic

Screening (Diagnostic Tests) Shaker Salarilak

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

Diagnostic screening. Department of Statistics, University of South Carolina. Stat 506: Introduction to Experimental Design

Predictive performance and discrimination in unbalanced classification

Analysis of Diabetic Dataset and Developing Prediction Model by using Hive and R

Performance Evaluation of Machine Learning Algorithms in the Classification of Parkinson Disease Using Voice Attributes

Critical reading of diagnostic imaging studies. Lecture Goals. Constantine Gatsonis, PhD. Brown University

SYSTEMATIC REVIEWS OF TEST ACCURACY STUDIES

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Classification with microarray data

Introduction to screening tests. Tim Hanson Department of Statistics University of South Carolina April, 2011

Introduction to ROC analysis

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

Characterization of a novel detector using ROC. Elizabeth McKenzie Boehnke Cedars-Sinai Medical Center UCLA Medical Center AAPM Annual Meeting 2017

Example - Birdkeeping and Lung Cancer - Interpretation. Lecture 20 - Sensitivity, Specificity, and Decisions. What do the numbers not mean...

Predicting Breast Cancer Survivability Rates

Comparing Multifunctionality and Association Information when Classifying Oncogenes and Tumor Suppressor Genes

ROC Curve. Brawijaya Professional Statistical Analysis BPSA MALANG Jl. Kertoasri 66 Malang (0341)

Machine learning II. Juhan Ernits ITI8600

Weight Adjustment Methods using Multilevel Propensity Models and Random Forests

Meta-analysis of diagnostic research. Karen R Steingart, MD, MPH Chennai, 15 December Overview

Meta-analysis of diagnostic test accuracy studies with multiple & missing thresholds

Diagnostic Test. H. Risanto Siswosudarmo Department of Obstetrics and Gynecology Faculty of Medicine, UGM Jogjakarta. RS Sardjito

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017

It s hard to predict!

Personalized Colorectal Cancer Survivability Prediction with Machine Learning Methods*

S4. Summary of the GALNS assay validation. Intra-assay variation (within-run precision)

Module Overview. What is a Marker? Part 1 Overview

Selection and Combination of Markers for Prediction

Chapter 10. Screening for Disease

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers

Net Reclassification Risk: a graph to clarify the potential prognostic utility of new markers

Validation of QFracture. Analysis prepared for NICE 2011

Predicting the Effect of Diabetes on Kidney using Classification in Tanagra

A scored AUC Metric for Classifier Evaluation and Selection

Defining and Measuring Recent infection

Individualized Treatment Effects Using a Non-parametric Bayesian Approach

4. Model evaluation & selection

Predicting New Customer Retention for Online Dieting & Fitness Programs

Sensitivity, specicity, ROC

The recommended method for diagnosing sleep

Data Mining and Knowledge Discovery: Practice Notes

Comparing Two ROC Curves Independent Groups Design

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

Part [1.0] Introduction to Development and Evaluation of Dynamic Predictions

Modifying ROC Curves to Incorporate Predicted Probabilities

! Mainly going to ignore issues of correlation among tests

The Potential of Genes and Other Markers to Inform about Risk

Shedding Light on the Role of Sample Sizes and Splitting Proportions in Out-of-Sample Tests: A Monte Carlo Cross-Validation Approach

A Practical Approach for Implementing the Probability of Liquefaction in Performance Based Design

Analysis of Classification Algorithms towards Breast Tissue Data Set

Diagnostic Test of Fat Location Indices and BMI for Detecting Markers of Metabolic Syndrome in Children

ROC Curves. I wrote, from SAS, the relevant data to a plain text file which I imported to SPSS. The ROC analysis was conducted this way:

Systematic reviews of prognostic studies 3 meta-analytical approaches in systematic reviews of prognostic studies

An Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH

Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection

Normal Distribution. Many variables are nearly normal, but none are exactly normal Not perfect, but still useful for a variety of problems.

Chapter 7: Descriptive Statistics

7/17/2013. Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course

Exemplar for Internal Assessment Resource Mathematics and Statistics Level 1 Resource title: Carbon Credits

Predicting Patient Satisfaction With Ensemble Methods

MACHINE LEARNING BASED APPROACHES FOR PREDICTION OF PARKINSON S DISEASE

Biostatistics II

A PRACTICAL APPROACH FOR IMPLEMENTING THE PROBABILITY OF LIQUEFACTION IN PERFORMANCE BASED DESIGN

Title:Prediction of poor outcomes six months following total knee arthroplasty in patients awaiting surgery

Computer Models for Medical Diagnosis and Prognostication

Estimation of Area under the ROC Curve Using Exponential and Weibull Distributions

Transcription:

Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-17-AUC 13 March 2015 1 Testing A useful tool for investigating model performance is the confusion matrix: y = 0 y = 1 ŷ = 0 a b ŷ = 1 c d Contains quantities for the correct prediction of class 0, correct prediction of class 1, and the two ways you may have made incorrect predictions. Tom Kelsey ID5059-17-AUC 13 March 2015 2 Performance Measures a + d Accuracy a + b + c + d d Precision b + d d Recall (TP) Sensitivity c + d a True negative Specificity a + b b False positive a + b c False negative c + d Tom Kelsey ID5059-17-AUC 13 March 2015 3

Receiver-Operator Characteristics ROC curves For continuous data with variable cutoff points for the classification Obese Y/N based on BMI, age, etc. Cancerous based on percent of abnormal tissue in a slide Given a tree, some test data and a confusion matrix, it s easy to generate a point on a ROC chart x-axis is FP rate, y-axis is TP rate This point depends on a probability threshold for the classification Varying this threshold will change the confusion matrix, giving more points on the chart Use this to tune the model w.r.t FP and TP rates Tom Kelsey ID5059-17-AUC 13 March 2015 4 Example Goldstein and Mushlin (J. Gen. Intern. Med. 1987 2 20-24) Tom Kelsey ID5059-17-AUC 13 March 2015 5 Example Tom Kelsey ID5059-17-AUC 13 March 2015 6

Example Tom Kelsey ID5059-17-AUC 13 March 2015 7 Example Tom Kelsey ID5059-17-AUC 13 March 2015 8 Effect of Thresholding How the balance between TP, TN, FP and FN changes: Tom Kelsey ID5059-17-AUC 13 March 2015 9

Area Under Curve The area measures discrimination the ability of the test to classify correctly Useful for comparing ROC curves standard academic banding: 0.90 1.00 = excellent 0.80 0.90 = good 0.86 for the example 0.70 0.80 = fair 0.60 0.70 = poor 0.50 0.60 = fail Computed by trapezoidal estimates (or the curve can be smoothed, then integrated) Tom Kelsey ID5059-17-AUC 13 March 2015 10 Kelsey et al. Tom Kelsey ID5059-17-AUC 13 March 2015 11 Kelsey et al. Tom Kelsey ID5059-17-AUC 13 March 2015 12

Tom Kelsey ID5059-17-AUC 13 March 2015 13 Tom Kelsey ID5059-17-AUC 13 March 2015 14 Tom Kelsey ID5059-17-AUC 13 March 2015 15

Tom Kelsey ID5059-17-AUC 13 March 2015 16 Tom Kelsey ID5059-17-AUC 13 March 2015 17 Tom Kelsey ID5059-17-AUC 13 March 2015 18

The Case For S. Ma & J. Huang Regularized ROC method for disease classification and biomarker selection with microarray data Bioinf. (2005) 21 (24) An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease classification. Thus there is a need for developing statistical methods that can efficiently use such high-throughput genomic data, select biomarkers with discriminant power and construct classification rules. The ROC technique has been widely used in disease classification with low-dimensional biomarkers because (1) it does not assume a parametric form of the class probability as required for example in the logistic regression method; (2) it accommodates case-control designs and (3) it allows treating false positives and false negatives differently. However, due to computational difficulties, the ROC-based classification has not been used with microarray data. Tom Kelsey ID5059-17-AUC 13 March 2015 19 The Case Against J.M. Lobo et al. AUC: a misleading measure of the performance of predictive distribution models Global Ecol. and Biogeog. 17(2); 2008 The... AUC, is currently considered to be the standard method to assess the accuracy of predictive distribution models. It avoids the supposed subjectivity in the threshold selection process, when continuous probability derived scores are converted to a binary presence-absence variable, by summarizing overall model performance over all possible thresholds... We do not recommend using AUC for five reasons: (1) it ignores the predicted probability values and the goodness-of-fit of the model; (2) it summarises the test performance over regions of the ROC space in which one would rarely operate; (3) it weights omission and commission errors equally; (4) it does not give information about the spatial distribution of model errors; and, most importantly, (5) the total extent to which models are carried out highly influences the rate of well-predicted absences and the AUC scores. Tom Kelsey ID5059-17-AUC 13 March 2015 20 Lift Measures the degree to which the predictions of a classification model are better than random predictions. In simple terms lift is the ratio of the correct positive classifications made by the model to the actual positive classifications in the test data For example, if 40% of patients have been diagnosed (the positive classification) in the past, and the model accurately predicts 75% of them, the lift would be 0.75 0.4 = 1.875 Tom Kelsey ID5059-17-AUC 13 March 2015 21

Lift Lift charts for a model can be obtained in a similar manner to ROC charts. For threshold value t x = TP(t) + FP(t) P + N y = TP(t) The AUC of a lift chart is no smaller than the AUC of the ROC curve for the same model As before, we can compare lift charts for competing models, and investigate optimal threshold values Tom Kelsey ID5059-17-AUC 13 March 2015 22 Lift Example Suppose there is have a mailing list of former students, and we want to get money by mailing an elaborate brochure. We have demographic information that we can relate to the response rate. Also, from similar mail-out campaigns, we estimated the baseline response rate at 8%. Sending to everyone would result in a net loss. We build a model based on the data collected. We can select the 10% most likely to respond. If among these the response rate is 16% percent then the lift value due to using the predictive model is 16% / 8% = 2. Analogous lift values can be computed for each percentile of the population. From this we work out the best trade-off between expense and anticipated response. Tom Kelsey ID5059-17-AUC 13 March 2015 23 General chart structure You can think of this as a customer database ordered by predicted probability - as we move from left-to-right we are penetrating deeper in to the database from high ˆp observations to low ˆp observations: Tom Kelsey ID5059-17-AUC 13 March 2015 24

Lift Closely associated with the Pareto Principle 80% of profit comes from 20% of customers. A good model and a lift chart help identify those customers. Tom Kelsey ID5059-17-AUC 13 March 2015 25 Why use these plots? The utility of these charts is hopefully clear: if we had a limited budget we can see what kind of level of response this would buy by targeting the (modelled) most likely responders we can see how much value our model has brought to the problem (compared to a random sample of customers) - in direct monetary terms if costs are included perhaps we can do a smaller campaign, as the returns diminish beyond some percentage of customers targeted we can see where a level of customer targeting becomes unprofitable if the costs are known. Tom Kelsey ID5059-17-AUC 13 March 2015 26 Summary Medics and management use ROC, AUC & Lift whenever possible Easy to compute Easy to understand Simple 2D graphical expression of how Model A compares to Model B Plus useful threshold cutoff information Plus important cost-benefit information You are expected to be able to produce ROC curves. You are not expected to be able to produce lift charts, but be able to explain their design and use. Tom Kelsey ID5059-17-AUC 13 March 2015 27