Receiver operating characteristic

Similar documents
METHODS FOR DETECTING CERVICAL CANCER

Evaluation of diagnostic tests

Various performance measures in Binary classification An Overview of ROC study

A scored AUC Metric for Classifier Evaluation and Selection

Introduction to ROC analysis

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

Knowledge Discovery and Data Mining. Testing. Performance Measures. Notes. Lecture 15 - ROC, AUC & Lift. Tom Kelsey. Notes

Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta GA, USA.

Week 2 Video 3. Diagnostic Metrics

An Introduction to ROC curves. Mark Whitehorn. Mark Whitehorn

4. Model evaluation & selection

Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN

ROC Curve. Brawijaya Professional Statistical Analysis BPSA MALANG Jl. Kertoasri 66 Malang (0341)

3. Model evaluation & selection

Performance Evaluation of Machine Learning Algorithms in the Classification of Parkinson Disease Using Voice Attributes

Comparing Two ROC Curves Independent Groups Design

VU Biostatistics and Experimental Design PLA.216

A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range

Detection Theory: Sensitivity and Response Bias

Statistics, Probability and Diagnostic Medicine

Predicting Breast Cancer Survivability Rates

An Improved Algorithm To Predict Recurrence Of Breast Cancer

An Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Estimation of Area under the ROC Curve Using Exponential and Weibull Distributions

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Derivative-Free Optimization for Hyper-Parameter Tuning in Machine Learning Problems

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

Modifying ROC Curves to Incorporate Predicted Probabilities

ROC (Receiver Operating Characteristic) Curve Analysis

Mammogram Analysis: Tumor Classification

Lecturer: Rob van der Willigen 11/9/08

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Lecturer: Rob van der Willigen 11/9/08

Detection Theory: Sensory and Decision Processes

Machine learning II. Juhan Ernits ITI8600

Computer Models for Medical Diagnosis and Prognostication

EVALUATION AND COMPUTATION OF DIAGNOSTIC TESTS: A SIMPLE ALTERNATIVE

When Overlapping Unexpectedly Alters the Class Imbalance Effects

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers

Efficient AUC Optimization for Information Ranking Applications

The Crowd vs. the Lab: A Comparison of Crowd-Sourced and University Laboratory Participant Behavior

Predictive performance and discrimination in unbalanced classification

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Protein Structure & Function. University, Indianapolis, USA 3 Department of Molecular Medicine, University of South Florida, Tampa, USA

Sensitivity, Specificity, and Relatives

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

Biomarker adaptive designs in clinical trials

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

A NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION

Student Performance Q&A:

Sensitivity, specicity, ROC

Personalized Colorectal Cancer Survivability Prediction with Machine Learning Methods*

STATISTICS AND RESEARCH DESIGN

Chapter 7: Descriptive Statistics

Zheng Yao Sr. Statistical Programmer

Diagnostic tests, Laboratory tests

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T.

Predicting Risk of Drug Use for High School Students Using Artificial Neural Network

SISCR Module 4 Part III: Comparing Two Risk Models. Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

Combining Predictors for Classification Using the Area Under the ROC Curve

Bayes theorem, the ROC diagram and reference values: Definition and use in clinical diagnosis

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

MACHINE LEARNING BASED APPROACHES FOR PREDICTION OF PARKINSON S DISEASE

Influence of Hypertension and Diabetes Mellitus on. Family History of Heart Attack in Male Patients

An Empirical and Formal Analysis of Decision Trees for Ranking

Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION

Christina Martin Kazi Russell MED INF 406 INFERENCING Session 8 Group Project November 15, 2014

Predictive Models for Healthcare Analytics

Mammogram Analysis: Tumor Classification

Discovering Meaningful Cut-points to Predict High HbA1c Variation

Remarks on Bayesian Control Charts

Using Perceptual Grouping for Object Group Selection

COMMITMENT. &SOLUTIONS Act like someone s life depends on what we do. UNPARALLELED

Categorical Perception

2011 ASCP Annual Meeting

PERFORMANCE MEASURES

Chapter 17 Sensitivity Analysis and Model Validation

Learning with Rare Cases and Small Disjuncts

The Problem With Sensitivity and Specificity

Learning Decision Trees Using the Area Under the ROC Curve

A Practical Approach for Implementing the Probability of Liquefaction in Performance Based Design

Imperfect, Unlimited-Capacity, Parallel Search Yields Large Set-Size Effects. John Palmer and Jennifer McLean. University of Washington.

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH

Module Overview. What is a Marker? Part 1 Overview

Overview. Goals of Interpretation. Methodology. Reasons to Read and Evaluate

4 Diagnostic Tests and Measures of Agreement

A PRACTICAL APPROACH FOR IMPLEMENTING THE PROBABILITY OF LIQUEFACTION IN PERFORMANCE BASED DESIGN

Improving k Nearest Neighbor with Exemplar Generalization for Imbalanced Classification

An SVM-Fuzzy Expert System Design For Diabetes Risk Classification

Clinical Decision Analysis

Analysis of Diabetic Dataset and Developing Prediction Model by using Hive and R

Tolerance of Effectiveness Measures to Relevance Judging Errors

Time-to-Recur Measurements in Breast Cancer Microscopic Disease Instances

DETECTING DIABETES MELLITUS GRADIENT VECTOR FLOW SNAKE SEGMENTED TECHNIQUE

Transcription:

Receiver operating characteristic From Wikipedia, the free encyclopedia In signal detection theory, a receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot of the sensitivity, or true positives, vs. (1! specificity), or false positives, for a binary classifier system as its discrimination threshold is varied. The ROC can also be represented equivalently by plotting the fraction of true positives (TPR = true positive rate) vs. the fraction of false positives (FPR = false positive rate). Also known as a Relative Operating Characteristic curve, because it is a comparison of two operating characteristics (TPR & FPR) as the criterion changes. [1] ROC analysis provides tools to select possibly optimal models and to discard suboptimal ones independently from (and prior to specifying) the cost context or the class distribution. ROC analysis is related in a direct and natural way to cost/benefit analysis of diagnostic decision making. The ROC curve was first developed by electrical engineers and radar engineers during World War II for detecting enemy objects in battle fields, also known as the signal detection theory, and was soon introduced in psychology ROC curve of three epitope predictors. to account for perceptual detection of signals. ROC analysis since then has been used in medicine, radiology, and other areas for many decades, and it has been introduced relatively recently in other areas like machine learning and data mining. Contents 1 Basic concept 2 ROC space 3 Curves in ROC space 4 Further interpretations 5 History 6 See also 7 References 7.1 General references 8 Further reading 9 External links Basic concept 1 sur 6 05/04/10 12:13

See also: Type I and type II errors A classification model (classifier or diagnosis) is a mapping of instances into a certain class/group. The classifier or diagnosis result can be in a real value (continuous output) in which the classifier boundary between classes must be determined by a threshold value, for instance to determine whether a person has hypertension based on blood pressure measure, or it can be in a discrete class label indicating one of the classes. Let us consider a two-class prediction problem (binary classification), in which the outcomes are labeled either as positive (p) or negative (n) class. There are four possible outcomes from a binary classifier. If the outcome from a prediction is p and the actual value is also p, then it is called a true positive (TP); however if the actual value is n then it is said to be a false positive (FP). Conversely, a true negative has occurred when both the prediction outcome and the actual value are n, and false negative is when the prediction outcome is n while the actual value is p. To get an appropriate example in a real-world problem, consider a diagnostic test that seeks to determine whether a person has a certain disease. A false positive in this case occurs when the person tests positive, but actually does not have the disease. A false negative, on the other hand, occurs when the person tests negative, suggesting they are healthy, when they actually do have the disease. Let us define an experiment from P positive instances and N negative instances. The four outcomes can be formulated in a 2"2 contingency table or confusion matrix, as follows: actual value p n total Terminology and derivations from a confusion matrix true positive (TP) eqv. with hit true negative (TN) eqv. with correct rejection false positive (FP) eqv. with false alarm, Type I error false negative (FN) eqv. with miss, Type II error sensitivity or true positive rate (TPR) eqv. with hit rate, recall TPR = TP / P = TP / (TP + FN) false positive rate (FPR) eqv. with false alarm rate, fall-out FPR = FP / N = FP / (FP + TN) accuracy (ACC) ACC = (TP + TN) / (P + N) specificity (SPC) or True Negative Rate SPC = TN / N = TN / (FP + TN) = 1! FPR positive predictive value (PPV) eqv. with precision PPV = TP / (TP + FP) negative predictive value (NPV) NPV = TN / (TN + FN) false discovery rate (FDR) FDR = FP / (FP + TP) Matthews correlation coefficient (MCC) F1 score F1 = 2TP / (P + P') Source: Fawcett (2006). prediction outcome p' n' True Positive False Negative False Positive True Negative P' N' ROC space total P N 2 sur 6 05/04/10 12:13

The contingency table can derive several evaluation "metrics" (see infobox). To draw an ROC curve, only the true positive rate (TPR) and false positive rate (FPR) are needed. TPR determines a classifier or a diagnostic test performance on classifying positive instances correctly among all positive samples available during the test. FPR, on the other hand, defines how many incorrect positive results occur among all negative samples available during the test. An ROC space is defined by FPR and TPR as x and y axes respectively, which depicts relative trade-offs between true positive (benefits) and false positive (costs). Since TPR is equivalent with sensitivity and FPR is equal to 1 - specificity, the ROC graph is sometimes called the sensitivity vs (1 - specificity) plot. Each prediction result or one instance of a confusion matrix represents one point in the ROC space. The best possible prediction method would yield a point in the upper left corner or coordinate (0,1) of the ROC space, representing 100% sensitivity (no false negatives) and 100% specificity (no false positives). The (0,1) point is also called a The ROC space and plots of the four prediction examples. perfect classification. A completely random guess would give a point along a diagonal line (the so-called line of no-discrimination) from the left bottom to the top right corners. An intuitive example of random guessing is a decision by flipping coins (head or tail). The diagonal line divides the ROC space in areas of good or bad classification/diagnostic. Points above the diagonal line indicate good classification results, while points below the line indicate wrong results (although the prediction method can be simply inverted to get points above the line). Let us look into four prediction results from 100 positive and 100 negative instances: A B C C' TP=63 FP=28 91 TP=77 FP=77 154 TP=24 FP=88 112 TP=76 FP=12 88 FN=37 TN=72 109 FN=23 TN=23 46 FN=76 TN=12 88 FN=24 TN=88 112 TPR = 0.63 TPR = 0.77 TPR = 0.24 TPR = 0.76 FPR = 0.28 FPR = 0.77 FPR = 0.88 FPR = 0.12 ACC = 0.68 ACC = 0.50 ACC = 0.18 ACC = 0.82 Plots of the four results above in the ROC space are given in the figure. The result A clearly shows the best among A, B, and C. The result B lies on the random guess line (the diagonal line), and it can be seen in the table that the accuracy of B is 50%. However, when C is mirrored onto the diagonal line, as seen in C', the result is even better than A. [C' should not be mirrored across the diagonal line, but rather through the center point. The calculations, above, are correct, and if you plot C' by hand, you will see that it should appear farther to the left and lower down. This proper location in the diagram would still be better than A.] Since this mirrored C method or test simply reverses the predictions of whatever method or test produced the C contingency table, the C method has positive predictive power simply by reversing all of its decisions. When the C method predicts p or n, the C' method would predict n or p, respectively. In this manner, the C' test would perform the best. While the closer a result from a contingency table is to the upper left corner the better it predicts, the distance from the random guess line in either direction is the best indicator of how much predictive power a method has, albeit, if it is below the line, all of its predictions including its more often wrong predictions must be reversed in order to utilize the method's power. Curves in ROC space Discrete classifiers, such as decision tree or rule set, yield numerical values or binary label. Applied to a set of instances, such classifiers yield a single point in the ROC space. Other classifiers, such as naive Bayesian and neural network, produce probability values representing the degree to which class the instance belongs to. For these methods, setting a threshold value will determine a point in the ROC space. For instance, if probability values below or equal to a threshold value of 0.8 are sent to 3 sur 6 05/04/10 12:13

the positive class, and other values are assigned to the negative class, then a confusion matrix can be calculated. Plotting the ROC point for each possible threshold value results in a curve. Further interpretations How an ROC curve can be interpreted Sometimes, the ROC is used to generate a summary statistic. Three common versions are: the intercept of the ROC curve with the line at 90 degrees to the no-discrimination line [citation needed] the area between the ROC curve and the no-discrimination line the area under the ROC curve, or "AUC", or A' (pronounced "a-prime") [2] d' (pronounced "d-prime"), the distance between the mean of the distribution of activity in the system under noise-alone conditions and its distribution under signal-alone conditions, divided by their standard deviation, under the assumption that both these distributions are normal with the same standard deviation. Under these assumptions, it can be proved that the shape of the ROC depends only on d'. The AUC is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. [3] It can be shown that the area under the ROC curve is closely related to the Mann Whitney U, which tests whether positives are ranked higher than negatives. It is also equivalent to the Wilcoxon test of ranks. The AUC is related to the Gini coefficient (G 1 ) by the following formula [4] G 1 + 1 = 2AUC, where: In this way, it is possible to calculate the AUC by using an average of a number of trapezoidal approximations. However, any attempt to summarize the ROC curve into a single number loses information about the pattern of tradeoffs of the particular discriminator algorithm. The machine learning community most often uses the ROC AUC statistic for model comparison. [5] This measure can be interpreted as the probability that when we randomly pick one positive and one negative example, the classifier will assign a higher score to the positive example than to the negative. In engineering, the area between the ROC curve and the no-discrimination line is often preferred, because of its useful mathematical properties as a non-parametric statistic [citation needed]. This area is often simply known as the discrimination. In psychophysics, d' is the most commonly used measure. The illustration at the top right of the page shows the use of ROC graphs for the discrimination between the quality of different epitope predicting algorithms. If you wish to discover at least 60% of the epitopes in a virus protein, you can read out of the graph that about 1/3 of the output would be falsely marked as an epitope. The information that is not visible in this graph is that the person that uses the algorithms knows what threshold settings give a certain point in the ROC graph. Sometimes it can be more useful to look at a specific region of the ROC Curve rather than at the whole curve. It is possible to compute partial AUC. [6] For example, one could focus on the region of the curve with low false positive rate, which is often of prime interest for population screening tests. [7] Another common approach for classification problems in which P << N (common in bioinformatics applications) is to use a logarithmic scale for the x-axis. [8] History The ROC curve was first used during World War II for the analysis of radar signals before it was employed in signal detection theory. [9] Following the attack on Pearl Harbor in 1941, the United States army began new research to increase the prediction of 4 sur 6 05/04/10 12:13

correctly detected Japanese aircraft from their radar signals. In the 1950s, ROC curves were employed in psychophysics to assess human (and occasionally non-human animal) detection of weak signals. [9] In medicine, ROC analysis has been extensively used in the evaluation of diagnostic tests. [10][11] ROC curves are also used extensively in epidemiology and medical research and are frequently mentioned in conjunction with evidence-based medicine. In radiology, ROC analysis is a common technique to evaluate new radiology techniques. [12]. In the social sciences, ROC analysis is often called the ROC Accuracy Ratio, a common technique for judging the accuracy of default probability models. ROC curves also proved useful for the evaluation of machine learning techniques. The first application of ROC in machine learning was by Spackman who demonstrated the value of ROC curves in comparing and evaluating different classification algorithms. [13] See also Constant false alarm rate Detection theory False alarm Gain (information retrieval) References 1. ^ Signal detection theory and ROC analysis in psychology and diagnostics : collected papers; Swets, 1996 2. ^ J. Fogarty, R. Baker, S. Hudson (2005). "Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction" (http://portal.acm.org/citation.cfm?id=1089530). ACM International Conference Proceeding Series, Proceedings of Graphics Interface 2005. Waterloo, Ontario, Canada: Canadian Human-Computer Communications Society. http://portal.acm.org /citation.cfm?id=1089530. 3. ^ Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861 874. 4. ^ Hand, D.J., & Till, R.J. (2001). A simple generalization of the area under the ROC curve to multiple class classification problems. Machine Learning, 45, 171-186. 5. ^ Hanley, JA; BJ McNeil (1983-09-01). "A method of comparing the areas under receiver operating characteristic curves derived from the same cases" (http://radiology.rsnajnls.org/cgi/content/abstract/148/3/839). Radiology 148 (3): 839 843. PMID 6878708 (http://www.ncbi.nlm.nih.gov/pubmed/6878708). http://radiology.rsnajnls.org/cgi/content/abstract/148/3/839. Retrieved 2008-12-03. 6. ^ McClish, Donna Katzman (1989-08-01). "Analyzing a Portion of the ROC Curve" (http://mdm.sagepub.com/cgi/content/abstract /9/3/190). Med Decis Making 9 (3): 190 195. doi:10.1177/0272989x8900900307 (http://dx.doi.org /10.1177%2F0272989X8900900307). PMID 2668680 (http://www.ncbi.nlm.nih.gov/pubmed/2668680). http://mdm.sagepub.com /cgi/content/abstract/9/3/190. Retrieved 2008-09-29. 7. ^ Dodd, Lori E.; Margaret S. Pepe (2003). "Partial AUC Estimation and Regression" (http://www.blackwell-synergy.com/doi/abs /10.1111/1541-0420.00071). Biometrics 59 (3): 614 623. doi:10.1111/1541-0420.00071 (http://dx.doi.org /10.1111%2F1541-0420.00071). PMID 14601762 (http://www.ncbi.nlm.nih.gov/pubmed/14601762). http://www.blackwellsynergy.com/doi/abs/10.1111/1541-0420.00071. Retrieved 2007-12-18. 8. ^ [1] (http://www.soe.ucsc.edu/~karplus/papers/better-than-chance-sep-07.pdf) 9. ^ a b D.M. Green and J.M. Swets (1966). Signal detection theory and psychophysics. New York: John Wiley and Sons Inc.. ISBN 0-471-32420-5. 10. ^ M.H. Zweig and G. Campbell (1993). "Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine". Clinical chemistry 39 (8): 561 577. PMID 8472349 (http://www.ncbi.nlm.nih.gov/pubmed/8472349). 11. ^ M.S. Pepe (2003). The statistical evaluation of medical tests for classification and prediction. New York: Oxford. 12. ^ N.A. Obuchowski (2003). "Receiver operating characteristic curves and their use in radiology". Radiology 229 (1): 3 8. doi:10.1148/radiol.2291010898 (http://dx.doi.org/10.1148%2fradiol.2291010898). PMID 14519861 (http://www.ncbi.nlm.nih.gov /pubmed/14519861). 13. ^ Spackman, K. A. (1989). "Signal detection theory: Valuable tools for evaluating inductive learning". Proceedings of the Sixth International Workshop on Machine Learning. San Mateo, CA: Morgan Kaufmann. pp. 160 163. General references X. H. Zhou, N. A. Obuchowski, and D. M. McClish (2002). New York, USA: Wiley & Sons. ISBN 9780471347729. Further reading 5 sur 6 05/04/10 12:13

Zou, K.H., O'Malley, A.J., Mauri, L. (2007). Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation, 6;115(5):654 7. Lasko, T.A., J.G. Bhagwat, K.H. Zou and Ohno-Machado, L. (2005). The use of receiver operating characteristic curves in biomedical informatics. Journal of Biomedical Informatics, 38(5):404 415. Balakrishnan, N., (1991) Handbook of the Logistic Distribution, Marcel Dekker, Inc., ISBN 978-0824785871. Gonen M., (2007) Analyzing Receiver Operating Characteristic Curves Using SAS, SAS Press, ISBN 978-1-59994-298-1. Green, W.H., (2003) Econometric Analysis, fifth edition, Prentice Hall, ISBN 0-13-066189-9. Hosmer, D.W. and Lemeshow, S., (2000) Applied Logistic Regression, 2nd ed., New York; Chichester, Wiley, ISBN 0-471-35632-8. Brown, C.D., and Davis, H.T. (2006) Receiver operating characteristic curves and related decision measures: a tutorial, Chemometrics and Intelligent Laboratory Systems, 80:24 38 Mason, S.J. and Graham, N.E. (2002) Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation. Q.J.R. Meteorol. Soc., 128:2145 2166. Pepe, M.S. (2003). The statistical evaluation of medical tests for classification and prediction. Oxford. ISBN 0198565828. Carsten, S. Wesseling, S., Schink, T., and Jung, K. (2003) Comparison of Eight Computer Programs for Receiver- Operating Characteristic Analysis. Clinical Chemistry, 49:433 439 Swets, J.A. (1995). Signal detection theory and ROC analysis in psychology and diagnostics: Collected papers. Lawrence Erlbaum Associates. Swets, J.A., Dawes, R., and Monahan, J. (2000) Better Decisions through Science. Scientific American, October, pages 82 87. External links Kelly H. Zou's Bibliography of ROC Literature and Articles (http://www.spl.harvard.edu/archive/spl-pre2007/pages /ppl/zou/roc.html) An introduction to ROC analysis (http://www.csee.usf.edu/~candamo/site/papers/rocintro.pdf) A more thorough treatment of ROC curves and signal detection theory (http://www-psych.stanford.edu/~lera/psych115s /notes/signal/) Diagnostic test evaluation - online calculator (http://www.medcalc.be/calc/diagnostic_test.php) Tom Fawcett's ROC Convex Hull: tutorial, program and papers (http://home.comcast.net/~tom.fawcett/public_html /ROCCH/index.html) Peter Flach's tutorial on ROC analysis in machine learning (http://www.cs.bris.ac.uk/~flach/icml04tutorial/index.html) The magnificent ROC (http://www.anaesthetist.com/mnm/stats/roc/) An explanation and interactive demonstration of the connection of ROCs to archetypal bi-normal test result plots Retrieved from "" Categories: Detection theory Data mining Socioeconomics Biostatistics Statistical classification This page was last modified on 30 March 2010 at 13:47. Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. See Terms of Use for details. Wikipedia is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. 6 sur 6 05/04/10 12:13