Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN

Size: px

Start display at page:

Download "Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN"

Marjory Hunter
6 years ago
Views:

1 Outline 1. Review sensitivity and specificity 2. Define an ROC curve 3. Define AUC 4. Non-parametric tests for whether or not the test is informative 5. Introduce the binormal ROC model 6. Discuss non-parametric estimation of the ROC curve

2 Review Imagine the following table being obtained as a random sample Decision Test Diseased Not Diseased Positive TP FP Negative FN TN

3 Sensitivity and Specificity Recall that the sensitivity is P(+ D); it can be estimated with sens ˆ = TP TP + FN The specificity is P( D C ); it can be estimated with spec ˆ = TN TN + FP Sensitivity and specificity can be estimated both under the current assumed sampling scheme, as well as when the column margins are fixed Estimation is typically performed using methods developed for binomial proportions

4 Diagnostic accuracy Diagnostic accuracy is defined as the probability of a correct decision P {(+ D) ( D c )} Diagnostic accuracy can be estimated with ˆ DA = TP + TN TP + TN + FP + FN = sens ˆ prev ˆ + spec ˆ (1 ˆ prev) Note that Diagnostic accuracy can not be estimated if the column margins are fixed without an estimate of the disease prevalence

5 Comparisons Suppose that two diagnostic tests are being compared with regard to their sensitivity or specificity A first question to ask is whether the data are matched or not, i.e. were both tests applied to each subject or were the tests applied to independent subjects If they are matched, then McNemar s test can be used If they are independent, then any of the methods for comparing proportions can be used

6 ROC curves Often the diagnostic measure is continuous (and hence not dichotomous) For example, consider BMI as a marker for sleep disordered breathing For each cutpoint of the marker, there is a sensitivity and specificity A plot of the sensitivity by 1 - specificity (the false positive rate) is called a receiver operating characteristic (ROC) curve

7 ROC curves Let Y be the marker and assume that we conclude that a subject is diseased if Y > c Let S D and S D be the survivor functions for the marker for the diseased and non-diseased populations Then the ROC curve is where t [0, 1] S D {S 1 D (t)}

8 Proof Consider the point t on the (horizontal axis of the) ROC curve? First, what cutpoint does t correspond to? This implies t = 1 - specificity = P(Y > c D) = S D(c) c = S 1 D (t) Therefore the corresponding sensitivity, i.e. the point on the ROC curve, is P(Y > c D) = S D (c) = S D {S 1 D (t)}

9 Some facts about ROC curves An ROC curve that is a diagonal line (sensitivity = 1 - specificity) corresponds to a uninformative test where a positive test corresponds to flipping a coin with success probability equal to the sensitivity The ROC curve always starts at (0, 0) and ends at (1, 1) Curves that are higher represent better tests A marker with an ROC curve that is uniformly below the diagonal line is worse than guessing, and hence can be improved upon by taking the opposite decision

10 Good discrimination marker Specificity density Sensitivity Poor discrimination marker Specificity density Sensitivity

11 AUC Comparison of ROC curves often follows by comparing their areas under the curve (AUC) Better markers have higher AUCs The largest possible AUC is 1, the smallest (for an informative test) is.5 If the ROC curve for one test dominates another, then its AUC will also be larger The converse is not true, a larger AUC does not imply a uniformly better test

12 AUC The AUC can be shown to be P(Y D >= Y D) where Y D is a marker value from the diseased population and Y D is a marker value from the non-diseased population Recall the Wilcoxon rank sum statistic tests a stochastic shift in the two distributions This motivates the use of the Wilcoxon rank sum test to test whether or not the AUC is.5 (equal to a coin flip) (versus greater than) More thorough derivations can be found in Pepe s book The Statistical Evaluation of Medical Tests for Classification and Prediction

13 Example Consider scores given by a computer-aided diagnosis tool for breast cancer (higher scores indicate presence of the disease) N D Consider testing H 0 : AUC =.5 versus H a : AUC >.5 wilcox.test(d, N, alternative = "g") P-value is estimated to be nearly zero, suggesting evidence that this test is informative

14 The binormal ROC Assume that Y D Normal(µ D, σ 2 D ) And Y D Normal(µ D, σ 2 D) Then the ROC curve works out to be where a = µ D µ D σ D Φ{a + bφ 1 (t)} and b = σ D σ D

15 Proof Recall that the ROC curve equals Note that S D (c) = Φ( c µ D σ D ) S 1 D (t) = µ D + σ D Φ 1 (t) S D {S 1 D (t)} Plug in and you obtain the result

16 Binormal ROC model Note that the ROC curve is invariant to monotonic transformations of the data Therefore, the binormal ROC model doesn t actually require the data to be normal on the measured scale Instead it only requires that the data be normally distributed on some scale For this reason estimating µ D, µ D, σ D and σ D assuming normality on the original data scale is not generally done

17 A non-parametric estimator of the ROC Given a set of data, the conceptual way to calculate the ROC curve is to 1. Choose a sequence of thresholds 2. For each threshold, estimate the true positive fraction (TPF) and false positive fraction (FPF) 3. Plot the pairs of TPF and 1 FPF Note we could speed this procedure up quite a bit by choosing the sequence of thresholds to only be those points for which a data value occurred

18 Verifying the binormal model One can use the non-parametric ROC curve to verify the binormal assumption Recall for the binormal model Hence ROC(t) = Φ{a + bφ 1 (t)} Φ 1 {ROC(t)} = a + bφ 1 (t) If the empirical ROC curve looks like a straight line on with normal quantile axes, then the binormal model appears to hold

19 Example This data is example data from the ROCR package in R Data contains a probability of disease status and a disease status label data(rocr.simple) pred <- prediction(rocr.simple$pred, ROCR.simple$labels) perf <- performance(pred, "tpr", "fpr") plot(perf) abline(0, 1) plot(qnorm(perf@x.values[[1]]), qnorm(perf@y.values[[1]]), points(qnorm(perf@x.values[[1]]), qnorm(perf@y.values[[1]])

20 False positive rate qnorm(perf@x.values[[1]]) True positive rate qnorm(perf@y.values[[1]])

21 Summary We ve really only given a brief introduction to this topic; of interest is Variance of the non-parametric ROC curve Comparison of multiple ROC curves Estimating a and b from the binormal model Including covariate effects Ordinal (instead of continuous) scores Handling matching A good treatment of the subject can be found in the (previously mentioned) Pepe book

Introduction to ROC analysis

Introduction to ROC analysis Andriy I. Bandos Department of Biostatistics University of Pittsburgh Acknowledgements Many thanks to Sam Wieand, Nancy Obuchowski, Brenda Kurland, and Todd Alonzo for previous