Sensitivity, Specificity, and Relatives Brani Vidakovic ISyE 6421/ BMED 6700 Vidakovic, B. Se Sp and Relatives January 17, 2017 1 / 26
Overview Today: Vidakovic, B. Se Sp and Relatives January 17, 2017 2 / 26
Overview Today: Definitions of Sensitivity, Specificity, Positive and Negative Predictive Values, Likelihood Ratio Positive and Negative, Measure of Agreement. Vidakovic, B. Se Sp and Relatives January 17, 2017 2 / 26
Overview Today: Definitions of Sensitivity, Specificity, Positive and Negative Predictive Values, Likelihood Ratio Positive and Negative, Measure of Agreement. Performance of Tests: ROC Curves and Area Under ROC. Vidakovic, B. Se Sp and Relatives January 17, 2017 2 / 26
Overview Today: Definitions of Sensitivity, Specificity, Positive and Negative Predictive Values, Likelihood Ratio Positive and Negative, Measure of Agreement. Performance of Tests: ROC Curves and Area Under ROC. Assessing Combined Tests Vidakovic, B. Se Sp and Relatives January 17, 2017 2 / 26
Overview Today: Definitions of Sensitivity, Specificity, Positive and Negative Predictive Values, Likelihood Ratio Positive and Negative, Measure of Agreement. Performance of Tests: ROC Curves and Area Under ROC. Assessing Combined Tests Examples Vidakovic, B. Se Sp and Relatives January 17, 2017 2 / 26
Casscells et al. 1978 The following problem was posed by Casscells, Schoenberger, and Graboys (1978) to 60 students and staff at an elite medical school: If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming you know nothing about the person s symptoms or signs? Assuming that the probability of a positive result given the disease is 1, the answer to this problem is approximately 2%. Casscells et al. found that only 18% (11/60) of participants gave this answer. The most frequent response was 95% (27/60). Vidakovic, B. Se Sp and Relatives January 17, 2017 3 / 26
Fundamental 2 x 2 Table disease (D) no disease (C) total test positive (P) TP FP np=tp+fp test negative (N) FN TN nn=fn+tn total nd=tp+fn nc = FP + TN n TP FP FN TN np nn nd nc n true positives (test positive, disease present) false positives (test positive, disease absent) false negatives (test negative, disease present) true negatives (test negative, disease absent) total number of positives (TP + FP) total number of negatives (TN + FN) total number with disease present (TP + FN) total number without disease present (TN + FP) total sample size (TP+FP+FN+TN) Vidakovic, B. Se Sp and Relatives January 17, 2017 4 / 26
Definitions Sensitivity (Se) Specificity (Sp) Prevalence (Pre) Positive Predictive Value (PPV) Negative Predictive Value (NPV) Likelihood Ratio Positive (LRP) Likelihood Ratio Negative (LRN) Apparent Prevalence (APre) Accuracy, Concordance, Agreement (Ag) Se = TP/(TP + FN) = TP/nD Sp = TN/(FP + TN) = TN/nC (TP + FN)/(TP + FP + FN + TN)= nd/n PPV = TP/(TP + FP) = TP/nP NPV = TN/(TN + FN) = TN/nN LRP = Se/(1-Sp) LRN = (1-Se)/Sp APre = np/n Ag =(TP + TN)/n Alternative Names Sensitivity = True Positive Rate, Recall, Hit Rate Specificity = True Negative Rate PPV = Precision 1-Specificity = Fall-out, False Positive Rate Vidakovic, B. Se Sp and Relatives January 17, 2017 5 / 26
Discussion Imagine a test that classifies all subjects as positive trivially the sensitivity is 100%. Since there are no negatives, the specificity is zero. Likewise, the test that classifies all subjects as negative has a specificity of 100% and zero sensitivity. One of the most important measures is Positive Predictive Value, PPV. It is the proportion of true positives among all positives, T P/nP. Correct only if the population prevalence is well estimated by nd/n, that is, if the table is representative of its population. If the table is formed from a convenience sample, the prevalence (Pre) would be an external information, and positive predictive value is calculated as PPV = Se Pre Se Pre+(1 Sp) (1 Pre). Vidakovic, B. Se Sp and Relatives January 17, 2017 6 / 26
More Discussion Why is the Positive Predictive Value so important? Imagine almost perfect test for a particular disease, with sensitivity 100% and specificity of 99%. If the prevalence of the disease in population is 10%, then approximately among 10 positives there would be one false positive. However, if the population prevalence is 1/10000, then approximately for each true positive there would be 100 false positives. The Likelihood Ratio Positive represents the odds that a positive test result would be found in a patient with, versus without, a disease. The The Likelihood Ratio Negative represents the odds that a negative test result would be found in a patient without, versus with, a disease. For example, Post-test Disease Odds = LRP Pre-test Disease Odds. Post-test No-Disease Odds = LRN Pre-test No-Disease Odds. Vidakovic, B. Se Sp and Relatives January 17, 2017 7 / 26
D-Dimer Example The data below consist of quantitative plasma D-dimer levels among patients undergoing pulmonary angiography for suspected pulmonary embolism (PE). The patients who exceed the threshold of 500 ng/ml are classified as positive. The gold standard for PE is the pulmonary angiogram. Goldhaber et al. (1993), from Brigham and Women s Hospital at Harvard Medical School, considered a population of patients who are suspected of PE based on a battery of symptoms. The summarized data for 173 patients are provided in the table below: acute PE present no PE present total positive (D-d 500 ng/ml) 42 96 138 negative (D-d < 500 ng/ml) 3 32 35 total 45 128 173 Vidakovic, B. Se Sp and Relatives January 17, 2017 8 / 26
sesp.m function [se sp pre ppv npv lrp ag yi] = sesp(tp, fp, fn, tn) % %INPUT: 2x2 Contingency (Confusion) Table % tp-true positives; fp-false positives; % fn-false negatives; tn-true negatives %--------- % OUTPUT % se-sensitivity, sp-specificity, pre-prevalence(for random sample) % ppv-positive predictive value, npv-negative predictive value, % lrp - likelihood ratio positive, ag-agreement % EXAMPLE OF USE: % D-dimer as a test for acute PE (Goldhaber et al, 1993) % [s1, s2, p1, p2, p3, lr, a, yi]=sesp(42,96,3,32); % [a b c d e f g h] = sesp(42,96,3,32); Se Sp Pre PPV NPV LRP Ag Yi 0.9333 0.2500 0.2601 0.3043 0.9143 1.2444 0.4277 0.1296 Vidakovic, B. Se Sp and Relatives January 17, 2017 9 / 26
Comparing Tests Youden Index To choose the best test out of a multiplicity of tests obtained by changing the threshold and generating the ROC curve, select the test corresponding to the point in the ROC curve most distant from the diagonal. This point corresponds to a Youden index YI = max i Se i +Sp i 1 2, where Se i and Sp i are, respectively, the sensitivity and specificity for the ith test. In the expression for Youden index the constant 2 is taken because of geometric interpretation (distance from the diagonal), often the constant is omitted. Vidakovic, B. Se Sp and Relatives January 17, 2017 10 / 26
Comparing Tests F-measure Another objective criterion to choose from the multiplicity of thresholds in a test is F-measure or F-index. It is defined as a harmonic average of Sensitivity and and Positive Predictive Value, F = 2 1/Se+1/PPV. It is easy to see that F is the ratio of TP and an average of nd and np, F = TP (nd+np)/2. The test that maximizes F-measure is favored. Vidakovic, B. Se Sp and Relatives January 17, 2017 11 / 26
k Independent Tests, Parallel and Serial Strategies In the parallel strategy the combination is positive if at least one test is positive. It is negative if all tests are negative. Se = 1 [(1 Se 1 ) (1 Se 2 ) (1 Se k )] Sp = Sp 1 Sp 2 Sp k. The overall sensitivity is larger than any individual sensitivity, and specificity smaller than any individual specificity. In the serial strategy, the combination is positive if all tests are positive. It is negative if at least one test is negative. Se = Se 1 Se 2 Se k. Sp = 1 [(1 Sp 1 ) (1 Sp 2 ) (1 Sp k )] The overall sensitivity is smaller than any individual sensitivity, while the specificity is larger than any individual specificity. Vidakovic, B. Se Sp and Relatives January 17, 2017 12 / 26
Parikh et al (2008) Parikh et al (2008) provide an example of combining two tests for sarcoidosis. Ocular sarcoidosis is an idiopathic multi-system granulomatous disease, where the diagnosis is made by a combination of clinical, radiological and laboratory findings. The gold standard is a tissue biopsy showing noncaseating granuloma. Angiotensin-converting enzyme (ACE) test has a sensitivity of 73% and a specificity of 83% to diagnose sarcoidosis. Abnormal gallium scan (AGS) has a sensitivity of 91% and a specificity of 84%. Though individually the specificity of either test is not impressive, for the serial combination, the specificity becomes Sp = 1 (1 0.84) (1 0.83) = 1 (0.16 0.17) = 0.97. The combination sensitivity becomes 0.73 0.91 = 0.66. Vidakovic, B. Se Sp and Relatives January 17, 2017 13 / 26
ROC Curves The ROC (receiver-operating-characteristic) curve is defined as a graphical plot of sensitivity vs. (1 - specificity). To increase apparently low specificity in the previous D-dimer analysis the threshold for a positive is increased from 500 ng/ml to 650 ng/ml. acute PE present no PE present positive (D-d 650 ng/ml) 31 33 negative (D-d < 650 ng/ml) 14 95 total 45 128 This new table results in [a b c d e f] = sesp(31,33,14,95); Se Sp Pre PPV NPV LRP Ag YI 0.6889 0.7422 0.2601 0.4844 0.8716 2.6721 0.7283 0.7283 Vidakovic, B. Se Sp and Relatives January 17, 2017 14 / 26
ROC Curves Combining this with the output of 500 ng/ml threshold, we get vectors 1-sp = [0 1-0.7422 1-0.25 1], and se = [0 0.6889 0.9333 1]. 1 0.9 (0.75,0.9333) 0.8 sensitivity 0.7 0.6 0.5 0.4 (0.2578,0.6889) Area = 0.7297 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 1 specificity The m-file RocDdimer.m plots this rudimentary ROC curve. Vidakovic, B. Se Sp and Relatives January 17, 2017 15 / 26
ROC Curves The curve is rudimentary since it is based on only two tests. Note that points (0,0) and (1,1) always belong to ROC curves. These two points correspond to the trivial tests in which all patients test negative or all patients test positive. The area under the ROC curve (AUC), is a well accepted measure of test performance. The closer the area is to 1, the more unbalanced the ROC curve, implying that both sensitivity and specificity of the test are high. It is interesting that some researchers assign an academic scale to AUC as an informal measure of test performance. Vidakovic, B. Se Sp and Relatives January 17, 2017 16 / 26
ROC Curves AUC performance 0.9-1.0 A 0.8-0.9 B 0.7-0.8 C 0.6-0.7 D 0.0-0.6 F MATLAB code auc.m calculates AUC when the vectors 1-specificity and sensitivity are supplied. function A = auc(csp, se) % A = auc(csp,se) computes the area under the ROC curve % where csp and se are vectors representing (1-specificity) % and (sensitivity), used to plot the ROC curve % The length of the vectors has to be the same Vidakovic, B. Se Sp and Relatives January 17, 2017 17 / 26
Exercise 1. HAAH Improves the Test for Prostate Cancer. Keith et al, 2008 A new procedure based on a protein called human aspartyl (asparaginyl) beta-hydroxylase, or HAAH, adds to the accuracy of standard prostate-specific antigen (PSA) testing for prostate cancer. The findings were presented at the 2008 Genitourinary Cancers Symposium (Keith et al, 2008). The research involved 233 men with prostate cancer and 43 healthy men, all over 50 years old. Researchers reported that the HAAH test had an overall sensitivity of 95% and specificity of 93%. Vidakovic, B. Se Sp and Relatives January 17, 2017 18 / 26
Exercise 1, continued (a) From the reported percentages, form a table with true positives, false positives, true negatives and false negatives (tp, fp, tn, and fn). You will need to round to the nearest integers since the specificity and sensitivity were reported as integer percents. (b) Suppose that for the men of age 50+ in US, the prevalence of prostate cancer is 7%. Suppose, Jim Smith is randomly selected from this group and tested positive with HAAH test. What is the probability that Jim has prostate cancer. (c) Suppose that Bill Schneider is randomly selected person from the sample of n = 276 (= 233+43) subjects involved in the HAAH study. What is the probability that Bill has prostate cancer if he tests positive and no other information is available. How do you call this probability? What is different here from (b)? Vidakovic, B. Se Sp and Relatives January 17, 2017 19 / 26
Solution (a) (a) Recall that sensitivity is the ratio of true positives and total number of subjects with the disease. Since 233 subjects are with the disease, the sensitivity of 95% means that there are 233 0.95 = 221.35 221 true positives. Thus tp = 221. This gives 233 221 = 12 false negatives, thus fn = 12. Similarly, 43 subjects do not have disease. Since specificity is 0.93, the true negatives are 43 0.93 = 39.99 40. This means tn=40 and fp = 3. The table is disease no disease total test positive tp=221 fp=3 tot.pos = 224 test negative fn=12 tn=40 tot.neg = 52 total tot.dis=233 tot.ndis=43 total=276 Vidakovic, B. Se Sp and Relatives January 17, 2017 20 / 26
(b) PPV is 0.5058 If the prevalence is an external info, P( disease test positive) sensitivity prevalence = sensitivity prevalence +(1-specificity) (1- prevalence) 221/233 7/100 = 221/223 7/100+3/43 93/100 = 0.5058 (c) PPV is 0.9866 If the table is representative of population, PPV = tp tp+fp = 221/224 = 0.9866. Vidakovic, B. Se Sp and Relatives January 17, 2017 21 / 26
Discussion In both (b) and (c) we have found positive predicted value, that is P( disease present test positive). However, (b) and (c) differ in the information where the subject comes from, which is critical for the prevalence. If the subject comes from the general population then the prevalence is 0.07. If we selected the subject from the group involved in this study (that is, selected person is one of 276 subjects), then the prevalence refers to this particular group and is tp+fn = 233/276. total n Vidakovic, B. Se Sp and Relatives January 17, 2017 22 / 26
Exercise 2. Hypothyroidism Low values of a total thyroxine (T4) test can be indicative of hypothyroidism (Goldstein and Mushlin, 1987). Hypothyroidism is a condition in which the body lacks sufficient thyroid hormone. Since the main purpose of thyroid hormone is to run the body s metabolism, it is understandable that people with this condition will have symptoms associated with a slow metabolism. Over five million Americans have this common medical condition. A total of 195 patients, among which 59 have confirmed hypothyroidism, have been tested for the level of T4. If the patients with T 4-level 5 are considered positive for hypothyroidism, the following table is obtained: T 4 value Hypothyroid Euthyroid Total Positive, T4 5 35 5 40 Negative, T 4 > 5 24 131 155 Total 59 136 195 Vidakovic, B. Se Sp and Relatives January 17, 2017 23 / 26
Exercise 2, continued However, if the thresholds for T4 are 6, 7, 8 and 9, the following tables are obtained. T4 value Hypothyroid Euthyroid Total Positive, T4 6 39 10 49 Negative, T4 > 6 20 126 146 Total 59 136 195 T4 value Hypothyroid Euthyroid Total Positive, T4 7 46 29 75 Negative, T4 > 7 13 107 120 Total 59 136 195 T4 value Hypothyroid Euthyroid Total Positive, T4 8 51 61 112 Negative, T4 > 8 8 75 83 Total 59 136 195 T4 value Hypothyroid Euthyroid Total Positive, T4 9 57 96 153 Negative, T4 > 9 2 40 42 Total 59 136 195 Vidakovic, B. Se Sp and Relatives January 17, 2017 24 / 26
Solution to Exercise 2 hypothiroidism.m [a b c d e f h] = sesp( 35, 5, 24, 131); % Se Sp Pre PPV NPV LRP Ag YI % 0.5932 0.9632 0.3026 0.8750 0.8452 16.1356 0.8513 0.3935 [a b c d e f h] = sesp( 39, 10, 20, 126); % Se Sp Pre PPV NPV LRP Ag YI % 0.6610 0.9265 0.3026 0.7959 0.8630 8.9898 0.8462 0.4154 [a b c d e f h] = sesp(46, 29, 13, 107); % Se Sp Pre PPV NPV LRP Ag YI % 0.7797 0.7868 0.3026 0.6133 0.8917 3.6563 0.7846 0.4005 [a b c d e f h] = sesp(51, 61, 8, 75); % Se Sp Pre PPV NPV LRP Ag YI % 0.8644 0.5515 0.3026 0.4554 0.9036 1.9272 0.6462 0.2941 [a b c d e f h] = sesp(57, 96, 2, 40); % Se Sp Pre PPV NPV LRP Ag YI % 0.9661 0.2941 0.3026 0.3725 0.9524 1.3686 0.4974 0.1840 se = [0, 0.5932, 0.6610, 0.7797, 0.8644, 0.9661, 1]; csp = [0, 1-0.9632, 1-0.9265, 1-0.7868, 1-0.5515, 1-0.2941, 1]; figure(1) plot(csp, se, r- ) hold on plot(csp, se, ro ) plot([0 1],[0 1], r- ) xlabel( 1 - specificity ) ylabel( sensitivity ) a = auc(csp, se) % a = 0.8527 (Grade of B). Vidakovic, B. Se Sp and Relatives January 17, 2017 25 / 26
Solution to Exercise 2 Roc Curve 1 0.8 Sensitivity 0.6 0.4 (1 0.9265,0.6610) Area = 0.8527 0.2 0 0 0.2 0.4 0.6 0.8 1 1 - Specificity Vidakovic, B. Se Sp and Relatives January 17, 2017 26 / 26