OCW Epidemiology and Biostatistics, 2010 Michael D. Kneeland, MD November 18, 2010 SCREENING. Learning Objectives for this session:

OCW Epidemiology and Biostatistics, 2010 Michael D. Kneeland, MD November 18, 2010 SCREENING Learning Objectives for this session: 1) Know the objectives of a screening program 2) Define and calculate sensitivity, specificity and predictive values 3) Understand impact of disease prevalence in determining predictive values 4) Be familiar with the likelihood ratio in clinical decisions Outside Preparation: Gordis, Chapter 5: Assessing the Validity and Reliability of Diagnostic and Screening Tests (Skip: Use of Multiple Tests, Sequential (Two-stage) Testing and Simultaneous Testing) Chapter 18: The Epidemiologic Approach to Evaluation of Screening Programs (Skip: Methodologic Issues, Selection Biases and Referral Bias, Volunteer Bias) Screening is the application of a test to detect a potential disease or condition in a person who has no known signs of that disease or condition. So notes Dr. David Eddy in his chapter How to think About Screening in Screening for Diseases, Prevention in Primary Care. There are two main purposes for screening, he continues. One is to detect a disease early in its natural history when treatment might be more effective, less expensive, or both. The other purpose is to detect risk factors that put a person at a higher than average risk for developing a disease, with the goal of modifying the risk factor or factors to prevent the disease. Patients who have symptoms of a disease, or have signs of a disease on physical examination, have a work-up for the disease to detect its presence or absence. A work-up is not screening. 1) Screening Levels of Prevention A) Primary: To prevent the disease from occurring; Example: cholesterol screening to prevent heart disease. B) Secondary: To reduce the impact of a disease; Example: mammogram screening to identify patients at an early stage of breast cancer that may be favorably altered. C) Tertiary: To improve the quality of life associated with a disease; Example: Metastatic bone screen in certain cancers to prevent pathologic fractures. 2) Some Goals of Screening Programs 1

A) Exclusionary; Examples: Military; sport physicals; pilot license B) Societal Protection; Examples: TB; eye test for driver's license; pilot license C) Prevention and Control of Disease; Examples: Diabetes Mellitus; Hemophilia; Sickle-Cell Disease D) Surveillance and Monitoring; Examples: HIV status in high risk groups; Hepatitis B and HIV in IV drug abusers 3) Diseases Appropriate for Screening A) Serious Diseases B) Treatment begun before symptoms develop should be more beneficial than treatment begun after symptoms develop. Natural History of Disease a b c d a: disease begins b: disease detectable by screening c. symptoms develop d: outcome C) Prevalence of disease should warrant testing 4) Examples of screening programs in primary care medicine A) Pap smears re cervical cancer (Primary and Secondary) B) Blood sugars re diabetes mellitus (Secondary) C) Mammograms re breast cancer (Secondary) D) Colonoscopies re colo-rectal cancer (Secondary) E) Cholesterol re vascular disease (Primary and Secondary) F) PKU blood testing in newborns (Secondary) G) Blood pressure re vascular disease (Primary and Secondary) 5) Diseases not routinely screened in primary care medicine A) Coronary artery disease in young adults B) Degenerative joint disease in the elderly 6) Criteria for Test Selection A) Available B) Inexpensive Cost Benefit Analysis: Total costs saved as a result of the screening process divided by the total costs of the screening program Cost Effective Analysis: a.)total cost of screening program per diagnosis, or b.) Total cost of 2

the screening program per life year saved, or c.) Total cost of the screening program divided by the quality adjusted life year saved. C) Low Risk D) Easily performed E) Reliable: results reproducible F) Accurate: results are correct Various groups, such as the American Cancer Society or the American College of Cardiology, publish screening recommendations. Sometimes their recommendations differ. Ideally, such decisions are based on evidence supported by studies. Dr. Eddy points out that RCTs to determine the benefits of a screening program can be subject to dilution bias, patients offered the screening test don t receive it, and contamination bias, patients not offered the screening test receive it anyway. In addition to these potential internal validity biases, trials done in an experimental setting might not reflect what would happen in actual practice, an external validity concern. Another issue to consider is lead time bias. Because the screening process has presumably identified a disease earlier in its natural history, it might appear that patients live longer than they would have had they not had a screening test. Because of this, Dr. Eddy cautions, a comparison of survival rates in screened or unscreened populations can be misleading. 7) Measures of Test Performance A) Sensitivity: Given the disease is present, the likelihood of testing positive. Example: 100 people are known to have AIDS. A new HIV test done on these people says 90 are positive and 10 are negative. The sensitivity is 90/100 = 90%. The 90 patients are true positives because they really have the disease. The other 10 patients who test negative are false negatives because they were incorrectly called negative for the disease. Disease Positive Test Positive 90 (True Positives) Test Negative 10 (False Negatives) Total 100 Sensitivity = 90 / [ 90 + 10 ] = 90% = TP/ [ TP + FN ] 3

B) Specificity: Given the disease is not present, the likelihood of testing negative. Example: 200 people are known to be fully healthy. A screening VDRL indicates 190 are negative while 10 have latent syphilis. The specificity is 190 / 200 = 95% Disease Negative Test Positive 10 (False Positives) Test Negative 190 (True Negatives) Total 200 Specificity = 190 / [ 190 + 10 ] = 95% = TN / [ TN + FP ] Further Examples: 80 people are known to have alcoholic hepatitis and 60 people are known not to have alcoholic hepatitis. The sensitivity of serum SGOT for alcoholic hepatitis is 90% and its specificity is 70%. How many true positives and false positives were identified? Disease Positive Disease Negative SGOT Positive (80) (0.9) = 72 (60) - (42) = 18 SGOT Negative (80) - (72) = 8 (60) (0.7) = 42 TP = 72; FN = 8; TN = 42; FP = 18 8) Interpreting Test Results A) Predictive Value Positive: Given a test is positive, the likelihood disease is present. Example: 60 patients with a positive exercise stress test are referred for coronary artery angiography (cardiac catheterization) that showed 10 of them had coronary artery disease (CAD) and 50 did not. What was the predictive value positive of the stress test for CAD? CAD Positive CAD Negative Stress Test Positive 10 (True Positives) 50 (False Positives) PV+ = 10 / [ 10 + 50 ] = 10/60 = 16.7% = TP / [ TP + FP ] 4

B) Predictive Value Negative: Given a test is negative, the likelihood disease is not present. Example: 250 people have a negative lung scan for pulmonary embolus (PE). After undergoing a pulmonary artery angiography, 25 of them are shown to have a pulmonary embolus (PE). What was the predictive value negative of the lung scan for PE? (Note: A lung scan is not a screening test as defined in this lecture. Rather, it is part of a work-up when a concern has been raised. However, the concept as it pertains to predictive values is the same as it is for screening tests.) PE Positive PE Negative Lung Scan Negative 25 (False Negatives) 225 (True Negatives) PV (-) = 225 / [ 225 + 25 ] = 225/250 = 90% = TN / [ TN + FN ] 9) Baye's Theorem: Combines the sensitivity and specificity of a given test with the disease prevalence of a tested group to determine predictive values. Example: An HIV test has 90% sensitivity and 95% specificity. Researchers test 1000 highschool students with an HIV+ prevalence of 1%. What are the predictive values? HIV Positive HIV Negative Test Positive (10) (0.90) = 9 (990) - (941) = 49 Test Negative (10) - (9) = 1 (990) (0.95) = 941 Total (1000) (0.01) = 10 (1000) - (10) = 990 PV(+) = 9/58 = 15.5% PV(-) = 941/942 = 99.9% 5

The researchers then test 1200 IV drug abusers with an HIV+ prevalence of 40%. What are the predictive values with this group? HIV Positive HIV Negative Test Positive (480) (0.90) = 432 (720) - (684) = 36 Test Negative (480) - (432) = 48 (720) (0.95) = 684 Total (1200) (0.40) = 480 (1200) - (480) = 720 PV(+) = 432/468 = 92.3% PV(-) = 684/732 = 93.4% As the prevalence of HIV(+) increased from the high-school students to the IV drug abusers, the PV(+) increased from 15.5% to 92.3% while the PV(-) decreased from 99.9% to 93.4%. This change had nothing to do with the test's sensitivity and/or specificity that did not change. These results demonstrate why it may not be beneficial to test low prevalence groups for a disease, no matter how well the screening test itself performs. As the prevalence increases, the predictive value positive increases and the predictive value negative decreases. Conversely, as the prevalence decreases, the predictive value negative increases and the predictive value positive decreases. Following is a diagram that may help you to remember how to calculate sensitivity, specificity and the predictive values. Predictive Value Positive Disease + - Sensitivity Test + TP FP - FN TN Specificity Predictive Value Negative 6

When a New Test is More Sensitive than the Old Test This is a situation raised by Sarah Lord et al. in the June 6, 2006 Annals of Internal Medicine, When is Measuring Sensitivity and Specificity Sufficient to Evaluate a Diagnostic Test, and When Do We Need Randomized Trials? The authors note that, If a new test is more sensitive than an old test but has similar specificity, its value is directly related to the treatment response in the extra cases detected. The issue is whether or not the additional cases diagnosed by the new test s increased sensitivity will lead to improved patient outcomes. Clinicians first need to ask whether the extra cases detected by a new diagnostic test represent the same spectrum or subtypes of disease as those included in treatment trials. If they do, then randomized trials may not be required. If they do not, or if clinicians cannot be certain that they do, then clinicians also need to ask whether treatment response is known to be similar across the range of disease spectrum or subtype. These considerations are important regardless of the magnitude of the difference in sensitivity between tests. The authors cite the following example. A new test may also produce a shift in the spectrum of disease detected. For example, breast magnetic resonance imaging in addition to mammography is more sensitive that mammography alone for the early detection of invasive breast cancer in young women at high risk. However, the extra cases detected may represent a different spectrum of disease that does not show improved treatment response. Therefore, the value of breast magnetic resonance imaging is uncertain without trials comparing patient outcomes after early versus standard detection. Clinical Scenario You re seeing a 45-year-old female who has a slightly enlarged liver. She says she just lost her job because her boss claimed she repeatedly reported late to work. The patient maintains that rarely happened and the real reason she lost her job was that her boss didn t like her. She says she drinks alcohol socially, which she says is one glass of wine each night and occasionally a little more on weekends. Upon questioning, she says her husband wants her to drink less but she quickly adds that he believes it s unhealthy to drink any alcohol at all. After reviewing her medications, you note that she takes a medication that can occasionally cause an enlarged liver. You conclude that there is a possibility that her slightly enlarged liver could be due to her medicine, but unfortunately there is no blood test that can confirm or rule out that possibility. You re also concerned that she might be an alcoholic in denial and you consider acute alcoholic hepatitis in your differential diagnoses. You recall that the lab has added a new blood test that is 98% sensitive and 70% specific for acute alcoholic liver disease. You order the test and it returns at 81 U with 80 U being the lab s upper limit of normal; hence the lab reports this as a positive test for acute alcoholic hepatitis. You re facing a diagnostic challenge. You note that the test result is barely positive so you re concerned about a false positive result, especially given the test s specificity. You think about 7

calculating a predictive value positive, but you have no idea how likely it is that a patient presenting like her would have acute alcoholic hepatitis. Without this a priori estimate, the prior probability, you re unable to calculate a predictive value positive. You know that sensitivity is a measure of how well a test correctly identifies patients with a disease and that specificity is a measure of how well a test correctly identifies patients as not having a disease. It would be unusual for a test to be 100% sensitive and 100% specific. This is because there is often an overlap of the distributions from patients with and without disease, shown below. Suppose the curve on the left represents the distribution of results in patients without acute alcoholic hepatitis while the distribution on the right is the distribution of results of patients with acute alcoholic hepatitis. Because there is an overlap involving the right tail from the first distribution and the left tail from the other distribution, a cutoff point must be selected to separate a normal test result from an abnormal result. Such a cutoff level is illustrated below by the straight vertical line at 80 U. 30 U 80 U 130 U A test result is either positive or negative. The results can reflect truth true positives or true negatives or they can reflect mistakes false positives and false negatives. Using sensitivity and specificity alone, one can t quantify how likely it is that a given result represents truth. In the diagram above, the cutoff level was set at 80 U so a result of 81 U would be reported as a positive test. 8

One might ask, however, how likely it is that a result of 81 U represents a true positive; said differently, how likely is it that a patient with 81 U has alcoholic liver disease? What about a result of 100 U? One might intuitively think that a patient with this latter result is more likely to have acute alcoholic hepatitis vs. a patient with a value of 81 U. In other words, the higher the U, the more likely it is that the patient truly has acute alcoholic hepatitis. You will note that sensitivity and specificity can t quantify the likelihood that a given result represents true disease. The test is either positive or negative. There is, however, a method to quantify the likelihood that a given test result represents true disease or not. This is the likelihood ratio. Sackett et al in their Clinical Epidemiology text note that it expresses the odds that a given level of a diagnostic test result would be expected in a patient with (as opposed to one without) the disease. Suppose one calculated the following in 100 patients known to have acute alcoholic hepatitis: U Acute Alcoholic Hepatitis 1-90 5/100 patients = 0.05 91-99 15/100 patients = 0.15 100 or higher 80/100 patients = 0.80 Note that the higher the U, the more likely it is that the patient has acute alcoholic hepatitis. Suppose one then calculated the following in 50 patients known not to have alcoholic liver disease: U No Acute Alcoholic Hepatitis 1-90 42/50 = 0.84 91-99 7/50 = 0.14 100 or higher 1/50 = 0.02 Note that the lower the U, the more likely it is that the patient does not have acute alcoholic hepatitis. Said differently, the lower the U, the less likely it is that the patient has acute alcoholic hepatitis. 9

Calculating and Interpreting the Likelihood Ratio Using the data from the previous charts, we would say that a value of more than 100 U has a likelihood ratio of 40. This number is calculated as follows: 0.80/0.02 = 40 We would therefore conclude that a result greater than 100 U is 40 times as likely to come from patients with acute alcoholic hepatitis as from patients without acute alcoholic hepatitis. Using sensitivity and specificity values alone, all we can say about a test result higher than 100 U is that the test is positive. Using the likelihood ratio, we can quantify the odds by saying it s 40 times more likely to represent disease vs. non-disease. (There are further uses of likelihood ratios that are beyond the goals of this lecture.) One might ask about the relationship between sensitivity, specificity and the likelihood ratio. The cut-off value chosen to report a positive vs. a negative test would determine the sensitivity and specificity of the test. For example, if a cutoff value of 100 were used to determine a positive from a negative test, then the sensitivity of the test would be 80% because 80 of 100 patients with the disease would be correctly identified as positive while 20% would be false negatives. (20/100 patients with the disease had a value less than 100.) As for specificity, with a cut-off value of 100 the test would be 98% specific as only 1/50 patients without the disease had a value of 100 or higher. 10