OCW Epidemiology and Biostatistics, 2010 Michael D. Kneeland, MD November 18, 2010 SCREENING. Learning Objectives for this session:

Similar documents
Introduction to Epidemiology Screening for diseases

Screening for Disease

Lecture 14 Screening tests and result interpretation

Welcome to this series focused on sources of bias in epidemiologic studies. In this first module, I will provide a general overview of bias.

The recommended method for diagnosing sleep

Handout 11: Understanding Probabilities Associated with Medical Screening Tests STAT 100 Spring 2016

7/17/2013. Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course

Screening for Disease

Chapter 10. Screening for Disease

Screening (Diagnostic Tests) Shaker Salarilak

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

THERAPEUTIC REASONING

Finding Good Diagnosis Studies

Biostatistics Lecture April 28, 2001 Nate Ritchey, Ph.D. Chair, Department of Mathematics and Statistics Youngstown State University

Data that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes:

Let s look a minute at the evidence supporting current cancer screening recommendations.

Bayes Theorem Application: Estimating Outcomes in Terms of Probability

INTERNAL VALIDITY, BIAS AND CONFOUNDING

CHAPTER 2 MAMMOGRAMS AND COMPUTER AIDED DETECTION

OCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010

Epidemiology 2200b Lecture 3 (continued again) Review of sources of bias in Case-control, cohort and ecologic(al) studies

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

Controversies in Breast Cancer Screening

Biostatistics 3. Developed by Pfizer. March 2018

Health Studies 315: Handouts. Health Studies 315: Handouts

Chapter 8: Two Dichotomous Variables

The cross sectional study design. Population and pre-test. Probability (participants). Index test. Target condition. Reference Standard

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

CPS331 Lecture: Coping with Uncertainty; Discussion of Dreyfus Reading

Folland et al Chapter 4

Evidence-based Cancer Screening & Surveillance

An introduction to power and sample size estimation

Critical reading of diagnostic imaging studies. Lecture Goals. Constantine Gatsonis, PhD. Brown University

Bioengineering and World Health. Lecture Twelve

Bayes Theorem and diagnostic tests with application to patients with suspected angina

Table of Contents. Early Identification Chart Biometric Screening Comprehensive Cardiovascular Disease Risk Assessment...

Systematic Reviews and meta-analyses of Diagnostic Test Accuracy. Mariska Leeflang

Bayesian Analysis by Simulation

Functionalist theories of content

Diagnostic Test. H. Risanto Siswosudarmo Department of Obstetrics and Gynecology Faculty of Medicine, UGM Jogjakarta. RS Sardjito

PREVENTIVE HEALTHCARE GUIDELINES INTRODUCTION

Actor-Observer Bias One of the most established phenomenon in social psychology YOUR behavior is attributed to OTHER S behavior is attributed to

SYSTEMATIC REVIEWS OF TEST ACCURACY STUDIES

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH DIAGNOSTICS: SOME FUNDAMENTAL ISSUES

Evidence Based Medicine: Articles of Diagnosis

Breast Cancer Screening

RANDOLPH E. PATTERSON, MD, FACC, STEVEN F. HOROWITZ, MD, FACC*

Statistics. Dr. Carmen Bruni. October 12th, Centre for Education in Mathematics and Computing University of Waterloo

Checking In on the Annual Check-up MICHAEL MIGNOLI, MD, FACP

Selecting Research Participants. Conducting Experiments, Survey Construction and Data Collection. Practical Considerations of Research

Oct. 21. Rank the following causes of death in the US from most common to least common:

Understanding diagnostic tests. P.J. Devereaux, MD, PhD McMaster University

Challenges of Observational and Retrospective Studies

Bayes theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Statistics, Probability and Diagnostic Medicine

Diagnostic Reasoning: Approach to Clinical Diagnosis Based on Bayes Theorem

Conduct an Experiment to Investigate a Situation

Preventive Care Guideline for Asymptomatic Elderly Patients Age 65 and Over

EBM Diagnosis. Denise Campbell-Scherer Stefanie R. Brown. Departments of Medicine and Pediatrics University of Miami Miller School of Medicine

Screening. Mario Merialdi. Training Course in Sexual and Reproductive Health Research Geneva 2009

BMI 541/699 Lecture 16

Handout 16: Opinion Polls, Sampling, and Margin of Error

Probability Revision. MED INF 406 Assignment 5. Golkonda, Jyothi 11/4/2012

Bradford Hill Criteria for Causal Inference Based on a presentation at the 2015 ANZEA Conference

Hereditary Breast and Ovarian Cancer (HBOC) Information for individuals and families

Genetic Counselor: Hi Lisa. Hi Steve. Thanks for coming in today. The BART results came back and they are positive.

Evidence-Based Medicine: Diagnostic study

Why Prevention? Why is Prevention Difficult? Overview of Preventive Medicine for Family Physicians. Levels of Prevention

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers

Meta-analysis of diagnostic research. Karen R Steingart, MD, MPH Chennai, 15 December Overview

Welcome to this four part series focused on epidemiologic and biostatistical methods related to disease screening. In this first segment, we will

Blue represents coding updates. G0389 with diagnosis V81.2, V15.82, or with diagnosis V79.1, or

Probability II. Patrick Breheny. February 15. Advanced rules Summary

The Activity Students will read extracts from the Making sense of testing document, and answer questions about what they have read.

Why Prevention? Why is Prevention Difficult? Overview of Preventive Medicine for Family Physicians. Levels of Prevention

! Mainly going to ignore issues of correlation among tests

Prevention Live sensibly, among a thousand people only one dies a natural death, the rest succumb to irrational modes of living. Maimonides,

Alcohol s Good for You? Some Scientists Doubt It

Optima Health. Adult Health Maintenance Guidelines. Guideline History. Original Approve Date 04/93

1 The conceptual underpinnings of statistical power

Designing Studies of Diagnostic Imaging

Perspectives on Large Simple Trials

Gold and Hohwy, Rationality and Schizophrenic Delusion

A parable in four visits. A parable THE ARCHITECTURE OF MEDICAL TEST EVALUATIONS. παραβολή

EVIDENCE-BASED GUIDELINE DEVELOPMENT FOR DIAGNOSTIC QUESTIONS

Psychology, 2010, 1: doi: /psych Published Online August 2010 (

Bias. A systematic error (caused by the investigator or the subjects) that causes an incorrect (overor under-) estimate of an association.

Unit 4 Probabilities in Epidemiology

sickness, disease, [toxicity] Hard to quantify

MIDTERM EXAM. Total 150. Problem Points Grade. STAT 541 Introduction to Biostatistics. Name. Spring 2008 Ismor Fischer

METHODS FOR DETECTING CERVICAL CANCER

Biomarkers for Underreported Alcohol Use

Subject: Preventive Services Policy Effective Date: 08/2017 Revision Date: 05/2018

Appendix G: Methodology checklist: the QUADAS tool for studies of diagnostic test accuracy 1

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

MITOCW conditional_probability

Preventive Health Guidelines

Research methods in sensation and perception. (and the princess and the pea)

STUDIES OF THE ACCURACY OF DIAGNOSTIC TESTS: (Relevant JAMA Users Guide Numbers IIIA & B: references (5,6))

Questionnaire design. Questionnaire Design: Content. Questionnaire Design. Questionnaire Design: Wording. Questionnaire Design: Wording OUTLINE

Transcription:

OCW Epidemiology and Biostatistics, 2010 Michael D. Kneeland, MD November 18, 2010 SCREENING Learning Objectives for this session: 1) Know the objectives of a screening program 2) Define and calculate sensitivity, specificity and predictive values 3) Understand impact of disease prevalence in determining predictive values 4) Be familiar with the likelihood ratio in clinical decisions Outside Preparation: Gordis, Chapter 5: Assessing the Validity and Reliability of Diagnostic and Screening Tests (Skip: Use of Multiple Tests, Sequential (Two-stage) Testing and Simultaneous Testing) Chapter 18: The Epidemiologic Approach to Evaluation of Screening Programs (Skip: Methodologic Issues, Selection Biases and Referral Bias, Volunteer Bias) Screening is the application of a test to detect a potential disease or condition in a person who has no known signs of that disease or condition. So notes Dr. David Eddy in his chapter How to think About Screening in Screening for Diseases, Prevention in Primary Care. There are two main purposes for screening, he continues. One is to detect a disease early in its natural history when treatment might be more effective, less expensive, or both. The other purpose is to detect risk factors that put a person at a higher than average risk for developing a disease, with the goal of modifying the risk factor or factors to prevent the disease. Patients who have symptoms of a disease, or have signs of a disease on physical examination, have a work-up for the disease to detect its presence or absence. A work-up is not screening. 1) Screening Levels of Prevention A) Primary: To prevent the disease from occurring; Example: cholesterol screening to prevent heart disease. B) Secondary: To reduce the impact of a disease; Example: mammogram screening to identify patients at an early stage of breast cancer that may be favorably altered. C) Tertiary: To improve the quality of life associated with a disease; Example: Metastatic bone screen in certain cancers to prevent pathologic fractures. 2) Some Goals of Screening Programs 1

A) Exclusionary; Examples: Military; sport physicals; pilot license B) Societal Protection; Examples: TB; eye test for driver's license; pilot license C) Prevention and Control of Disease; Examples: Diabetes Mellitus; Hemophilia; Sickle-Cell Disease D) Surveillance and Monitoring; Examples: HIV status in high risk groups; Hepatitis B and HIV in IV drug abusers 3) Diseases Appropriate for Screening A) Serious Diseases B) Treatment begun before symptoms develop should be more beneficial than treatment begun after symptoms develop. Natural History of Disease a b c d a: disease begins b: disease detectable by screening c. symptoms develop d: outcome C) Prevalence of disease should warrant testing 4) Examples of screening programs in primary care medicine A) Pap smears re cervical cancer (Primary and Secondary) B) Blood sugars re diabetes mellitus (Secondary) C) Mammograms re breast cancer (Secondary) D) Colonoscopies re colo-rectal cancer (Secondary) E) Cholesterol re vascular disease (Primary and Secondary) F) PKU blood testing in newborns (Secondary) G) Blood pressure re vascular disease (Primary and Secondary) 5) Diseases not routinely screened in primary care medicine A) Coronary artery disease in young adults B) Degenerative joint disease in the elderly 6) Criteria for Test Selection A) Available B) Inexpensive Cost Benefit Analysis: Total costs saved as a result of the screening process divided by the total costs of the screening program Cost Effective Analysis: a.)total cost of screening program per diagnosis, or b.) Total cost of 2

the screening program per life year saved, or c.) Total cost of the screening program divided by the quality adjusted life year saved. C) Low Risk D) Easily performed E) Reliable: results reproducible F) Accurate: results are correct Various groups, such as the American Cancer Society or the American College of Cardiology, publish screening recommendations. Sometimes their recommendations differ. Ideally, such decisions are based on evidence supported by studies. Dr. Eddy points out that RCTs to determine the benefits of a screening program can be subject to dilution bias, patients offered the screening test don t receive it, and contamination bias, patients not offered the screening test receive it anyway. In addition to these potential internal validity biases, trials done in an experimental setting might not reflect what would happen in actual practice, an external validity concern. Another issue to consider is lead time bias. Because the screening process has presumably identified a disease earlier in its natural history, it might appear that patients live longer than they would have had they not had a screening test. Because of this, Dr. Eddy cautions, a comparison of survival rates in screened or unscreened populations can be misleading. 7) Measures of Test Performance A) Sensitivity: Given the disease is present, the likelihood of testing positive. Example: 100 people are known to have AIDS. A new HIV test done on these people says 90 are positive and 10 are negative. The sensitivity is 90/100 = 90%. The 90 patients are true positives because they really have the disease. The other 10 patients who test negative are false negatives because they were incorrectly called negative for the disease. Disease Positive Test Positive 90 (True Positives) Test Negative 10 (False Negatives) Total 100 Sensitivity = 90 / [ 90 + 10 ] = 90% = TP/ [ TP + FN ] 3

B) Specificity: Given the disease is not present, the likelihood of testing negative. Example: 200 people are known to be fully healthy. A screening VDRL indicates 190 are negative while 10 have latent syphilis. The specificity is 190 / 200 = 95% Disease Negative Test Positive 10 (False Positives) Test Negative 190 (True Negatives) Total 200 Specificity = 190 / [ 190 + 10 ] = 95% = TN / [ TN + FP ] Further Examples: 80 people are known to have alcoholic hepatitis and 60 people are known not to have alcoholic hepatitis. The sensitivity of serum SGOT for alcoholic hepatitis is 90% and its specificity is 70%. How many true positives and false positives were identified? Disease Positive Disease Negative SGOT Positive (80) (0.9) = 72 (60) - (42) = 18 SGOT Negative (80) - (72) = 8 (60) (0.7) = 42 TP = 72; FN = 8; TN = 42; FP = 18 8) Interpreting Test Results A) Predictive Value Positive: Given a test is positive, the likelihood disease is present. Example: 60 patients with a positive exercise stress test are referred for coronary artery angiography (cardiac catheterization) that showed 10 of them had coronary artery disease (CAD) and 50 did not. What was the predictive value positive of the stress test for CAD? CAD Positive CAD Negative Stress Test Positive 10 (True Positives) 50 (False Positives) PV+ = 10 / [ 10 + 50 ] = 10/60 = 16.7% = TP / [ TP + FP ] 4

B) Predictive Value Negative: Given a test is negative, the likelihood disease is not present. Example: 250 people have a negative lung scan for pulmonary embolus (PE). After undergoing a pulmonary artery angiography, 25 of them are shown to have a pulmonary embolus (PE). What was the predictive value negative of the lung scan for PE? (Note: A lung scan is not a screening test as defined in this lecture. Rather, it is part of a work-up when a concern has been raised. However, the concept as it pertains to predictive values is the same as it is for screening tests.) PE Positive PE Negative Lung Scan Negative 25 (False Negatives) 225 (True Negatives) PV (-) = 225 / [ 225 + 25 ] = 225/250 = 90% = TN / [ TN + FN ] 9) Baye's Theorem: Combines the sensitivity and specificity of a given test with the disease prevalence of a tested group to determine predictive values. Example: An HIV test has 90% sensitivity and 95% specificity. Researchers test 1000 highschool students with an HIV+ prevalence of 1%. What are the predictive values? HIV Positive HIV Negative Test Positive (10) (0.90) = 9 (990) - (941) = 49 Test Negative (10) - (9) = 1 (990) (0.95) = 941 Total (1000) (0.01) = 10 (1000) - (10) = 990 PV(+) = 9/58 = 15.5% PV(-) = 941/942 = 99.9% 5

The researchers then test 1200 IV drug abusers with an HIV+ prevalence of 40%. What are the predictive values with this group? HIV Positive HIV Negative Test Positive (480) (0.90) = 432 (720) - (684) = 36 Test Negative (480) - (432) = 48 (720) (0.95) = 684 Total (1200) (0.40) = 480 (1200) - (480) = 720 PV(+) = 432/468 = 92.3% PV(-) = 684/732 = 93.4% As the prevalence of HIV(+) increased from the high-school students to the IV drug abusers, the PV(+) increased from 15.5% to 92.3% while the PV(-) decreased from 99.9% to 93.4%. This change had nothing to do with the test's sensitivity and/or specificity that did not change. These results demonstrate why it may not be beneficial to test low prevalence groups for a disease, no matter how well the screening test itself performs. As the prevalence increases, the predictive value positive increases and the predictive value negative decreases. Conversely, as the prevalence decreases, the predictive value negative increases and the predictive value positive decreases. Following is a diagram that may help you to remember how to calculate sensitivity, specificity and the predictive values. Predictive Value Positive Disease + - Sensitivity Test + TP FP - FN TN Specificity Predictive Value Negative 6

When a New Test is More Sensitive than the Old Test This is a situation raised by Sarah Lord et al. in the June 6, 2006 Annals of Internal Medicine, When is Measuring Sensitivity and Specificity Sufficient to Evaluate a Diagnostic Test, and When Do We Need Randomized Trials? The authors note that, If a new test is more sensitive than an old test but has similar specificity, its value is directly related to the treatment response in the extra cases detected. The issue is whether or not the additional cases diagnosed by the new test s increased sensitivity will lead to improved patient outcomes. Clinicians first need to ask whether the extra cases detected by a new diagnostic test represent the same spectrum or subtypes of disease as those included in treatment trials. If they do, then randomized trials may not be required. If they do not, or if clinicians cannot be certain that they do, then clinicians also need to ask whether treatment response is known to be similar across the range of disease spectrum or subtype. These considerations are important regardless of the magnitude of the difference in sensitivity between tests. The authors cite the following example. A new test may also produce a shift in the spectrum of disease detected. For example, breast magnetic resonance imaging in addition to mammography is more sensitive that mammography alone for the early detection of invasive breast cancer in young women at high risk. However, the extra cases detected may represent a different spectrum of disease that does not show improved treatment response. Therefore, the value of breast magnetic resonance imaging is uncertain without trials comparing patient outcomes after early versus standard detection. Clinical Scenario You re seeing a 45-year-old female who has a slightly enlarged liver. She says she just lost her job because her boss claimed she repeatedly reported late to work. The patient maintains that rarely happened and the real reason she lost her job was that her boss didn t like her. She says she drinks alcohol socially, which she says is one glass of wine each night and occasionally a little more on weekends. Upon questioning, she says her husband wants her to drink less but she quickly adds that he believes it s unhealthy to drink any alcohol at all. After reviewing her medications, you note that she takes a medication that can occasionally cause an enlarged liver. You conclude that there is a possibility that her slightly enlarged liver could be due to her medicine, but unfortunately there is no blood test that can confirm or rule out that possibility. You re also concerned that she might be an alcoholic in denial and you consider acute alcoholic hepatitis in your differential diagnoses. You recall that the lab has added a new blood test that is 98% sensitive and 70% specific for acute alcoholic liver disease. You order the test and it returns at 81 U with 80 U being the lab s upper limit of normal; hence the lab reports this as a positive test for acute alcoholic hepatitis. You re facing a diagnostic challenge. You note that the test result is barely positive so you re concerned about a false positive result, especially given the test s specificity. You think about 7

calculating a predictive value positive, but you have no idea how likely it is that a patient presenting like her would have acute alcoholic hepatitis. Without this a priori estimate, the prior probability, you re unable to calculate a predictive value positive. You know that sensitivity is a measure of how well a test correctly identifies patients with a disease and that specificity is a measure of how well a test correctly identifies patients as not having a disease. It would be unusual for a test to be 100% sensitive and 100% specific. This is because there is often an overlap of the distributions from patients with and without disease, shown below. Suppose the curve on the left represents the distribution of results in patients without acute alcoholic hepatitis while the distribution on the right is the distribution of results of patients with acute alcoholic hepatitis. Because there is an overlap involving the right tail from the first distribution and the left tail from the other distribution, a cutoff point must be selected to separate a normal test result from an abnormal result. Such a cutoff level is illustrated below by the straight vertical line at 80 U. 30 U 80 U 130 U A test result is either positive or negative. The results can reflect truth true positives or true negatives or they can reflect mistakes false positives and false negatives. Using sensitivity and specificity alone, one can t quantify how likely it is that a given result represents truth. In the diagram above, the cutoff level was set at 80 U so a result of 81 U would be reported as a positive test. 8

One might ask, however, how likely it is that a result of 81 U represents a true positive; said differently, how likely is it that a patient with 81 U has alcoholic liver disease? What about a result of 100 U? One might intuitively think that a patient with this latter result is more likely to have acute alcoholic hepatitis vs. a patient with a value of 81 U. In other words, the higher the U, the more likely it is that the patient truly has acute alcoholic hepatitis. You will note that sensitivity and specificity can t quantify the likelihood that a given result represents true disease. The test is either positive or negative. There is, however, a method to quantify the likelihood that a given test result represents true disease or not. This is the likelihood ratio. Sackett et al in their Clinical Epidemiology text note that it expresses the odds that a given level of a diagnostic test result would be expected in a patient with (as opposed to one without) the disease. Suppose one calculated the following in 100 patients known to have acute alcoholic hepatitis: U Acute Alcoholic Hepatitis 1-90 5/100 patients = 0.05 91-99 15/100 patients = 0.15 100 or higher 80/100 patients = 0.80 Note that the higher the U, the more likely it is that the patient has acute alcoholic hepatitis. Suppose one then calculated the following in 50 patients known not to have alcoholic liver disease: U No Acute Alcoholic Hepatitis 1-90 42/50 = 0.84 91-99 7/50 = 0.14 100 or higher 1/50 = 0.02 Note that the lower the U, the more likely it is that the patient does not have acute alcoholic hepatitis. Said differently, the lower the U, the less likely it is that the patient has acute alcoholic hepatitis. 9

Calculating and Interpreting the Likelihood Ratio Using the data from the previous charts, we would say that a value of more than 100 U has a likelihood ratio of 40. This number is calculated as follows: 0.80/0.02 = 40 We would therefore conclude that a result greater than 100 U is 40 times as likely to come from patients with acute alcoholic hepatitis as from patients without acute alcoholic hepatitis. Using sensitivity and specificity values alone, all we can say about a test result higher than 100 U is that the test is positive. Using the likelihood ratio, we can quantify the odds by saying it s 40 times more likely to represent disease vs. non-disease. (There are further uses of likelihood ratios that are beyond the goals of this lecture.) One might ask about the relationship between sensitivity, specificity and the likelihood ratio. The cut-off value chosen to report a positive vs. a negative test would determine the sensitivity and specificity of the test. For example, if a cutoff value of 100 were used to determine a positive from a negative test, then the sensitivity of the test would be 80% because 80 of 100 patients with the disease would be correctly identified as positive while 20% would be false negatives. (20/100 patients with the disease had a value less than 100.) As for specificity, with a cut-off value of 100 the test would be 98% specific as only 1/50 patients without the disease had a value of 100 or higher. 10