Welcome to this third module in a three-part series focused on epidemiologic measures of association and impact.

Similar documents
Welcome to this series focused on sources of bias in epidemiologic studies. In this first module, I will provide a general overview of bias.

Research Designs and Potential Interpretation of Data: Introduction to Statistics. Let s Take it Step by Step... Confused by Statistics?

Critical Appraisal of Evidence A Focus on Intervention/Treatment Studies

Critical Appraisal of Evidence

Understanding Statistics for Research Staff!

Case-control studies. Hans Wolff. Service d épidémiologie clinique Département de médecine communautaire. WHO- Postgraduate course 2007 CC studies

Measuring association in contingency tables

Measures of association: comparing disease frequencies. Outline. Repetition measures of disease occurrence. Gustaf Edgren, PhD Karolinska Institutet

Causal Association : Cause To Effect. Dr. Akhilesh Bhargava MD, DHA, PGDHRM Prof. Community Medicine & Director-SIHFW, Jaipur

Controlling Bias & Confounding

Chapter 19. Confidence Intervals for Proportions. Copyright 2010 Pearson Education, Inc.

Is There An Association?

Measures of Association

Confounding and Interaction

Gender Analysis. Week 3

In the 1700s patients in charity hospitals sometimes slept two or more to a bed, regardless of diagnosis.

Chapter 19. Confidence Intervals for Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Analysis of Variance (ANOVA) Program Transcript

UNIT 5 - Association Causation, Effect Modification and Validity

Data that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes:

Stratified Tables. Example: Effect of seat belt use on accident fatality

Incorporating Clinical Information into the Label

5.3: Associations in Categorical Variables

EVect of measurement error on epidemiological studies of environmental and occupational

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

Dylan Small Department of Statistics, Wharton School, University of Pennsylvania. Based on joint work with Paul Rosenbaum

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

Measurement of Association and Impact. Attributable Risk Attributable Risk in Exposed Population Attributable risk

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data

Measuring association in contingency tables

ADENIYI MOFOLUWAKE MPH APPLIED EPIDEMIOLOGY WEEK 5 CASE STUDY ASSIGNMENT APRIL

Glossary of Practical Epidemiology Concepts

Epidemiology: Overview of Key Concepts and Study Design. Polly Marchbanks

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups

Risk Ratio and Odds Ratio

Supplemental tables/figures

INTERNAL VALIDITY, BIAS AND CONFOUNDING

WHAT S THE ANSWER? MEASURES OF EFFECT

Strategies for Data Analysis: Cohort and Case-control Studies

Epidemiologic Measure of Association

Confounding and Bias

Measure of Association Examples of measure of association

A Brief Introduction to Bayesian Statistics

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

Module 2: Fundamentals of Epidemiology Issues of Interpretation. Slide 1: Introduction. Slide 2: Acknowledgements. Slide 3: Presentation Objectives

I. Identifying the question Define Research Hypothesis and Questions

OCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010

Logistic Regression Predicting the Chances of Coronary Heart Disease. Multivariate Solutions

ACT Aspire. Science Test Prep

2 Assumptions of simple linear regression

Do the sample size assumptions for a trial. addressing the following question: Among couples with unexplained infertility does

Lecture 8: Statistical Reasoning 1. Lecture 8. Confidence Intervals for Two-Population Comparison Measures

MS&E 226: Small Data

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

University of Wollongong. Research Online. Australian Health Services Research Institute

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

First author. Title: Journal

Sensitivity Analysis in Observational Research: Introducing the E-value

VALIDITY OF QUANTITATIVE RESEARCH

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Beyond the intention-to treat effect: Per-protocol effects in randomized trials

Bias. Sam Bracebridge

EPIDEMIOLOGY. Training module

BMI 541/699 Lecture 16

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Chapter 2 A Guide to Implementing Quantitative Bias Analysis

INTRODUCTION TO EPIDEMIOLOGICAL STUDY DESIGNS PHUNLERD PIYARAJ, MD., MHS., PHD.

CHAPTER ONE CORRELATION

Biostatistics 2 - Correlation and Risk

Probability. Esra Akdeniz. February 26, 2016

Correlation and regression

III. WHAT ANSWERS DO YOU EXPECT?

115 remained abstinent. 140 remained abstinent. Relapsed Remained abstinent Total

Reflection Questions for Math 58B

Epidemiology 101. Nutritional Epidemiology Methods and Interpretation Criteria

17/10/2012. Could a persistent cough be whooping cough? Epidemiology and Statistics Module Lecture 3. Sandra Eldridge

Epidemiology & Evidence Based Medicine. Patrick Linden & Matthew McCall

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results.

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera

GLOSSARY OF GENERAL TERMS

Cigarette Smoking and Lung Cancer

observational studies Descriptive studies

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

Smoking among shift workers: more than a confounding factor

EPIDEMIOLOGY-BIOSTATISTICS EXAM Midterm 2004 PRINT YOUR LEGAL NAME:

Validity and Quantitative Research. What is Validity? What is Validity Cont. RCS /16/04

Contingency Tables Summer 2017 Summer Institutes 187

USING STATCRUNCH TO CONSTRUCT CONFIDENCE INTERVALS and CALCULATE SAMPLE SIZE

W e have previously described the disease impact

Epidemiologic Methods and Counting Infections: The Basics of Surveillance

Observational Medical Studies. HRP 261 1/13/ am

STAT 113: PAIRED SAMPLES (MEAN OF DIFFERENCES)

Chapter 11. Experimental Design: One-Way Independent Samples Design

Common Statistical Issues in Biomedical Research

Supplementary Online Content

Module 28 - Estimating a Population Mean (1 of 3)

Chapter 8 Estimating with Confidence

Systematic Reviews and Meta-analysis

Module 4 Introduction

The t-test test and ANOVA

Transcription:

Welcome to this third module in a three-part series focused on epidemiologic measures of association and impact. 1

This three-part series focuses on the estimation of the association between exposures and outcomes. In the first two modules, we learned about epidemiologic measures that can be used to determine if there an association between exposure and disease. Furthermore, we learned about measures that can be used to summarize the strength of the association between exposure and disease outcome. In this module, we will learn about epidemiologic measures that can be used to quantify the potential for prevention. Specifically, we will learn about measures that can be used to quantify the amount of disease incidence that can be attributed to a particular exposure. Throughout this series, we will focus on the interpretation of study results, and in particular, measures of the association between exposure and disease outcome. 2

In this module, we will move beyond measures of association and will discuss measures for estimating the potential for prevention, which relates to measures of impact. 3

The four measures of impact that we will discuss are: Attributable risk Attributable risk % Population attributable risk Population attributable risk % 4

When estimating the attributable risk, or the impact of a particular exposure, we will utilize an attributable risk model in which we assume that the risk of disease in the unexposed group is a function of the risk not due to the particular exposure under investigation. This is the background incidence of disease. Then, among participants in the exposed group, we assume that the risk of disease is a function of the risk not due to the particular exposure under investigation (the background incidence) as well as the risk due to the particular exposure under investigation. Under this model, we can subtract the risk in the unexposed group from the risk in the exposed group to derive an estimate of the risk of the disease outcome that can be attributed to the exposure of interest. 5

Based on the model summarized on the previous slide, we define the attributable risk, or the risk difference, as the amount of disease risk, among exposed, that can be attributed to a specific exposure. The attributable risk is of importance for public health and clinical practice because it helps us to determine how much of the disease risk, among the exposed, can be prevented if we eliminate the exposure? The attributable risk is calculated as the risk in the exposed group minus the risk in the unexposed group where the risk in the unexposed group is assumed to be the background risk. 6

Let s now consider an example where we focus on the risk of lung cancer that can be attributed to smoking. In practice, we would like to know how much of the risk of lung cancer, among smokers, is due to smoking? Also, we would like to know how much of the risk of lung cancer, among smokers, can be prevented if they did not smoke? Attributable risk can be calculated to answer these questions. 7

Let s consider a data example where we followed 800 smokers and 1200 nonsmokers for the development of lung cancer over a 20 year period. We first calculate the risk of lung cancer among the smokers. Among the 800 smokers, 90 developed lung cancer. This results in a risk estimate of 0.113 or 113 per 1,000 people. 8

Next, we will calculate the risk of lung cancer among the non-smokers. Among the 1200 non-smokers, 10 developed lung cancer. This results in a risk estimate of 0.008 or 8 per 1,000 people. 9

The attributable risk is then calculated as the risk of lung cancer among the smokers minus the risk of lung cancer among the non-smokers. The attributable risk is 0.113 0.008 = 0.105. We would interpret this as 105 of the 113 incident cases of lung cancer among 1,000 smokers are attributable to smoking. 10

We can then calculate an attributable risk proportion or percentage to determine what proportion of risk in exposed persons is due to the exposure. The attributable risk proportion is of importance for public health and clinical practice because it helps us to determine what proportion of the disease, among exposed, can be prevented if we eliminate the exposure? The attributable risk proportion is calculated as the risk in the exposed group minus the risk in the unexposed group and then this difference is divided by the risk in the exposed group. The percentage is found by multiplying the proportion by 100%. 11

Let s now consider an example where we focus on the risk of lung cancer that can be attributed to smoking. In practice, we would like to know what proportion of the risk of lung cancer, among smokers, is due to smoking? Also, we would like to know what proportion of the risk of lung cancer, among smokers, can be prevented if they did not smoke? The attributable risk proportion can be calculated to answer these questions. 12

We will consider the same data example where we followed 800 smokers and 1200 non-smokers for the development of lung cancer over a 20 year period. We first calculate the risk of lung cancer among the smokers. Among the 800 smokers, 90 developed lung cancer. This results in a risk estimate of 0.113 or 113 per 1,000 people. 13

Next, we will calculate the risk of lung cancer among the non-smokers. Among the 1200 non-smokers, 10 developed lung cancer. This results in a risk estimate of 0.008 or 8 per 1,000 people. 14

The attributable risk proportion is then calculated as the risk of lung cancer among the smokers minus the risk of lung cancer among the non-smokers, where this difference is divided by the risk among the smokers. The attributable risk proportion is (0.113 0.008)/0.113 = 0.105/0.113 = 0.93. We would interpret this as 93% of the risk of lung cancer is attributable to smoking. 15

Now, let s return to our listing of measures of impact. We have discussed two measures of impact, namely, the attributable risk and the attributable risk percentage. Both of these measures provide a measure of the impact of the exposure among the exposed. In practice, we may want to derive measures of the impact that are relevant for the entire population, noting that the population is a mixture of exposed and unexposed. We will now discuss the population attributable risk and the population attributable risk percentage; these measures of impact are relevant for the entire population. 16

We define the population attributable risk, as the amount of disease risk, in the population, that can be attributed to a specific exposure. The population attributable risk is of importance for public health and clinical practice because it helps us to determine how much of the disease risk, in the population, can be prevented if we eliminate the exposure? The population attributable risk is calculated as the risk in the population minus the risk in the unexposed group where the risk in the unexposed group is assumed to be the background risk. 17

Let s now continue our discussion of the example where we focus on the risk of lung cancer that can be attributed to smoking. In practice, we would like to know how much of the total risk of lung cancer, in the population, is attributable to smoking? Also, we would like to know how much of the total risk of lung cancer, in the population, can be prevented if we eliminated smoking? Population attributable risk can be calculated to answer these questions. 18

Let s consider the same data example where we followed 800 smokers and 1200 non-smokers for the development of lung cancer over a 20 year period. We first calculate the risk of lung cancer in the total population. Among the 2000 participants, 100 developed lung cancer. This results in a total risk estimate of 0.050 or 50 per 1,000 people. 19

Next, we will calculate the risk of lung cancer among the non-smokers. Among the 1200 non-smokers, 10 developed lung cancer. This results in a risk estimate of 0.008 or 8 per 1,000 people. 20

The population attributable risk is then calculated as the risk of lung cancer in the total population minus the risk of lung cancer among the non-smokers. The population attributable risk is 0.050 0.008 = 0.042. Based on these results, we can state that if smoking were eliminated, the risk of lung cancer in the population would be reduced by 42 cases per 1,000 population. 21

We can then calculate a population attributable risk proportion or percentage to determine what proportion of risk in population is due to the exposure. The population attributable risk proportion is of importance for public health and clinical practice because it helps us to determine what proportion of the disease, in the total population, can be prevented if we eliminate the exposure? The population attributable risk proportion is calculated as the risk in the total population minus the risk in the unexposed group and then this difference is divided by the risk in the total population. The percentage is found by multiplying the proportion by 100%. 22

Let s now continue the discussion of our data example where we focus on the risk of lung cancer that can be attributed to smoking. In practice, we would like to know what proportion of the total risk of lung cancer, in the population, is due to smoking? Also, we would like to know what proportion of the total risk of lung cancer, in the population, can be prevented if they did not smoke? The population attributable risk proportion can be calculated to answer these questions. 23

We first calculate the risk of lung cancer in the total population. Among the 2000 participants, 100 developed lung cancer. This results in a total risk estimate of 0.050 or 50 per 1,000 people. Next, we will calculate the risk of lung cancer among the non-smokers. Among the 1200 non-smokers, 10 developed lung cancer. This results in a risk estimate of 0.008 or 8 per 1,000 people. The population attributable risk proportion is then calculated as the risk of lung cancer in the total population minus the risk of lung cancer among the nonsmokers and then dividing this result by the risk of lung cancer in the population. The population attributable risk proportion is (0.050 0.008)/0.050 = 0.042/0.050 = 0.84. We would interpret this result as 84% of lung cancer incidence in total pop is due to smoking. 24

Note that the attributable risk model, upon which our calculations are based, includes several key assumptions. First, the model assumes that the relation between the exposure and outcome is causal (the exposure causes the disease outcome). Next, we assume that all other variables, or patient characteristics and factors, are equally distributed, or balanced, between the exposed and unexposed groups. Therefore, under these assumptions, we can focus on the causal impact of the exposure of interest alone and avoid the confounding impact of other related factors. These assumptions may not hold in practice, and a more complicated attributable risk model would need to be used. 25

Now, let s relate a few of the concepts that we have learned in this module. Let s discuss the relation between relative risk and population attributable risk. 26

We will consider two different diseases, lung cancer and cardiovascular disease (CVD). In this table, we have the mortality rates per 1,000 person years for each disease type according to smoking status. 27

We first note that among smokers, the risk of CVD death is higher than the risk of death from lung cancer. 28

We can use the incidence rates on the previous slide to calculate the relative risk of lung cancer and CVD associated with smoking. We can also calculate the attributable risk percentage for the risk of death due to each cause that is attributable to smoking. Based on these summaries, which cause of death has a stronger association with smoking? 29

Lung cancer-related deaths have a stronger association with smoking. The risk of lung cancer-related death for smokers is 18.5 times the risk for nonsmokers. Furthermore, 95% of the risk of lung cancer-related death among smokers can be attributed to smoking. These values are much higher than the values of 1.3 and 23% seen when we focus on CVD-related death. 30

Now, let s consider the impact of smoking at a population level for deaths due to lung cancer and deaths due to CVD. When we calculate the population attributable risk associated with smoking we see that the PAR value for CVD is much higher than the PAR for lung cancerrelated deaths. Why is this the case? 31

Recall that the population attributable risk is calculated as the risk in the population minus the risk in the unexposed. If the risk in the total population is high, the population attributable risk will be high, assuming a fixed risk in the unexposed. 32

In our example, the mortality rates for CVD are much higher than the mortality rates for lung cancer. Meaning, more people die from CVD than from lung cancer in the population. As a result, the population attributable risk will be higher for CVD than for lung cancer. In summary, although smoking has a stronger association with lung cancerrelated deaths, at a population level, because death due to CVD is more common, the PAR associated with smoking is higher for CVD than for lung cancer-related deaths. 33

Up to this point, we have focused on point estimates for the measures of association. In practice, we will couple the point estimates, say, the estimate for the relative risk, with a confidence interval. Recall from a previous module that the confidence interval is useful for quantifying the degree of uncertainty and variability in our estimate. 34

The confidence interval provides a range of reasonable estimates for the true parameter value, for example, the relative risk or the odds ratio. The confidence interval width reflects the precision of our estimation. A wide interval corresponds to poor precision while a narrow interval corresponds to better precision. The level of confidence reflects our certainty in the estimates upon repeated sampling from the population. The confidence interval provides an indication of how much the estimates would vary if we drew a sample from the population and calculated the parameter estimate multiple times with each repeated sample. 35

The level of confidence reflects the probability, across repeated samples from the population, that our calculated interval includes the true population parameter value. If we sample the same population in the same way 100 times, for 95 of the times, the true value of the parameter of interest (OR or RR or IRR or PPR) would be included in our 95% CI. The figure represents this repeated sampling process. This means that across repeated samples, 5% of the time, our interval will not include the true parameter value. A given confidence interval either will or will not include the true parameter value; we don t know for certain if our interval includes the true value or not, we only know that across repeated samples, 95% of the time, the interval will include the true value. 36

Let s consider an example. Let s say that we calculated the odds of smoking for patients with lung cancer and compared that to the odds of smoking among participants without lung cancer. If the resulting odds ratio is 2.5, we would state that the odds of smoking for participants with lung cancer are 2.5 times the odds of smoking for participants without lung cancer. We will couple this point estimate with an estimated confidence interval. The confidence interval ranges from 1.4 to 4.6. We interpret this interval as, 95 times out of 100, the true OR will lie between the calculated interval (1.4 and 4.6 in this example). We are 95% confident that the true population OR lies between 1.4 and 4.6. We note that this interval does not include the null hypothesis value of 1, which would indicate that the odds of smoking are equal between those with lung cancer and those without lung cancer. Therefore, we can state, that because the confidence interval lies entirely above 1, that the odds of smoking as significantly higher for participants with lung cancer compared to those without lung cancer based on a 2-sided 0.05 alpha level (corresponding to the 95% level of confidence). 37

37

Recall from an earlier module that we can use the estimated confidence interval to conduct a hypothesis test. When considering differences in parameter values, say the difference in the mean total cholesterol between treated and control patients, we compared the confidence interval to a null hypothesis value of 0 (meaning, a value of no difference). When analyzing ratios, such as the risk ratio or the odds ratio, the null hypothesis value is 1 instead of 0. A ratio of 1 indicates that the numerator and denominator are equal. For example, the risk of lung cancer among smokers (numerator) is equal to the risk of lung cancer among non-smokers (denominator). To test the null hypothesis, of equal risk or equal odds between the groups, we can compare the confidence interval to a null hypothesis value of 1. If the 95% confidence interval includes the value of 1 then we know that the p-value will be greater than 0.05. On the other hand, if the interval does not include 1, then we know that the p-value is less than or equal to 0.05 and the results are statistically significant. 38

Consider the confidence intervals in each of these three examples. Let s assume that these intervals are for the odds of smoking comparing cases with CVD to controls without CVD. In the first example, the 95% confidence interval for the OR is wide and includes the null hypothesis value of 1. We would conclude that there is no significant association between the odds of smoking and CVD. In the second example, the 95% confidence interval for the OR lies entirely above the null hypothesis value of 1. We would conclude that there is a significantly increased odds of smoking for patients with CVD compared to those without CVD. In the third example, the 95% confidence interval for the OR lies entirely below the null hypothesis value of 1. We would conclude that there is a significantly lower odds of smoking for patients with CVD compared to those without CVD. 39

We noted that the width of the confidence interval conveys information regarding the precision of our estimation. The width of the interval is a function of the degree of uncertainty in our estimate (i.e., the level of confidence) and also the standard error of our estimate, which is a function of the sample size and variance. A larger sample size results in a narrower interval and as a result, our estimate is more precise. A smaller sample size results in a wider interval and as a result, our estimate is less precise. 40

Let s consider an example, which confidence interval would arise from a larger sample size when considering sampling from a study setting and population with a fixed OR value? Answer: the interval 1.4 to 1.9 is narrower and therefore, we expect that this interval would arise from a larger sample size. As the sample size increases, the width of the confidence interval decreases. 41

Let s now consider a few review questions. Choose the measure that would best address the following question. How many lung cancer cases could be prevented among smokers if smoking were eliminated? The key phrases are how many and among smokers. Therefore, we recognize that the attributable risk would answer this question. 42

Now, choose the measure that would best address the following question. What proportion of lung cancer risk in the total population is attributable to smoking? The key phrases are proportion and total population. Therefore, we recognize that the population attributable risk proportion would answer this question. 43

In summary, we have discussed four new measures of the impact of a given exposure on health outcomes among the exposed and among the total population. We recognize the importance of coupling our point estimates with confidence intervals when presenting estimates. The confidence interval conveys information regarding the variability and uncertainty in our estimates. Wider intervals correspond to less precise estimates. As the sample size increases, the width of the confidence interval will decrease. Finally, we can use confidence intervals to implement hypothesis tests. For tests of differences, we compare the interval to 0, for tests of ratios, we compare the interval to a null hypothesis value of 1. This concludes the series focused on epidemiologic measures of association and impact. 44