Contingency Tables Summer 2017 Summer Institutes 187

Similar documents
Stratified Tables. Example: Effect of seat belt use on accident fatality

Biostatistics 513 Spring Homework 1 Key solution. See the Appendix below for Stata code and output related to this assignment.

Reflection Questions for Math 58B

Binary Diagnostic Tests Paired Samples

Measures of Association

Risk Ratio and Odds Ratio

Common Statistical Issues in Biomedical Research

Comparing Proportions between Two Independent Populations. John McGready Johns Hopkins University

Measuring association in contingency tables

Measuring association in contingency tables

Strategies for Data Analysis: Cohort and Case-control Studies

Binary Diagnostic Tests Two Independent Samples

Lecture 8: Statistical Reasoning 1. Lecture 8. Confidence Intervals for Two-Population Comparison Measures

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

Two-sample Categorical data: Measuring association

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

a) Is it reasonable to compute the relative risk for endometrial cancer? Explain. b) Can we estimate relative risk for heart attacks? Explain.

Lesson: A Ten Minute Course in Epidemiology

Understandable Statistics

- Decide on an estimator for the parameter. - Calculate distribution of estimator; usually involves unknown parameter

115 remained abstinent. 140 remained abstinent. Relapsed Remained abstinent Total

BMI 541/699 Lecture 16

Epidemiologic Study Designs. (RCTs)

Overview. Goals of Interpretation. Methodology. Reasons to Read and Evaluate

appstats26.notebook April 17, 2015

Simple Sensitivity Analyses for Matched Samples Thomas E. Love, Ph.D. ASA Course Atlanta Georgia

Modeling Binary outcome

Controlling Bias & Confounding

INTERNAL VALIDITY, BIAS AND CONFOUNDING

Analyzing Discrete Data

Confounding, Effect modification, and Stratification

Lecture 4. Confounding

Welcome to this third module in a three-part series focused on epidemiologic measures of association and impact.

Section F. Measures of Association: Risk Difference, Relative Risk, and the Odds Ratio

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

Do the sample size assumptions for a trial. addressing the following question: Among couples with unexplained infertility does

Sections 10.7 and 10.9

GLOSSARY OF GENERAL TERMS

EPIDEMIOLOGY. Training module

Confounding and Interaction

Measure of Association Examples of measure of association

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

Case-control studies. Hans Wolff. Service d épidémiologie clinique Département de médecine communautaire. WHO- Postgraduate course 2007 CC studies

INVESTIGATION SLEEPLESS DRIVERS

Gene-Environment Interactions

Basic Biostatistics. Chapter 1. Content

Lessons in biostatistics

CHL 5225 H Advanced Statistical Methods for Clinical Trials. CHL 5225 H The Language of Clinical Trials

Learning Objectives 9/9/2013. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

9/4/2013. Decision Errors. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency

Today Retrospective analysis of binomial response across two levels of a single factor.

ANOVA in SPSS (Practical)

Unit 1 Exploring and Understanding Data

NORTH SOUTH UNIVERSITY TUTORIAL 2

Exam 4 Review Exercises

Using Statistical Principles to Implement FDA Guidance on Cardiovascular Risk Assessment for Diabetes Drugs

Final Exam - section 2. Thursday, December hours, 30 minutes

Lecture 15 Chapters 12&13 Relationships between Two Categorical Variables

Dylan Small Department of Statistics, Wharton School, University of Pennsylvania. Based on joint work with Paul Rosenbaum

Gender Analysis. Week 3

How do we know that smoking causes lung cancer?

Chapter 13 Estimating the Modified Odds Ratio

What is a case control study? Tarani Chandola Social Statistics University of Manchester

Midterm Exam MMI 409 Spring 2009 Gordon Bleil

Proportions, risk ratios and odds ratios

In the 1700s patients in charity hospitals sometimes slept two or more to a bed, regardless of diagnosis.

Business Research Methods. Introduction to Data Analysis

sickness, disease, [toxicity] Hard to quantify

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

Statistics Assignment 11 - Solutions

Modelling Reduction of Coronary Heart Disease Risk among people with Diabetes

S160 #15. Comparing Two Proportions, Part 3 Odds Ratio. JC Wang. March 15, 2016

Inbreeding and Inbreeding Depression

Incorporating Clinical Information into the Label

UNIVERSITY OF THE FREE STATE DEPARTMENT OF COMPUTER SCIENCE AND INFORMATICS CSIS6813 MODULE TEST 2

breast cancer; relative risk; risk factor; standard deviation; strength of association

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

Cohort studies. Training Course in Sexual and Reproductive Health Research Geneva Nguyen Thi My Huong, MD PhD WHO/RHR/SIS

Flexible Matching in Case-Control Studies of Gene-Environment Interactions

PPH Domain: Learning Objectives Synopsis: probability frequency Prevalence Incidence Relative risk - Question 1

Is There An Association?

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI /

Introduction & Basics

Using Direct Standardization SAS Macro for a Valid Comparison in Observational Studies

STAT362 Homework Assignment 5

Confounding Bias: Stratification

Epidemiologic study designs

Chi-square test. Wenli Lu Dept. Health statistics School of public health Tianjin medical university. Chi-square

Statistical Significance, Effect Size, and Practical Significance Eva Lawrence Guilford College October, 2017

Confounding and Bias

Overview of Non-Parametric Statistics

MEASURES OF ASSOCIATION AND REGRESSION

Constructing a Bivariate Table:

Outline. Case control studies. Study Designs. Case Control Study. Start with OUTCOME Go backwards Check for EXPOSURE. Experimental studies

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004

Name: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.

Transcription:

Contingency Tables 87

Overview ) Types of Variables ) Comparing () Categorical Variables Contingency (two-way) tables Tests 3) x Tables Sampling designs Testing for association Estimation of effects Paired binary data 4) Stratified Tables Confounding Effect Modification 88

Factors and Contingency Tables Definition: A factor is a categorical (discrete) variable taking a small number of values that represent the levels of the factor. Examples Gender with two levels: = Male and = Female Disease status with three levels: = Progression, = Stable, 3 = Improved AgeFactor with 4 levels: = 0-9 yrs, = 30-39, 3 = 40-49, 4 = 50-59 89

Factors and Contingency Tables Data description: Form one-way, two-way or multiway tables of frequencies of factor levels and their combinations To assess whether two factors are related, we often construct an R x C table that cross-classifies the observations according to the factors. Examining two-way tables of Factor A vs Factor B at each level of a third Factor C shows how the A/B association may be explained or modified by C (later). Data Summary: Categorical data are often summarized by reporting the proportion or percent in each category. Alternatively, one sometimes sees a summary of the relative proportion (odds) in each category (relative to a baseline category). Testing: We can test whether the factors are related using a test. 90

Categorical Data Example: From Doll and Hill (95) - retrospective assessment of smoking frequency. The table displays the daily average number of cigarettes for lung cancer patients and control patients. Note there are equal numbers of cancer patients and controls. Daily # cigarettes None < 5 5-4 5-4 5-49 50+ Total Cancer 7 55 489 475 93 38 357 0.5% 4.% 36.0% 35.0%.6%.8% Control 6 9 570 43 54 357 4.5% 9.5% 4.0% 3.8%.3% 0.9% Total 68 84 059 906 447 50 74 9

Test We want to test whether the smoking frequency is the same for each of the populations sampled. We want to test whether the groups are homogeneous with respect to a characteristic. H 0 : smoking probability same in both groups H A : smoking probability not the same Q: What does H 0 predict we would observe if all we knew were the marginal totals? Daily # cigarettes None < 5 5-4 5-4 5-49 50+ Total Cancer 357 Control 357 Total 68 84 059 906 447 50 74 9

Test A: H 0 predicts the following expectations: Daily # cigarettes None < 5 5-4 5-4 5-49 50+ Total Cancer 34 9 59.5 453 3.5 5 357 Control 34 9 59.5 453 3.5 5 357 Total 68 84 059 906 447 50 74 Each group has the same proportion in each cell as the overall marginal proportion. The equal expected number for each group is the result of the equal sample size in each group (what would change if there were half as many cases as controls?) 93

Test Summing the differences between the observed and expected counts provides an overall assessment of H 0. X i, j O ij E E ij ij ~ ( r ) ( c ) X is known as the Pearson s Chi-square Statistic. Large values of X suggests the data are not consistent with H 0 Small values of X suggests the data are consistent with H 0 94

Test In example 3 the contributions to the X statistic are: Daily # cigarettes None < 5 5-4 5-4 5-49 50+ Total Cancer 7 34 55 9 etc. Control Total 34 6 34 34 9 Daily # cigarettes None < 5 5-4 5-4 5-49 50+ Total Cancer.44 4.88 3.0.07.6 6.76 Control.44 4.88 3.0.07.6 6.76 Total X i, j O E p = P(X > (5) H 0 true) < 0.000 Conclusion? ij E ij ij 37.7 95

Factor Levels C Total O O O C N Group O N 3 O 3 N 3 R O R O RC N R Total M M M C T. Compute the expected cell counts under homogeneity assumption: E ij = N i M j /T. Compute the chi-square statistic: X Test 3. Compare X to (df) where i, j df = (R-) x (C-) 4. Interpret acceptance/rejection or p-value. O ij E E ij ij 96

x Tables Example : Pauling (97) Patients are randomized to either receive Vitamin C or placebo. Patients are followedup to ascertain the development of a cold. Cold - Y Cold - N Total Vitamin C 7 39 Placebo 3 09 40 Total 48 3 79 Q: Is treatment with Vitamin C associated with a reduced probability of getting a cold? Q: If Vitamin C is associated with reducing colds, then what is the magnitude of the effect? 97

x Tables Example : Keller (AJPH, 965) Patients with (cases) and without (controls) oral cancer were surveyed regarding their smoking frequency (this table collapses over the smoking frequency categories). Case Control Total Smoker 484 385 869 Non- Smoker 7 90 7 Total 5 475 986 Q: Is oral cancer associated with smoking? Q: If smoking is associated with oral cancer, then what is the magnitude of the risk? 98

x Tables Example 3: Sex-linked traits Suppose we collect a random sample of Drosophila and cross classify eye color and sex. male female Total red 65 300 465 white 76 8 57 Total 34 38 7 Q: Is eye color associated with sex? Q: If eye color is associated with sex, then what is the magnitude of the effect? 99

x Tables Example 4: Matched case control study 3 subjects with a history of acute myocardial infarction (AMI) were matched by age and sex with one of their siblings who did not have a history of AMI. The prevalence of a particular polymorphism was compared between the siblings AMI carrier noncarrier Total carrier No AMI noncarrier 73 3 4 03 87 6 Total 96 7 3 Q: Is there an association between the polymorphism and AMI? Q: If there is an association then what is the magnitude of the effect? 00

Exposure Status x Tables Each of these tables (except for example 4) can be represented as follows: Disease Status D not D Total E a b (a + b) = n not E c d (c + d) = n Total (a + c) = m (b + d) = m N The question of association can be addressed with Pearson s X (except for example 4) We compute the expected cell counts as follows: Expected: D not D Total E n m /N n m /N (a + b) = n not E n m /N n m /N (c + d) = n Total (a + c) = m (b + d) = m N 0

Pearson s chi-square is given by: x Tables 4 / / / / / m m n n bc ad N N m n N m n d N m n N m n c N n m N n m b N n m N n m a E E O X i i i i 0

x Tables Example : Pauling (97) Cold - Y Cold - N Total Vitamin C 7 (%) (88%) 39 Placebo 3 (%) H 0 : probability of disease does not depend on treatment H A : probability of disease does depend on treatment X N ad bc n n m m 79 709 3 3940 48 3 4.8 For the p-value we compute P( () > 4.8) = 0.08. Therefore, we reject the homogeneity of disease probability in the two treatment groups. 09 (78%) 40 Total 48 3 79 03

x Tables Applications In Epidemiology Example fixed the number of E and not E, then evaluated the disease status after a fixed period of time (same for everyone). This is a prospective study. Given this design we can estimate the relative risk: RR P P The range of RR is [0, ). By taking the logarithm, we have (-, +) as the range for ln(rr) and a better approximation to normality for the estimated ln RˆR : ln D D ln c/ n E E ˆ ˆ P D E p ln RR ln ln Pˆ D E p a/ n RR ˆ ~ approx N lnp / p, pn pn p p 04

The estimated relative risk is: RR ˆ Relative Risk Cold - Y Cold - N Total Vitamin C 7 39 Placebo 3 09 40 Total 48 3 79 Pˆ Pˆ D E D E 7 /39 3/40 We can obtain a 95% confidence interval for the relative risk by first obtaining a confidence interval for the log-rr: ˆ p p ln RR.96 p n p n 0.55 and exponentiating the endpoints of the CI. 05

Note that disease status and exposure status are transposed here compared to previous tables.. csi 7 3 09 Exposed Unexposed Total -----------------+------------------------+---------- Cases 7 3 48 Noncases 09 3 -----------------+------------------------+---------- Total 39 40 79 Risk.30.486.7043 Point estimate [95% Conf. Interval] ------------------------+---------------------- Risk difference -.09964 -.86859 -.03937 Risk ratio.55333.30978.950603 Prev. frac. ex..4476677.0493797.67908 Prev. frac. pop.3036 +----------------------------------------------- chi() = 4.8 Pr>chi = 0.083 06

x Tables Example : Keller (AJPH, 965) Patients with (cases) and without (controls) oral cancer were surveyed regarding their smoking frequency (this table collapses over the smoking frequency categories). Case Control Total Smoker 484 385 869 Non- Smoker 7 90 7 Total 5 475 986 Q: Is oral cancer associated with smoking? Q: If smoking is associated with oral cancer, then what is the magnitude of the risk? 07

x Tables Applications In Epidemiology In Example we fixed the number of cases and controls then ascertained exposure status. Such a design is known as case- control study. Based on this we are able to directly estimate: P( E D) and P( E D) However, we generally are interested in the relative risk of disease given exposure, which is not estimable from these data alone - we ve fixed the number of diseased and diseased free subjects, and it can be shown that in general: P(D E) P(E D) P D E P D E P E D P E D 08

Odds Ratio Instead of the relative risk we can estimate the exposure odds ratio which (surprisingly) is equivalent to the disease odds ratio: P P E D/ PE D E D/ PE D In other words, the odds ratio can be estimated regardless of the sampling scheme. Furthermore, for rare diseases, P(D E) 0 so that the disease odds ratio approximates the relative risk: P P P P D E/ PD E D E/ PD E D E/ PD E D E/ PD E Since with case-control data we are able to effectively estimate the exposure odds ratio we are then able to equivalently estimate the disease odds ratio which for rare diseases approximates the relative risk. P P D D For rare diseases (e.g., prevalence <5%), the (sample) odds ratio estimates the (population) relative risk. E E 09

Odds Ratio 6 Odds Ratio Relative Risk 4 0...3.4 Disease prevalence 0

Odds Ratio Like the relative risk, the odds ratio has [0, ) as its range. The log odds ratio has (-, +) as its range and the normal approximation is better as an approximation to the dist of the estimated log odds ratio. p/ p OR p / p OR ˆ OR ˆ pˆ pˆ ad bc / / Confidence intervals are based upon: ln ˆ ~N ln(or), OR n n ( ) n n ( ) p p p p Therefore, a 95% confidence interval for the log odds ratio is given by: pˆ pˆ ad ln.96 bc a b c d

Odds Ratio. cci 484 7 385 90 Proportion Exposed Unexposed Total Exposed -----------------+------------------------+---------------------- Cases 484 7 5 0.947 Controls 385 90 475 0.805 -----------------+------------------------+---------------------- Total 869 7 986 0.883 Point estimate [95% Conf. Interval] ------------------------+---------------------- Odds ratio 4.90476.633584 6.8369 (exact) Attr. frac. ex..763636.60893.853705 (exact) Attr. frac. pop.735 +----------------------------------------------- chi() = 43.95 Pr>chi = 0.0000

Interpreting Odds ratios. What is the outcome of interest? (i.e. disease). What are the two groups being contrasted? (i.e. exposed and unexposed) OR odds of OUTCOME in EXPOSED odds of OUTCOME in UNEXPOSED Similar to RR for rare diseases Meaningful for both cohort and case-control studies OR > increased risk of OUTCOME with EXPOSURE OR < decreased risk of OUTCOME with EXPOSURE 3

x Tables Example 3: Sex-linked traits Suppose we collect a random sample of Drosophila and cross classify eye color and sex. male female Total red 65 300 465 white 76 8 57 Total 34 38 7 Q: Is eye color associated with sex? Q: If eye color is associated with sex, then what is the magnitude of the effect? 4

x Tables Applications in Epidemiology Example 3 is an example of a cross-sectional study since only the total for the entire table is fixed in advance. The row totals or column totals are not fixed in advance. male female Total red 65 300 465 (48%) (79%) white 76 8 57 Total 34 38 7 Cross-sectional studies Sample from the entire population, not by disease status or exposure status Use chi-square test to test for association Use RR or OR to summarize association Cases of disease are prevalent cases (compared to incident cases in a prospective or cohort study) 5

x Tables Applications in Epidemiology Case = red eye color Noncase = white eye color male female -----------------+------------------------+------------ Cases 65 300 465 Noncases 76 8 57 -----------------+------------------------+------------ Total 34 38 7 Risk.48387.787406.6440443 Point estimate [95% Conf. Interval] ------------------------+------------------------ Risk difference -.3035306 -.37067 -.364395 Risk ratio.6456.54463.6938375 Prev. frac. ex..3854839.30665.455737 Prev. frac. pop.80637 Odds ratio.535.83063.350044 +------------------------------------------------- chi() = 7.3 Pr>chi = 0.0000 6

x Tables Example 4: Matched case control study 3 subjects with a history of acute myocardial infarction (AMI) were matched by age and sex with one of their siblings who did not have a history of AMI. The prevalence of a particular polymorphism was compared between the siblings AMI carrier noncarrier Total carrier No AMI noncarrier 73 3 4 03 87 6 Total 96 7 3 Q: Is there an association between the polymorphism and AMI? Q: If there is an association then what is the magnitude of the effect? 7

Paired Binary Data Example 4 measures a binary response in sibs. This is an example of paired binary data. One way to display these data is the following: Carrier Noncarrier Total AMI 96 7 3 No AMI 87 6 3 Total 83 43 46 Q: Can t we simply use X Test of Homogeneity to assess whether this is evidence for an increase in knowledge? A: NO!!! The X tests assume that the rows are independent samples. In this design the 3 with AMI are genetically related to the 3 w/o AMI. 8

Paired Binary Data For paired binary data we display the results as follows: AMI 0 No AMI n n 0 0 n 0 n 00 This analysis explicitly recognizes the heterogeneity of subjects. Thus, those that score (0,0) and (,) provide no information about the association between AMI and the polymorphism. These are known as the concordant pairs. The information regarding the association is in the discordant pairs, (0,) and (,0). p = P(carrier AMI) p 0 = P(carrier No AMI) H 0 : p = p 0 H A : p p 0 n n pˆ 0 N n n N n n N 0 0 0 0 pˆ 9

Paired Binary Data McNemar s Test Under the null hypothesis, H 0 : p = p 0, we expect equal numbers of 0 s and 0 s. (E[n 0 ] = E[n 0 ]). Specifically, under the null: M n 0 Z M n 0 n ~ 0 M n Under H 0, Z ~ (), and forms the basis for McNemar s Test for Paired Binary Responses. The odds ratio comparing the odds of carrier in those with AMI to odds of carrier in those w/o AMI is estimated by: ˆ n0 OR n Confidence intervals can be obtained as described in Breslow and Day (98), section 5., or in Armitage and Berry (987), chapter 6. 0 Bin M M 0, 0

Example 4: AMI carrier noncarrier Total carrier No AMI noncarrier 73 3 4 03 87 6 Total 96 7 3 We can test H 0 : p = p using McNemar s Test: Z 0 Comparing.48 to a () we find that p > 0.05. Therefore, we do not reject the null hypothesis and find little evidence of association between gene and disease. We estimate the odds ratio as n M 3 3 4 /.48 M 3 4 / 4 OR ˆ 3/4.64.

Matched case-control data. mcci 73 3 4 03 Controls Cases Exposed Unexposed Total -----------------+------------------------+------------ Exposed 73 3 96 Unexposed 4 03 7 -----------------+------------------------+------------ Total 87 6 3 McNemar's chi() =.9 Prob > chi = 0.390 Exact McNemar significance probability = 0.877 Proportion with factor Cases.450704 Controls.4084507 [95% Conf. Interval] --------- -------------------- difference.04535 -.0847.0638 ratio.03448.968494.5707 rel. diff..07486 -.097486.66057 odds ratio.64857.80776 3.45833 (exact)

Two way tables - Review How were data collected? Cohort design Case-control design Cross-sectional design Matched pairs Is there an association? R x C Tables Chi-square tests of Homogeneity & Independence x Tables Chi-square test Paired data and McNemar's What is the magnitude of the association? Relative risk Odds ratio ( relative risk for rare diseases) Risk difference (attributable risk) 3

SUMMARY Measures of Association for x Tables RD = p - p = risk difference (null: RD = 0) also known as attributable risk or excess risk measures absolute effect the proportion of cases among the exposed that can be attributed to exposure RR = p / p = relative risk (null: RR = ) measures relative effect of exposure bounded above by /p OR = [p (-p )]/[ p (-p )] = odds ratio (null: OR = ) range is 0 to approximates RR for rare events invariant of switching rows and cols good behavior of p-values and CI even for small to moderate sample size 4

SUMMARY Models for x Tables. Cohort ( Prospective, Followup ) Sample n exposed and n unexposed Follow everyone for equal period of time Observe incident disease r cases among exposed, r cases among unexposed Model: Two independent binomials r ~ binom(p,n ) r ~ binom(p,n ) p = P(D E) p = P(D E) Useful measures of association RR,OR,RD Examples: r i = number of cases of HIV during year followup of n i individuals in arm i of HIV prevention trial r i = number of low birthweight babies among n i live births 5

SUMMARY Models for x Tables. Case-Control Sample n cases and n controls Observe exposure history r exposed among cases, r exposed among controls Model: Two independent binomials r ~ binom(q,n ) r ~ binom(q,n ) q = P(E D) q = P(E D) Useful measures of association OR Examples: r i = consistent condom use (yes/no) among those with/without HPV infection r i = number exposed to alcohol during pregnancy among n i low birthweight/normal birthweight babies 6

SUMMARY Models for x Tables 3. Cross-sectional Sample n individuals from population Observe both exposure and (prevalent) disease status. No longitudinal followup Useful measures of association RR,OR,RD Example: n ij = number of gay men with gonorrhea in random sample of STD clinic attendees 7