Confounding, Effect modification, and Stratification Tunisia, 30th October 2014 Acknowledgment: Kostas Danis Takis Panagiotopoulos National Schoool of Public Health, Athens, Greece takis.panagiotopoulos@gmail.com
Main points The concept of confounding and of effect modification How to check if confounding and if effect modification is present: stratification How to show results if confounding or if effect modification is present How to prevent confounding
Galaxy, Vol 11, Issue 2, Feb 1871 Mark Twain... Source: http://ebooks.library.cornell.edu/g/gala/gala.1871.html...
Exposure Lying in bed The real player Disease (a third variable ), associated with the outcome, and also with the exposure (independently of the outcome) Outcome Q. 1 Is there a third variable risk factor for the Outcome which may be hidden behind the Exposure? Confounding The lying-in-bed association: a fallacy unmask the real player
Exposure Driving fast The co-player Outcome Q. 2 I s there a third variable risk factor for the Outcome which may be acting in combination with the Exposure? Effect modification The driving-fast association: a half-truth disentangle effects of co-players
Exposure Outcome Third variable Confounding: To eliminate (bias) Effect modification: To analyse (useful info)
Cohort studies marching towards outcomes
Cohort study Total Cases Non cases Risk % Exposed 100 50 50 50 % Not exposed 100 10 90 10 % Risk ratio: 50% / 10% = 5
Case-control studies
Source population Exposed Unexposed Sample Cases Controls: Sample of the source population Representative with regard to exposure Controls
Case-control study Exposed Not exposed Cases a c Controls b d Total a+c % exposed a/(a+c) % unexposed c/(a+c) Odds of exposure a/c b+d b/(b+d) d/(b+d) b/d Odds Ratio: = (a/c) / (b/d) = ad / bc
Cross-sectional (prevalence) studies
Cross-sectional study: Sampling Sampling Population Sample Target Population
Cross-sectional study Total Cases Non cases Prevalence % Exposed 1000 500 500 50 % Not exposed 1000 100 900 10 % Prevalence ratio (PR) 50% / 10% = 5
Should I believe my measurement? Exposure RR = 4 Outcome Chance? Bias? Confounding? True association - Causal? - Non-causal?
Exposure Outcome Third variable Confounding: To eliminate (bias) Effect modification: To analyse (useful info)
What should you do? In practice, we deal with: effect modification FIRST; confounding SECOND 1. Check for effect modification Analyse it. 2. Check for confounding Eliminate it. How? 1. Stratification 2. Stratified analysis Create strata according to levels of exposure to the third variable
Effect modification
Effect modification Variation in the magnitude of measure of effect across levels of a third variable Happens when RR or OR is different between strata (subgroups of population)
Why study effect modification To sort out (quantitatively) effect of the exposure under study and of a third variable (a potential effect modifier) To identify a subgroup with a lower or higher risk ratio To target public health action To study interaction between risk factors
Effect modification Factor A (asbestos) Factor B (smoking) Disease (lung cancer) Effect modifier / Interaction 21
Asbestos (As) and lung cancer (Ca) Case-control study, unstratified data As Ca Controls OR Yes 693 320 4.8 No 307 680 Ref. Total 1000 1000
Asbestos Lung cancer Smoking?
Smokers As Ca Controls OR Yes 517 160 6.0 No 183 340 Ref. Total 700 500 Nonsmokers As Ca Controls OR Yes 176 160 3.0 No 124 340 Ref. Total 300 500 Are stratum-specific RRs/ORs different between them?
Asbestos (As), smoking and lung cancer (Ca) Smoking As Cases Controls OR Yes Yes 517 160 8.9 Yes No 183 340 1.5 No Yes 176 160 3.0 No No 124 340 Ref. (Reference group is the same for all ORs)
Asbestos, smoking and lung cancer: interpretation In smokers, exposure to asbestos is 6 times higher among lung cancer cases than among controls In non-smokers, exposure to asbestos is 3 times higher among lung cancer cases than among controls Therefore, exposure to smoking modifies (increases by a factor of 2) the effect of exposure to asbestos Present data by stratum Public health implications?
Physical activity and myocardial infarction (MI) Physical activity MI Control s OR, 95%CI 2500 kcal/d 190 264 0.64 (0.6-0.9) < 2500 kcal/d 176 157 Ref.
Physical activity Infarction Gender?
Men Physical activity MI Controls OR, 95%CI 2500 kcal/d 141 208 0.53 (0.4-0.7) < 2500 kcal/d 144 112 Ref. Women Physical activity MI Controls OR, 95%CI 2500 kcal/d 49 56 1.2, (0.7-2.2) < 2500 kcal/d 32 45 Ref. Are stratum-specific RRs/ORs different between them?
Interpretation Different effects (RR) in different strata (men-women) Therefore, effect of physical activity is modified by gender (more protective for men) Present data by stratum (gender)
Vaccine efficacy ARU ARV VE = ---------------- ARU VE = 1 RR VE: vaccine efficacy ARU: Attack rate in unvaccinated ARV: Attack rate in vaccinated RR: risk ratio
Vaccine efficacy (VE) Status Pop. Cases Cases per 1000 RR V 301 545 150 0.49 0.28 NV 298 655 515 1.72 Ref. Total 600 200 665 1.11 VE = 1 - RR = 1-0.28 VE = 72%
Vaccine Disease Age?
Vaccine efficacy by age group Age Status Pop. Cases Cases /1000 <1y V 35 625 38 1.07 0.87 13% NV 24 375 30 1.23 1-4y V 44 220 34 0.77 0.42 58% NV 46780 86 1.84 5-9y V 78 200 50 0.64 0.19 81% NV 75 000 250 3.33 10-24y V 83 400 18 0.22 0.15 85% NV 82 600 120 1.45 > 24y V 60 100 10 0.17 0.40 60% NV 69 900 29 0.41 RR VE Are stratum-specific RRs/ORs different between them?
Interpretation Different effects (RR) in different strata (age groups) Therefore, VE is modified by age Present data by age-stratum Public health implications?
Any statistical test to help us? Breslow-Day Woolf test Test for trends: Chi square
Comparison of strata-specific RRs/ORs Question: Are stratum-specific RRs/ORs different between them? Answer: 1. Do stratum-specific estimates look different? 2. 95% CI of OR/RR do NOT overlap? 3. Is the Test of Homogeneity significant?
Death from diarrhoea according to breast feeding, Brazil, 1980s (crude analysis) Diarrhoea Controls OR (95% CI) No breast feeding 120 136 3.6 (2.4-5.5) Breast feeding 50 204 Ref
No breast feeding Diarrhoea Age?
Death from diarrhoea according to breast feeding, Brazil, 1980s Infants < 1 month of age Cases Controls OR (95% CI) No breast feeding 10 3 32 (6-203) Breast feeding 7 68 Ref Infants 1 month of age Cases Controls OR (95% CI) No breast feeding 110 133 2.6 (1.7-4.1) Breast feeding 43 136 Ref Woolf test (test of homogeneity): p=0.03 Are stratum-specific RRs/ORs different between them?
Interpretation Different effects (OR) in different strata (age groups) Therefore, protective effect of breast feeding is modified by age (more protective in neonates <1 mo of age compared to infants 1 mo by a factor of 12) Present data by age-stratum Public health implications?
Risk of gastroenteritis by exposure, outbreak X, place Y, time Z (crude analysis) Exposed Yes No Exposure n AR (%) * n AR(%) * RR (95% CI ) pasta 94 77 7 4.2 18.0 (8.8-38) tuna 49 68 49 24 2.9 (2.1-3.8) * AR = Attack Rate RR = Risk Ratio 95% CI = 95% confidence interval of the RR
Tuna Gastroenteritis Pasta?
Pasta Yes Cases Total AR (%) RR (95% CI) Tuna 43 52 83 1.1 (0.9-1.3) No tuna 46 60 77 Ref Pasta No Risk of gastroenteritis by exposure, Outbreak X, Place, time X Cases Total AR (%) RR (95% CI) Tuna 4 17 24 11 (2.6-46) No tuna 3 144 2 Ref Woolf s test (test of homogeneity): p=0.0007 Are stratum-specific RRs/ORs different between them?
Interpretation Different effects (RR) in different strata Therefore, effect of exposure to tuna is modified by exposure to pasta Exposure to pasta reduces the effect of exposure to tuna (by a factor of 10) Present data by stratum Negative modification of effect is also possible
How to check for effect modification in stratified analysis Perform crude analysis Measure the strength of association (RR/OR with CIs) List potential effect modifiers Stratify data by level of exposure to potential modifier(s) Check for effect modification Are stratum-specific RRs/ORs different between them? [Yes = EM] Observe RRs/ORs CIs: overlapping? Woolf s test: statistical significance? If YES effect modification Show data by stratum If NO effect modification Check for confounding
Crazy findings!! Why?? A trial compared two treatments for kidney stones, Treatment A (surgery) and Treatment B (percutaneous nephrolithotomy). Successful outcome was defined as stones reduced to <2 mm in size. 350 patients were included in each treatment branch of the study. It was found that: Successful outcome, crude results Treatment A Treatment B 273/350 (78%) 289/350 (83%) Therefore, Treatment B is better (83% vs 78%). Successful outcome, stratified results by size of stones Treatment A Treatment B Small stones (<2 cm) 81/87 (93%) 234/270 (87%) Large stones ( 2 cm) 192/263 (73%) 55/80 (69%) TOTAL 273/350 289/350 Therefore, Treatment A is better (93% vs 87% and 73% vs 69%). Source: http://en.wikipedia.org/wiki/simpson s_paradox, Charig CR et al, BMJ 29/03/1986.
Crazy findings!! Why?? A trial compared two treatments for kidney stones, Treatment A (surgery) and Treatment B (percutaneous nephrolithotomy). Successful outcome was defined as stones reduced to <2 mm in size. 350 patients were included in each treatment branch of the study. It was found that: Successful outcome, stratified results by size of stones Treatment A Treatment B Small stones (<2 cm) 81/87 (93%) 234/270 (87%) Large stones ( 2 cm) 192/263 (73%) 55/80 (69%) TOTAL 273/350 (78%) 289/350 (83%) Simpson s paradox or reversal paradox Treatment A dominated by large stones: 263/350 (75%) Treatment B dominated by small stones: 270/350 (77%) AND Less successful outcome for large stone SOS: crude rates can be misleading in some circumstances!! Source: http://en.wikipedia.org/wiki/simpson s_paradox, Charig CR et al, BMJ 29/03/1986.
Confounding
Exposure Outcome Third factor (potential confounding factor) Distortion of measure of effect because of a third (confounding) factor, which must be related to the exposure must be a risk factor itself must not be in the causal chain of exposure-outcome Confounding should be prevented or controlled for
Example: third factor in the causal chain Smoking Heart attack NO + Atherosclerosis NOT a confounding
Example 1 Skateboarding Chlamydia infections Age not evenly distributed between the two exposure groups: 90% of Skate-boarders are young 20% of Non skate-boarders are young Age? In conclusion (1) Age is a possible CF (2) Thus, the effect of skateboarding on chlamydia infections likely close to 1 Chlamydia infection
Example 2 Coffee drinking NOTE: New study says that coffee drinkers are more likely to get lung cancer Lung cancer Smoking? Smoking not evenly distributed between coffee drinkers/non-drinkers: - Drinkers: 70% smokers - Non-drinkers: 10% smokers In conclusion (1) Effect of coffee drinking on lung cancer likely confounded by smoking. Smoking is a likely CF (2) Effect of coffee drinking on lung cancer likely close to 1
Example 3 Birth order Down syndrome Age of mother?
Example 3 (ii) Incidence Cases of Down syndroms syndrome by birth order Cases per 100 000 live births 180 160 140 120 100 80 60 40 20 0 1 2 3 4 5 Birth order
Example 3 (iii) Incidence of Cases Down of syndrome Down Syndrom babies by by age mothers groups age group Cases per 100000 live births 1000 900 800 700 600 500 400 300 200 100 0 < 20 20-24 25-29 30-34 35-39 40+ Age groups
Incidence of Down syndrome Cases of Down syndrom by birth order and mother's age babies by birth order AND age of mother Cases per 100000 1000 900 800 700 600 500 400 300 200 100 0 XXXXXXXXXXXXXXXXXXXXX 1 2 3 4 5 Birth order 30-34 25-29 20-24 < 20 40+ 35-39 Age groups
Example 3 (v) Birth order Down syndrome Age of mother? In conclusion (1) Effect of birth order on Down syndrome babies confounded by age of mothers (possible CF). (2) Effect of birth order on down syndrome will be likely close to 1
So, remember! A confounding factor always.. Confounding factors must met the two following conditions: Exposure Outcome Third variable Be associated with outcome - independently of exposure Be associated with exposure - without being the consequence of exposure
The distortion introduced by confounding factors May simulate an association May hide an association that does exist
How to prevent/control confounding? Prevention of confounding (in study design) Randomization (experiment) Restriction to one stratum Matching Control of confounding (in analysis) Stratified analysis Multivariable analysis
Example: Are Mercedes more dangerous than Porsche? A paper looking at 1000 Mercedes and 1000 Porsche drivers who were followed for one year (cohort study) Type Total Accidents AR % RR Porsche 1 000 300 30 1.5 Mercedes 1 000 200 20 Ref. Total 2 000 500 25 95% CI = 1.3-1.8
Distribution by age of driver Proportion of < 25 yo among Porsche drivers: 55 %... among Mercedes drivers: 30 %
Car type=porsche Accidents Confounding factor: Age of driver?
< 25 years Type Total Accidents AR % RR, 95% CI Porsche 550 250 45.5 1.14 (0.9-1.3) Mercedes 300 120 40.0 25 years Type Total Accidents AR % RR, 95% CI Porsche 450 50 11.1 0.97 (0.7-1.4) Mercedes 700 80 11.4 Crude RR = 1.5 Adjusted RR = 1.1 (0.94-1.27) Is adjusted RR/OR different from crude?
Incidence of malaria according to the presence of a radio set, Kahinbhi Pradesh Crude data Malaria Total AR% RR Radio set 80 520 15 0.7 No radio 220 1080 20 Ref 95% CI = 0.6-0.9
Radio Malaria Confounding factor: Mosquito net?
Sleeping under mosquito net Malaria Total AR% RR Radio 30 400 7.5 1.02 No radio 50 680 7.4 Ref No mosquito net Malaria Total AR % RR Radio 50 120 41.7 0.98 No radio 170 400 42.5 Ref Crude RR = 0.7 Adjusted RR = 1.01 Is adjusted RR/OR different from crude?
To identify confounding Compare crude measure of effect (RR or OR) to adjusted (weighted) measure of effect Mantel-Haenszel RR or OR
Any statistical test to help us? When is OR MH different from crude OR? Adjusted Crude Crude > 10-20%
Mantel-Haenszel summary measure Adjusted or weighted RR or OR S (a i d i ) / n i ORMH = --------------------------- S (b i c i ) / n i Advantages of MH Zeroes allowed
Mantel-Haenszel summary measure Mantel-Haenszel (adjusted or weighted) OR Exp+ For each stratum Cases Controls a 1 b 1 SUM (a i d i / n i ) OR MH = ------------------- Exp- c 1 d 1 SUM (b i c i / n i ) n 1 Cases Controls (a 1 x d 1 ) / n 1 + (a 2 x d 2 ) / n 2 +... OR MH = ------------------------------------------------ (b 1 x c 1 ) / n 1 + (b 2 x c 2 ) / n 2 +... Exp+ Exp- a 2 b 2 c 2 d 2 n 2
How to check for confounding in stratified analysis Perform crude analysis Measure the strength of association (RR/OR with CIs) List potential confounders Stratify data by level of exposure to potential confounder(s) Check for effect modification If NO effect modification Check for confounding Is adjusted RR/OR different from crude? [Yes = CF] adjusted RR/OR: Mantel-Haenszel (Adjusted-Crude)/Crude > 10 20%? If YES confounding Show adjusted data If NO confounding Show crude data
How to define the strata? Strata defined according to third variable: Usual confounders (e.g. age, sex, socio-economic status) Any other suspected confounder, effect modifier or additional risk factor Strata of public health interest For two risk factors: stratify on one to study the effect of the second on outcome Two or more exposure categories: each is a stratum Residual confounding? 74
How to deal with multiple risk factors Logical order of data analysis 1) Crude analysis 2) Stratified analysis 3) Multivariate analysis (mathematical model) - linear regression - logistic regression - Simultaneous adjustment for multiple risk factors/confounders - Can address effect modification
Stratified Analysis: an example. csinter case pesto, by(pasta) Pasta = Exposed Pesto Total Cases Risk % Exposed 56 43 76.79 Unexposed 65 51 78.46 Risk difference -0.02 [-0.17 0.13] Risk Ratio 0.98 [0.81 1.19] Attrib.risk.exp 0.02 [-0.19 0.19] Attrib.risk.pop 0.01 [..] Pasta = Unexposed Pesto Total Cases Risk % Exposed 20 1 5.0 Unexposed 145 6 4.14 Risk difference 0.01 [-0.09 0.11] Risk Ratio 1.21 [0.15 9.53] Attrib.risk.exp 1.17 [-5.52 0.90] Attrib.risk.pop 0.02 [..] Test of Homogeneity (M-H) : pvalue : 0.8366301 Crude RR for pesto : 2.08 [1.56 2.79] MH RR for pesto adjusted for pasta : 0.99 [0.81 1.20] Adjusted/crude relative change : -52.67% > 10-20%
Examples of stratified analysis Examples Stratum 1 Stratum 2 Crude RR 1 1.07 9.40 4.00 2 1.01 1.03 4.00 3 3.05 5.20 4.00 4 1.02 0.96 4.00 5 4.00 4.00 4.00 1) Are RRs/ORs different between strata? EM 2) Is adjusted RR/OR different from crude? CF
Summary: effect modification and confounding Effect modification Belongs to nature Modifies effect of exposure under study Need to disentangle effect of risk factors Useful: increases knowledge of biological mechanism Allows targeting of public health action Confounding Belongs to study Distorts effect of exposure under study Need to unmask effect of third variable Prevent (study design) Control (analysis)
Exposure Effect modification The co-player Outcome The driving-fast association: a half-truth disentangle effects of co-players
Exposure Confounding The real player Outcome The lying-in-bed association: a fallacy unmask the real player
Summary: how to conduct a stratified analysis Perform crude analysis Measure the strength of association (RR/OR with CIs) List potential effect modifiers and/or confounders Stratify data by level of exposure to potential modifiers or confounders Check for effect modification (Are stratum-specific RRs/ORs different between them? CIs / Woolf s test) If YES effect modification Show data by stratum If NO effect modification Check for confounding (Is adjusted RR/OR different from crude? >10-20%) If YES confounding Show adjusted data If NO confounding Show crude data
A train can mask a second train A variable can mask another variable!
Thank you Takis Panagiotopoulos National Schoool of Public Health, Athens, Greece takis.panagiotopoulos@gmail.com