Joseph W Hogan Brown University & AMPATH February 16, 2010
Drinking and lung cancer Gender bias and graduate admissions AMPATH nutrition study
Stratification and regression drinking and lung cancer graduate admissions Matching Related to stratification Nutrition study Weighting
Lung Cancer? Yes No Heavy drinker 33 1667 (1.9%) Non drinker 27 2273 (1.2%) Odds ratio = 1.67
Smokers CA No CA HD 24 776 (3%) ND 6 194 (3%) OR = 1.0
Smokers Non- Smokers CA No CA CA No CA HD 24 776 (3%) ND 6 194 (3%) OR = 1.0 HD 9 891 (1%) ND 21 2079 (1%) OR = 1.0
Of the 1000 smokers: 800 are heavy drinkers (80%) 30 develop lung cancer (3%)
Of the 1000 smokers: 800 are heavy drinkers (80%) 30 develop lung cancer (3%) Of the 3000 non- smokers 900 are heavy drinkers (30%) 30 develop lung cancer (1%) Source: Rosner, Fundamentals of Biostatistics, Duxbury Press, 1995
Method 1: Mantel- Haenszel odds ratio Stratify the analysis on the confounding variable Take weighted average of odds ratios Here, the weighted average is 1.0
Method 2: Logistic regression Dependent variable = lung CA Independent variable = drinking (yes/no) Confounder variable = smoking (yes/no) Having the confounder in the model performs the adjustment In large samples, equivalent to M- H odds ratio
(1) Variable Coef. s.e. O.R. Drinker 0.51 0.26 1.67 (2) Variable Coef. s.e. O.R. Drinker 0.00 0.30 1.00 Smoker 1.12 0.30 3.06
Coefficients in logistic regression are log odds ratios Adding a yes/no variable as a predictor stratifies the analysis
Observational study of sex bias in graduate admissions (1973) Admission rates: Women 4,321 applied, 35% admitted Men 8,442 applied, 44% admitted A clear case of discrimination?
Men Women Major Applied % Admitted Applied % Admitted A 825 62 108 82
Men Women Major Applied % Admitted Applied % Admitted A 825 62 108 82 B 560 63 25 68
Men Women Major Applied % Admitted Applied % Admitted A 825 62 108 82 B 560 63 25 68 C 325 37 593 34
Men Women Major Applied % Admitted Applied % Admitted A 825 62 108 82 B 560 63 25 68 C 325 37 593 34 D 417 33 375 35
Men Women Major Applied % Admitted Applied % Admitted A 825 62 108 82 B 560 63 25 68 C 325 37 593 34 D 417 33 375 35 E 191 28 393 24
Men Women Major Applied % Admitted Applied % Admitted A 825 62 108 82 B 560 63 25 68 C 325 37 593 34 D 417 33 375 35 E 191 28 393 24 F 373 6 341 7
Men Women Major Applied % Admitted Applied % Admitted A 825 62 108 82 B 560 63 25 68 C 325 37 593 34 D 417 33 375 35 E 191 28 393 24 F 373 6 341 7
First two majors are the easiest (over 50% of men applied to these) The rest are harder (over 90% of women applied to these) This time, major department is the confounder
As with odds ratios, stratify and average Take a weighted average of sex- specific admission rates across majors The weight is the total number of applicants to the department
Unweighted (aggregated) rates Men = 44% Women = 35% Weighted (within- department) rates Men = 39% Women = 43% Within department, women have better admission rates.
Study objectives Assess effect of food assistance for those initiating cart Weight, clinic adherence, mortality Have data on those in food program Need comparable control group to assess program effectiveness
Food program 1864 identified on food assistance 74% female Mean age 37 yrs Mean wt 52 kg How to identify a control group? Cannot just get randomly- selected controls
Idea: match each treated person to one or more untreated controls Want to match on one or more characteristics Result: those characteristics are controlled Analysis is similar to stratification methods (though technically more complicated)
Simple example: match these lists on age Group 1 (Treated) 20 22 24 24 32 50 58 60 Group 2 (Controls) 18 20 26 27 40 41 60 61
Find closest possible match for each treated Group 1 (Treated) 20 22 24 24 32 50 58 60 Group 2 (Controls) 18 20 26 27 40 41 60 61
Match within 2 year window 20 22 24 24 32 50 58 60 18 20 26 27 40 41 60 61
Find closest possible match for each treated 20 22 24 24 32 50 58 60 18 20 26 27 40 41 60 61
Find closest possible match for each treated 20 22 24 24 32 50 58 60 18 20 26 27 40 41 60 61
Find closest possible match for each treated 20 22 24 24 32 50 58 60 18 20 26 27 40 41 60 61
Simple example illustrates that algorithms are needed to get optimal matching Example: minimize total difference in age over all possible matched sets Can specify thresholds for matching Can match on more than one variable
Information comes from discordancy in the outcome, within matched sets Matched sets where outcome is the same contribute no information Some special analysis routines are needed Conditional logistic regression Stratified regression
When matching is done effectively, results can have more power than unmatched analyses But not always the case
case male basewt basecd4 adh3! -------------------------------------! 0 0 56 37 0! 0 0 50 27 1! 0 0 54 47 1! 1 0 55 39 0!
case male basewt basecd4 adh3! -------------------------------------! 0 0 68 167 1! 0 0 66 126 1! 0 0 72 133 1! 0 0 72 162 0! 1 0 71 117 1!
case male basewt basecd4 adh3! -------------------------------------! 0 0 65 740 1! 0 0 70 545 1! 1 0 67 801 1!
Method: conditional logistic regression Outcome = adherence (yes/no) Independent var = case status (food yes/no) Do not need to add matching covariates Cannot estimate effect of variables used to match Odds ratio interpretation: Effect of food program within matched sets i.e. within sets having similar covariate profile
note: 1052 groups (2338 obs) dropped because of all positive or all negative outcomes. Number of obs = 3821 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - adh3 OR SE [95% Conf. Interval] - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - case 1.31.097 1.13 1.51 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Ideal solution: randomized trial Another solution: covariate balance Stratified analyses Matching Can only reduce confounding bias to the extent that you can measure and adjust for important confounders
Stratification Stratify the population across levels of the confounder Works well when confounders are low dimensional Matching Match individual treated to untreated
Confounders should be thought of in advance For observational studies, confounder adjustment is essential Matching or stratification?
Use stratification/regression adjustment for Small to moderate sized studies Situations with small number of confounders Use matching for Large studies, where lots of controls are available Situations where you are not interested in the effect of the confounders themselves
Weighting and propensity scores Close relation to missing data To be discussed at next lecture