Bias and confounding. Mads Kamper-Jørgensen, associate professor, Section of Social Medicine

Bias and confounding Mads Kamper-Jørgensen, associate professor, maka@sund.ku.dk PhD-course in Epidemiology l 7 February 2017 l Slide number 1

The world according to an epidemiologist Exposure Outcome We estimate the association between an exposure and an outcome. But does the association reflect causality or is it due to error? We will talk about Chance Information bias Selection bias Confounding PhD-course in Epidemiology l 7 February 2017 l Slide number 2

Two types of error Type I error We demonstrate an association, although no such association exist We typically accept a risk of type I error (α-level) of 5% Type II error We do not demonstate an association, although a such actually does exist We typically accept a risk of type II error (β-level) of 20% The error rates are traded off against each other. The only way to reduce both error rates is to increase the sample size. PhD-course in Epidemiology l 7 February 2017 l Slide number 3

Type I and type II error Testing a statistical hypothesis The truth Association exists No association exists Result of study Association demonstrated Association not demonstrated Reject 0-hypothesis (correct inference) Accept 0-hypothesis (Type II error) Reject 0-hypothesis (Type I error) Accept 0-hypothesis (correct inference) PhD-course in Epidemiology l 7 February 2017 l Slide number 4

Precision and bias Blood pressure measured once for 20 people A) Precise, unbiased: Blood pressure meter B) Precise, biased: Poorly calibrated blood pressure meter. C) Unprecise, unbiased: iphone D) Unprecise, biased: Poorly calibrated iphone PhD-course in Epidemiology l 7 February 2017 l Slide number 5

Precision and bias RANDOM ERROR SYSTEMATIC ERROR Reduces the precision Reduces the validity Has no direction Depends on sample size: Bigger is better Leads to over- or under estimation Bigger is not better Does not nescesarily lead to bias Leads to bias PhD-course in Epidemiology l 7 February 2017 l Slide number 6

Types of bias Information bias Has to do with the information about study participants Selection bias Has to do with the selection of study participants Confounding Has to do with mixing of effects because the compared study participants are not comparable PhD-course in Epidemiology l 7 February 2017 l Slide number 7

Why information bias? Because we can over or under estimate frequencies or associations and draw the wrong inference if the information on participants is incorrect So far we assumed correct information: (Hardly) never the case Pertains to exposure, covariates and/or outcome Due to e.g. biologic variation, poor memory, imprecise question, ignorance etc. Information bias is due to systematically incorrect information about participants You can t undo information bias once data has been collected so use precise instruments, questions, standardized procedures, blinding, training PhD-course in Epidemiology l 7 February 2017 l Slide number 8

Sensitivity and specificity Sensitivity: the ability of a test to classify true positives (TP) as positives. Calculation: TP/(TP+FN) Specificity: the ability of a test to classify true negatives (TN) as negatives. Calculation: TN/(TN+FP) Diseased Non-diseased Diseased TP FP Non-diseased FN TN Total TP+FN FP+TN Most often related to the quality of a biologic test, can describe how well a question reflects truth PhD-course in Epidemiology l 7 February 2017 l Slide number 9

Misclassification Wrong classification of participants If misclassification is similar in the compared groups it s called non-differential misclassification If misclassification is not similar in the compared groups it s called differential misclassification Both non-differential and differential misclassification may cause information bias PhD-course in Epidemiology l 7 February 2017 l Slide number 10

Examples from own research Ignorance Few adult Americans received transfusion Culture Few adult Frenchmen drink alcohol Poor question Few Danish children have age-appropriate motor skills PhD-course in Epidemiology l 7 February 2017 l Slide number 11

Quiz Turn on your phone/tablet/laptop Visit www.madskamper.dk/phd Take only the HPV quiz Discuss with your neighbour PhD-course in Epidemiology l 7 February 2017 l Slide number 12

Misclassification Fictitious cohort study of the association between alcohol consumption and self-percieved health using a poor measure of alcohol consumption Good Bad Total Abstinent 236 59 295 Consumer 846 419 1265 Total 1082 478 1560 True information on alcohol: RR=1.66 PhD-course in Epidemiology l 7 February 2017 l Slide number 13

Non-differential misclassification Good Bad Total Abstinent 321 101 422 Consumer 761 377 1139 Total 1082 478 1560 True information on alcohol: RR=1.66 10% of consumers are misclassified: RR=1.38 PhD-course in Epidemiology l 7 February 2017 l Slide number 14

Non-differential misclassification Good Bad Total Abstinent 405 143 548 Consumer 677 335 1012 Total 1082 478 1560 True information on alcohol: RR=1.66 10% of consumres are misclassified: RR=1.38 20% of consumers are misclassified: RR=1.27 The association goes towards no difference between groups i.e. 0 if the scale is absolute and 1 if the scale is relative PhD-course in Epidemiology l 7 February 2017 l Slide number 15

Differential misclassification Good Bad Total Abstinent 236 101 337 Consumer 846 377 1223 Total 1082 478 1560 True information on alcohol: RR=1.66 10% of consumers are misclassified, but only among those with self-percieved bad health: RR=1.03 PhD-course in Epidemiology l 7 February 2017 l Slide number 16

Differential misclassification Good Bad Total Abstinent 236 143 379 Consumer 846 335 1181 Total 1082 478 1560 True information on alcohol: RR=1.66 10% of consumres are misclassified, but only among those with self-percieved bad health: RR=1.03 20% of consumers are misclassified, but only among those with self-percieved bad health: RR=0.75 Can reverse the association PhD-course in Epidemiology l 7 February 2017 l Slide number 17

Examples of differential misclass. Case-control study Recall bias: cases remember exposures differently (often better) than controls. NOT the same as poor memory Interviewer bias: Interviewer asks differently (often in more detail) regarding exposures among cases compared with controls Cohort study Detection bias: exposed are at different (often higher) risk of the outcome compared with unexposed Interviewer bias: exposed are asked differently (often in more detail) about the outcome compared with unexposed PhD-course in Epidemiology l 7 February 2017 l Slide number 18

BREAK What are the sources of information bias in your project and is it non-differential or differential? PhD-course in Epidemiology l 7 February 2017 l Slide number 19

Why selection bias? Because we can over or under estimate frequencies or associations and draw the wrong inference if the study population does not represent the target population So far we assumed that participants in our study are comparable to those who do not participate: Not always the case Selection bias is due to systematic differences between participants and thoose who do not participate Selection into the cohort and attrition PhD-course in Epidemiology l 7 February 2017 l Slide number 20

Selection bias Target population Source population Study population Systematic differences PhD-course in Epidemiology l 7 February 2017 l Slide number 22

An example Target population Pregnant women in Denmark Source population Pregnant women at selected GPs Study population Paricipants in the Danish National Birth Cohort (DNBC): participation dependent on whether the woman wanted to participate Selection bias? Is the study population different than the source population, and is the source population different than the target population? PhD-course in Epidemiology l 7 February 2017 l Slide number 23

It depends DNBC women are different They drink less, they are better educated, they eat healthier, they use less medication etc. Scientific question How many use pain killers during pregnancy? Yes, very likely information bias Is folic acid associated with neural tube defects? No, not very likely Because Both the exposure and the outcome should be associated with the likelihood of participating in the study in comparative studies PhD-course in Epidemiology l 7 February 2017 l Slide number 24

Validity Internal validity Do the results apply to the target population? Threatened by selection bias, information bias and confounding External validity Do results apply beyond the target population? Dependent on internal validity Qualitative statement of the direction and strength of an association PhD-course in Epidemiology l 7 February 2017 l Slide number 25

Are the results biased? We (often times) do not know if the frequency or association is biased by selection because we (often times) do not have information about non-participants Risk of selection bias must be considered depending on the scientific question, the study design, and the applied data Texan study of HIV prevalence Matthew McConaughey in Dallas Buyers Club PhD-course in Epidemiology l 7 February 2017 l Slide number 26

What to do? Data collection Maximize response rate through reminders, competitions, payment etc. Response rates dropped throughout 30 years Snowball sampling (hard-to-get groups) National registers without selection PhD-course in Epidemiology l 7 February 2017 l Slide number 27

Quiz Visit www.madskamper.dk/phd Take only the hepatitis quiz Discuss with your neighbour PhD-course in Epidemiology l 7 February 2017 l Slide number 28

Examples of selection bias Randomized and cohort studies Generally not a problem because selection must relate to both exposure and outcome (which happens in the future) Attrition bias e.g. new anti-depressant and depression. Under estimates the effect of the new anti-depressant because the most depressed using the old drug drop out Case-control studies Poor selection of controls: Pancreas cancer and coffee. Over estimates the effect of coffee because controls have been advised not to drink coffee PhD-course in Epidemiology l 7 February 2017 l Slide number 29

Examples of selection bias Cross-sectional studies Survival bias: Smoking and COPD. Under estimates the effect of smoking because smokers with COPD are at high risk of dying PhD-course in Epidemiology l 7 February 2017 l Slide number 30

Can selection bias explain it? 1000 people were invited to participate in a study of the association between sex and hair loss. Of those, 650 (65%) agreed. OR = (100/200) / (50/300) = 3.00 (95% CI 2.04-4.40) + Hair loss - Hair loss Man 100 200 Woman 50 300 We suspect men losing their hair to be more interested in participating than the other groups. PhD-course in Epidemiology l 7 February 2017 l Slide number 31

Can selection bias explain it? All men losing their hair participate, while participation in the other groups is 61% + Hair loss - Hair loss Man 100 (100%) 200 (61%) Woman 50 (61%) 300 (61%) Observed OR part%(a) / part%(c) part%(b) / part%(d) x true OR 100 / 61 3 61/ 61 x true OR True OR 1.83 (95% CI 1.32-2.53) PhD-course in Epidemiology l 7 February 2017 l Slide number 32

BREAK Do you have reasons to fear selection in your studies can you justify it? PhD-course in Epidemiology l 7 February 2017 l Slide number 33

Confounding What is it? To mix up, confuse, mistake Used in epidemiology to describe mixing up of causes of a given effect Leads to misinterpretation, wrong inference An example Does birth order affect the risk of Down s syndrome? PhD-course in Epidemiology l 7 February 2017 l Slide number 34

Birth order and Down s syndrome DK in 2005-2009: ~ 0,5 per 1000 births From: K Rothman: Epidemiology An Introduction 2002 PhD-course in Epidemiology l 7 February 2017 l Slide number 35

Maternal age and Down s syndrome From: K Rothman: Epidemiology An Introduction 2002 PhD-course in Epidemiology l 7 February 2017 l Slide number 36

Birth order, maternal age and Down s syndrome From: K Rothman: Epidemiology An Introduction 2002 PhD-course in Epidemiology l 7 February 2017 l Slide number 37

Confounding Is present when An observed association between exposure and outcome fully or partly can be attributed a different distribution of risk factors for the outcome, among exposed and unexposed i.e. unexchangeability Criteria Independent risk-factor for the outcome Associated with the exposure Not an inter-mediate step between exposure and outcome PhD-course in Epidemiology l 7 February 2017 l Slide number 38

Confounder model Exposure Outcome Associated with the exposure Independent risk-factor for the outcome Confounder Not inter-mediate between exposure and outcome PhD-course in Epidemiology l 7 February 2017 l Slide number 39

Quiz Visit www.madskamper.dk/phd Take the last quiz Discuss with your neighbour PhD-course in Epidemiology l 7 February 2017 l Slide number 40

Confounder identification Methods Stepwise selection (forwards or backwards) Change-in-estimate Causal diagrams (DAGs) Recommendation Common sense Do not nescessarily do what others have done before PhD-course in Epidemiology l 7 February 2017 l Slide number 41

PhD-course in Epidemiology l 7 February 2017 l Slide number 42 Section of Social Medicine

Confounder control DESIGN ANALYSIS Randomization - Not possible in observational design Matching - Not possible to investigate the effect of matching variable - May remove the effect you are interested in studying - Twin and sibling design Standardization - Indirect standardization (one population is standard) - Direct standardization (external standard population) Stratified analysis - Only possible to stratify according to a few variables Multivariate analysis - Adjust simultaneously for several variables - Estimates from such analysis are called adjusted PhD-course in Epidemiology l 7 February 2017 l Slide number 43

Unmeasured vs. residual confounding Unmeasured Variables which we have no data on Residual If the categorization is too crude or the information regarding the confounder is imprecise Look out for mix-ups PhD-course in Epidemiology l 7 February 2017 l Slide number 44

Design and bias PhD-course in Epidemiology l 7 February 2017 l Slide number 45

Sir Bradford Hill s criteria of causality Criterion Stregnth Consistency Specificity Temporality Dosis-response Plausibility Explanation Strength depend on the prevalence. A strong association are not likely only due to confounding Several investigations point towards the same i.e. replicated in other designs and settings One cause leads to one outcome Cause must predate effect The risk of outcome increases with increasing exposure Plausible biological explanation? Experimental evidence Analogy Designs with control of conditions (RCT or animal models) If some exposures are harmfull similar exposures are probably harmfull too PhD-course in Epidemiology l 7 February 2017 l Slide number 46