Mendelian Randomisation and Causal Inference in Observational Epidemiology. Nuala Sheehan

Similar documents
Sample size and power calculations in Mendelian randomization with a single instrumental variable and a binary outcome

Mendelian Randomization

Brief introduction to instrumental variables. IV Workshop, Bristol, Miguel A. Hernán Department of Epidemiology Harvard School of Public Health

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

For more information about how to cite these materials visit

Mendelian Randomisation and Causal Inference in Observational Epidemiology

Association of plasma uric acid with ischemic heart disease and blood pressure:

MEA DISCUSSION PAPERS

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE

Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology

Advanced IPD meta-analysis methods for observational studies

Mendelian randomization with a binary exposure variable: interpretation and presentation of causal estimates

Relationships between adiposity and left ventricular function in adolescents: mediation by blood pressure and other cardiovascular measures

Exploring the utility of alcohol flushing as an instrumental variable for alcohol intake in Koreans

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data

Mendelian randomization analysis with multiple genetic variants using summarized data

Use of allele scores as instrumental variables for Mendelian randomization

Bayesian methods for combining multiple Individual and Aggregate data Sources in observational studies

(b) empirical power. IV: blinded IV: unblinded Regr: blinded Regr: unblinded α. empirical power

MATERNAL INFLUENCES ON OFFSPRING S EPIGENETIC AND LATER BODY COMPOSITION

Outline. Introduction GitHub and installation Worked example Stata wishes Discussion. mrrobust: a Stata package for MR-Egger regression type analyses

MRC Integrative Epidemiology Unit (IEU) at the University of Bristol. George Davey Smith

Use of allele scores as instrumental variables for Mendelian randomization

Does prenatal alcohol exposure affect neurodevelopment? Attempts to give causal answers

Fatty acids and cardiovascular health: current evidence and next steps

Mendelian randomization for strengthening causal inference in observational studies: application to gene by environment interaction.

Score Tests of Normality in Bivariate Probit Models

Network Mendelian randomization: using genetic variants as instrumental variables to investigate mediation in causal pathways

Testing for non-linear causal effects using a binary genotype in a Mendelian randomization study: application to alcohol and cardiovascular traits

Analyzing diastolic and systolic blood pressure individually or jointly?

Trial designs fully integrating biomarker information for the evaluation of treatment-effect mechanisms in stratified medicine

Genetic Meta-Analysis and Mendelian Randomization

An instrumental variable in an observational study behaves

Methods for meta-analysis of individual participant data from Mendelian randomization studies with binary outcomes

Careful with Causal Inference

Causal Mediation Analysis with the CAUSALMED Procedure

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

Supplementary Table 1. Association of rs with risk of obesity among participants in NHS and HPFS

THE FIRST NINE MONTHS AND CHILDHOOD OBESITY. Deborah A Lawlor MRC Integrative Epidemiology Unit

What is: regression discontinuity design?

Association of a nicotine receptor polymorphism with reduced ability to quit smoking in pregnancy

Introduction and motivation

Investigating causality in the association between 25(OH)D and schizophrenia

Best (but oft-forgotten) practices: the design, analysis, and interpretation of Mendelian randomization studies 1

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Consideration of Anthropometric Measures in Cancer. S. Lani Park April 24, 2009

Lecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009

Dylan Small Department of Statistics, Wharton School, University of Pennsylvania. Based on joint work with Paul Rosenbaum

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

in Medicine Tutorial in Biostatistics: Instrumental Variable Methods for Causal Inference

Selection Bias in the Assessment of Gene-Environment Interaction in Case-Control Studies

Assessing causal relationships using genetic proxies for exposures: an introduction to Mendelian randomization

How should the propensity score be estimated when some confounders are partially observed?

Studying the effect of change on change : a different viewpoint

Instrumental Variables Estimation: An Introduction

G , G , G MHRN

16:35 17:20 Alexander Luedtke (Fred Hutchinson Cancer Research Center)

ESM1 for Glucose, blood pressure and cholesterol levels and their relationships to clinical outcomes in type 2 diabetes: a retrospective cohort study

Biostatistics and Epidemiology Step 1 Sample Questions Set 2. Diagnostic and Screening Tests

Article (Published version) (Refereed)

Gene-Environment Interactions

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Treatment effect estimates adjusted for small-study effects via a limit meta-analysis

Beyond the intention-to treat effect: Per-protocol effects in randomized trials

Reflection Questions for Math 58B

Genetics, epigenetics, and Mendelian randomization: cuttingedge tools for causal inference in epidemiology

Confounding. Confounding and effect modification. Example (after Rothman, 1998) Beer and Rectal Ca. Confounding (after Rothman, 1998)

MENDELIAN GENETIC CH Review Activity

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Looking Toward State Health Assessment.

A re-randomisation design for clinical trials

Chapter 1: Exploring Data

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

EVect of measurement error on epidemiological studies of environmental and occupational

Web Extra material. Comparison of non-laboratory-based risk scores for predicting the occurrence of type 2

Laws of Inheritance. Bởi: OpenStaxCollege

Technical Specifications

programme. The DE-PLAN follow up.

Fetal exposure to alcohol and cognitive development: results from a Mendelian randomization study. Sarah Lewis

Running head: MENDELIAN RANDOMISATION and PSYCHOPATHOLOGY 1. studies aiming to identify environmental risk factors for psychopathology

Continuous update of the WCRF-AICR report on diet and cancer. Protocol: Breast Cancer. Prepared by: Imperial College Team

Mediation Analysis With Principal Stratification

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): 10.

Simulation study of instrumental variable approaches with an application to a study of the antidiabetic effect of bezafibrate

Measurement Error in Nonlinear Models

Econometric Game 2012: infants birthweight?

Low-density lipoproteins cause atherosclerotic cardiovascular disease (ASCVD) 1. Evidence from genetic, epidemiologic and clinical studies

8/10/2012. Education level and diabetes risk: The EPIC-InterAct study AIM. Background. Case-cohort design. Int J Epidemiol 2012 (in press)

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide

Instrumental variable analysis in randomized trials with noncompliance. for observational pharmacoepidemiologic

Bayesian and Frequentist Approaches

Examining Relationships Least-squares regression. Sections 2.3

investigate. educate. inform.

Importance of factors contributing to work-related stress: comparison of four metrics

Choice of axis, tests for funnel plot asymmetry, and methods to adjust for publication bias

Biostatistics II

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

Transcription:

Mendelian Randomisation and Causal Inference in Observational Epidemiology Nuala Sheehan Department of Health Sciences UNIVERSITY of LEICESTER MRC Collaborative Project Grant G0601625 Vanessa Didelez, Sha Meng, Roger Harbord, Jonathan Sterne, Debbie Lawlor, Frank Windmeijer, John Thompson, Paul Clarke, Tom Palmer, Paul Burton, George Davey Smith Department of Epidemiology and Public Health Imperial College London March 2009

Causal Inference Always inference about interventions whatever framework is adopted Epidemiology is concerned with finding and assessing the size of the effect of modifiable risk factors on diseases so that (public health) interventions can be informed always about causality! Examples for interventions: Adding folic acid to flour Banning smoking in pubs Dietary advice: 5 portions of fruit & vegetables a day etc. 2

Problems Association causation i.e. might find an association but intervention turns out to be useless, Randomised/controlled experiments are not always possible in Epidemiology sometimes unavoidable and often desirable to have to work with observational data. observational findings often not reproduced in randomised trials when these are possible: reverse causation confounding others? Instrumental variable methods as alternative? 3

Interventions We need notation to formally distinguish between association and causation. Intervention: setting X to a value x denoted by do(x = x). p(y do(x = x)) not necessarily the same as p(y X = x). p(y do(x = x)) depends on x only if X is causal for Y observed in a randomised study. p(y X = x) will also depend on x when there is confounding or reverse causation observed in an observational study. 4

Causal Effect Some contrast in the effects of different interventions on X on the outcome Y i.e. compare p(y do(x = x 1 )) with p(y do(x = x 2 )). Average Causal Effect: ACE(x 1,x 2 ) = E(Y do(x 1 )) E(Y do(x 2 )) or analogously Causal Risk Ratio CRR or Causal Odds Ratio COR. CRR(x 1,x 2 ) = p(y = 1 do(x = x 1)) p(y = 1 do(x = x 2 )) COR(x 1,x 2 ) = p(y = 1 do(x = x 1))p(Y = 0 do(x = x 2 )) p(y = 0 do(x = x 1 ))p(y = 1 do(x = x 2 )) Alternative: causal effect on particular subgroups. 5

Identifiability We are often interested in estimating the causal effect consistently from observational data. Mathematically, the causal effect is identifiable if we can re-express it without do(x) notation using only the distribution of observable variables. If a sufficient set C of confounders is measured, the causal effect is identified using p(y do(x = x)) = c p(y X = x,c = c)p(c) usual adjustment for known confounders. This is not always possible in the observational regime need to deal with confounding by other means, e.g. instrumental variables (IV) 6

Identifiability using Instrumental Variables There are different types of assumption required: (in)dependencies structural parametric form } allow testing for causal effect for estimation 7

Core Conditions For the effect of X (phenotype/exposure) on Y (disease), a variable G (genotype) is an instrument if variable(s) U (unobserved confounders) exist such that 1. G U: G and U are independent 2. G / X: G and X are (strongly) associated 3. G Y (X, U): G and Y conditionally independent given X and U. G is only associated with Y via its association with X, Cannot forget about U! 8

Core Conditions Graphically (1) U G (2) X Y (3) NOTE: In particular, these conditions do not imply that G Y X or G Y. NOTE: Assumptions 1 and 3 cannot be tested from data as U is typically not known/measured justification must be based on background/subject matter knowledge. 9

Core Conditions Graphically U G X Y Equivalent to factorisation Structural assumptions: p(y x, u)p(x u, g)p(u)p(g). p(y x, u), p(g) and p(u) are not changed by intervention in X, i.e. when conditioning on do(x). 10

Core Conditions Graphically U G X Y With structural assumption: under intervention in X p(y,u,g do(x = x )) = p(y x,u)p(u)p(g) Graphically, the intervention corresponds to removing all arrows leading into X. 12

Testing for Causal Effect No causal effect G independent of Y U G X Y From factorisation p(y, x, u, g) = p(y u)p(x u, g)p(u)p(g) p(y,g) = x,u p(y u)p(x u, g)p(u)p(g) = p(y)p(g). So a test for association between G and Y can be taken as a test for a causal effect of X on Y (Katan 1986) 13

Mendelian Randomisation Consider risk factors that are modifiable behaviours or phenotypes known to be caused by, or strongly related to, certain genotypes; Mendel s Second Law (law of assortment): genotypes can reasonably be assumed to be independent of life style etc. Independent of typical confounding factors; kind of randomised ; Genes are determined before birth, no reverse causation possible; Conjecture: if and only if phenotype is causal for disease should find an association between genotype and disease! This is known in statistics and econometrics as an Instrumental Variable (IV) method here the genotype is the instrument. Katan (1986) letter to Lancet, Davey Smith & Ebrahim (2003),Lawlor et al. (2008),Greenland(2000), Hernán & Robins (2006), Didelez & Sheehan (2007) 14

Example: Alcohol Consumption unobserved Lifestyle / confounders Alcohol consumption? Disease Chen et al. (2008) Alcohol consumption has been found in observational studies to have positive effects (coronary heart disease) as well as negative effects (liver cirrhosis, some cancers, mental health problems). But also strongly associated with all kinds of confounders (lifestyle etc.), as well as subject to self report bias. Hence doubts in causal meaning of above effects. 15

Example: Alcohol Consumption Genetic Instrumental Variable? Genotype: ALDH2 determines blood acetaldehyde, the principal metabolite for alcohol. Two alleles/variants: wildtype *1 and null variant *2. *2*2 homozygous individuals suffer facial flushing, nausea, drowsiness and headache after alcohol consumption. *2*2 homozygotes have low alcohol consumption regardless of their other lifestyle behaviours i.e. the gene can be taken as a proxy for alcohol intake. IV Idea: check if these individuals have a reduced risk for alcoholrelated health problems! 16

Example: Alcohol Consumption (1) unobserved Lifestyle / confounders ALDH2 genotype (2) Alcohol consumption? Disease Note 1: due to random allocation of genes at conception, can be fairly confident that genotype is not associated with unobserved confounders. Further evidence: in extensive studies no evidence for association with observed confounders, e.g. age, smoking, BMI, cholesterol. (see also Davey Smith et al. 2007) 17

Example: Alcohol Consumption (1) unobserved Lifestyle / confounders ALDH2 genotype (2) Alcohol consumption? Disease Note 2: due to known functionality of ALDH2 gene, we can exclude that it affects the typical diseases considered by another route than through alcohol consumption. important to use well studied genes as instruments! 18

Example: Alcohol Consumption unobserved Lifestyle / confounders ALDH2 genotype (3) Alcohol consumption? Disease Note 3: association of ALDH2 with alcohol consumption well established, strong, and underlying biochemistry well understood. 19

Example: Alcohol Consumption Alcohol intake ml/day 40 30 20 10 0 2*2/2*2 2*2/2*1 1*1/1*1 ALDH2 genotype Note 3: association of ALDH2 with alcohol consumption well established, strong, and underlying biology well understood. 20

Example: Alcohol Consumption unobserved Lifestyle / confounders ALDH2 genotype Alcohol consumption? Disease Note 4: if the above is our causal graph, then under the null hypothesis of no causal effect of alcohol consumption, there should be no association between ALDH2 and disease; While if alcohol consumption has a causal effect we would expect an association between ALDH2 and disease. 21

Example: Alcohol Consumption Findings: (Meta-analysis by Chen et al. 2008) Blood pressure on average 7.44mmHg higher and risk of hypertension 2.5 higher for *1*1 homozygotes than for *2*2 homozygotes (only males). mimics the effect of large versus low alcohol consumption. Blood pressure on average 4.24mmHg higher and risk of hypertension 1.7 higher for *1*2 heterozygotes than for *2*2 homozygotes (only males). mimics the effect of moderate versus low alcohol consumption. it seems that even moderate alcohol consumption is harmful. Note: studies mostly in Japanese populations (where ALDH2*2*2 is common) and where women drink only little alcohol in general. 22

Problems with Mendelian Randomisation Poor inferences may occur due to poor estimates of G X and G Y associations a genetic epidemiology problem The core conditions can be violated in many different ways an instrumental variable problem But some situations that look like violations are okay. GRAPHS can be used to check these conditions. 23

Estimation of Causal Effect Requires parametric assumptions e.g. linearity & no interactions. Plus: structural assumption E(Y X = x,u = u) = E(Y do(x = x),u = u) = µ + βx + δu Then: consistent estimator for ACE(x + 1, x) = β is ˆβ IV = ˆβ Y G ˆβ X G and st.dev(ˆβ IV ) = σ Gσ Y X σ G,X where ˆβ Y G and ˆβ X G are least squares regression coefficients. Note: weak instrument (σ G,X 0) makes ˆβ IV unstable. 24

However Typical applications of IV with Mendelian randomisation in epidemiology: Y is binary (X continuous, G categorical), p(y x, u) usually non-linear. Not always clear how relevant causal parameter is related to the relevant coefficients from the two separate regressions. case control studies: want to use causal odds ratio (COR) or causal risk ratio (CRR) not ACE. 25

IV Methods for Binary Outcome Various IV estimators for binary outcomes are used in Epidemiology. They all make different additional parametric assumptions (i.e. besides the core conditions and structural assumption). When assumptions are violated, resulting estimates will be biased estimates of the target causal effect. Can all behave unreliably in the all-binary situation. Didelez, Meng & Sheehan (2008) Research Report 08-20 Department of Mathematics, University of Bristol 26

IV Methods for Binary Outcome Naïve estimators: ORs and RRs from a regression of Y on X. Biased if there is unobserved confounding. Wald-type estimators: Based on ratios of differences e.g. Wald OR: OR (µ 1 µ 0 ) 1 Y G where µ 1 µ 0 = E(X G = 1) E(X G = 0) and for binary G. OR Y G = p(y = 1 G = 1)p(Y = 0 G = 0) p(y = 0 G = 1)p(Y = 1 G = 0) Heuristic no model assumptions for theoretical justification. 27

IV Methods for Binary Outcome Wald-type estimators: If all relationships are log-linear and conditional variance of X is variation independent of the mean WaldRR = is consistent for the CRR. { } 1/ E(Y G = 1) (1) E(Y G = 0) = RR(Y G) 1/ For rare disease, the Wald OR is as good as consistent under the same model assumptions. Wald-type estimators: useful when we do not have joint data on X,G and Y e.g. meta-analysis. 28

IV Methods for Binary Outcome Multiplicative structural mean model: { E(Y X = x,g,do( log X } = x)) E(Y X = x,g,do( X = 0)) = γx. (2) where X denote the natural exposure level while X denotes the exposure that is set by an intervention (overruling the natural X) and X = 0 stands for an appropriate baseline value. Y depends causally on X but still associated with X since X is informative for unobserved confounding that also predicts Y. Explicit form for binary X estimating equations for continuous X. 29

IV Methods for Binary Outcome Multiplicative structural mean model: Unlike Wald-type, no distributional assumptions for X (given G). Requires joint information on all relevant variables. Assumes effect of exposure on exposed is same for different levels of G no effect modification by instrument local causal risk ratio. Reasonable? Population CRR if no interaction between U and X on multiplicative scale. Can we ever rule out such interactions? Simulations studies for the all-binary case SMM performed better in terms of asymptotic bias. 30

ALSPAC FTO, BMI and Asthma About 4,000 seven-year old children. G FTO genotype dichotomised TT & AT = 0, AA = 1 (A allele known to be associated with fat mass) X Body mass index (kg/m 2 ) Y doctor diagnosed asthma (yes,no) at 91 months (prevalence 14%) Test for association between G and Y OR not significantly different from 1 (p = 0.134). 31

ALSPAC FTO, BMI and Asthma IV estimates of causal effect of BMI on asthma: Naïve OR: 1.0806 Wald OR: 2.8776 Wald RR: 2.4584 SMM RR: 0.8799 (GMM RR: 0.7088) All have wide confidence intervals and we are, perhaps, estimating too close to the null value BUT why should we get different directions of causal effect? SMM and GMM only use information on exposure in the disease group (Y = 1): asthmatics heavier on average and go against population trend with genotype. 32

Conclusion Need a formal causal framework to disentangle associational and causal concepts in observational epidemiology. Causal inference always requires background knowledge to verify that assumptions are met. In case of Mendelian randomisation we have background knowledge genetics. IV methods avoid the assumption of no unobserved confounding but make other assumptions instead! What do these mean in epidemiological applications? 33

Discussion Points IV assumptions that crop up in other contexts: No effect modification by instrument. No treatment received (i.e. no defiers) in control group what does that mean here? Monotonicity assumption: X (G = 1) X (G = 0). Local causal effects effect of exposure on the exposed (effect of treatment received) as targeted by SMM. Is this what we want? effect of treatment on the compliers?? In order for IV methods to be useful for epidemiological applications, we need assumptions that can be verified in our context. 34