Methods for meta-analysis of individual participant data from Mendelian randomization studies with binary outcomes

Size: px

Start display at page:

Download "Methods for meta-analysis of individual participant data from Mendelian randomization studies with binary outcomes"

Merryl Harrison
5 years ago
Views:

1 Methods for meta-analysis of individual participant data from Mendelian randomization studies with binary outcomes Stephen Burgess Simon G. Thompson CRP CHD Genetics Collaboration May 24, 2012 Abstract Mendelian randomization is an epidemiological method for estimating causal associations from observational data by using genetic variants as instrumental variables. Typically the genetic variants explain only a small proportion of the variation in the risk factor of interest, and so large sample sizes are required, necessitating data from multiple sources. Meta-analysis based on individual patient data requires synthesis of studies which differ in many aspects. A proposed Bayesian framework is able to estimate a causal effect from each study, and combine these using a hierarchical model. The method is illustrated for data on C-reactive protein (CRP) and coronary heart disease (CHD) from the CRP CHD Genetics Collaboration (CCGC). Studies from the CCGC differ in terms of the genetic variants measured, the study design (prospective or retrospective, population-based or case-control), whether CRP was measured, the time of CRP measurement (pre- or post-disease), and whether full or tabular data were shared. We show how these data can be combined in an efficient way to give a single estimate of causal association based on the totality of the data available. Compared to a two-stage analysis, the Bayesian method is able to incorporate data on 23% additional participants and 51% more events, leading to a 23 26% gain in efficiency. Keywords: Mendelian randomization, meta-analysis, individual participant data, causal inference. Address: Department of Public Health & Primary Care, Strangeways Research Laboratory, Worts Causeway, Cambridge, CB1 8RN, UK. Telephone: Fax: Correspondence to: sb452@medschl.cam.ac.uk. 1

2 1 Introduction A fundamental epidemiological question of interest is whether an observed correlation between a risk factor and a disease is a causal or a non-causal association. Mendelian randomization is a technique for determining the causal association between a risk factor (X) and an outcome (Y ) in the presence of several possibly unmeasured confounders (U) [1]. A genetic variant (G) is sought which is associated with the risk factor, not associated with any of the confounders, and independent of the outcome conditional on the risk factor and confounders [2] [3]. Such a variable is known as an instrumental variable [4]. 1.1 Causal inference A causal association refers to the effect of an intervention on a specific risk factor, and is usually the target of interest for an epidemiologist [5]. This is contrasted with an observational association, which describes how the outcome depends on an observed difference in a risk factor. Unfortunately, interpreting the association between a risk factor and a disease outcome in observational data as a causal association relies on untestable and often implausible assumptions. This has led to several high-profile cases where a risk factor has been widely advocated as important in disease prevention from observational data, only to be later discredited when the evidence from randomized trials did not support a causal interpretation [6]. For example, observational studies reported a strong inverse association between vitamin C and coronary heart disease, which did not attenuate on adjustment for a variety of risk factors [7]. However, results of experimental data obtained from randomized controlled trials (RCTs) showed a null association with a positive point estimate for the association [8]. 1.2 Instrumental variables An instrumental variable (IV) is associated with the risk factor of interest and is used to estimate the causal effect of change in the risk factor while all other risk factors remain constant [9] [10]. The fundamental conditions for an IV to satisfy are summarized as [3] [4] [11]: i. the IV is associated with the risk factor (G X), ii. the IV is not associated with any confounder (G U), iii. the IV is conditionally independent of the outcome given the risk factor and confounders (G Y X, U). Subgroups defined as those with a given value of the IV are analogous to treatment arms in a RCT [12]. From the IV assumptions, these subgroups differ systematically in the risk factor, but not in any other factor [13]. A difference in disease incidence between these subgroups would therefore indicate a true causal relationship between risk factor and outcome [14]. 2

3 1.3 Mendelian randomization Mendelian randomization is instrumental variable analysis using genetic instruments [15]. Although not all Mendelian randomization studies have used IV methodology [16] [17], the use of genetic variants as IVs is at the core of Mendelian randomization. Genetic variants are ideal candidates for IVs, as genes are typically specific in function, affecting a single risk factor [18]. Genetic variation is determined at conception, so no reverse causation of an outcome on a genetic variant is possible. Genetic markers used as IVs are usually single nucleotide polymorphisms (SNPs) [11]. We consider data on C-reactive protein (CRP) and coronary heart disease (CHD) collated by the CRP CHD Genetic Collaboration (CCGC) [19] [20]. Although the methods in this paper were specifically designed for the data from the collaboration, we believe that they cover a wide range of study designs and scenarios and so will be useful for meta-analysis of Mendelian randomization data in other contexts. The use of any particular genetic variant as an IV requires caution as the IV assumptions may be violated for various epidemiological and biological reasons, such as the gene being associated with variables on multiple risk pathways (pleiotropy) [3] [11] [21] [22] [23]. Although the assumptions cannot be fully tested, when the function of the gene where the genetic variant is located is known, we have good biological plausibility for use of the genetic variant as an IV. We assume that the genetic variants used as an IV in this paper satisfy the necessary conditions; this has been discussed at length elsewhere [19]. To summarize, the SNPs used as IVs are taken from the CRP gene region on chromosome 1 and are not known to be associated with any potential confounding factors either from experimental knowledge or empirical testing in this dataset (out of 84 associations between 4 SNPs and 21 alternative risk factors for CHD, 3 had p < 0.05, with minimal p-value 0.003). 1.4 Meta-analysis In general, the variation in the risk factor of interest explained by genetic variants is small, and so adequately powered Mendelian randomization studies typically require large sample sizes, demanding synthesis of evidence from multiple, possibly heterogeneous studies [2]. We combat the problems raised by this heterogeneity by extending a Bayesian hierarchical method designed for continuous outcomes [24]. By making certain simplifying assumptions which are fully detailed below, we demonstrate how a range of different designs of studies with binary outcomes can be analysed using a logistic model, and how these causal estimates can be combined in a hierarchical model. We show how the parameters of genetic association can also be combined across studies, to strengthen the instrument and increase precision. By using the random effects distribution as an implicit prior for the genetic association parameters, we show how studies with no data on the risk factor can still be included in the analysis. By including both prevalent disease events (those reported at baseline) and incident events in prospective studies, we use all available data on disease outcomes. 3

4 1.5 Structure of paper Having discussed the data available and sources of heterogeneity between studies in the CCGC (Section 2), the methodological framework and statistical model for analysis is introduced (Section 3). We show how this can be used to analyse each study in the collaboration (Section 4), assessing the model assumptions by sensitivity analysis. Extensions are discussed which efficiently deal with issues of combining evidence across studies (Section 5), and then results are presented for the causal change in CHD due to CRP (Section 6). We conclude by discussing the interpretation and potential applications of this method (Section 7). 2 The CRP CHD Genetics Collaboration The CCGC is a collaboration of 47 epidemiological studies seeking to ascertain the causal role of C-reactive protein (CRP) in coronary heart disease (CHD) using a Mendelian randomization approach [20]. CRP is an acute-phase protein found in the blood which is associated with inflammation. It is known that CRP is observationally associated with CHD [25], but it is not known whether this association is causal [26] [27] [28]. Studies from the collaboration measure CRP levels, genes relating to CRP and CHD events. Individual participant data (IPD) have been collated by the coordinating centre. Table 1 lists the major statistical features of the studies in the CCGC. Further epidemiological characterization of the studies can be found in Appendix 1 of the published paper from the collaboration [19]. A list of study abbreviations as used in this paper can be found in Web Table A1. In all analyses, we restrict attention to participants of European descent, excluding the four studies with no participants of European descent from analysis. This is to ensure greater homogeneity of the study populations and to counteract violations of the IV assumptions due to population stratification [11]. CRP is positively-skewed, and so we take log(crp) as the risk factor. We use the term risk ratio as a generic term meaning hazard ratio or odds ratio as appropriate. 2.1 Issues leading to difficulties in evidence synthesis The studies to be combined differ in several aspects. We list some aspects of between-study variability such as clinical and methodological diversity, and differences in which variables have been measured. These lead to difficulties in evidence synthesis and possible statistical heterogeneity: 1. Study design: The collaboration includes prospective studies: cohort studies, nested case-control studies (both matched and unmatched); and retrospective studies: case-control studies (unmatched). Four of the studies in the collaboration did not provide IPD but only summary data on numbers of individuals with and without CHD events for each genotype. Different study designs are usually analysed using different methods and provide estimates which represent different quantities. 4

5 Total Number of subjects with... SNP data 1 Study 2 Study type participants Incident CHD Prevalent CHD CRP data 3 g1 g2 g3 g4 BRHS Cohort with prevalent cases BWHHS Cohort with prevalent cases CCHS Cohort with prevalent cases CGPS Cohort with prevalent cases CHS Cohort with prevalent cases EAS Cohort with prevalent cases ELSA Cohort with prevalent cases FRAMOFF Cohort with prevalent cases PROSPER Cohort with prevalent cases ROTT Cohort with prevalent cases NPHSII Cohort without prevalent cases WOSCOPS Cohort without prevalent cases EPICNOR Nested matched case-control HPFS Nested matched case-control NHS Nested matched case-control NSC Nested matched case-control CAPS Nested unmatched case-control DDDD Nested unmatched case-control EPICNL Nested unmatched case-control WHIOS Nested unmatched case-control MALMO Nested unmatched case-control with prevalent cases SPEED Nested unmatched case-control with prevalent cases ARIC Unmatched case-control CUDAS Unmatched case-control CUPID Unmatched case-control HIFMECH Unmatched case-control HIMS Unmatched case-control ISIS Unmatched case-control (see Web Appendix) LURIC Unmatched case-control PROCARDIS Unmatched case-control SHEEP Unmatched case-control WHITE2 Unmatched case-control CIHDS Unmatched case-control (CRP in controls only) BHF-FHS Unmatched case-control (no CRP data) CHAOS Unmatched case-control (no CRP data) GISSI Unmatched case-control (no CRP data) HVHS Unmatched case-control (no CRP data) INTHEART Unmatched case-control (no CRP data) UCP Unmatched case-control (no CRP data) AGES Tabular data HEALTHABC Tabular data MONICA/KORA Tabular data PENNCATH Tabular data Total Table 1: Summary of studies from the CRP CHD Genetics Collaboration with subjects of European descent 1 g1 = rs1205, g2 = rs , g3 = rs , g4 = rs or equivalent proxies (see Web Appendix). 2 A list of study abbreviations is given in the Web Table A1 3 In retrospective case-control studies, CRP data is taken in controls only; in prospective studies, in subjects without prevalent CHD. 5

6 2. Outcome data: The outcome was defined as fatal CHD (based on International Classification of Diseases codings) or nonfatal myocardial infarction (using World Health Organization criteria). In five studies, coronary stenosis (more than 50% narrowing of at least one coronary artery assessed by angiography) was also included as a disease outcome. As our case definition is almost uniform across studies, we do not expect this to be a source of heterogeneity. We refer to all outcomes as CHD, using the term prevalent to refer to a CHD event prior to blood draw for CRP measurement and incident to refer to a CHD event subsequent to blood draw. In some prospective studies, CRP measurements have not been taken at baseline, but rather at a later occasion, which we have redefined as our baseline. Hence, some of the individuals who had incident events in the original study will not have incident events in the baseline-transformed study. In most of the prospective cohort studies, the hazard function appears to be a smooth function of follow-up time (Web Figure A1). In later sections, we will investigate the sensitivity of regression of the outcome in cohort studies on parametric assumptions, and on ignoring variable follow-up. Although there are anomalous results in a few of the studies, it seems that these assumptions may not severely misrepresent the data. 3. Risk factor data: Some of the studies do not measure CRP level for all individuals, and others do not measure it for any individuals. In case-control data, cases have been oversampled with respect to the general population, and so we make inferences on the gene-risk factor association from controls [29] [30]. In prospective studies, as a CHD event may affect CRP levels, to prevent problems of reverse causation, we only make inferences on the gene risk factor association from individuals without a prevalent CHD event [3]. Different studies measured CRP using different assays, after different storage periods, and with and without prior fasting, leading to potential heterogeneity. Apart from low levels of CRP, where assays are not sensitive enough to determine between values, the distribution of log(crp) can be approximated by a normal distribution (Web Figure A2). The mean of the risk factor distribution in genetic subgroups varies, as shown for one of the studies in the collaboration (CGPS), but the standard deviation is similar in each subgroup with no clear trend (Web Figure A3). 4. Genetic data: The 43 studies in the collaboration with participants of European descent measure different genetic information in the form of SNPs. The number of SNPs measured in each study varies from 1 to 13. Over 20 SNPs in total are measured by at least one study. Only 6 studies measure all of the four pre-specified SNPs [20]: rs1205, rs , rs and rs Some studies measure SNPs which are in complete linkage disequilibrium (LD) with one of the pre-specified SNPs, and which can be used as proxies for these SNPs [31]. In total 20 studies measure all four SNPs or proxies thereof and an additional 17 measure some three out of these four. Others measure fewer than 6

7 this. SNPs measured all come from the CRP-regulatory gene on chromosome 1 and so display varying degrees of correlation. These genetic variants can be summarized by five haplotypes, which comprise 99% of the variation in European populations (Web Tables A2 and A3). The frequency of the haplotype patterns is similar in each of the studies in the collaboration (Web Figure A4), which provides evidence for the homogeneity of European descent populations, and supports our claims for use of proxy SNPs and determination of haplotypes. Further details about the genetic variants used as SNPs are found in the Web Appendix. 2.2 Weak instruments Although IV methods give consistent estimates of causal association, IV estimates in finite samples are biased towards the confounded observational estimate of association [32]. The magnitude of this bias depends on the strength of the statistical association between the IV and the risk factor, and is related to the F statistic in the regression of the risk factor on the IV. The F statistics in each study for different models of genetic association are given in Web Table A4. As there is little evidence for a more complex model (see Web Appendix for discussion), an additive per-allele SNP based model of genetic association in each study is used throughout. Even using this parsimonious model with one parameter for each IV, the F statistics in some studies are less than 10. IVs with an F statistic less than 10 are often labelled as weak instruments [33]. Such classification is misleading for several reasons. First, it gives a binary classification of IVs as either weak or strong based on an arbitrarily chosen threshold F value, whereas the bias is a continuous phenomenon. Secondly, it leads researchers to think that weak instrument bias is due to an intrinsic property of the instruments, whereas any instrument can be made stronger by increasing the sample size. Thirdly, the measured F statistic in a given dataset is an unreliable guide to the true strength of an instrument, due to the large sampling variability of the F statistic [34]. Fourthly, the use of rules based on measured F statistics, such as the choice of IVs or the exclusion of studies from a meta-analysis with F < 10, can lead to more bias than it prevents [35]. For these reasons, we seek to combat weak instrument bias through careful choice of analysis method rather than by post hoc selection of data. We discuss how weak instrument bias affects the methods used in this paper in Section

8 3 Methods for data analysis In this section, we present methods for instrumental variable analysis. Firstly, we present the classical two-stage method. This is equivalent to the commonly used ratio of coefficients valid for a single IV, but can be used with multiple instruments [36]. Then, we present a Bayesian method similar to the two-stage method, which extends naturally to a meta-analysis model where studies are combined in a hierarchical model on the causal parameter. Finally, we discuss approaches to IV estimation with a survival outcome. 3.1 Two-stage analysis The causal association can be estimated using a two-stage approach. With continuous outcomes, this is known as two-stage least squares (2SLS) [37]. In 2SLS, we first fit a linear regression of the risk factor on the IVs (G X regression), and secondly a linear regression of the outcome on the fitted values for the risk factor from the first stage regression ( ˆX Y regression). The 2SLS estimate ( ˆβ 2SLS ) is the coefficient for the increase in outcome per unit increase in risk factor. With binary outcomes, an analogous estimate has been proposed, called a two-stage [38], pseudo-2sls [39], two-stage predictor substitution [40] [41], or Wald-type estimator [42]. This replaces the second linear ˆX Y regression with a logistic regression. With a single instrument, the 2SLS and two-stage methods estimators coincide with the ratio of coefficients from the appropriate G Y regression (linear or logistic) divided by the coefficient from the G X regression [43]. There are several difficulties with this approach. Firstly, the fitted values for the risk factor are plugged into the second-stage regression without accounting for uncertainty, meaning that the precision in the causal estimate may be overestimated. Secondly, the distribution of the causal parameter is assumed to be normal, which may result in overly narrow confidence intervals when the instrument is weak [44]. Finally, in the non-linear case, the two-stage estimate is uncorrelated with the residuals in the G X regression, but not with the residuals in the ˆX Y regression, leading to bias compared to the conditional causal effect [39] and the coining of such two-stage analyses as forbidden regressions in the field of econometrics [45] [46]. However, when the two-stage estimate is compared to a marginal or population-averaged causal effect, which is different to the conditional causal effect due to non-collapsibility of the logistic function [3] [47], the two-stage estimate is shown to be close to unbiased with strong IVs [48] [49]. This contrasts to the control function approach [50], also known as the adjusted IV [51] or two-stage residual inclusion method [40], which has been shown to be biased for the conditional causal effect when there is confounding [41]. In this method, the first-stage residuals from the G X regression are included in the second-stage ˆX Y regression. The parameter estimated by the adjusted two-stage method does not have a clear interpretation in general, as the residuals represent a univariate combination of the unmeasured confounders, independent variation and measurement error in X [49]. Although we prefer the two-stage method on theoretical grounds, we include results from this method, which we label as the adjusted two-stage 8

9 method, when pre-chd event measures of the risk factor are available on the entire cohort, as using risk factor values measured after a CHD event may lead to reverse causation. We use the term two-stage to refer to a two-stage IV analysis and two-step to a two-step meta-analysis based on combining summary estimates from individual studies. All two-step meta-analyses in this paper use inverse-variance weighting and the DerSimonian Laird method of moments to estimate heterogeneity in a random-effects model [52]. 3.2 Bayesian methods To combat some of the difficulties with the two-stage methods, we use a Bayesian model with vague priors. We divide our population using genetic information into subgroups, where a subgroup contains all individuals in a study with a certain genotype. For each subgroup j, we estimate the mean level of risk factor for the subgroup j assuming that, for each individual i, the measured values of risk factor x ij come from a normal distribution with mean ξ j and variance σ 2, common across subgroups. Using a logistic model of outcome on risk factor, we model the probability of an event π j in subgroup j by assuming a binomial distribution of number of events n j from total number at risk N j and a linear relationship between the log-odds of event η j = logit(π j ) and the mean level of risk factor (ξ j ). The coefficient β 1, the increase in log-odds of an event for unit increase in the risk factor, is taken as our causal parameter of interest. As in the two-stage methods, we only use the risk factor values x ij for individuals from the control population in a case-control study, and for individuals without previous history of disease in a cohort study. Individuals with missing risk factor values are still included as cases or controls in the logistic regression. X ij N (ξ j, σ 2 ) (1) n j Binomial(N j, π j ) logit(π j ) = η j = β 0 + β 1 ξ j We model the risk factor as additive across SNPs with a per allele model for each SNP; justification for this is provided in the Web Appendix. For each subgroup j comprising all people with g jk (=0, 1, 2) variant allele copies for SNP k, k = 1,..., K, we estimate the change in risk factor per allele α k to give average levels of risk factor ξ j for each subgroup. K ξ j = α 0 + α k g jk (2) A haplotype-based model for the risk factor was also considered; details are given in the Web Appendix. In a meta-analysis context, we jointly estimate the causal parameter across studies in a hierarchical model. In a fixed-effect model, the causal parameter β 1 is the same k=1 9

10 for each study m = 1,..., M. X ijm N (ξ jm, σ 2 m) (3) n jm Binomial(N jm, π jm ) logit(π jm ) = η jm = β 0m + β 1 ξ jm In a random-effects model, the causal parameter is allowed to vary between studies, with a normal distribution imposed on the study-level causal parameters. Here, the causal parameter of interest µ β is the mean causal effect across studies. We replace the final line from (3) with: logit(π jm ) = η jm = β 0m + β 1m ξ jm (4) β 1m N (µ β, τ 2 ) where τ 2, the variance of the random-effects distribution, is a measure of the between-study heterogeneity in the β 1m. Hence, unlike the two-stage IV method, the Bayesian analysis is performed in one stage, and the meta-analysis is performed in one step. 3.3 Survival regression models Using the two-stage paradigm with survival outcomes, we perform second-stage Cox and Weibull proportional hazards regressions, censoring non-chd deaths. It is not clear what the parameter estimated by such regressions represents [53], and the results presented here are for comparative purposes only. We also convert the survival outcome into a binary outcome, ignoring variable follow-up, and use a logistic regression model. In the Bayesian framework, a Weibull distribution can be assumed for survival times, with shape parameter r and a log-linear model for the rate parameter µ j for each individual i in genotypic group j with time-to-event t ij : x ij N (ξ j, σ 2 ) (5) t ij Weibull(r, µ j ) log(µ j ) = η j = β 0 + β 1 ξ j If there is no event but an individual is right-censored, then we introduce a censoring indicator and use the likelihood contribution from the probability of not having an event until the time of censoring. A gamma distribution is used for the prior distribution of r with shape parameter 0.1 and rate parameter Details of Bayesian analyses In each of the Bayesian analyses below, vague independent N (0, ) priors were placed throughout on all regression parameters, independent U(0, 20) priors on the standard deviation parameters in the normal distributions of the risk factor, independent 10

11 U(0, 1) priors on the standard deviation parameters of random effects distributions, and inverse-wishart priors on the variance-covariance matrix of multivariate normal distributions, where the scale matrix in the Wishart distribution is diagonal with 10 as each diagonal element and 0 as each off-diagonal element. We use Markov chain Monte Carlo (MCMC) methods in WinBUGS [54] with at least iterations, of which the first 1000 are discarded as burn-in. We assess convergence by running three parallel chains with different starting values to assess convergence of the posterior distribution, examining the Gelman Rubin plots [55], and perform sensitivity analyses to show lack of dependence on the prior distributions. For ease of expression, we regard the mean of the posterior distribution as the estimate of the parameter of interest, the standard deviation of the posterior distribution as the standard error (SE), and the 2.5th to the 97.5th percentile range as the 95% confidence interval. 4 Analysis of individual studies For each of the study designs in the CCGC, we use a logistic model of disease association. This is for two reasons: first, to simplify calculations in the computationally intensive Bayesian framework, and secondly, to aim to estimate the same target parameter in each of the studies. We describe below how a logistic model is approximately valid for each study design. The difference between IV estimates based on different approaches (two-stage and Bayesian) and different models of association are examined in Section 6 as a sensitivity analysis for the simplifying assumptions to be made. In cohort studies, where possible, two analyses are performed. A retrospective analysis is performed by viewing the cohort at baseline as a cross-sectional study with cases taken as individuals with previous history of disease (prevalent cases) and controls as all individuals free from disease at baseline. A prospective analysis excludes all prevalent cases and considers CHD events within the reporting period. An individual who is censored at the end of the follow-up period is taken as a control in both the retrospective and prospective analyses as he has two separate opportunities to become a case. We look in turn at unmatched case-control studies and cohort studies viewed cross-sectionally, then matched case-control studies, and finally cohort studies viewed prospectively. In each case, we use both two-stage and Bayesian methods to estimate a causal effect. 4.1 Unmatched case-control studies and cross-sectional analysis of cohort studies For the case-control studies and cohort studies viewed cross-sectionally, we use a logistic model in the second stage regression. In both cases, this is the correct analysis, although with a cohort study, a log-linear model could also be used to estimate a relative risk, which is close to the odds ratio estimated by the logistic model under the rare-disease assumption. Table 2 shows that the two-stage and Bayesian methods give similar answers in most large studies. Some studies give less consistent results, 11

12 especially ISIS and HIFMECH, where no Bayesian results are given as the posterior distribution of the causal effect did not converge. In both of these studies, only one SNP is available and the F statistic in the additive G X model is less than 1, indicating that the IV explains less of the variation in the risk factor than would be expected by chance. 4.2 Analysis of matched case-control studies For the matched case-control studies, in the two-stage approach, we use conditional and unconditional logistic models in the second stage regression. In a matched case-control study, the effect size should be estimated using conditional logistic regression [56], although the bias from ignoring the matching is generally small [56] [57]. In the Bayesian approach, we use an unconditional logistic model, due to issues of computational complexity and difficulty of Bayesian inference on a conditional likelihood. Table 3 shows that for most studies the two approaches give broadly similar estimates. The Bayesian and two-stage random-effects pooled results are quite different due to different assumptions about heterogeneity. The lack of information on between-study heterogeneity due to the paucity of studies and diffuse prior on the heterogeneity parameter in the Bayesian approach gives a large estimate of τ. This conflict can be redressed by use of a more informative prior; two-stage and Bayesian fixed-effect meta-analyses (effectively a point-mass prior for τ concentrated at 0) give much closer results. 4.3 Analysis of cohort studies For the cohort studies viewed prospectively, we use Cox and Weibull proportional hazards, and logistic models in the second-stage regression. The adjusted two-stage method with a logistic model is also used (Web Table A5). In the Bayesian approach, we use a logistic model (1) and a Weibull model (5). For most studies, Table 4 shows that the approaches give similar estimates. There is a slight loss in precision in using a logistic model over a Cox or Weibull model, due to the loss of time-to-event information. We note that the Bayesian and two-stage analyses give similar inference throughout, especially in studies with over 100 events. The random-effects meta-analysis results are different between the Bayesian and two-stage analyses, but the fixed-effect results are almost identical. The correlation between the two-stage IV estimates in the ten cohort studies viewed prospectively and cross-sectionally (using a logistic model in both analyses; similar results are obtained using a Cox or Weibull model) is (Web Figure A6). 4.4 Differences between two-stage and Bayesian IV estimates Although there is broad agreement between the Bayesian and two-stage IV results in this section, there are several differences in results. We discuss some possible reasons for the differences. 12

13 1. Weak instrument bias: A simulation study has shown that a similar Bayesian method with a continuous outcome gives estimates with are free from weak instrument bias for IVs which would be conventionally thought of as weak (expected F statistic 5) [48]. This corresponds with the limited information maximum likelihood (LIML) method [58], which is based on the same likelihood function. LIML estimates are known to suffer less from weak instrument bias than two-stage methods [45]. With a binary outcome, simulations are less clear, although it seems that the Bayesian method gives similar results to the two-stage method, which estimates a population-averaged causal effect [49], but suffers less from weak instrument bias [48]. 2. Measurement error: When the estimates differ more substantially (say by more than 15% of the standard error), the Bayesian estimates are generally greater in magnitude than their two-stage counterparts, with standard errors similarly greater. The increase in size of effect may be due to random error in the mean risk factor estimates in genotypic groups leading to dilution of the regression coefficients in the second-stage regression and attenuation in the two-stage estimates [59]. As the Bayesian analyses allow for error in X, the Bayesian estimates should be unaffected by regression dilution bias. 3. Propagation of uncertainty: The Bayesian model estimates causal association in one stage, allowing for propagation of error and feedback throughout the model. In the two-stage model, there is no possibility of propagation of error or feedback from the second-stage to the first-stage regression. 4. Asymptotic assumptions: The Bayesian analysis gives a posterior distribution rather than a single point estimate. When the posterior distribution cannot be well-approximated by a normal distribution, the mean and median of the posterior can be quite different, and neither may be an adequate summary of the posterior. The two-stage estimate may be closer to one of the posterior mean or median than the other. The sampling distribution of IV estimates are typically non-normal with long tails, especially when the IV is weak [34]. This means that the asymptotic standard errors of the two-stage method may underestimate the true uncertainty in the causal parameter. Simulations in the continuous case for the two-stage method have shown poor coverage properties [44]. As the Bayesian method does not make asymptotic assumptions, the shape of the posterior distribution will reflect the true uncertainty in the parameter estimate. This may result in wider confidence intervals than those of the two-stage method, but the coverage levels of the confidence intervals in simulations have been shown to be better [48]. With regards to the Bayesian causal estimates which did not converge, this is usually due to lack of differentiation in mean risk factor levels between genetic subgroups, leading to gradients between risk factor and outcome which may be compatible with an infinite (vertical) association [48]. This is expressed in the two-stage method by a large standard error on the causal parameter, but represented more accurately by the confidence interval in the ratio method from 13

14 Fieller s Theorem, which may cover the entire real line, or by the Bayesian method, where the posterior distribution fails to converge. Hence, failure to converge in the Bayesian method is not (necessarily) a negative feature, but can be an indication that no proper posterior distribution reflects the uncertainty due to the weakness in the G X association. 5. Treatment of heterogeneity: The Bayesian model estimates a causal association in one stage. Similarly, the Bayesian meta-analysis model estimates a pooled association in one step. In the Bayesian meta-analysis, the prior for the heterogeneity parameter ensures that the heterogeneity is always positive. In a two-step meta-analysis, the DerSimonian Laird estimate of heterogeneity can be (and is often) zero. If there are not many studies or studies have imprecise estimates, the DerSimonian Laird estimate may be zero due to lack of evidence of heterogeneity, whereas the Bayesian one-step model recognizes only a lack of information on the between-study variance, and the posterior for τ is similar to the prior. The point estimate changes as heterogeneity increases, as larger studies are down-weighted in comparison to small studies [60]. For these reasons, while we would expect the results from a Bayesian and two-stage IV analysis to be close for large studies, they may well give different estimates if the sample size is small, if there are few events, or if the IV is weak. Random-effects meta-analysis estimates may be different if the number of studies is small and the prior on the between-study heterogeneity is diffuse. 4.5 Summary In this section, we have seen that despite the logistic model relying on certain assumptions, the causal estimates are not particularly sensitive to these assumptions, and the loss of information in discarding survival outcomes is not great. We conclude that using a logistic model in all studies is a reasonable simplifying assumption. The Bayesian and two-stage approaches make different assumptions in terms of feedback and propagation of errors between the regression stages, normality of the causal estimate, and heterogeneity in the random-effects models. We have seen that, where the number of cases is fairly large (n > 100) and the instrument strength is moderate (F-statistic in the G X regression > 5), the Bayesian and two-stage analyses give similar inferences. In meta-analysis models, the fixed-effect two-stage and Bayesian meta-analyses agree throughout, and the random-effects meta-analyses agree when the number of studies is large (e.g. Table 2 with M = 27). 14

15 Case-control studies Cohort studies Study N n Two-stage analysis Bayesian analyses ARIC (0.279) (0.314) CAPS (0.505) (0.600) CIHDS (0.225) (0.235) CUDAS (1.392) (2.176) CUPID (0.326) (0.491) DDDD (0.446) (0.628) EPICNL (0.340) (0.347) HIFMECH (2.508) - 1 HIMS (0.318) (0.333) ISIS (1.480) - 1 LURIC (0.212) (0.235) MALMO (0.158) (0.194) PROCARDIS (0.180) (0.185) SHEEP (0.216) (0.250) SPEED (0.488) (0.608) WHIOS (0.202) (0.216) WHITE (0.901) (1.238) BRHS (0.491) (0.500) BWHHS (0.475) (0.531) CCHS (0.772) (0.792) CGPS (0.325) (0.326) CHS (0.358) (0.375) EAS (0.974) (1.209) ELSA (0.461) (0.496) FRAMOFF (0.747) (0.852) PROSPER (0.258) (0.261) ROTT (0.388) (0.417) Pooled (0.061) (0.065) Heterogeneity I 2 = 0% (0 33%) ˆτ = Table 2: Unmatched case-control studies and cohort studies viewed cross-sectionally Log odds ratio of CHD per unit increase in log(crp) using two-stage and Bayesian IV methods with standard error, number of participants in study (N), number of events (n), pooled results from two-step inverse-variance weighted random-effects meta-analysis (two-stage) or hierarchical random-effects meta-analysis model (Bayesian), heterogeneity estimate (I 2 with 95% confidence interval for two-step method, ˆτ for hierarchical model) 1 Posterior distribution of causal effect did not converge. 15

16 Two-stage analyses Bayesian analyses Study N n Conditional logistic model Unconditional logistic model Logistic model EPICNOR (0.284) (0.280) (0.319) HPFS (0.405) (0.362) (0.543) NHS (0.327) (0.308) (0.374) NSC (0.327) (0.316) (0.338) Pooled (FE) (0.164) (0.156) (0.166) Pooled (RE) (0.164) (0.156) (0.266) Heterogeneity I 2 = 0% (0 83%) I 2 = 0% (0 82%) ˆτ = Table 3: Matched case-control studies Conditional and unconditional logistic models for causal log odds ratio of CHD per unit increase in log(crp) using two-stage and Bayesian IV methods with standard error, number of participants in study (N), number of events (n), pooled results from two-step inverse-variance weighted fixed-effects/random-effects (FE/RE) meta-analysis (two-stage) or hierarchical FE/RE meta-analysis model (Bayesian), heterogeneity estimate (I 2 with 95% confidence interval for two-step method, ˆτ for hierarchical model) from random-effects meta-analysis Two-stage analyses Bayesian analyses log-hr log-hr log-or log-hr log-or Study N n Cox model Weibull model Logistic model Weibull model 1 Logistic model BRHS (0.305) (0.306) (0.323) 0.51 (0.33) (0.351) BWHHS (1.034) (1.036) (1.042) (1.08) (1.085) CCHS (0.457) (0.457) (0.472) 0.04 (0.48) (0.482) CGPS (0.699) (0.700) (0.702) (0.71) (0.709) CHS (0.258) (0.259) (0.288) 0.68 (0.28) (0.307) EAS (0.689) (0.692) (0.722) 0.67 (0.84) (0.891) ELSA (0.828) (0.829) (0.833) (0.85) (0.857) FRAMOFF (0.965) (0.965) (0.974) 0.51 (1.21) (1.204) NPHSII (0.815) (0.830) (0.837) (0.97) (1.008) PROSPER (0.311) (0.312) (0.328) 0.25 (0.32) (0.337) ROTT (0.564) (0.565) (0.582) (0.61) (0.635) WOSCOPS (2.539) (2.540) (2.806) - 2 Pooled (FE) (0.137) (0.137) (0.145) 0.26 (0.13) (0.145) Pooled (RE) (0.159) (0.156) (0.175) 0.20 (0.21) (0.234) Heterogeneity I 2 = 14% (0 54) I 2 = 12% (0 51) I 2 = 19% (0 57) ˆτ = 0.35 ˆτ = Table 4: Cohort studies Cox, Weibull and logistic models for causal log risk ratio of CHD per unit increase in log(crp) using two-stage and Bayesian IV methods with standard error, number of participants in study (N), number of events (n), pooled results from two-step inverse-variance weighted fixed-effects/random-effects (FE/RE) meta-analysis (two-stage) or hierarchical FE/RE meta-analysis model (Bayesian), heterogeneity estimate (I 2 with 95% confidence interval for two-step method, ˆτ for hierarchical model): log hazard ratio (HR) and log odds ratio (OR) 1 The Weibull models were slower to run and mixed poorly, so results are only given to 2 decimal places due to Monte Carlo random error. 2 Posterior distributions of causal effect did not converge. 16

17 5 Dealing with issues of evidence synthesis In this section, we detail how the problems of combining evidence of heterogenous sources can be efficiently accomplished in the Bayesian model detailed above. 5.1 Cohort studies We want to include participants in cohort studies up to twice in the analysis, once in the study viewed retrospectively and once prospectively. However, we do not want to include the individual s risk factor data twice, and we want to ensure that the same parameter is estimated in both analyses. In the corresponding model (6), we consider genetic subgroup j. This subgroup contains N 1j individuals, n 1j of whom are prevalent cases, and N 2j (= N 1j n 1j ) non-prevalent individuals, n 2j of whom have incident events. X ij N (ξ j, σ 2 ) for i = 1,..., N 2j non-prevalent individuals (6) K ξ j = α 0 + α k g jk k=1 n 1j Binomial(N 1j, π 1j ) n 2j Binomial(N 2j, π 2j ) logit(π 1j ) = η 1j = β 01 + β 1 ξ j logit(π 2j ) = η 2j = β 02 + β 1 ξ j This model ensures that the same fitted values of the risk factor are used in both logistic regressions without including individuals twice in the regression of risk factor on genotype. Moreover, a single causal parameter β 1 is estimated. For comparison, in the two-stage method, we calculate the causal effect separately using prospectively and retrospectively assessed events, combine the two estimates using an inverse-variance weighted fixed-effect meta-analysis, and take the result of this as the study-specific effect. This assumes, incorrectly, that the two estimates are independent; such an assumption is not made in the Bayesian method. Although in this case the risk factor data is used twice, the main source of uncertainty in the causal estimates comes from the second-stage regression, and so inclusion of the risk factor data twice may not add undue precision to the overall pooled result. 5.2 Common SNPs Where the same subset of SNPs has been used in several studies, we can combine the estimates of genetic association α km across studies. This should give a more precise model of association in smaller studies and should reduce weak instrument bias, as instrument strength will be combined across the studies. Due to possible heterogeneity between populations, we use a random-effects model, where we impose a multivariate normal distribution on the study level parameters α km, k = 1,..., K 17

18 with mean vector µ α and variance-covariance matrix Ψ. Note that the intercept parameters α 0m for m = 1,..., M are not pooled. 5.3 Different sets of SNPs X ijm N (ξ jm, σm) 2 (7) K ξ jm = α 0m + α km g jkm k=1 (α 1m,..., α Km ) T N K (µ α, Ψ) (8) In the CCGC, there is no common set of SNPs measured in all studies. Due to correlation between the SNPs, it would not be valid to use the same parameters of genetic association α k in studies measuring different SNPs. In the overall meta-analysis, we use four different sets of parameters of genetic association, corresponding to four different patterns of measured SNPs in studies (see Web Appendix for details). 5.4 Lack of risk factor data Where a study m has not measured the risk factor (X) but has genetic data in common with other studies, we use the random-effects distributions for the genetic association parameters defined in equation (8) as a predictive distribution or implicit prior for the unknown parameters. This requires an assumption of exchangeability that the change in risk factor per additional allele is similar (i.e. can be drawn from the same random-effects distribution) as the other studies. We set α 0m = 0 as with no data on the G X association, this parameter cannot be identified. Incorporation of studies with information on only some of the gene risk factor outcome triangle needed for Mendelian randomization analysis is known in econometrics circles as the two sample problem [61]. 5.5 Tabular data For studies providing tabular data only, we had for each genetic subgroup j the total number of individuals (N j ) and the number with an event (n j ). We are able to incorporate such studies into our analysis using the random-effects distributions for the parameters of genetic association as above. 6 Meta-analysis of studies We apply the methods of the previous section to the CCGC data. Firstly, we look at estimation of the causal effect using a single instrument. We then present overall meta-analyses results from pooled two-stage estimates and from Bayesian hierarchical models. 18

19 6.1 Using instruments one at a time The G X and G Y associations are estimated in each study. A linear regression model is used for the G X association; the G Y association is estimated using logistic regression in unmatched case-control studies and cross-sectional analysis of cohort studies, conditional logistic regression in matched case-control studies and Cox regression in prospective analysis of cohort studies. The pooled results using each of the four pre-specified SNPs in turn are given in Table 5 (study-specific results in Web Figures A7 and A8). Using the method of Thompson et al. [62], we calculate causal estimates using each SNP in turn. Confidence intervals are constructed assuming the within-study correlation between G X and G Y association is zero, as recommended in the Thompson paper. G X G Y X Y SNP 1 Number of N n Pooled per allele p-value Heterogeneity studies effect (SE) (I 2 and 95% CI) g (0.0097) % (37 72%) g (0.0070) % (0 54%) g (0.0194) % (0 51%) g (0.0125) % (0 41%) g (0.0129) % (0 54%) g (0.0105) % (0 37%) g (0.0241) % (0 41%) g (0.0227) % (0 32%) SNP Number of studies N n Causal estimate (95% CI) g (-0.056, 0.237) g (-0.146, 0.168) g (-0.163, 0.195) g (-0.223, 0.203) Table 5: Pooled estimates from two-step inverse-variance weighted random-effects meta-analysis of per allele effect on log(crp) (G X association) and log odds of CHD (G Y association) in regression on each SNP in turn with standard error (SE), heterogeneity estimate; causal estimates (X Y association) of log odds ratio of CHD per unit increase in log(crp) from meta-analysis using method of Thompson et al. [62]; number of studies, total sample size (N), and total number of events (n) where appropriate 1 g1 = rs1205, g2 = rs , g3 = rs , g4 = rs or equivalent proxies (see Web Appendix) The causal estimates from each SNP are similar; heterogeneity of estimates would be evidence against the validity of one or more of the instruments [37]. As the causal estimates are derived from the same data and are correlated, they cannot be naively combined. As none of these analyses uses the totality of the genetic data, an integrated two-stage or Bayesian approach would be preferred. 19

Advanced IPD meta-analysis methods for observational studies

Advanced IPD meta-analysis methods for observational studies Simon Thompson University of Cambridge, UK Part 4 IBC Victoria, July 2016 1 Outline of talk Usual measures of association (e.g. hazard ratios)