EPSE 594: Meta-Analysis: Quantitative Research Synthesis

EPSE 594: Meta-Analysis: Quantitative Research Synthesis Ed Kroc University of British Columbia ed.kroc@ubc.ca March 28, 2019 Ed Kroc (UBC) EPSE 594 March 28, 2019 1 / 32

Last Time Publication bias Funnel plots, trim-and-fill procedures Ed Kroc (UBC) EPSE 594 March 28, 2019 2 / 32

Today Simpson s Paradox Psychometric considerations in meta-analysis Ed Kroc (UBC) EPSE 594 March 28, 2019 3 / 32

Funnel plots A useful visual tool to diagnose possible publication bias is a funnel plot: Plot each study s outcome effect size against its standard error. If the scatter of points is a symmetric blob around the summary effect size, then no evidence of significance bias. If the scatter of points trails off to the right (positive effect size) or to the left (negative effect size), then we have possible evidence of significance bias. Note: typical to draw a triangle (funnel) around the scatterplot of points: triangle is centred at summary effect, and has vertex angle defined by the 95% CI of the summary effect. Ed Kroc (UBC) EPSE 594 March 28, 2019 4 / 32

Funnel plots Funnel plot for Zheng et al. (2016): no evidence of PB Ed Kroc (UBC) EPSE 594 March 28, 2019 5 / 32

Funnel plots Funnel plot for hypothetical meta-analysis: possible evidence of PB Ed Kroc (UBC) EPSE 594 March 28, 2019 6 / 32

Asymmetry in funnel plots does not always imply PB Notice that we have been careful to say that nonsymmetric plots only show possible evidence of publication bias. This is because under a random effects model, we should expect some variation in the true effect sizes. Moreover, true effect size is often correlated with sample size (and so standard error). For example, when meta-analyzing well-designed RCTs, studies targeting smaller effect sizes will have larger sample sizes (to achieve reasonable power). Thus, we might expect studies with smaller standard errors to arise from studies estimating smaller true effect sizes. More generally, a moderating variable may explain the asymmetry in the funnel plot. Ed Kroc (UBC) EPSE 594 March 28, 2019 7 / 32

Asymmetry in funnel plots does not always imply PB Funnel plot for meta-analysis with no PB: skew explained by moderator Ed Kroc (UBC) EPSE 594 March 28, 2019 8 / 32

Asymmetry in funnel plots does not always imply PB When meta-analyzing well-designed studies, power will correlate with true effect size. Simulated meta-analysis with correlation between true effects and targeted effects (all with 80% power) of 0.6. No PB. Ed Kroc (UBC) EPSE 594 March 28, 2019 9 / 32

Trim and fill procedures If we see evidence of PB in the funnel plot, then we may want to adjust for it. How? Most common procedure is trim and fill (assume positive mean effect): Remove the study furthest to the right (biggest effect size); Compute the new summary effect; Repeat until funnel plot is symmetric; Then, to ensure we don t artificially deflate uncertainty, add the removed studies back in, and also add their mirror images on the opposite side of the new summary effect; Now we have an unbiased estimate of the summary effect and a semi-reasonable estimate of its uncertainty, assuming the initial asymmetry actually reflects true PB. Ed Kroc (UBC) EPSE 594 March 28, 2019 10 / 32

Trim and fill procedures Trim... Ed Kroc (UBC) EPSE 594 March 28, 2019 11 / 32

Trim and fill procedures... and fill Ed Kroc (UBC) EPSE 594 March 28, 2019 12 / 32

Trim and fill procedures Trim and fill is a nice technique, but it comes with major caveats: The technique assumes that asymmetry actually reflects true PB. The technique does not explicitly consider Type M error. The actual algorithm that does the trimming is prone to perform poorly when there are too few studies, or too many aberrant studies. The fill algorithm relies on imputation to create the missing effect sizes: this comes with a host of other modelling assumptions that we will not be able to test for a meta-analysis. In particular, a good technical argument can be made that the fill procedure artificially deflates uncertainty quite badly; it also can severely distort the true mean effect size. Can use trim-and-fill to see if your substantive conclusions change; if they do, then should attempt to find the source of alleged PB and adjust for it directly. Ed Kroc (UBC) EPSE 594 March 28, 2019 13 / 32

Simpson s Paradox Simpson s Paradox (also called Simpson-Yule Paradox or Lord s Paradox) occurs when a trend present in an aggregate dataset disappears or reverses when the dataset is split into groups, or more generally, when an omitted confounding variable is accounted for. This has major implications for inference. It is particularly troublesome in the context of meta-analysis, where we are combining a bunch of group (study) effects into a single (composite) effect. Ed Kroc (UBC) EPSE 594 March 28, 2019 14 / 32

Simpson s Paradox: Ex. 1 Ed Kroc (UBC) EPSE 594 March 28, 2019 15 / 32

Simpson s Paradox: Ex. 2 Ed Kroc (UBC) EPSE 594 March 28, 2019 16 / 32

Simpson s Paradox: Berkeley admissions In 1973, alleged gender bias in grad school admissions at UC-Berkeley: Chi-squared test yields p-value! 0.000001. So ostensible evidence for gender bias, but... Ed Kroc (UBC) EPSE 594 March 28, 2019 17 / 32

Simpson s Paradox: Berkeley admissions Broken down by department, a very different picture emerges: Ed Kroc (UBC) EPSE 594 March 28, 2019 18 / 32

Simpson s Paradox: Berkeley admissions (1) No consistent evidence of gender bias; in fact, one could argue that a possible gender bias exists in favour of women applicants in Department A. Ed Kroc (UBC) EPSE 594 March 28, 2019 19 / 32

Simpson s Paradox: Berkeley admissions (2) Women tend to apply to departments with higher overall rejection rates; may reflect underlying societal gender biases at work, but not in the admissions process. Ed Kroc (UBC) EPSE 594 March 28, 2019 20 / 32

Simpson s Paradox: Kidney stone treatments Two treatments for kidney stones: A = open surgery (invasive), B = laparoscopy (mildly invasive) Larger kidney stones is a more severe condition than small stones. Ed Kroc (UBC) EPSE 594 March 28, 2019 21 / 32

Simpson s Paradox: Kidney stone treatments Ignoring severity (confounding variable), Treatment B is more effective. Yet Treatment A is more effective at treating both mild and severe cases. Ed Kroc (UBC) EPSE 594 March 28, 2019 22 / 32

Simpson s Paradox: Kidney stone treatments Why does this happen? Notice the cell counts: Groups 2 and 3 dominate. Thus, the combined estimates are driven by the proportions in Groups 2 and 3, and Group 2 success rate is higher. Ed Kroc (UBC) EPSE 594 March 28, 2019 23 / 32

Simpson s Paradox as Ecological Fallacy Simpson s Paradox is an example of a more general phenomenon known as the ecological fallacy. An ecological fallacy occurs when we use an inference at an ecological (aggregate) level to make claims about what happens at the individual (group) level. Classic example: Income positively correlates with tendency to vote Republican (USA). Thus, richer states tend to vote Republican more than poorer states... FALSE! Here, voting preference is affected by overall wealth of the state even after controlling for individual wealth: self-perceived relative wealth? Ed Kroc (UBC) EPSE 594 March 28, 2019 24 / 32

Simpson s Paradox as Ecological Fallacy In a meta-analysis, this is a potentially serious concern. Why? We are aggregating group (study) level effects to estimate a combined (ecological) effect. Thus, a positive aggregate association of treatment with condition may actually mask negative associations within each individual study. Ed Kroc (UBC) EPSE 594 March 28, 2019 25 / 32

Psychometric issues in meta-analysis In psychometrics, we are often very concerned with issues of measurement, namely: Reliability: how variable, or imprecise, a measurement process is. Validity: how well (how accurately) the measurement captures the phenomenon it is trying to quantify. Ed Kroc (UBC) EPSE 594 March 28, 2019 26 / 32

Psychometric issues in meta-analysis Classically, one proposes the following framework: Each subject (e.g. person) has a unique true value (score), T, of some particular phenomenon of interest. This true value cannot be measured directly; instead, we observe (measure) only a proxy for it; this is the observed score, X. This observed score may differ from the true score; thus we propose a generic measurement error model: X T ` E, where E denotes the measurement error. Usually, further assumptions are then imposed on the the structure of the errors to more accurately model a real-life phenomenon and measurement process. Ed Kroc (UBC) EPSE 594 March 28, 2019 27 / 32

Psychometric issues in meta-analysis In the context of meta-analysis, it may be natural to ask how reliable or how valid are measurements (observed effects) are for the actual phenomenon (true effects) they are trying to quantify. Notice: this is not the same thing as sample error. Sample error occurs because our sample will not capture every relevant feature of the overall population. In contrast, measurement error speaks to how well our measurement process captures the phenomenon it is try to quantify. E.g. one could have census-level data (no sampling error), that is still subject to substantial measurement error. Ed Kroc (UBC) EPSE 594 March 28, 2019 28 / 32

Psychometric issues in meta-analysis In the context of meta-analysis, one may want study weights to explicitly account for the reliability or validity of a measurement from a particular study. For many reasons, you really only see this done with estimates of reliability. For most (but not all) measurement error situations, extra variance due to measurement error (i.e. imperfect reliability) will have an attenuating effect on model estimates; i.e. measurement error tends to cause our estimates to shrink towards the null. However, if we could adjust our estimates before meta-analyzing them, then we could potentially remove (at least some) of this attenuation bias. Ed Kroc (UBC) EPSE 594 March 28, 2019 29 / 32

Psychometric issues in meta-analysis First, we need to understand what reliability of a measurement process is, and how to quantify it. Formally, reliability of a measurement X for a true score T is defined as R : VarpT q VarpX q 1 VarpEq VarpX q. Under the classical test theory measurement error model, if one has two parallel measurements X and X 1 for T, then reliability is also equal to: R ρ 2 XT. X and X 1 are parallel measurements for T if their variances are equal, and their corresponding errors are uncorrelated. Ed Kroc (UBC) EPSE 594 March 28, 2019 30 / 32

Psychometric issues in meta-analysis It can be shown that if ρ is a correlation (effect size) between variables, with one subject to this kind of classical measurement error, and if ρ adj is the corrected correlation (free of measurement error), then a ρ ρ adj is equal to the square root of the reliability of the measurement process. Thus, to adjust for attenuation due to this kind of measurement error, we need a way to estimate reliability. Many methods for this: Cronbach s α is the most common. Crucially, all these methods always yield underestimates of the actual (theoretical) reliability. Ed Kroc (UBC) EPSE 594 March 28, 2019 31 / 32

Psychometric issues in meta-analysis With some (under)-estimate of reliability in place, we could now adjust our observed effects for measurement error: r adj r pa Similarly, we can adjust the corresponding variance of the observed effects via: Varpr adj q Varprq pa 2 Now can proceed to meta-analysis as usual, but using these adjusted estimates of effect size and standard error. Ed Kroc (UBC) EPSE 594 March 28, 2019 32 / 32