Results Univariable analyses showed that heterogeneity variances were, on average, increased among trials at

Similar documents
Keywords: meta-epidemiology; randomised trials; heterogeneity; Bayesian methods; Cochrane

Between-trial heterogeneity in meta-analyses may be partially explained by reported design characteristics

Empirical evidence on sources of bias in randomised controlled trials: methods of and results from the BRANDO study

Tucker, L. R, & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor

How Should Blood Glucose Meter System Analytical Performance Be Assessed?

A Comparison of Poisson Model and Modified Poisson Model in Modelling Relative Risk of Childhood Diabetes in Kenya

Challenges and Implications of Missing Data on the Validity of Inferences and Options for Choosing the Right Strategy in Handling Them

Bivariate Quantitative Trait Linkage Analysis: Pleiotropy Versus Co-incident Linkages

Models for potentially biased evidence in meta-analysis using empirically based priors

CHAPTER 7 THE HIV TRANSMISSION DYNAMICS MODEL FOR FIVE MAJOR RISK GROUPS

Predicting Time Spent with Physician

Matching Methods for High-Dimensional Data with Applications to Text

Fuzzy Analytical Hierarchy Process for Ecological Risk Assessment

Survival and Probability of Cure Without and With Operation in Complete Atrioventricular Canal

Optical coherence tomography (OCT) is a noninvasive

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

OBESITY EPIDEMICS MODELLING BY USING INTELLIGENT AGENTS

FAST ACQUISITION OF OTOACOUSTIC EMISSIONS BY MEANS OF PRINCIPAL COMPONENT ANALYSIS

AIDS Epidemiology. Min Shim Math 164: Scientific Computing. April 30, 2004

Physical Activity Training for

Implications of ASHRAE s Guidance On Ventilation for Smoking-Permitted Areas

Bayesian Networks Modeling for Crop Diseases

A CLOUD-BASED ARCHITECTURE PROPOSAL FOR REHABILITATION OF APHASIA PATIENTS

LONG-TERM PROGNOSIS OF SEIZURES WITH ONSET IN CHILDHOOD LONG-TERM PROGNOSIS OF SEIZURES WITH ONSET IN CHILDHOOD. Patients

DIET QUALITY AND CALORIES CONSUMED: THE IMPACT OF BEING HUNGRIER, BUSIER AND EATING OUT

Evolution of Indirect Reciprocity by Social Information: The Role of

Toll Pricing. Computational Tests for Capturing Heterogeneity of User Preferences. Lan Jiang and Hani S. Mahmassani

Learning the topology of the genome from protein-dna interactions

The sensitivity analysis of hypergame equilibrium

Direct in situ measurement of specific capacitance, monolayer tension, and bilayer tension in a droplet interface bilayer

Data Mining Techniques for Performance Evaluation of Diagnosis in

Assessment of Human Random Number Generation for Biometric Verification ABSTRACT

Adaptive visual attention model

Application of Factor Analysis on Academic Performance of Pupils

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

arxiv: v1 [cs.lg] 28 Nov 2017

A new approach for epileptic seizure detection: sample entropy based feature extraction and extreme learning machine

Biomedical Research 2016; Special Issue: S178-S185 ISSN X

Research Article Association Patterns of Ontological Features Signify Electronic Health Records in Liver Cancer

Performance Measurement Parameter Selection of PHM System for Armored Vehicles Based on Entropy Weight Ideal Point. Yuanhong Liu

Systematic review with multiple treatment comparison metaanalysis. on interventions for hepatic encephalopathy

Evaluation of an In-Situ Output Probe-Microphone Method for Hearing Aid Fitting Verification*

Policy Trap and Optimal Subsidization Policy under Limited Supply of Vaccines

The Roles of Beliefs, Information, and Convenience. in the American Diet

Spatiotemporal Dynamics of Tuberculosis Disease and Vaccination Impact in North Senatorial Zone Taraba State Nigeria

Diabetologia 9 Springer-Verlag 1995

Beating by hitting: Group Competition and Punishment

Did Modeling Overestimate the Transmission Potential of Pandemic (H1N1-2009)? Sample Size Estimation for Post-Epidemic Seroepidemiological Studies

Hierarchical Cellular Automata for Visual Saliency

Investigation of Binaural Interference in Normal-Hearing and Hearing-Impaired Adults

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

interactions (mechanism of folding/contact free energies/range of interactions/monte Carlo)

Seizure recurrence after a 1st unprovoked seizure:

The Price OF POLLUTION. Cost Estimates of Environmentally-Related Disease in Oregon

Sudden Noise Reduction Based on GMM with Noise Power Estimation

Community Health Environment Scan Survey (CHESS): a novel tool that captures the impact of the built environment on lifestyle factors

Outcome measures in palliative care for advanced cancer patients: a review

Where a licence is displayed above, please note the terms and conditions of the licence govern your use of this document.

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Methods Research Report. An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy

Bayesian and Frequentist Approaches

Follicle Detection in Digital Ultrasound Images using Bidimensional Empirical Mode Decomposition and Fuzzy C-means Clustering Algorithm

Controlled Trials. Spyros Kitsiou, PhD

Survival and Predictors of Survival in Patients With Congestive Heart Failure

Fructosamine. Claudia E. Reusch, DVM, PhD, Marianne R. Liehs, Martine Hoyer, and Renate Vochezer

Human Biology Open Access Pre-Prints

Automatic Seizure Detection Based on Wavelet-Chaos Methodology from EEG and its Sub-bands

How to Conduct a Meta-Analysis

(From the Department of Biology, St. Louis University, and the Department of Pathology, St. Louis University School of Medicine, St.

Speech Enhancement Using Temporal Masking in the FFT Domain

Magdalena Wojtczak, Anna C. Schroder, Ying-Yee Kong, and David A. Nelson

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data

Health Technology Assessment NIHR HTA programme September /hta16350

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

A novel technique for stress recognition using ECG signal pattern.

The Level of Participation in Deliberative Arenas

Department of Radiology and Radiation Oncology, Gunma University School of Medicine, Maebashi, Japan

The RoB 2.0 tool (individually randomized, cross-over trials)

Sex differences in binding of human growth hormone to isolated rat hepatocytes (hormone receptors/prolactin/estrogen effect)

Fixed Effect Combining

Meta-analyses: analyses:

Types of Data. Systematic Reviews: Data Synthesis Professor Jodie Dodd 4/12/2014. Acknowledgements: Emily Bain Australasian Cochrane Centre

Systematic Reviews. Simon Gates 8 March 2007

Comparison of Meta-Analytic Results of Indirect, Direct, and Combined Comparisons of Drugs for Chronic Insomnia in Adults: A Case Study

AUC Optimization vs. Error Rate Minimization

Fig.1. Block Diagram of ECG classification. 2013, IJARCSSE All Rights Reserved Page 205

Lung Volume Reduction Surgery Using the NETT Selection Criteria

Evolutionary and Ecological Perspectives on Systems Diseases using Agent-based Modeling

plant tissue. III. The role of the root cortex cells

Strain-rate Dependent Stiffness of Articular Cartilage in Unconfined Compression

Supplementary Online Content

Modelling heterogeneity variances in multiple treatment comparison meta-analysis Are informative priors the better solution?

changes in extracellular sodium or potassium concentration or in extracellular fluid volume could not account for these effects (1, 2).

Prognostic effect of factors involved in revised Tokuhashi score system for patients with spinal metastases: a systematic review and Meta-analysis

In many cleft palate centers, especially in Europe, infant orthopedics (IO) is used in the comprehensive care of children

ARCHE Risk of Bias (ROB) Guidelines

1. Introduction. {< > 1, < > 2 }

Live WebEx meeting agenda

Transcription:

Between-trial heterogeneity in eta-analyses ay be partially explained by reported design characteristics KM Rhodes 1, RM Turner 1,, J Savović 3,4, E Jones 3, D Mawdsley 5, JPT iggins 3 1 MRC Biostatistics Unit, School of Clinical Medicine, University of Cabridge, Cabridge, UK MRC Clinical Trials Unit, University College London, UK 3 Population ealth Sciences, Bristol Medical School, University of Bristol, UK 4 NIR CLARC West, University ospitals Bristol NS Foundation Trust, Bristol, UK 5 University of Manchester, Manchester, UK Abstract Objective We investigated the associations between risk of bias judgents fro Cochrane reviews for sequence generation, allocation concealent and blinding and between-trial heterogeneity. Study Design and Setting Bayesian hierarchical odels were fitted to binary data fro 117 eta-analyses, to estiate the ratio λ by which heterogeneity changes for trials at high/unclear risk of bias, copared to trials at low risk of bias. We estiated the proportion of between-trial heterogeneity in each eta-analysis that could be explained by the bias associated with specific design characteristics. Results Univariable analyses showed that heterogeneity variances were, on average, increased aong trials at high/unclear risk of bias for sequence generation ( ˆλ 1.14, 95% interval: 0.57 to.30) and blinding ( ˆλ 1.74, 95% interval: 0.85 to 3.47). Trials at high/unclear risk of bias for allocation concealent were on average less heterogeneous ( ˆλ 0.75, 95% interval: 0.35 to 1.61). Multivariable analyses showed that a edian of 37% (95% interval: 0% to 71%) heterogeneity variance could be explained by trials at high/unclear risk of bias for sequence generation, allocation concealent and/or blinding. All 95% intervals for changes in heterogeneity were wide and included the null of no difference. Conclusion Our interpretation of the results is liited by iprecise estiates. There is soe indication that between-trial heterogeneity could be partially explained by reported design characteristics, and hence adjustent for bias could potentially iprove accuracy of eta-analysis results. Keywords: eta-analysis; heterogeneity; sequence generation; allocation concealent; blinding; randoized trials

Introduction In published eta-analyses, the original studies are often affected by varying aounts of internal bias caused by ethodological flaws. Epirical studies have investigated the extent of between-study heterogeneity in a eta-analysis [1, ]. This is likely to coprise a ixture of variation caused by true diversity aong the study designs, variation due to within-study biases and unexplained variation. For this reason, it would be preferable to separate heterogeneity due to bias fro other sources of between-study variation, as proposed by iggins et al. [3]. Biases associated with reported study design characteristics can be investigated within etaepideiologica l studies that analyse a collection of eta-analyses. An early exaple is that of Schulz et al. [4], where the ethodological quality of 50 randoized controlled trials fro 33 etaanalyses within the Cochrane Pregnancy and Childbirth database was assessed. Schulz et al. provided epirical evidence to suggest that trials in which randoization is inadequately concealed report exaggerated estiates of intervention effect copared with adequately concealed trials. There was also soe indication that trials with inadequate blinding yield larger effect estiates. More recently, the BRANDO (Bias in Randoized and Observational Studies) study and ROBES (Risk of Bias in Evidence Synthesis) study have investigated the associations between reported design characteristics and intervention effects and heterogeneity [5, 6]. The BRANDO study cobined data fro all existing eta-epideiological studies (collections of eta-analyses) into a single database, coprising 1973 independent trials included in 34 eta-analyses. The ROBES database included 8 binary outcoe eta-analyses fro Cochrane reviews that had ipleented the Cochrane riskof-bias tool [7]. In the BRANDO study, all trials included in the database had been categorised according to whether they were judged as adequate, inadequate, or unclear for sequence generation, allocation concealent and double-blinding. Trials included in the ROBES study database had been categorised as being at high, low or unclear risk of bias for sequence generation, allocation concealent, blinding and incoplete outcoe data, using the Cochrane risk-of-bias tool. The results of both eta-epideiological studies showed that the relative intervention effect in favour of the experiental treatent is, on average, odestly exaggerated in trials with inadequate randoization and lack of blinding. The ROBES study found no evidence of bias due to a high or unclear risk of bias assessent for incoplete outcoe data. Both studies also found that bias in intervention effect estiates associated with the lack of blinding in trials with subjective outcoe easures ay be unpredictable in its direction and agnitude, leading to increased within eta-analysis heterogeneity.

When deciding how to handle suspected biases, eta-analysts often consider whether to restrict their analyses to studies at lower risk of bias or to include all available evidence. Restricting analyses to studies at lower risk of bias ay lead to an unbiased result, but this result would be iprecise if high quality evidence is sparse. On the other hand, cobining all available studies and ignoring flaws in their conduct could lead to biased suary estiates with inappropriate clinical or policy decisions as a possible consequence. Welton et al. [8] proposed a ethod for eta-analysis that uses all available data, while adjusting for and down-weighting the evidence fro lower quality studies, based on evidence fro a eta-epideiological study. The analyses of BRANDO and ROBES followed ethods proposed by Welton et al.[8], which odel the effects of lower quality design characteristics on average bias and between-trial heterogeneity. Under these odels, the trials judged to be of poorer quality were assued to be at least as heterogeneous as those of higher quality, which ay not be the case. We have since proposed labelinvariant odels that avoid this constraint [9]. Our odels are ore flexible than the odels of Welton et al. in allowing us to quantify the ratio by which heterogeneity changes for studies with lower quality design characteristics. This facilitates investigation of how uch between-study heterogeneity in a eta-analysis is attributable to lower quality studies. Bias can lead to overestiation or underestiation of the true intervention effect in a study, and we could expect differences in risk of bias across studies to contribute to variation aong the results of studies included in a eta-analysis. ere we re-analyse trial data fro the ROBES database, using our label-invariant odels to investigate the associations between risk of bias judgents fro Cochrane reviews and heterogeneity aong randoized controlled trials. We investigate the extent of heterogeneity in a eta-analysis that is due to within-trial biases. The epirical evidence provided gives useful inforation on the extent to which we ight expect the between-trial variance to change in a eta-analysis, if we adjust for known sources of bias. Methods Data description We ake use of data fro the ROBES (Risk of Bias in Evidence Synthesis) [5] study, which is a large collection of eta-analyses extracted fro the Cochrane Database of Systeatic Reviews. These data were originally used to exaine the associations between reported design characteristics and intervention effect estiates in eta-analyses. Meta-analyses with fewer than five trials were excluded, as were eta-analyses where the review authors considered pooling to be inappropriate or where nuerical data were unavailable. One or ore binary outcoe eta-analysis fro each eligible review was included in the database, corresponding to a priary outcoe where possible.

The dataset includes 8 eta-analyses fro Cochrane reviews that had inforation on all five of the following Risk of Bias ites: sequence generation; allocation concealent; blinding; incoplete outcoe data and selective outcoe reporting. In this paper we do not consider the influence of accounting for bias caused by incoplete outcoe data or selective outcoe reporting on heterogeneity. The ROBES study found no evidence of exaggerated intervention effect aong trials at high or unclear risk of bias (copared with low risk of bias) for assessent of incoplete outcoe data [5] and it is not generally recoended to try to adjust for selective outcoe reporting bias in eta-analysis [10]. Our statistical analyses were carried out on a subset of the ROBES study, coprising 1473 trials fro 117 eta-analyses. These eta-analyses contained at least one trial at low risk of bias and at least one trial at high or unclear risk of bias for each of the three characteristics of interest: sequence generation, allocation concealent and blinding. Focusing on one subset of the data throughout all analyses allowed for direct coparison of results assessing the influences of accounting for different cobinations of study design characteristics on heterogeneity. Table 1 shows the structure of the dataset. For each trial included in the ROBES database, we have binary outcoe data consisting of the nuber of events in each treatent ar and the total nuber of participants in each ar. The direction of outcoe events in the ROBES database is coded such that the outcoe for each trial corresponds to a harful event. All eta-analyses in the database have been categorised according to the type of outcoe under assessent and the types of interventions evaluated, in the sae way as Turner et al. [1]. Outcoes in the ROBES database were classified into three broad categories (allcause ortality, other objective, subjective) in the sae way as the BRANDO study [6]. Table 1 Structure of the dataset N Min Median Max IQR No. of trials per 117 eta-analyses 5 10 75 6 to 14 eta-analysis No. of participants per trial 1473 trials 8 119 18,000 60 to 67 Statistical analysis We used label-invariant hierarchical odels to analyse trial data fro all included eta-analyses siultaneously. The odels were fitted as described in an earlier paper [9] and are based on an extension of the odel described as Model 3 by Welton et al. [8]. Within each eta-analysis, a odel with binoial within-trial likelihoods was fitted to the binary outcoe data fro each trial on

the log odds ratio scale. The odel assues that the higher quality trials at low risk of bias provide an unbiased estiate of intervention effect, assued to have a noral rando-effects distribution with varianceτ specific to each eta-analysis indexed. Throughout our analyses, we used a dichotoised variable for each design characteristic (high or unclear risk of bias copared with low risk of bias for sequence generation, allocation concealent and blinding). The trials at high or unclear risk of bias are assued to estiate the su of two coponents: the sae intervention effect as the trials at low risk of bias plus soe trial-specific bias. Within each eta-analysis, we quantify variation aong trials at high or unclear risk of bias by λτ, which can be lower or higher than the variation τ aong trials at low risk of bias. For each design characteristic, the hierarchical odels allow us to estiate: the average bias in estiated intervention effect within eta-analysis (b ); the average bias in estiated intervention effect across eta-analyses (b 0 ); the ratio by which betweentrial heterogeneity in intervention effects changes for trials with potential flaws (λ); and variation in average bias across eta-analyses (φ). We first conducted univariable analyses exaining the influence of accounting for a single trial design characteristic on heterogeneity before carrying out ultivariable analyses exaining the influence of accounting for all three design characteristics. In ultivariable analyses, interactions between the different design characteristics were assued to have distinct variance coponents λ and φ. Following the approach of Turner et al. [1], we fitted a log-noral odel to underlying values of heterogeneity varianceτ in intervention effect aong trials with low risk of bias across etaanalyses. Previous research has shown that the extent of total heterogeneity in a eta-analysis differs according to the type of outcoe exained in the eta-analysis [1, ]. To investigate association between the type of outcoe under assessent and the heterogeneity variance aong trials with low risk of bias, we included indicators for the different types of outcoe as covariates in the odel for τ. All odels were fitted using Markov chain Monte Carlo (MCMC) ethods within WinBUGS Version 1.4.3 [11]. We based results on 100,000 iterations, following a burn-in period of 10,000 iterations, which was sufficient to achieve convergence and produced low MC error rates. Convergence was assessed according to the Brooks-Gelan-Rubin diagnostic tool [1], using two chains starting fro widely dispersed initial values. As in our earlier paper [9], we assigned noral(0,1000) prior distributions to location paraeters and a log-noral(0,1) prior to λ. Variation in average bias across eta-analyses, φ, was assigned an inverse-gaa(0.001,0.001) prior with increased weight on sall values. Model fit was assessed using the deviance inforation criterion (DIC), as recoended by

Spiegelhalter et al. [13,14]. Due to the non-linearity between the likelihood and the odel paraeters, we calculated the effective nuber of paraeters at the posterior ean of the fitted values rather than at the posterior ean of the basic odel paraeters [15]. The WinBUGS code for fitting the label-invariant odels is available in the Supporting Inforation of an earlier paper [9]. Quantifying heterogeneity due to bias It is of interest to quantify the proportion of between-trial heterogeneity in a eta-analysis that can be explained by the bias associated with reported design characteristics. This requires an estiate of total heterogeneity variance aong all trials included in a eta-analysis and an estiate of the heterogeneity variance after accounting for biases. The latter is estiated fro the odel above, where the three design characteristics are assued to be responsible for all of the within-trial biases. In univariable analyses for the influence of accounting for a single characteristic, we estiated total heterogeneity variance τ in a eta-analysis, using the forula total, τ / ((1 π ) τ + π λτ + π (1 π ) b ), where π is the proportion of trials at high or unclear risk of bias in eta-analysis. The derivation of this forula for total heterogeneity variance is provided in Suppleentary aterial (S1), together with the forula used in ultivariable analyses for the influence of accounting for two characteristics. We note that the forula used in ultivariable analyses for the influence of accounting for three characteristics is derived in the sae way. For each eta-analysis within the subset of 117 eta-analyses in ROBES, we used WinBUGS to obtain the posterior edian for the ratio of between-trial variance aong trials at high or unclear risk of bias to total between-trial variance 1 τ / τ. For each individual design characteristic and all total, cobinations of design characteristics, we suarise the proportion of heterogeneity attributable to trials at high or unclear risk of bias by the edian and 95% interval of posterior edians for 1 τ / τ across eta-analyses indexed. total, Negative estiates of the proportion of heterogeneity due to trials at high or unclear risk of bias occur where the estiate of total heterogeneity variance t aong all trials included in a eta-analysis is total less than the estiate of the heterogeneity variance τ aong trials at low risk of bias. We note that t is not only increased fro total τ by the heterogeneity variance aong trials at high or unclear risk of bias, but also the difference in intervention effect between the trials at high or unclear risk of bias and the trials at low risk of bias (see forula in Suppleentary aterial (S1)). We set the negative values of the ratio to zero, since total between-trial heterogeneity in the eta-analysis cannot be explained by the trials at high or unclear risk of bias.

We graphically explored the influence of accounting for reported design characterises on heterogeneity on randoized trials in eta-analysis. For each eta-analysis within the subset of ROBES, we plotted the posterior edian of heterogeneity variance τ aong trials at low risk of bias against the posterior edian of heterogeneity variance τ total, aong all trials. Results Descriptive analyses Table reports the nuber of trials with each cobination of reported design characteristics. The frequency of trials categorised as being at high or unclear risk of bias for a single design characteristic was 303 (1%), of which 75 (5%) were at high or unclear risk of bias for sequence generation, 98 (3%) were at high or unclear risk of bias for allocation concealent and 130 (43%) were at high or unclear risk of bias for blinding. The nuber of trials categorised as being at high or unclear risk of bias for precisely two design characteristics was soewhat higher at 413 (8%). All three design characteristics were judged as high or unclear risk in 396 (7%) of trials. For each design characteristic, Table shows the breakdown of the trial nubers into high risk of bias and unclear risk of bias, overall and according to the type of outcoe under assessent. Of all 1473 trials in the dataset, sequence generation was assessed as high risk of bias in 41 (3%) trials, unclear in 736 (50%) trials, and low risk of bias in 696 (47%) trials. Allocation concealent was assessed as high risk of bias in 80 (5%) trials, unclear in 760 (5%) trials, and low risk of bias in 633 (43%) trials. Blinding was assessed as high risk of bias in 317 (%) trials, unclear in 383 (6%) trials, and low risk of bias in 773 (5%) trials. The proportions of trials judged as being at high or unclear risk of bias are greatest aong trials with subjectively easured outcoes, and lowest aong trials assessing allcause ortality.

Table The overall nuber of trials with each cobination of reported design characteristics, within the subset of 117 eta-analyses extracted fro ROBES, and the nuber of trials at high or unclear risk of bias for each reported design characteristics overall and according to type of outcoe easure. Risk of bias No. of trials at unclear risk of bias (% of No. of trials at high risk of bias (% of trials) No. of trials trials) Sequence Allocation (%) Sequence Allocation Sequence Allocation Blinding Blinding generation concealent generation concealent generation concealent Blinding Low Low Low 361 (5%) - - - - - - igh or unclear Low Low 75 (5%) 0 - - 75 (100 %) - - Low Low igh or unclear igh or unclear Low igh or unclear igh or unclear Low igh or unclear Low igh or unclear igh or unclear Low 98 (7%) - 8 (8%) - - 90 (9%) - igh or unclear 130 (9%) - - 75 (58%) - - 55 (4%) Low 39 (16%) 9 (4%) 8 (3%) - 30 (96%) 31 (97%) - igh or unclear igh or unclear igh or unclear 67 (5%) 1 (1%) - 8 (4%) 66 (99%) - 39 (58%) 107 (7%) - 19 (18%) 60 (56%) - 88 (8%) 47 (44%) 396 (7%) 31 (8%) 45 (11%) 154 (39%) 365 (9%) 351 (89%) 4 (61%) Overall 1473 (100%) 41 (3%) 80 (5%) 317 (%) 736 (50%) 760 (4%) 383 (6%) Mortality outcoe 71 (18%) 7 (3%) (8%) 76 (8%) 100 (37%) 104 (38%) 36 (13%) Objective outcoe 1 301 (0%) 9 (3%) 13 (4%) 74 (5%) 145 (48%) 15 (51%) 54 (18%) Subjective outcoe 901 (61%) 5 (3%) 45 (5%) 167 (19%) 491 (55%) 504 (56%) 93 (33%) 1 10 (37%) eta-analyses easured objective outcoes other than all-cause ortality including laboratory assessed outcoes, pregnancy and perinatal outcoes. 17 (6%) eta-analyses assessed objective outcoes potentially influenced by judgent such as caesarean section and hospital adissions; Subjectively easured outcoes include pain, ental health outcoes, cause-specific ortality, clinically-assessed outcoes, signs and syptos reflecting continuation/end of condition and lifestyle outcoes.

Model coparison Results fro odel coparison are provided in Suppleentary aterial (S). The ultivariable odel for the influence of accounting for high or unclear risk of bias for sequence generation and blinding (Model B) had an iproved fit when an interaction ter was included. owever, after adjustent for trials at high or unclear risk of bias for allocation concealent (Model B4) there was no evidence of interaction between sequence generation and blinding. Despite this, we base our results on odels including interaction ters aong reported design characteristics, because we would expect reported design characteristics to interact in practice. The inclusion of outcoe type indicators in the odel for heterogeneity variance τ did not lead to a substantial iproveent in odel fit. For this reason, our results are based on hierarchical odels for τ fitted without these covariates. Exploring the associations between reported trial design characteristics and heterogeneity Reported in Table 3 are estiates of λ representing the ratio by which heterogeneity variance changes for trials at high or unclear risk of bias for specific design characteristics, copared to trials at low risk of bias. Estiates of average bias (b 0 ) and variation in ean bias across eta-analyses (φ) were alost identical to those reported elsewhere [5], and hence not reported here. Each estiate of λ in Table 3 is very iprecisely estiated; the 95% credible intervals for λ are wide and contain the null value 1 representing no difference in heterogeneity aong trials at high or unclear risk of bias and trials at low risk of bias. For this reason we interpret the results that follow with caution. Univariable analyses Based on univariable analyses for the influence of accounting for a single reported design characteristic, variation aong trials at high or unclear risk of bias for sequence generation is, on average, 14% greater than that aong trials at low risk of bias for sequence generation ( ˆλ 1.14, 95% interval: 0.57 to.30). eterogeneity aong trials judged as high or unclear risk of bias for allocation concealent is, on average, 75% that aong trials assessed as low risk of bias for allocation concealent ( ˆλ 0.75, 95% interval: 0.35 to 1.61). The central estiate for λ suggests that variation aong trials at high or unclear risk of bias for blinding is, on average, 74% greater than that aong trials at low risk of bias for blinding ( ˆλ 1.74, 95% interval: 0.85 to 3.47).

Table 3 Results fro univariable and ultivariable analyses for the influence of accounting for trials at high or unclear risk of bias for specific design characteristics on heterogeneity. Posterior edians and 95% intervals are reported. Model Univariable analyses λ igh or unclear risk (vs low risk) of bias for: A1 sequence generation 1.14 (0.57 to.30) A allocation concealent 0.75 (0.35 to 1.61) A3 blinding 1.74 (0.85 to 3.47) Multivariable analyses (fro odels including interaction ters) * igh or unclear risk (vs low risk) of bias for: B1 sequence generation, in trials at low risk of bias for allocation concealent 0.76 (0.14 to 1.79) allocation concealent, in trials at low risk of bias for sequence generation 0.54 (0.10 to 1.41) sequence generation and allocation concealent 0.94 (0.39 to 1.90) B sequence generation, in trials at low risk of bias for blinding 0.59 (0.14 to 1.46) blinding, in trials at low risk of bias for sequence generation 1.01 (0.41 to.73) sequence generation and blinding 1.58 (0.59 to 4.65) B3 allocation concealent, in trials at low risk of bias for blinding 0.65 (0.0 to.14) blinding, in trials at low risk of bias for allocation concealent 1.69 (0.44 to 5.68) allocation concealent and blinding 1.41 (0.55 to 4.0) B4 sequence generation, in trials at low risk of bias for allocation concealent & blinding 0.46 (0.11 to 1.13) allocation concealent in trials at low risk of bias for sequence generation & blinding 0.49 (0.1 to 1.71) blinding, in trials at low risk of bias for sequence generation & allocation concealent 0.99 (0.43 to.31) sequence generation and allocation concealent, in trials at low risk of bias for blinding 0.39 (0.07 to 1.9) sequence generation and blinding, in trials at low risk of bias for allocation concealent 1.44 (0.34 to 5.34) allocation concealent and blinding, in trials at low risk of bias for sequence generation 0.50 (0.16 to 1.9) sequence generation, allocation concealent and blinding 1. (0.39 to 3.01) λ ratio of heterogeneity variance aong trials at high or unclear risk of bias to heterogeneity variance aong trials at low risk of bias. *Note that results for ultiple characteristics are not iplied by the results for each individual bias doain in the ultivariable analysis, due to the presence of all possible interactions between bias doains.

Multivariable analyses Also reported in Table 3 are results fro ultivariable analyses for the influence of accounting for cobinations of design characteristics. Based on results fro fitting Model B1, heterogeneity aong trials at high or unclear risk of bias for both sequence generation and allocation concealent is, on average, 94% that aong trials at low risk of bias for both sequence generation and allocation concealent ( ˆλ 0.94, 95% interval: 0.39 to 1.90). eterogeneity aong trials at high or unclear risk of bias for both sequence generation and blinding is, on average, 58% greater than that aong trials at low risk of bias for both characteristics based on results fro fitting Model B ( ˆλ 1.58, 95% interval: 0.59 to 4.65). Results fro fitting Model B3 show that heterogeneity is, on average, 41% greater aong trials at high or unclear risk of bias for both allocation concealent and blinding, copared with trials at low risk of bias for both characteristics ( ˆλ 1.41, 95% interval: 0.55 to 4.0). Results fro ultivariable analyses for the influence of accounting for all three design characteristics (Model B4) iply that heterogeneity is, on average, % greater aong trials at high or unclear risk of bias (copared with low risk of bias) for all three reported design characteristics ( ˆλ 1., 95% interval: 0.39 to 3.01). As in univariable analyses, estiates of association between heterogeneity and reported design characteristics are very uncertain; 95% credible intervals for λ all contain the null effect. Investigating the extent of heterogeneity due to reported trial design characteristics We investigate the extent to which one ight expect between-trial heterogeneity in a rando-effects eta-analysis to change, on average, if we adjust for potential bias attributable to specific design characteristics in a new eta-analysis. Table 4 suarises posterior edians of the proportion of total between-trial heterogeneity attributable to trials at high or unclear risk of bias across the subset of 117 eta-analyses in ROBES. Univariable analyses In univariable analyses for the influence of accounting for a single reported design characteristic, central estiates for the proportion of between-trial variance explained by trials at high or unclear risk of bias for sequence generation have edian 30% (95% interval: 7% to 46%) across eta-analyses (Model A1). There is less evidence that between-trial heterogeneity in a eta-analysis is attributable to the bias associated with low or unclear quality for allocation concealent; central estiates for the proportion of heterogeneity aong trials at high or unclear risk of bias have edian 6% (95% interval: 0% to 17%) across eta-analyses (Model A). Across eta-analyses, central estiates for the proportions of between-trial heterogeneity explained by bias associated with trials at high or unclear risk of bias for blinding have edian 40% (95% interval: 8% to 56%) based on fitting Model A3.

For each of the 117 eta-analyses included within the subset of ROBES, Figure 1 presents a coparison of the central estiate of heterogeneity variance aong trials at low risk of bias and the central estiate of heterogeneity variance aong all trials. In separate univariable analyses for the influences of high or unclear risk of bias for sequence generation and blinding, the central estiate of heterogeneity variance aong trials at low risk of bias tends to be lower than the central estiate of heterogeneity aong all trials. In contrast, the central estiate of heterogeneity variance aong trials at low risk of bias for allocation concealent is slightly higher than that aong all trials in 73 (6%) eta-analyses. Multivariable analyses Based on results fro ultivariable analyses for the influence of accounting for ultiple reported design characteristics, one ight hypothesize that heterogeneity aong trials in eta-analyses within ROBES can be explained by the bias associated with sequence generation and/or allocation concealent (Model B1); across eta-analyses within the subset of ROBES, central estiates for the proportion of heterogeneity due to trials at high or unclear risk of bias have edian 19% (95% interval: 0% to 48%). Estiates of the proportion of heterogeneity due to trials at high or unclear risk of bias due to sequence generation and/or blinding have edian 37% (95% interval: 0% to 57%) across eta-analyses (Model B). This edian is slightly lower at 31% (95% interval: 0% to 51%) for heterogeneity variance explained by bias associated with trials at high or unclear risk of bias for allocation concealent and/or blinding (Model B3). Across eta-analyses in ROBES, central estiates for the proportion of between-trial heterogeneity explained by bias associated with trials at high or unclear risk of bias for sequence generation, allocation concealent and/or blinding have edian 37% (95% interval: 0% to 71%) based on fitting Model B4. In ultivariable analyses for the influence of accounting for all three characteristics, the central estiate of heterogeneity variance aong trials at low risk of bias for all three characteristics is lower than the central estiate of heterogeneity variance aong all trials in the ajority of 107 (91%) etaanalyses (Figure 1).

Table 4 Suaries of posterior edians for the proportion of heterogeneity due to trials at high or unclear risk of bias for each design characteristic and cobinations of design characteristics within the subset of 117 eta-analyses extracted fro ROBES. Model A1 A A3 B1 B B3 B4 Design characteristic/s Sequence generation Allocation concealent Blinding Sequence generation and/or allocation concealent Sequence generation and/or blinding Allocation concealent and/or blinding Sequence generation, allocation concealent and/or blinding Proportion of heterogeneity due to trials at high or unclear risk of bias for the design characteristic/s * Median 0.30; 95% interval 0.07 to 0.46 Median 0.06; 95% interval 0 to 0.17 Median 0.40; 95% interval 0.08 to 0.56 Median 0.19; 95% interval 0 to 0.48 Median 0.37; 95% interval 0 to 0.57 Median 0.31; 95% interval 0 to 0.51 Median 0.37; 95% interval 0 to 0.71 * Negative estiates suggest that heterogeneity aong trials in a eta-analysis cannot be explained by trials at high or unclear risk of bias and were hence set to zero.

Figure 1 For each of the 117 eta-analyses within the subset of ROBES, the central estiate of heterogeneity variance aong trials at low risk of bias plotted against the central estiate of heterogeneity variance aong all trials. Central estiates of heterogeneity variance are based on results fro univariable odel A1 for sequence generation, univariable odel A for allocation concealent, univariable odel A3 for blinding, and ultivariable odel B4 for sequence generation, allocation concealent and blinding. Solid lines indicate that estiates are identical.

Discussion Within-study biases can lead to overestiation or underestiation of the true intervention effect in a study and are expected to contribute to between-study variation in eta-analyses [5, 6, 16]. With access to a eta-epideiological data set including eta-analyses which have ipleented the Cochrane risk-of-bias tool, it was possible to explore the extent to which accounting for suspected biases influences levels of heterogeneity. We have investigated the ipact of risk of bias judgents fro Cochrane reviews for sequence generation, allocation concealent and blinding on between-trial heterogeneity, using data fro 117 eta-analyses included in the ROBES study. Between-trial heterogeneity in intervention effect is a coon proble in eta-analysis. The results of this epirical study show that roughly a third of between-trial heterogeneity ight be explained by trial design characteristics, on average. Prediction intervals are becoing increasingly widely used to provide a predicted range for the true intervention effect in an individual study [3, 17], and are useful in decision aking [18]. The iplications of our research are that prediction intervals for true effects could be narrowed to account for biases, if they are to represent genuine variation in true effects. This epirical study builds on previous eta-epideiological studies [4-6] that have focussed on the influence of accounting for reported design characteristics on intervention effect rather than betweentrial heterogeneity. Recent eta-epideiological studies have tended to use the ethods proposed by Welton et al.[8], which are less general in that they constrain trials at high or unclear risk of bias to be at least as heterogeneous as trials at low risk of bias. We previously proposed a ore general odel for the analysis of eta-epideiological data [9]. In this study, the advantage of using our odel was that we could estiate the quantity λ, representing the ratio by which heterogeneity changes for trials at high or unclear risk of bias, copared to trials at low risk of bias. Rando-effects eta-analysis ay be appropriate when between-study heterogeneity exists. owever, in soe situations, studies differ substantially in quality so the rando-effects assuption ay be inadequate. When confronted with evidence of varying quality in practice, eta-analysts ay decide to restrict their analyses to studies at lower risk of bias. owever, this would not be practical in the typical situation where few studies are available to be included in the eta-analysis. The results of our eta-epideiological study give soe indication of increased heterogeneity aong studies with high or unclear risk of bias judgeents. These findings support recoendations to adjust for bias in eta-analyses of evidence of varying quality. Methods are available to adjust for and down-weight studies of lower quality in eta-analysis, using generic data-based evidence or expert opinion infored by detailed trial assessent [8, 19]. Based on our findings, these ethods could be expected to reduce between-study variation in eta-analyses. Since the between-study variance paraeter would be iprecisely estiated in any eta-analyses that only contain a sall nuber of

studies, we recoend assigning an inforative prior distribution to this paraeter, based on epirical evidence fro historical eta-analyses [1, ]. For each reported design characteristic and cobinations of design characteristics, we calculated the proportion of heterogeneity in each eta-analysis that could be explained by trials at high or unclear risk of bias. Suaries of posterior edians for these proportions across eta-analyses give soe indication of the reduction in between-trial heterogeneity we ight expect to see in a eta-analysis, if we adjust for the bias associated with each reported design characteristic or cobination of reported design characteristics. There is epirical evidence to suggest that flaws in the rando sequence generation and lack of blinding ay lead to increased levels of heterogeneity aong randoized controlled trials, on average, but flawed ethods of allocation concealent ight have little ipact. These findings should be interpreted with caution due to the liited statistical power to detect differences in heterogeneity between higher and lower quality trials. In each analysis the ratio of heterogeneity variance λ attributable to bias was very iprecisely estiated. Although it would be expected for λ to be iprecisely estiated in a single eta-analysis, we hoped to gain precision when estiating across the collection of eta-analyses included in the ROBES database; however, variability across eta-analyses was high. In our analyses of the ROBES data, we wanted to allow the data to doinate and used a vague lognoral(0,1) prior distribution for the heterogeneity paraeter λ. owever, given the sall aount of inforation available on λ in the dataset, there was a possibility that results could have been sensitive to the choice of vague prior distribution. In an earlier paper, we used the sae dataset in a sensitivity analysis to copare the effects of 5 different prior distributions for λ [9]. Posterior estiates for the scale paraeter λ were consistent aong the different priors, with siilar edians and overlapping credible intervals. eterogeneity aong trials at low risk of bias could be explained by clinical differences, for exaple difference in participants, or in the dosage or tiing of an intervention. In each univariable and ultivariable analysis, we did not find evidence of association between heterogeneity variance aong trials at low risk of bias and the type of outcoe under assessent in the eta-analysis. This ight be explained by the fact that the ajority of the outcoes exained in the eta-analyses included in our analyses were subjectively easured. In future work it would be of interest to explore how the extent of between-trial heterogeneity due to bias ay depend on the type of outcoe under assessent and the types of interventions being copared. Another liitation is the accuracy of reported design characteristics which ay not well represent how a trial was actually conducted. Trials that are conducted well could be poorly reported [0]. ill

et al. [1] investigated discrepancies between published reports and actual conduct of randoized clinical trials and found that sequence generation and allocation concealent were reported as unclear in over 75% of studies where these two characteristics were actually at low risk of bias. A ore recent study found that descriptions of blinding in trial protocols and corresponding reports were often in agreeent []. These investigations provide soe insight as to why the influences of accounting for high or unclear risk of bias for sequence generation and high or unclear risk of bias for allocation concealent on intervention effect and between-trial variance are saller, copared with the effects of high or unclear risk of bias for blinding. Ideally we would have investigated this further, by separating trials at unclear risk of bias fro trials at high risk of bias and coparing heterogeneity estiates between trials at high risk of bias and trials at low or unclear risk of bias. owever the data on trials at high risk of bias were sparse. It is possible that our results were confounded by the influence of other types of biases that could not be accounted for in our analyses. For exaple, there is epirical evidence of bias in the results of eta-analyses due to publication bias and selective reporting of outcoes arising fro the lack of inclusion of statistically non-significant results [3, 4]. Methods to adjust for reporting biases are available, but it would have been ipractical to apply these ethods to each eta-analysis in our dataset. Meta-analyses affected by reporting biases would be expected to overestiate intervention effect and so the extent of heterogeneity that we observed aong trials in the ROBES database could be higher than expected. The ROBES dataset was extracted fro the April 011 issue of the Cochrane Database of Systeatic Reviews, for which the risk of bias in trials ay have been assessed prior to 011. As of early 011, Cochrane review authors have assessed risk of bias due to blinding of participants and personnel separately fro blinding of outcoe assessors. In the future, it would be of interest to investigate separate influences of accounting for blinding of participants and personnel and blinding of outcoe assessors on intervention effect and between-trial heterogeneity, once large collections of etaanalyses with such assessents becoe available. It would also be of interest to investigate the ipact of bias on intervention effect and heterogeneity in other types of eta-analyses; our analyses were conducted using binary outcoe data fro Cochrane reviews only. These include a wide range of application areas but ay not be representative of all healthcare eta-analyses, and so the findings in this paper ay not be generalizable to eta-analyses included in other systeatic reviews. In conclusion, the overall iplications of this research are that the accuracy of eta-analysis results could be iproved by adjusting for reported study design characteristics in the eta-analysis odel. After conducting a rando-effects eta-analysis, it is iportant to consider the potential effect of the intervention when it is applied within an individual study setting because this ight be different fro

the average effect. In the presence of substantial heterogeneity aong studies, prediction intervals for the true intervention effect in an individual study will be wide and uncertain. This epirical study gives soe indication that adjustent for bias could reduce the uncertainty in predictive inferences, and better reflect the potential effectiveness of the intervention. A strategy of including all studies with such adjustents ay produce a ore favourable trade-off between bias and precision than excluding studies assessed to be at high risk of bias. owever, interpretation of our results is liited by extreely iprecise estiates. Acknowledgeents This project was supported by UK Medical Research Council (MRC) fellowship (G0701659/1) and grant (MR/K014587/1). KR and RT were supported by the MRC prograe grant (U10560558), and RT also by the MRC grant MC_UU_103/1, and JS was supported by the National Institute for ealth Research (NIR) Collaboration for Leadership in Applied ealth Research and Care West (CLARC West) at University ospitals Bristol NS Foundation Trust. EJ was supported by a MRC career developent award in biostatistics (MR/M014533/1). The views expressed are those of the authors and not necessarily those of the MRC, the National ealth Service, the NIR or the Departent of ealth.

References [1] R. M. Turner, J. Davey, M. J. Clarke, S. G. Thopson, and J. P. iggins, "Predicting the extent of heterogeneity in eta-analysis, using epirical data fro the Cochrane Database of Systeatic Reviews," International Journal of Epideiology, vol. 41, no. 3, pp. 818-87, 01. [] K. M. Rhodes, R. M. Turner, and J. P. T. iggins, "Predictive distributions were developed for the extent of heterogeneity in eta-analyses of continuous outcoe data," Journal of Clinical Epideiology, vol. 68, no. 1, pp. 5-60, 015. [3] J. P. T. iggins, S. G. Thopson, and D. J. Spiegelhalter, "A re-evaluation of rando-effects eta-analysis," Journal of the Royal Statistical Society, Series A, vol. 17, 009. [4] K. F. Schulz, I. Chalers, RJ ayes, D. G. Altan, "Epirical evidence of bias: Diensions of ethodological quality associated with estiates of treatent effects in controlled trials," vol. 73, ed. JAMA, pp. 408-41, 1995. [5] J. Savović et al., " Association between risk-of-bias assessents and results of randoized trials in Cochrane reviews: the ROBES eta-epideiologic study," Aerican Journal of Epideiology, in press. [6] J. Savović et al., "Influence of reported study design characteristics on intervention effect estiates fro randoised controlled trials: cobined analysis of eta-epideiological studies," ealth Technol. Assess, vol. 16, no. 35, pp. 1-8, 01. [7] J. P. T. iggins et al., "The Cochrane Collaboration's tool for assessing risk of bias in randoised trials," BMJ, vol. 343, 011. [8] N. J. Welton, A. E. Ades, J. B. Carlin, D. G. Altan, and J. A. C. Sterne, "Models for potentially biased evidence in eta-analysis using epirically based priors," Journal of the Royal Statistical Society: Series A (Statistics in Society), vol. 17, no. 1, pp. 119-136, 009. [9] K. M. Rhodes, D. Mawdsley, R. M. Turner,. E. Jones, J. Savović, and J. P. T. iggins, "Label-invariant odels for the analysis of eta-epideiological data," Statistics in Medicine, 1 11, 017. https://doi.org/10.100/si.7491. [10] J. P. T. iggins, D. G. Altan, and J. A. C. Sterne, "Chapter 8: Assessing risk of bias in included studies. In: iggins JPT, Green S (editors). Cochrane andbook for Systeatic Reviews of Interventions Version 5.1.0 (updated March 011).", ed, 011. [11] D. J. Lunn, A. Thoas, N. Best, and D. Spiegelhalter, "WinBUGS - A Bayesian odelling fraework: Concepts, structure, and extensibility," Statistics and Coputing, vol. 10, no. 4, pp. 35-337, 000. [1] S. P. Brooks and A. Gelan, "General Methods for Monitoring Convergence of Iterative Siulations," presented at the Journal of Coputational and Graphical Statistics, 1/1/1998, 1998. Available: http://www.tandfonline.co/doi/abs/10.1080/10618600.1998.10474787 [13] D. J. Spiegelhalter, N. G. Best, B. P. Carlin, and A. Van Der Linde, "Bayesian easures of odel coplexity and fit," Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 64, no. 4, pp. 583-639, 00.

[14] D. J. Spiegelhalter, N. G. Best, B. P. Carlin, and A. Linde, "The deviance inforation criterion: 1 years on," Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 76, no. 3, pp. 485-493, 014. [15] N. J. Welton and A. E. Ades, "A odel of toxoplasosis incidence in the UK: evidence synthesis and consistency of evidence," Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 54, no., pp. 385-404, 005. [16] M. J. Page, J. P. T. iggins, G. Clayton, J. A. C. Sterne, A. róbjartsson, and J. Savović, "Epirical Evidence of Study Design Biases in Randoized Trials: Systeatic Review of Meta-Epideiological Studies," PLOS ONE, vol. 11, no. 7, p. e015967, 016. [17] R. D. Riley, J. P. T. iggins, and J. J. Deeks, "Interpretation of rando effects etaanalyses," BMJ, vol. 34, 011. [18] A. E. Ades, G. Lu, and J. P. T. iggins, "The Interpretation of Rando-Effects Meta- Analysis in Decision Models," Medical Decision Making, vol. 5, no. 6, pp. 646-654, 005. [19] R. M. Turner, D. J. Spiegelhalter, G. C. S. Sith, and S. G. Thopson, "Bias odelling in evidence synthesis," Journal of the Royal Statistical Society: Series A (Statistics in Society), vol. 17, no. 1, pp. 1-47, 009. [0] K. uwiler-müntener, P. Jüni, C. Junker, M. Egger, "Quality of reporting of randoized trials as a easure of ethodologic quality," JAMA, vol. 87, no. 1, pp. 801-804, 00. [1] C. L. ill, M. P. LaValley, and T. Felson, "Discrepancy between published report and actual conduct of randoized clinical trials," presented at the Journal of Clinical Epideiology, Available: http://dx.doi.org/10.1016/s0895-4356(0)00440-7 [] A. robjartsson, J. Pildal, A. Chan, M. aahr, D. Altan, and P. Gotzsche, "Reporting on blinding in trial protocols and corresponding publications was often inadequate but rarely contradictory," presented at the Journal of Clinical Epideiology, 009. Available: http://dx.doi.org/10.1016/j.jclinepi.009.04.003 [3] K. Dwan et al., "Systeatic review of the epirical evidence of study publication bias and outcoe reporting bias," PLoS One, vol. 3, no. 8, p. e3081, 008. [4] J. J. Kirkha et al., "The ipact of outcoe reporting bias in randoised controlled trials on a cohort of systeatic reviews," BMJ, vol. 340, 010.

Suppleentary aterials S1 Estiating total heterogeneity variance fro the label-invariant odel We used label-invariant hierarchical odels to analyse trial data fro 117 eta-analyses in ROBES siultaneously. The odels have been proposed in an earlier paper [9], but we describe the odels briefly here to show how to derive the forulae for heterogeneity variance t aong all trials in total, a eta-analysis. S1.1 Univariable odel for the influence of accounting for a single trial design characteristic In a given eta-analysis, trials are categorised as low risk of bias (L-trials) or high/unclear risk of bias (-trials) for a specific design characteristic. The L-trials provide an estiate of the underlying intervention effect θ L, assued to have a noral i rando-effects distribution with ean d and variance τ, specific to eta-analysis. The -trials are assued to estiate an underlying intervention effect θ i, assued to be norally distributed with ean d b and variance λτ : + θ θ L i ~ Nd (, τ) i Nd+ b λτ ~ (, ). The average bias b in intervention effect in eta-analysis is assued to be exchangeable across eta-analyses, with overall ean b 0 and between-eta-analysis variance in ean bias φ : b b ~ Nb ( 0, ϕ ) ~ N( B, V ) 0 0 0 We set an indicator X i to be 1 for trials and 0 for L trials such that X i 1 p = with probability 0 1 p. Each trial is assued to provide an underlying estiate of intervention effect: L θ = (1- X ) θ + X θ. i i i The first ter of the su will return θ L if trial i is at low risk of bias. The second ter will return i θ i if the trial i is at high/unclear risk of bias. i i

The total heterogeneity variance aong trials in eta-analysis is given by: τ τoτal, = var( θi) L = var((1- X ) θ + X θ ) i i i i L L = var((1- X ) θ ) + var( X θ ) + cov((1- X ) θ, X θ ) i i i i i i i i = E X + E X + X L L L (1- i) var( θi) ( θi) var(1- i) var(1- i) var( θi) + EX + E X + X ( i) var( θi ) ( θi ) var( i) var( i) var( θi ) L L + [ E((1- X ) θ X θ ) - E((1- X ) θ ) E( X θ )] i i i i i i i i = E X θ + E θ X + X θ L L L (1- i) var( i) ( i) var(1- i) var(1- i) var( i) + EX θ + Eθ X + X θ ( i) var( i ) ( i ) var( i) var( i) var( i ) L L + [ E((1- X ) X ) E( θ θ ) - E((1- X ) θ ) E( X θ )] i i i i i i i i = (1- π ) τ + d (1- π ) π + (1- π ) π τ + π λτ + ( d + b ) π (1- π ) + π (1- π ) λτ -(1- π ) d π ( d + b ) (1- π) τ d(1- π) π = + + + ( + ) (1- ) π λτ d b π π -(1- π ) d π ( d + b ) = (1- π ) τ + π λτ + π (1- π ) b