White Rose Research Online URL for this paper:

Similar documents
Propensity score analysis with hierarchical data

EXPERTISE, UNDERUSE, AND OVERUSE IN HEALTHCARE * Amitabh Chandra Harvard and the NBER. Douglas O. Staiger Dartmouth and the NBER

Running head: SEPARATING DECISION AND ENCODING NOISE. Separating Decision and Encoding Noise in Signal Detection Tasks

Widespread use of pure and impure placebo interventions by GPs in Germany

Individual differences in the fan effect and working memory capacity q

Derivation of Nutrient Prices from Household level Consumption Data: Methodology and Application*

Name: Key: E = brown eye color (note that blue eye color is still represented by the letter e, but a lower case one...this is very important)

Correcting for Lead Time and Length Bias in Estimating the Effect of Screen Detection on Cancer Survival

Sickle Cell. Scientific Investigation

Homophily and minority size explain perception biases in social networks

Summary. Introduction. Methods

Modeling H1N1 Vaccination Rates. N. Ganesh, Kennon R. Copeland, Nicholas D. Davis, National Opinion Research Center at the University of Chicago

Unbiased MMSE vs. Biased MMSE Equalizers

the risk of heart disease and stroke in alabama: burden document

Public Assessment Report Scientific discussion. Kagitz (quetiapine) SE/H/1589/01, 04-05/DC

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Each year is replete with occasions to give gifts. From

Three-dimensional simulation of lung nodules for paediatric multidetector array CT

Public Assessment Report. Scientific discussion. Amoxiclav Aristo 500 mg/125 mg and 875 mg/125 mg film-coated tablets

Public Assessment Report. Scientific discussion. Carbidopa/Levodopa Bristol 10 mg/100 mg, 12.5 mg/50 mg, 25 mg/100 mg and 25 mg/250 mg tablets

Downloaded from:

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016

Citation Knight J, Andrade M (2018) Genes and chromosomes 4: common genetic conditions. Nursing Times [online]; 114: 10,

Two optimal treatments of HIV infection model

Journal of Theoretical Biology

International Journal of Health Sciences and Research ISSN:

Magnetic Resonance Imaging in Acute Hamstring Injury: Can We Provide a Return to Play Prognosis?

Public Assessment Report Scientific discussion. Aspirin (acetylsalicylic acid) Asp no:

Public Assessment Report. Scientific discussion. Orlyelle 0.02 mg/3 mg and 0.03 mg/3 mg film-coated tablets. (Ethinylestradiol/Drospirenone)

Public Assessment Report. Scientific discussion. Ramipril Teva 1.25 mg, 2.5 mg, 5 mg and 10 mg tablets Ramipril DK/H/2130/ /DC.

Understanding Uncertainty in School League Tables*

Lothian Palliative Care Guidelines patient information

TRAUMATIC HIP DISLOCATION IN CHILDHOOD

Locomotor and feeding activity rhythms in a light-entrained diurnal rodent, Octodon degus

MULTI-STATE MODELS OF HIV/AIDS BY HOMOGENEOUS SEMI-MARKOV PROCESS

Public Assessment Report. Scientific discussion. Mebeverine HCl Aurobindo Retard 200 mg modified release capsules, hard. (mebeverine hydrochloride)

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Analysis of TB prevalence surveys

Alcohol interventions in secondary and further education

A Platoon-Level Model of Communication Flow and the Effects on Operator Performance

Fixed-Effect Versus Random-Effects Models

arxiv: v2 [cs.ro] 31 Jul 2018

A GEOMETRICAL OPTIMIZATION PROBLEM ASSOCIATED WITH FRUITS OF POPPY FLOWER. Muradiye, Manisa, Turkey. Muradiye, Manisa, Turkey.

Allergy: the unmet need

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA

UMbRELLA interim report Preparatory work

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Public Assessment Report. Scientific discussion. Panclamox 40/500/1000 mg, gastro-resistant tablet/film-coated tablet/film-coated tablet

Spiral of Silence in Recommender Systems

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Applying Inhomogeneous Probabilistic Cellular Automata Rules on Epidemic Model

Information Sheet No. 27 Consultation and aid in case of impaired memory in old age

Public Assessment Report. Scientific discussion. Efavirenz/Emtricitabine/Tenofovirdisoproxil Teva, film-coated tablets

Performance of Fractured Horizontal Wells in High-Permeability Reservoirs P. Valkó, SPE and M. J. Economides, SPE, Texas A&M University

GUIDELINE COMPARATORS & COMPARISONS:

Empirical assessment of univariate and bivariate meta-analyses for comparing the accuracy of diagnostic tests

G , G , G MHRN

Abstract. Introduction A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION

Chapter 1: Exploring Data

An update on the analysis of agreement for orthodontic indices

How to analyze correlated and longitudinal data?

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

Public Assessment Report. Scientific discussion. Pulentia 100/6, 200/6 and 400/12 microgram/ dose inhalation powder, pre-dispensed

What is indirect comparison?

6. Unusual and Influential Data

Public Assessment Report. Scientific discussion. (Atorvastatin calcium) SE/H/757/01-03/DC

An Instrumental Variable Consistent Estimation Procedure to Overcome the Problem of Endogenous Variables in Multilevel Models

Biostatistics II

Summary. 20 May 2014 EMA/CHMP/SAWP/298348/2014 Procedure No.: EMEA/H/SAB/037/1/Q/2013/SME Product Development Scientific Support Department

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ

A re-randomisation design for clinical trials

Original Article Detection of lymph node metastases in cholangiocanma by fourier transform infrared spectroscopy

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

Structural Approach to Bias in Meta-analyses

2016 Children and young people s inpatient and day case survey

Bias in randomised factorial trials

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Insights. Central Nervous System Cancers, Version

Preparations for pandemic influenza. Guidance for hospital medical specialties on adaptations needed for a pandemic influenza outbreak

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

Cochrane Breast Cancer Group

Model-based quantification of the relationship between age and anti-migraine therapy

SUPPLEMENTAL MATERIAL

WELCOME! Lecture 11 Thommy Perlinger

Business Statistics Probability

Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials

In this module I provide a few illustrations of options within lavaan for handling various situations.

The RoB 2.0 tool (individually randomized, cross-over trials)

Small-area estimation of mental illness prevalence for schools

An informal analysis of multilevel variance

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

Some Comments on the Relation Between Reliability and Statistical Power

Controlled Trials. Spyros Kitsiou, PhD

Module 14: Missing Data Concepts

STATISTICS INFORMED DECISIONS USING DATA

REVIEW ARTICLE Anesthesiology 2010; 113: Copyright 2010, the American Society of Anesthesiologists, Inc. Lippincott Williams & Wilkins

Checklist for Randomized Controlled Trials. The Joanna Briggs Institute Critical Appraisal tools for use in JBI Systematic Reviews

Welcome to Our New FSHD Alumni

Transcription:

Tis is an autor produced version of Meta-analysis of absolute mean differences from randomised trials wit treatment-related clustering associated wit care providers. Wite Rose Researc Online URL for tis paper: ttp://eprints.witerose.ac.uk/118781/ Article: Walwyn, R and Roberts, C (015) Meta-analysis of absolute mean differences from randomised trials wit treatment-related clustering associated wit care providers. Statistics in Medicine, 34 (6). pp. 966-983. ISSN 077-6715 ttps://doi.org/10.100/sim.6379 014 Jon Wiley & Sons, Ltd. Tis is te peer reviewed version of te following article: Walwyn Rebecca, and Roberts Cris (015), Meta-analysis of absolute mean differences from randomised trials wit treatment-related clustering associated wit care providers, Statist. Med., 34, pages 966 983, wic as been publised in final form at ttps://doi.org/10.100/sim.6379. Tis article may be used for non-commercial purposes in accordance wit Wiley Terms and Conditions for Self-Arciving. promoting access to Wite Rose researc papers eprints@witerose.ac.uk ttp://eprints.witerose.ac.uk/

Full Title: Meta-analysis of absolute mean differences from randomised trials wit treatment-related clustering associated wit care providers Sort Title: Meta-analysis of mean differences from trials wit clustering effects Autors: Rebecca Walwyn (University of Leeds) Cris Roberts (University of Mancester) Contact Information for Corresponding Autors: Rebecca Walwyn, Leeds Institute for Clinical Trials Researc, University of Leeds, Leeds, United Kingdom, LS 9JT. Email: R.E.A.Walwyn@leeds.ac.uk Keywords: mean difference; meta-analysis; terapist effects Acknowledgements: Rebecca Walwyn was funded by a Medical Researc Council Special Training Fellowsip in Healt Services and Healt of te Public (ref: G0501886). Te autors would like to tank Pamela Gillies, Clair Cilvers, Micael Dewey, Karin Friedli, Ian Harvey, Adrian Hemmings, Micael King, Peter Bower, Roslyn Corney and Saron Simpson for access to te datasets used in te example. Rebecca Walwyn and Cris Roberts are members of te UK Mental Healt Researc Network (MHRN) Metodology Researc Group. 1

Abstract Nesting of patients witin care providers in trials of pysical and talking terapies creates an additional level witin te design. Te statistical implications of tis are analogous to tose of cluster-randomised trials, except tat te clustering effect may interact wit treatment and can be restricted to one or more of te arms. Te statistical model tat is recommended at te trial-level includes a random effect for te care provider, but allows te provider and patient level variances to differ across arms. Evidence suggests tat, wile potentially important, suc witin-trial clustering effects ave rarely been taken into account in trials and do not appear to ave been considered in meta-analyses of tese trials. Tis paper describes summary measures and individual-patient-data (IPD) metods for metaanalysing absolute mean differences from randomised trials wit two-level nested clustering effects, contrasting fixed and random effects meta-analysis models. It extends metods for incorporating trials wit unequal variances and omogeneous clustering to allow for between-arm and between-trial eterogeneity in ICC estimates. Te work is motivated by a meta-analysis of trials of counselling in primary care, were te control is no counselling and te outcome is te Beck Depression Inventory (BDI). Assuming equal counsellor ICCs across trials, te recommended random-effects eteroscedastic model gave a pooled absolute mean difference of -.53 (95% CI -5.33 to 0.7) using summary measures and -.51 (95% CI -5.35 to 0.33) wit te IPD. Pooled estimates were consistently below a minimally important clinical difference of 4 to 5 points on te BDI.

1. INTRODUCTION Were te treatment a patient receives is delivered by a ealt professional, suc as in talking or pysical terapies or surgery, patient outcomes may vary systematically by care provider. Variation between clusters, or, in tis case, care providers, leads to correlation among patient outcomes witin clusters, tereby violating te assumption of independence on wic standard metods of analysis are based. Suc correlation arises wen care providers differ in caracteristics related to outcome, suc as training, skill, experience or empaty. Te usual situation in psycoterapy is tat treatment is provided by different samples of clusters in eac arm in wat will be referred to as a nested terapist design (patients are allocated to care providers witin treatments). As tis is a special case of te more generic fully-nested design (were clusters formed at recruitment, treatment or outcome assessment are nested witin treatments), te statistical implications of provider clustering in nested terapist designs are analogous to te implications of recruitment-related clustering in standard cluster randomised trials, in wic clusters are randomly allocated to treatments. Te latter are now widely recognised [1]. Ignoring provider clustering can also result in treatment estimates tat are too precise and standard errors tat are too small. Tere are also crossed designs in wic all treatments are provided in eac cluster so tat te clusters and treatments are crossed. Tis covers a cluster randomised crossover design [-4] in wic sequences of treatments are randomised to clusters as well as a crossed terapist design in wic patients are allocated to treatments witin care providers (see Walwyn and Roberts [5] for furter details). Cluster randomised trials often assume tat te clustering effect is omogeneous across treatment arms, so a random intercept model is appropriate and a single intra-class correlation coefficient (ICC) is estimated. Care provider clustering may be treatment-specific, owever, in tat provider caracteristics may differ across arms, for instance wit greater skill or different training being required for one terapy compared to anoter. Tere may also be greater standardisation of one terapy, or one may be more establised so tat tere is greater experience associated wit it. Between-arm eterogeneity in te clustering effect, or treatment-related clustering, complicates matters so metods outlined for cluster randomised trials need to be extended for terapist designs. Te statistical model tat is recommended for nested terapist designs [6] includes a random effect for te care provider but allows te provider and patient level variances to differ across arms. We refer to tis as a two-level eteroscedastic model [5]. As suc, a separate ICC is estimated in eac treatment arm. For 3

crossed terapist designs, te recommended model [5] is a random coefficient model, wic includes a random intercept for te care provider but also allows te treatment effect to vary across care providers. In tis case, between-provider variation in outcome increases precision of te treatment effect wile between-provider variation in treatment effects decreases it. In te situation were clustering is absent from one arm, for example were te control is a waitlist or no treatment, te design is referred to as partially-nested or partially-crossed [5]. In tis case, te between-cluster variance is constrained to zero in te no clustering arm. Incorporating crossed designs into meta-analyses raises different issues. Tese are beyond te scope of tis paper and so will not be considered furter ere. Care provider variation as widespread implications for te design and analysis of trials wit nested designs. It affects not only te precision of treatment effect estimates [5-8] but also teir internal and external validity [5, 6, 8]. It is now accepted tat it needs to be considered in trials of non-parmacological treatments [9]. However, a yet unpublised systematic metodological review of Cocrane reviews of comparative studies involving psycoterapy found tat, wile potentially important, suc witin-trial variation as rarely been taken into account in psycoterapy trials and does not appear to ave been considered in meta-analyses of tese trials [10]. Statistical pooling or meta-analysis of summary-data across trials can be viewed as a two-stage process in wic summary statistics are first extracted from eac trial and ten a weigted average is calculated of tem [11-1]. Were outcomes are normally distributed, te summary statistic for te treatment effect may be an absolute or standardised mean difference. Our metodological review included 101 Cocrane reviews and 1816 unique studies, 1345 of wic involved psycoterapy given by care providers. Similar issues would apply to meta-analyses of surgical or educational interventions, pysioterapy, occupational terapy or speec terapy, were nested trial designs ave been used. Were a publised trial analysis as adequately allowed for te care provider, tere would be no need to make furter allowance in a summary-data meta-analysis. Problems only arise were no allowance as been made at te trial-level or were an inappropriate model as been used. In te current context it is likely tat no allowance will ave been made at te trial-level. As suc, te problems outlined in tis paper are expected to be quite common in practice. Te past decade as seen growing interest in te specific metodological callenges faced in te meta-analysis of randomised trials wit correlated data. Metods ave been proposed for pooling trials wit repeated-measures [13-16], for crossover trials [17-0], and for cluster- 4

randomised designs [1-6]. Wat is common across tis literature is a consideration of te impact of witin-trial clustering wen combining data from trials wit complex data structures, particularly were tis as been ignored in publised trial analyses. Drawing on tis literature, te Cocrane Handbook [7] cites metods for te meta-analysis of clusterrandomised and crossover trials. It briefly mentions clustering in individually randomised trials arising from ealt professionals but gives no specific guidance beyond stating tat te issues are similar to tose in cluster randomised trials, citing Lee and Tompson [7]. It makes no mention of treatment-related clustering effects, wic may arise in individually-, or indeed in cluster-, randomised trials were interventions are delivered by care providers. Te presence of between-trial eterogeneity in ICCs for care providers raises furter issues not previously considered in te literature. Tis eterogeneity migt arise from disparities in te cluster or patient level variances across trials. Possible causes could be differences in te level of treatment standardisation or patient eligibility criteria between trials. One option would be to estimate separate ICCs in eac trial for eac arm. An alternative migt be to estimate a single ICC across trials for eac arm. Here, treatment-specific ICCs are pooled across trials. A furter option would be to adopt a middle road and investigate te use of meta-regression models for te variance parameters. Tis paper considers metods for metaanalysing absolute mean differences from individually-randomised trials wit two-level nested designs and treatment-related clustering. Bot fixed- and random-effects meta-analysis models are considered, along wit bot summary-data and individual-patient-data (IPD) approaces. As wit any meta-analysis of absolute mean differences using a summary-data approac, te sample means, variances and sizes are needed in eac trial arm. To implement te metods described ere, te ICC and average cluster size are also required for eac trial by arm. Te IPD approac assumes researcers ave collected cluster identifiers, linking clusters to participants. Te feasibility of obtaining tese is commented on in te discussion. We begin in section by outlining te example tat motivated tis work. In section 3 we go on to review te recommended model at te trial level for fully and partially nested terapist designs. We ten extend standard summary-data and IPD approaces to te meta-analysis of absolute mean differences in sections 4 and 5, respectively, outlining meta-regression models in section 6 and illustrating te proposed metods wit our example in section 7. Section 8 contains a discussion, including limitations. Focusing initially on absolute mean differences as several advantages. Firstly, teir large-sample estimates are unbiased, teir sampling 5

variances are independent of te population parameter, and teir sampling distribution is normally distributed [8]. As tis is not te case for standardised mean differences, tis avoids some of te added complications encountered wen pooling te latter, allowing te general implications to be considered first. A separate paper, drawing on earlier work [10], is currently in preparation focusing on problems associated wit pooling standardised mean differences in tis context.. MOTIVATING EXAMPLE Te main point of contact for patients presenting in primary care in te UK is teir general practitioner (GP) and associated primary care team. One in tree is estimated to be affected by mental ealt problems [9]. Te case for providing psycological terapies, including counselling, witin te NHS as been made, wit a rapid rise in counselling in primary care seen since 1990. Half te general practices in England were estimated to ave a counsellor attaced by 000 [30]. Te background of counsellors working in tis setting is variable [31]. Counselling is typically brief, usually involving 6 to 10 sessions, eac of 50 minutes [3]. Te counselling process is caracterised by tree stages, operating by means of te relationsip between te counsellor and te patient [31]. Te focus is initially on building trust. Te counsellor encourages te patient to describe te situation tat is affecting tem and makes a systematic assessment. Te empasis ten turns to creating canges wic give te patient additional resources tey can subsequently draw upon. Te way tis is done depends on te teoretical model te counsellor is applying. Finally, alternative means of using te resources are considered, put into action and reflected upon. It is usual for counsellors to apply eclectic terapeutic approaces for a wide range of social and clinical problems. Bower and Rowland [33] publised a systematic review and meta-analysis of te clinical and cost-effectiveness of counselling in primary care, including eigt trials. Te largest metaanalysis compared counselling plus GP care to GP care alone, using te sort-term outcomes measuring te extent of mental ealt symptoms. Eac trial could be viewed as aving a partially nested terapist design, wit counsellors in te intervention but not te control arm. Tere was a single counsellor per patient. Tis meta-analysis gave a standardised mean difference (SMD) of -0.4 (95% CI -0.38 to -0.10). Te primary meta-analysis assumed a common underlying treatment effect across trials (i.e. a fixed-effects meta-analysis model) 6

wile a sensitivity analysis assumed te population treatment effects were normally distributed (i.e. a random-effects meta-analysis model). Neiter made allowance for witintrial clustering due to counsellors or for between-arm eteroscedasticity. As four of te trials [34-37] reported te Beck Depression Inventory (BDI) [38], allowing a meta-analysis of te absolute mean differences, tis subset will serve to illustrate te metods outlined below. Te BDI is one of te most widely used instruments for measuring te severity of depression. It is a 1-item self-report questionnaire, wit total scores ranging from 0 to 63. Higer scores indicate more severe depressive symptoms. Wile a minimally important clinical difference for te BDI in tis population as not been defined, a cange of 4 to 5 points, corresponding to 0.5 standard deviations, is generally regarded to be minimally important. Altoug te trials all ad partially nested designs, Friedli et al [35] and King et al [36] used a treatment manual, training or monitoring to standardise te delivery of counselling. Cilvers et al [34] and Simpson et al [37], instead, took a pragmatic approac. Patient eligibility was restricted to depression, or comorbid depression and anxiety, in Cilvers et al [34], King et al [36] and Simpson et al [37]. Friedli et al [35], in contrast, accepted a broad set of referrals. As suc, tis subset of trials also serves to illustrate meta-regression models for te variance parameters. 3. STATISTICAL MODELLING OF TWO-LEVEL NESTED TRIALS First consider a cluster-randomised trial in wic J clusters are randomly allocated to one of two treatments, wit te only source of clustering in a fully nested design being recruitmentrelated. Suppose y i is a continuous outcome for te i t patient, were i 1,, N, is te treatment effect, x i and are matrices signifying fixed patient or cluster level baseline covariates and teir coefficients and control. For simplicity of presentation let K i is an indicator variable for te intervention versus i equal xi were is te constant. Using Goldstein s [39] notation, between-cluster variation can be represented by a random effect u wit distribution, () cluster (i) N 0 u ; e is, (1) i N 0 e, te patient level error term. A randomintercept model for te outcome for te i t given by patient in te t k treatment is terefore appropriate )( )1( ii Ku i cluste ( i) e i (1) y 7

In tis notation, te bracketed superscript refers to te level of te random effect and cluster i in te subscript is te mapping of patients to clusters. Intra-cluster variability is measured by a single intraclass correlation coefficient defined using a variance components model by / u u e. Consider now any randomised trial in wic care providers are allocated to patients witin two treatments (k=0, 1) in a fully nested design. In te context of an individually randomised trial, Roberts and Roberts [6] suggest te following two-level eteroscedastic model y uk i i )( )1( )1( 1 i uk terap iii 1)( ek 0 i i 1 i )( iterapist i 0)( 1 KeK () Model () would also be appropriate for a cluster randomised trial in wic care providers are randomly allocated to two treatments, because one source of clustering is treatment provision and terefore treatment-related. In tis parameterisation, u () terapist (i)0 () and u terapist (i)1 are random intercepts for te control and intervention arms respectively, distributed 0, u 0, N and N0 u 1, wit covariance zero, as tey relate to independent samples. Note tere are also separate patient level error terms for te control and intervention arms rater tan just one across arms, given respectively by (1) e i0 and e, and distributed N, and, (1) i1 0 e 0 N0 e 1, included to prevent bias in te estimation of u0 and u1 [6]. Separate intraclass correlation coefficients under a variance components model are ten 0 u 0 u0 e 0 and. 1 u1 u1 e1 Were an individually-randomised trial as a partially-nested design, te random intercept for () te control arm is constrained to equal zero so u terapist i )0 1 K i giving ( is dropped from te model )1( i i 1Ke i )( )1( i i i uk terapist i 1)( i ekk i 01 (3) y Eac patient in te treatment arm witout clustering is assumed to be a cluster of size one. 8

4. SUMMARY-DATA META-ANALYSIS METHODS 4.1 Fixed- and Random-Effects Meta-Analysis Models witout Clustering In te simplest meta-analysis model, an underlying treatment effect common to all H trials is assumed, suc tat 1 H. Te fixed-effects model [40] implies ˆ,,1 H (4) e, were ˆ is te treatment effect observed in trial, is te population value, and e are te sampling errors, wit e ~ N 0, ˆ. Heterogeneity in te treatment effects across trials is ascribed to sampling error. Te arguably more realistic random-effects model permits te population treatment effects to vary across trials, wit and N,, were ~ is te between-trial variance and is now te mean of te population treatment effects. Tus [40] and ˆ~ N, ˆ ˆ e,,1, H (5). Te total variance of ˆ is terefore Tˆ ˆ, te sum of te witin and between trial variances. Te random-effects model reduces to a fixed-effects meta-analysis model wen, te between trial variance, is zero. Te uniformly minimum-variance unbiased estimate of a pooled treatment effect is given by [41-4] ˆ H w ˆ ˆ 1 w H (6) w 1 1 were w T is te weigt assigned to trial under a random-effects meta-analysis model. Its standard error is given by ˆ w 1 H w 1 (7) 9

so an approximate two-sided 100 1 % confidence interval for ˆ w is given by ˆ w z1 / ˆ (8) w It is usual for and to simply be replaced by teir respective estimators ˆ ˆ and ˆ, ˆ altoug Sidik and Jonkman [43] suggest an alternative approac tat is robust to sampling errors in te estimated weigts. A commonly used estimator of estimator is DerSimonian-Laird s (D-L) [44] metods of moments H Q 1 ˆ max,0 (9) H Te Q-statistic is estimated by ˆ ˆ ˆ 1, were is te mean of ˆ. Variation in te H ˆ H ˆ precision of te trial estimates between trials is indexed by 1 ˆ ˆ ˆ H. 1 ˆ ˆ 1 In order to obtain te standard error of te absolute mean difference from eac trial, ˆ ˆ, (used in calculating trial weigts and te standard error of te pooled treatment effect), one needs to first derive te sampling distribution of te absolute mean difference. Were outcomes are statistically independent witin and across arms, suppose 1 and 0 are te true mean outcomes in te intervention and control arm of trial respectively. Te population mean difference is ten MD, 1 0 (10) Te outcome of patient i in te k t arm of te t study is denoted by y ik. Assuming te population variances are omogeneous ( ) and te sample means ( y 1 and 1 0 10

y 0 ), variances ( s and 1 s 0 ) and sizes ( n 1 and n 0 sampling distribution are given by [45] ) available, te trial estimate and its ˆ 11, MD yyn 10 ~ 10,, H,1, nn (11) 10 were 1 1 ˆ s MD and, n 1 n0 ( n1 )1 s1 ( n0 )1 s0 s nn 1 0 If te outcome variances are eterogeneous across arms (i.e. 1 0 ) wit unknown ratio, te trial estimate ˆ is unaffected but its variance becomes MD, ˆ MD, 1 1 0 (1) n n 0 Te variances are replaced by s 1 and s 0 to give te estimator classically referred to as te Berens-Fiser problem [46]. ˆ ˆ MD,, a scenario tat is 4. Sampling Distribution of te Summary Statistic for Two-Level Nested Designs Suppose now tat te outcome of patient i is nested witin te j t cluster of arm k and is denoted by y ijk. For te sake of generality, assume tat model () applies. Ten assume, for eac of trials, tat a sample of J k clusters of size m k is assigned to eac arm under a fully nested design. Te trial estimate ˆ y remains an unbiased estimator of MD, MD, but te sample means are now given by y 1 0 y k Jk mk y j1 i1 Jk j1 m k were te design effect Jk mk y ijk ijk j1 i1 n k deff k are equal witin eac arm of eac trial. kdef, wit sample variances y k (13) n 1 m 1 k k k, in te clustered arms wen te cluster sizes 11

)1( For tis scenario, Kwong and Higgins [unpublised] gave te sampling distribution of ˆ MD, as ˆ deff 1 10 0, MD yyn 10 ~, 10,,1 H, (14) n1 n0 were ˆ ˆ, MD sdeff 1 1 sde 0 0. Te sampling variance simplifies to n n 1 0 J1 m1 n0 yij 1 yi 0 j11 i i1 1 deff 1 0 ˆ Var MD, n1 n 0 n1 n0 (15) 1 1 in te case of partial nesting and to deff for cluster randomised trials were n 1 n0 te only source of clustering is recruitment-related. 5. INDIVIDUAL-PATIENT-DATA META-ANALYSIS METHODS Going back to Goldstein s [39] notation, were y i denotes a continuous outcome for te i t patient, a standard fixed-effects meta-analysis model [40, 47] is i Ki e i (16) y were represents te mean outcome in te control arm of trial and te fixed treatment effect. It is commonly assumed tat patient residuals e are iid, (1) i N 0 e, altoug relaxing tis as been discussed [40, 47]. It is also possible to let te patient variance vary across arms, in wic case te model becomes wit te e iid, (1) ik )1( i i 1 i )1( i Ke 1 i i 0 Ke K(17) y N0 ek. Tis model can be extended to give te fixed-effects metaanalysis corresponding to a two-level eteroscedastic model by combining model (16) wit tat given by equation (), )( )1( )1( 1 i uk terap iii 1)( ek 0 ii 1 i y )( i iterapist uk i 0)( 1KeK (18) 1

wit te random effects iid, N. If all te trials are partially nested, 0 uk omitted from te model, corresponding to equation (3). u () terapist (i)0, can be A standard random-effects meta-analysis is one in wic te trial effects are fixed but te treatment effect is permitted to vary randomly across trials [40, 47]. Tat is, te term (3) trial( i) Ki is added to model (16), were te are iid 0, (3) trial (i) N and te random effects are mutually independent. Te random-effects meta-analysis corresponding to a two-level eteroscedastic model for te trials is given by )( )1( )1( 1 iii ii i ykuk ukek KeK )3( )( iitrial iiterapist )( i 0)( iterap 1)( 0 11 (19) As before, u () terapist (i)0 are partially nested. is constrained to zero, and te term omitted from te model, if all trials Models (18) and (19) constrain te terapist variance to be equal across trials for eac treatment. An alternative would be a saturated model in wic all trials are allowed to ave teir own terapist variance. Suppose H, trial( i) is an indicator variable equal to 1 wen trial ( i) and 0 oterwise, te saturated model can be defined as follows: )3( y KK i i H )( u0)( trial ii )( 1 uk 1 ekkhke )( )1( )1( terapist i i terapist ii 1)( i 0 i 1i i, tri )( 1 (0) Wit 4H variance parameters in a meta-analysis of fully-nested trials and 3H variance parameters in a meta-analysis of partially-nested trials, Model (0) is likely to be difficult to fit. It was not possible in our motivating example. One option is to add constraints to te saturated model tat can be motivated by te caracteristics of te trials, a possibility we now consider. 13

6. META-REGRESSION MODELS USING INDIVIDUAL-PATIENT-DATA Meta-regression models ave been described tat allow te pooled treatment effect to vary according to trial caracteristics [47-49], suc as weter te trial intervention was manualised or te trial quality. Tese models explore explanations for between-trial variation and require large numbers of trials. Incorporation of a categorical trial-level covariate into model (19) gives )( )1( )1( 1 itera iii 1)( 0 ii 1 i )3( )( y ii KxK itrial iiterapis )( uk i 0)( ukek 1KeK (1) were is a fixed treatment-by-covariate interaction effect and x is an indicator variable for te fixed trial-level covariate. Furter covariates could be added. Were data are available on terapist-level caracteristics suc as training or experience, one migt be interested in exploring weter te treatment effect varies according to tese. Here, te covariate varies witin trials, but is te same for every patient seen by a terapist. As te number of terapists per trial is usually small, it may only begin to be feasible to address suc questions in a meta-regression. As wit oter IPD meta-regressions, patient-level covariates, suc as severity, can also be investigated [47]. In tis case, te covariate varies between patients witin terapists and trials. Up to tis point, te meta-regressions considered are of fixed effects, and in particular of te treatment effect. Meta-regressions of random parameters may also be of interest. A complex random structure may be realistic if te trial designs vary. Under tese circumstances, tere is reason to expect between-trial variation in terapist or patient level random effects even if tere is insufficient statistical power available to detect it. It is realistic to suppose tat patient and terapist level variances are affected by standardising patient or terapist caracteristics and beaviour via te use of selection criteria and terapist training, certification, monitoring and supervision. If te trial designs are comparable in all oter respects, a categorical triallevel covariate can be incorporated for te terapist random intercept in model (19). T i is an indicator variable tat is equal to 1 if terapist caracteristics or beaviour are standardised and 0 oterwise, for example, as follows, 14

1 )3( )( )( ykuk i itrial iiterapist )( i)( 01 utk iii )( 11 TK () u KuT )( )1( )1( 11 i i terapist ii 11 i etk i i i KeK i )( terapist i )( 00 )( 10 0 1 were te four () u terapist ( i) kt are random intercepts for te control and intervention arms (k=0,1) in te unstandardised and standardised trials (t=0,1) respectively, distributed 0, ukt covariance zero, as tey relate to independent samples. N, wit Tis migt be considered if some of te trials used treatment manuals, wile oters did not, or if terapists were selected for teir expertise, given training, accreditation, monitoring or supervision in some trials but not oters. It is assumed tat tese design features do not ave a simultaneous effect at te patient level in Model (), as tis leads to te saturated model (0) in our motivating example. One could instead incorporate a categorical trial-level covariate for te patient-level residual error. For example, P i is 1 for trials were patient caracteristics are standardised and 0 oterwise, 1 )3( )( )( ykuk i itrial iiterapis )( i0)( iukk 1)( (3) e 1 )1( )1( )1( 1 iiiiii 11ePKePK 00 11 K i i ii 10 PKeP i )1( i 01 were te four (1) e ikp are patient residuals for te control and intervention arms (k=0,1) in te unstandardised and standardised trials (p=0,1) respectively, distributed N 0, ekp, wit covariance zero, as tey again relate to independent samples. Tis migt be considered if trials adopt a mix of explanatory and pragmatic approaces to patient eligibility. Models () and (3) may be considered parsimonious or constrained versions of te saturated model (0). Te potential complexity increases wit te variability in te trial designs. If te number of trials is small, as we ave seen, tere may be a trade-off between a realistic model for te random effects and computational feasibility. In teory, tese models could be extended to include terapist- and patient-level predictors of te random effects. As an aside, Model (0) can also be simplified to allow inclusion of fully and partially nested trials and inclusion of trials wit and witout clustering effects. In te case of a mixture of fully and partially nested designs, were as a fully nested design and 0 if it is partially nested, X i is an indicator variable equal to 1 wen te trial 15

)1( 1 )3( )( )( ykuk i itrial ii )( terapis i)( 01 iiuxk 1)( K i (4) ekex )1( )1( 11 i i i 1 iii KeXK i )1( i 00 01 11 Here, te residual error in te control arm is allowed to differ across trial designs, ensuring te terapist ICC in te control arm is based on te subset of trials wit fully nested designs. As before, it is assumed tat te terapist ICC in te control arm is omogeneous for all fully nested trials. If te independence assumption is reasonable in some of te trials, Model (4) can be extended, wit and 0 oterwise, to give i )1( i0 )1( i 01 C i an indicator variable equal to 1 if a trial as any clustering effects )3( )( )( Ki trial ii )( uk terapist i)(011 1 i ii CXKu i1)( )1( )1( 11 i i i001 11 i ii i011 1 i )1( i 1 i i ii y ekceck CK tera ii e KCe KCXe CXK 1 ii (5) Eac random intercept applies only to te clustered arms. Te residual error again varies by trial design. For non-clustered trials, it is e i 0 1 K i 1 C i e 0 in te control arm and (1) i 1 0K i 1 C i in te intervention arm. Te latter term can be omitted if te patient-level variance is assumed to be omogeneous across arms. An, albeit rater contrived, example in wic fully-, partially-nested and non-clustered trials migt be pooled is a comparison between counselling and cognitive-beavioural terapy were bot ave web-based and face-to-face versions. Some trials migt compare web-based versions, tereby incorporating no terapist involvement, and so be non-clustered. Oters migt compare face-to-face versions to web-based versions and be partially- nested. Oters migt compare te face-to-face versions and be fully-nested. Anoter situation in wic one migt be justified in considering Model (5) is wen te number of terapists cannot be identified in one or more of te trials. In tis case tey may be included as non-clustered. 7. APPLICATION TO THE MOTIVATING EXAMPLE Sort-term outcomes relating to te Beck Depression Inventory (BDI) were available for 460 patients from four [34-37] of te counselling in primary care trials. Of tese, 4 (49%) were allocated counselling wit one of 39 counsellors. Overall, te cluster sizes ranged from 1 to 16

33, wit a median of 3 and an IQR of 1 to 8. Data were available for 5 or more patients for 18 of te counsellors. Table 1 gives descriptive statistics for te four included trials. It can be seen tat te trials wit te largest treatment effects also ad te smallest counsellor ICCs. Te ANOVA estimates of te counsellor ICC are negative for two of te four trials. Tis is possible because ANOVA estimation is consistent wit a common correlation model rater tan a variance components model [50]. By definition, te lower bound on te ICC is zero for a variance components model since a between-cluster variance cannot be negative. It is te design effect rater tan te ICC tat cannot be negative in ANOVA estimation. If clusters are of size two, te range of te ICC is 1, but as te cluster size increases te minimum approaces zero. One ICC across trials was initially assumed for te counselling arm. [Insert Table 1 about ere] 7.1 Summary-Data versus Individual-Patient-Data Meta-Analyses To reflect a common lack of knowledge about te cluster size distribution, equal cluster sizes witin trials were assumed for all summary-data meta-analyses. A pooled ICC of 0.033 was used, based on a weigted average of te trial-specific ICCs [10], regardless of te model. IPD models were implemented in MLwiN using RIGLS, due to its flexibility in modelling random effects. RIGLS is comparable to REML [39] implemented in mixed in Stata Version 13. Te preceding command xtmixed was updated in Version 11 to permit inclusion of one covariate for te patient level error. Te mixed command uses te same syntax but seems to be faster, wit a more stable algoritm. Details of te programming for bot packages are given as supporting web materials. Tables and 3 summarise, respectively, te summary-data and IPD estimates and standard errors for te fixed- and random-effects meta-analyses, progressively relaxing independence and common variance assumptions witin te trials. As all of te trials ave partially nested designs, te Level variance, were it applies, is eterogeneous in all analyses. Te common variance assumptions terefore relate only to te Level 1 variance. As can be seen, te pooled mean difference and its standard error for a usual summary-data fixed-effects analysis are -.43 and 0.89 (95% CI -4.17 to -0.69), indicating tat counselling reduces sort term symptoms of depression by an average of.4 points and tat tis reduction is statistically significant at te 5% level. A mean difference of.5 points corresponds to a standardised effect size of about 0.5. According to Coen [51] tis represents a small effect. Based on 17

results wit similar effects, te autors of te Cocrane review concluded tat counselling is associated wit modest improvement in sort-term outcome and tat it may be a useful addition to mental ealt services in primary care [5]. Te equivalent IPD estimate and its standard error are -.47 and 0.90 wit te two-sided 95% CI -4.3 to -0.71. Te similarity of tese results implies tat bias and sampling error in te summary-data witin-trial variance estimates is not important ere. Te pooled mean difference and its standard error in te analogous summary-data random-effects analysis are -.50 and 1.40 (95% CI -5.4 to 0.4). Te increase in standard error arises from between-trial eterogeneity in te mean differences. Te reduction in BDI is no longer statistically significant. If an IPD approac ad been used, te estimate and its standard error would be -.47 and 1.4 (95% CI -5.5 to 0.31). Te sligt disparity in standard errors is explained by tat of te between-trial variance estimates, wic is in turn due to bias arising from sampling error or eterogeneity in te witin-trial variances. Even so, te evidence in favour of counselling in primary care is less clear if between-trial eterogeneity is taken into consideration. [Insert Tables and 3 about ere] Te impact of between-arm eteroscedasticity and witin-trial clustering is minimal if pooled treatment effects and teir standard errors are compared across random-effects summary-data or IPD models (see Figure 1 below). Te effect is a little more pronounced for bot summary-data and IPD fixed-effects models, owever. Te disparity between te summarydata and IPD results enlarges as te model becomes more realistic. It is of note tat te DerSimonian-Laird (D-L) and IPD between-trial variance estimates differ (see Table 3), wit bot estimates being larger, assuming independence, were patient-level variances are allowed to differ between arms. Te IPD estimate, in contrast to te D-L estimate, is not only smaller for bot clustered models but also smaller for te clustered model were patient-level variances are allowed to differ between arms. IPD estimates of te counsellor ICC are larger tan te summary-data estimate of 0.033, varying from model to model. Tese differences arise, in part, because te variances are estimated simultaneously in an IPD model, making appropriate allowance for all oter effects in te model. In tis example, te results continue to be dominated by between-trial eterogeneity in te treatment effects. Te most realistic IPD pooled mean difference and standard error are -.51 and 1.45 (95% CI -5.35 to 0.33). Te summary-data equivalent is -.53 and 1.43 (95% CI -5.33 to 0.7). Bot are very similar. 18

In te IPD case, te confidence interval is marginally wider tan te standard random effects one. Te conclusion remains uncanged. [Insert Figure 1 about ere] 7. Sensitivity of te Summary-Data Approac to te Coice of Population ICC Te sensitivity of te mean difference and its standard error to te coice of population ICC was explored for ICCs between zero and one. Te trial estimates are unaffected as te ICC increases but te pooled estimates become sligtly more extreme. Tis is because King et al [36] as more weigt as te ICC increases, in part due to its mean cluster size. Tis effect is sligtly more pronounced for te fixed-effects estimate. Te slope of te pooled standard error, wen plotted against te population ICC, is not steep, indicating te results are not sensitive to te ICC in te anticipated range (i.e. for ICCs between zero and 0.0). Te D-L estimate of decreases as te ICC increases, implying eterogeneity in mean differences across trials contributes to, rater tan simply explaining, eterogeneity between counsellors. 7.3 Meta-Regression of te Random Effects Table 4 gives results of two meta-regression models, one for te terapist random effect (Model ) and te oter for te patient residual (Model 3). Bot explore trial-level sources of eterogeneity in te counsellor ICC, te first treatment standardisation (yes, no) and te second patient eligibility (mixed diagnosis, depression). Tere were insufficient trials available to fit random-effects meta-regression models in tis instance so te results are compared to model (18). As all te trials ave partially nested designs, te random intercept for te control arm, u () terapist (i)0, is omitted from all models. [Insert Table 4 about ere] A reduction of 8.7 was seen in te log likeliood by including separate residual terms for trials wit mixed and depression patient referrals. Te pooled treatment effect reduced very sligtly, as did its standard error. Te counsellor ICC was iger wen patients were more omogeneous as te patient residual was smaller relative to te counsellor variance. If distinct terapist-level terms were included for trials standardising counselling and tose tat did not, te log likeliood reduced by 1.6. Te pooled treatment effect increased 19

appreciably, reflecting an association between te trial estimate and counsellor ICC. Tat is, te trials wit te largest estimates (i.e. Friedli [35] and King [36]) also ad te smallest counsellor ICCs, so carried more weigt in te meta-regression analysis. Te standard error was similar to tat for Model (18). Since te pooled counsellor ICC is negative for trials standardising counselling, a different parameterisation of te model was used, including a covariance term rater tan an explicitly negative estimate, to allow te model to converge. A covariance, in contrast to a variance, can be negative. Including te covariance between te terapist level random effects in place of te negative variance terefore indirectly enabled a negative variance to be estimated witin a variance components model. Te counsellor ICC was lower wen counselling was standardised as te counsellor variance was smaller relative to te patient residual. Tis corresponds to te ANOVA estimates of te ICC in Table 1. Te standard errors for te variance estimates are large due to te number of trials and counsellors. It was not computationally possible to simultaneously allow for eterogeneity from bot sources (i.e. Models and 3 combined) or to fit te model of coice (i.e. a random-effects meta-regression). Te potential to do so wen te number of trials available is larger is clear owever. Te facility to disentangle te predictors of te components of an ICC is also attractive as te predictors may differ between te components. 8. DISCUSSION Wile potentially important, treatment-related clustering effects in individually-randomised psycoterapy trials ave rarely been taken into account in trial reports and do not appear to ave been considered in meta-analyses [10]. Fitting fixed- and random-effects meta-analysis models to trials of counselling in primary care, adopting summary-data and IPD approaces and allowing for tese effects, ad minimal impact on te pooled estimate and its standard error. Tis is not surprising for two reasons. Firstly, te cluster sizes were small in te example so te design effect was also. Secondly, assuming a common ICC across trials in te counselling arm meant tat te contribution of eac trial to te pooled treatment estimate remained essentially te same, despite some variability in te mean cluster size. Altoug ardly noticeable in te example, te impact was instead on te precision of te pooled treatment effect. Neverteless, as we ave seen in Tables and 3, failure to take account of terapist variation will give an overly precise pooled estimate in a fixed effects meta-analysis because te effect of failing to include a terapist random effect in te analysis of a single trial generally results in te variance of te treatment effect being underestimated. Te picture 0

is more complex for a random-effects meta-analysis. If te variance of single trials is underestimated, te between trial variance may be overestimated, as was seen in Tables and 3. Te combined effect of a reduction in te variance between trials and an increased variance of eac trial can result in eiter a reduction or an increase in te standard error of te randomeffects pooled estimate. Wilst in our example tis estimate ad a marginally larger variance wen terapist clustering ad been taken in to account, a different set of cluster sizes or trial variances could ave led to a reduction. By contrast, an appreciable impact of treatment-related clustering was observed on te pooled treatment effect in te meta-regression models. Here, between-trial eterogeneity in te counsellor ICC ad a greater impact on te weigt given to particular trials and in so doing affected te pooled estimate and its standard error. Collection of te IPD is made attractive by te potential of meta-regression analyses for exploring trial-, terapist- and patient-level predictors of te treatment effect and of te random effects. Increased sample sizes open up opportunities not usually present at a trial-level but computational problems may still arise largely due to te presence of negative estimates. Allowing te ICCs to vary by trial as well as by treatment arm is particularly likely to lead to problems, as many of te trial-level ICC estimates involve very small numbers of clusters. Te middle road suggested ere is one way of circumventing tese problems wile maintaining a more realistic model. An advantage of te proposed metods is teir generality. A two-level eteroscedastic model relaxes common variance and independence assumptions, being appropriate for all fully nested designs. It simplifies to te models recommended for unequal patient-level variances across arms and for partially-nested, cluster-randomised and non-clustered designs. In eac of tese special cases, additional assumptions may be made so constraints can be added to te model at te trial level. It is possible to envisage scenarios were one migt want to allow te ICC to vary by treatment arm in a cluster randomised trial. Here, te source of clustering is traditionally conceptualised as recruitment-related. For example, if at baseline GP practices, rater tan patients, are randomised to treatments, patients witin a GP practice are likely to be more similar to one anoter tan to oter patients in te trial. As a consequence, clustering arises from use of a two-stage or clustered sample in a cluster-randomised trial but not in an individually-randomised trial. Suc clustering is expected to be maintained at follow-up. 1

Te unit of randomisation may not be te only source of clustering in a cluster randomised trial owever, particularly were te intervention is directed at te cluster (e.g. GP practice) rater tan at te patient-level. If tere is also treatment-related clustering ten, as long as te unit of randomisation and te clusters relating to treatment are te same (e.g. GPs are te unit of randomisation and te care providers), a two-level eteroscedastic model, outlined ere for a fully-nested terapist design, may be appropriate in a cluster-randomised trial even if te treatment-related clustering is restricted to one or more arms in te trial. Consider a trial in wic groups of patients are cluster-randomised to intervention or control, were te intervention is some kind of group terapy and te control is no terapy as an example. Clustering related to recruitment would still apply in te control arm, were in an individually-randomised trial it may be constrained to zero, but you migt not expect te clustering effect to be equal in bot arms as you migt in a traditional cluster-randomised trial. Were tere is interest in comparing group terapy to no terapy, one migt want to consider pooling trials using an individually- and a cluster-randomised design wit metaregression models similar in principle to tose described ere. Te general principle we ave adopted is tat te cluster and patient-level components of an ICC sould be allowed to differ by trial design, at a minimum. In te motivating example tere was also te potential for clustering by te GP. GP care was generally a co-intervention delivered by te same sample of GPs. As suc, GPs were crossed wit treatment arms. As tey were not blinded to weter patients were allocated counselling or no counselling, an interaction between GPs and treatment arm is plausible. Information on GP involvement in te motivating example was very limited owever. GP identifiers were not recorded for te majority of te trials so it was not straigtforward to include GPs in IPD analyses nor was it often possible for researcers to report te level of between-gp variability in te treatment effect. Te number of GPs treating trial patients was also often unavailable so tere was very limited information on cluster size distributions. As suc, wile a literature is starting to develop on te statistical implications of multiple terapist-per-patient designs [53], it is likely to be generally te case tat details of multiple terapists treating particular participants are unavailable in tis setting. Tis is likely to be true of multiple terapists of te same type (e.g. if more tan one counsellor ad treated patients) or of different types (e.g. in te case of a counsellor and a GP, as was te case ere), even toug bot multiple terapist-per-patient trial designs are common in psycoterapy [10]. Tat is, trials in wic te relationsip between terapists and patients can be described as multiple-membersip

or cross-classified [5]. Extensions are needed to te metods proposed ere for tese more complex data structures, as well as for crossed designs and trials wit furter levels (e.g. centres) or repeated measurements over time. An important consideration wen implementing te summary data metods proposed ere is te feasibility of obtaining, by trial arm, te ICC and average cluster size wen researcers ave made no allowance for clustering by care providers. To our knowledge, ICC estimates are currently only very rarely reported in te principal reports of psycoterapy trials [e.g. 56]. Subsequent papers may be publised focusing on terapist effects, suc as a series of papers relating to te NIMH Treatment for Depression Collaborative Researc Program trial [55-58], or for te purpose of generating a database of terapist effects [59-60]. Te number of terapists involved in a psycoterapy trial is commonly reported toug and tends to be no greater tan ten per arm. It is terefore likely to be possible to calculate average cluster sizes. Te distribution of te cluster sizes may owever be skewed and igly variable, wit only a few terapists treating te majority of participants, as was te case in Cilvers et al [34] and King et al [36]. As tis is not likely to be clear from te principal paper, no allowance was made for it in te metods described ere. More generally, variability in cluster sizes witin trials is likely to be common, and wile it is difficult to make appropriate allowance for tis if te cluster size distribution is unknown, te assumption of equal cluster sizes is a limitation of our metods. For tese reasons, te IPD approac is preferred, but tis assumes researcers are able to link clusters to participants. From experience of collecting te terapist data for tis meta-analysis, it is likely tat cluster identifiers are collected in te paper records of psycoterapy trials and it is common for tem to be somewere in te electronic dataset. Altoug time consuming, it was possible to get old of IPD for all te trials of counselling in primary care. Contact started wit te lead autor of te Cocrane review and progressed to te lead autor (and statistician were appropriate) for eac trial. In two of seven trials, data was re-entered from te paper case report forms. Tis was te entire dataset for one but for te oter it was just te counsellor identifiers. Every trial recorded te counsellor wo provided treatment. Te age of te trial is likely to be a factor in ow accessible data is more generally. Establising a collaborative group and making use of personal contacts bot elped to facilitate permissions to use data. In oter meta-analyses, it is possible tat only te summary-data will be available in one or more eligible trials. Were tis is te case, assumptions could be made about te size of te 3