Critical Thinking A tour through the science of neuroscience

Critical Thinking A tour through the science of neuroscience NEBM 10032/5 Publication Bias Emily S Sena Centre for Clinical Brain Sciences, University of Edinburgh slides at @CAMARADES_

Bias Biases in research and reporting are presumably exacerbated by systems of publication and career evaluation that reward impact and productivity over quality and ability to replicate studies Systematic reviews Can only include data made available If unpublished data are systematically different to the published literature then the summarised data, opinions and understanding will be biased.

Growing Research Area Increased efforts in the study of conscious and unconscious biases in research Threaten human health Waste economic resources Threaten scientific progress Fanelli and Ioannidis, 2013

Reporting bias Neutral and negative studies Publication bias Time lag bias Language bias Vibration effects Selective analysis reporting Selective outcome reporting

Publication Bias Neutral and negative studies remain unpublished less likely to be identified in systematic review leads to the overstatement of efficacy in meta-analysis Hot topic for clinical trials summary results of clinical trials conducted in Europe publicly available

How to assess for it? Effect size and corresponding standard error To assess for its presence Funnel plot Egger regression Estimate efficacy in the absence of publication bias Trim and Fill

Funnel Plots Work on the basis that small studies are likely to be more spread around the mean Small studies that are significant are more likely to be published inverted funnel Funnel plots are visually assessed and most useful with a large number of studies Relative measures plotted on a log scale ensures that effects of the same magnitude but opposite directions are equidistant from 1.0 Used to examine whether smaller studies report larger treatment effects

Funnel Plot 0.5 0.4 Precision 0.3 0.2 0.1 0.0-150 -100-50 0 50 100 150 Effect Size

Egger Regression Egger regression statistically assess asymmetry of the funnel plot The standardised effect is regressed on the precision A weighted regression of the effect size on it standard error with the weight equal to the precision With no asymmetry the regression line and its 95% CI will pass through the origin Again, not proof of bias but does raise the questions regarding the interepration of results

Egger Regression 0.5 0.4 0.3 Precision 0.2 0.1 0.0-0.1-10 -5 0 5 10 15 20 25 Effect Size/Standard Error

Trim and Fill Missing studies are iteratively identified ( trimming ) and are replaced ( filling ) in order to calculate an adjusted estimate of effect size Studies with the largest deviation from the mean are trimmed The remaining symmetrical plot is used to recalculate the summary effect Trimmed studies are replaced and their counterparts filled to correct variance The mirror axis is placed along the adjusted estimate

Trim and Fill Overall efficacy was reduced from; 32% (95% CI 30 to 34%) to 26% (95% CI 24 to 28%)

Publication bias in experimental stroke Trim and Fill suggested 16% of experiments remain unpublished Best estimate of magnitude of problem Overstatement of efficacy 31% Only 2% publications reported no significant treatment effects

Cancer Induced Bone Pain - animals 103 filled studies 2.79 (2.6-3.0; n=257) to 1.6 (1.4-1.8; n=360) 42.3% relative overstatement

Publication bias 20% - 32% n expts Estimated unpublished Reported efficacy Corrected efficacy Stroke infarct volume 1359 214 31.3% 27.5% EAE - neurobehaviour 1892 505 33.1% 15.0% EAE inflammation 818 14 38.2% 37.5% EAE demyelination 290 74 45.1% 30.5% EAE axon loss 170 46 54.8% 41.7% AD Water Maze 80 15 0.688 sd 0.498 sd AD plaque burden 632 154 0.999 sd 0.610 sd

Few Studies 16 Studies OR 6.7 (3.7-12.2)

Few Studies 7 filled trials OR 6.7 to 2.7 150% relative overstatement in effect?

More Data 165 comparisons OR 1.8 (1.7-1.9)

More Data 34 filled trials OR 1.8 to 1.6 10% relative overstatement in effect

Limitations Too few studies, ideally want >25 Need reasonable dispersion of sample size For sparse data the Mantel-Haenszel methods are probably more appropriate for weighting but Trim and Fill function only has the inverse variance method enabled Small study effects may have other causes ORs rather than NNT Measure of variance difficult to derive for NNT OR are more extreme compared to RR if the event rate is high so asymmetry may observed with OR

Note of caution. Funnel plot asymmetry has a number of potential sources Selection bias (Publication bias, reporting bias, biased inclusion criteria) True heterogeneity (study effect differs according to study size, differences in underlying risk) Data irregularities (poor methodological design, inadequate analysis, fraud) Artefact (poor choice of effect measure) Chance

Is the effect an artefact of bias? We checked for its presence and impact but not whether the overall effect is robust Rosenthal s fail-safe N How many studies do we need to nullify the observed effect? Statistical rather than substantiative Assumes missing studies have an effect of zero rather than negative or small positive effects Orwin s Fail-safe N Attempts to address both issues

Other options Cumulative meta-analysis by size This allows you to see whether adding smaller studies shifts a stable effect size of larger studies Smaller studies are generally given less weight but are they influencing the pooled estimate Used a fixed model (Mantel-Haenszel) where less weight is given to smaller studies. Will not be thrown by few aberrant studies

Cumulative MA by size

Other options - Excess Significance To identify whether there is an excess of significant studies within previously conducted systematic reviews of neurological disorders

Excess Significance Test Examines whether too many individual studies in a meta-analysis report statistically significant results Driven by selective reporting of analyses or outcomes Low power to detect bias in a single meta-analysis with few studies but is applicable across many metaanalyses in a field

Methods Dataset 6 diseases Alzheimer s disease, EAE, focal ischaemia, intracerebral haemorrhage, Parkinson s disease, and spinal cord injury 60 meta-analyses 4,445 experiments Median sample size = 16 (IQR 11-20)

Methods Data extraction Year of publication Intervention Effect size + standard error Sample size Quality score items Peer review, randomisation, allocation concealment, blinded assessment of outcome, sample size calculation, conflict of interest Funnel plot asymmetry

Methods Excess Significance Test [(O)>(E)] Observed nominally positive studies p<0.05 Binomial test Expected positive studies is power True effect size Fixed effect size of the most precise study Wilcoxon rank sum test

Assessed excess significance In aggregate across the six disease Within each disease Subgroup Amount of heterogeneity Significant summary fixed effect Egger Regression Reporting of measures to minimise bias (quality score)

Expected significant results = 919 (21%) Observed significant results = 1719 (39%) Excess significance was present across all six neurological disorders in all subgroups defined by methodological or reporting characteristics The strongest excess significance was observed in meta-analyses where small study effects were also observed with the least precise studies

Some research areas may be more susceptible to bias Softer sciences Proportion of studies reporting positive results increases moving from physical sciences to medical to social sciences. Research choices are more influenced by a scientist s own beliefs, expectations and wishes. Behavioural studies Regional Publish or perish long characterised US-based research Research from Asian and developing countries tends to be rejected unless they report extraordinary results Fanelli and Ioannidis, 2013

Further analysis of research bias 82 meta-analyses 1,174 primary outcomes Samples from health related biological and behavioural research Measured how individual results deviated from the overall summary estimate of effect from within their respective meta-analysis Stratified by research type and country of origin Fanelli and Ioannidis, 2013

Hypotheses In the life sciences there are perverse incentives (publication, funding, promotion) to produce positive results with little attention paid to their validity Leads to a body of evidence with an inflated proportion of published studies with statistically significant results This potentially compromises the utility of animal models and contributes to translational failure

What happens Small (underpowered), poorly conducted studies reach spurious (falsely positive) conclusions but are published because they are seen to be interesting. Small (perhaps) poorly conducted (sometimes) studies not reaching the same conclusions are not published. Investigators become conditioned by the apparent success that comes from conducting small underpowered studies Investigators keep trying to replicate the positive studies Exacerbated by the combination of pressures to publish and winner-takes-all system of rewards.

What can we do? Problem: Not all outcomes are reported A priori analyses not always reported What we need: We need to know if scientists report what they set out to report Solution: Published Protocols Registries

Thanks to... Edinburgh Malcolm Macleod Kieren Egan Hanna Vesterinen Gill Currie Rustam Al-Shahi Salman Joseph Frantzis Project Students Melbourne David Howells Ana Antonic Peter Batchelor Taryn Wills Sarah McCann Stanford/Ioannina John Ioannidis Kostad Tsilidis Orestis Panagiotou Eleni Aretouli Vangelis Evangelou Utrecht Bart van der Worp Nottingham Philip Bath Translators NHS R&D methodology program Chief Scientist Office http://www.ted.com/talks/ben_goldacre_what_doctors_don_t_know_about_th e_drugs_they_prescribe.html