Fixing the replicability crisis in science. Jelte M. Wicherts

Similar documents
Registered Reports: Peer reviewed study pre-registration. Implications for reporting of human clinical trials

Errors and biases in Structural Equa2on Modeling. Jelte M. Wicherts

CURRICULUM VITAE MARJAN BAKKER PERSONAL WORK EXPERIENCE EDUCATION

Christopher Chambers Cardiff University. Rob McIntosh University of Edinburgh. Zoltan Dienes University of Sussex

TRANSPARENCY AND OPENNESS IN SCIENCE

Tilburg University. Conducting meta-analyses based on p values van Aert, Robbie; Wicherts, Jelte; van Assen, Marcus

Hacking the experimental method to avoid Cognitive biases

STI 2018 Conference Proceedings

Estimation of effect sizes in the presence of publication bias: a comparison of meta-analysis methods

Changes to NIH grant applications:

Pooling Subjective Confidence Intervals

Curriculum Vitae. Michèle B. Nuijten, MSc. December 19, 2016

ARE RESEARCH FINDINGS RELIABLE? SAINT KIZITO OMALA MAKERERE UNIVERSITY

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0

Curriculum Vitae. Michèle B. Nuijten, MSc. May 10, 2017

The Replication Crisis Explained N. WAGNER & R. TERNES MI AIR NOV. 2018

Scientific Ethics. Modified by Emmanuel and Collin from presentation of Doug Wallace Dalhousie University

UC Davis UC Davis Previously Published Works

Introduction to Research Methods

Registered reports as a solu/on to publica/on bias and p-hacking

Underreporting in Psychology Experiments: Evidence From a Study Registry

HOW CAN RESEARCH FINDINGS BECOME MORE RELIABLE? ANTHONY MVEYANGE WORLD BANK /EAST FELLOW

The Replication Paradox: Combining Studies Can Decrease Accuracy of Effect Size Estimates. Michèle B. Nuijten, Marcel A. L. M.

CONSORT 2010 checklist of information to include when reporting a randomised trial*

Less Story and more Reliability in cognitive neuroscience. David E. Huber, Kevin W. Potter, and Lucas D. Huszar. University of Massachusetts, Amherst

The Royal College of Pathologists Journal article evaluation questions

Systematic Reviews. Simon Gates 8 March 2007

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior

Curriculum Vitae. Michèle B. Nuijten, MSc. November 15, 2017

RESEARCH ARTICLE. Dipartimento di Psicologia Generale, Università di Padova, Italy

ETHICS OF PUBLICATION

EQUATOR Network: promises and results of reporting guidelines

SAMPLING AND SAMPLE SIZE

Step 2 Challenging negative thoughts "Weeding"

More than 3Rs Improving the validity and reproducibility of animal research

EPSE 594: Meta-Analysis: Quantitative Research Synthesis

Chapter 1.1. The Process of Science. Essential Questions

Guidelines for reviewers

Science, Society, and Social Research (1) Benjamin Graham

UNIT II: RESEARCH METHODS

Critical Thinking A tour through the science of neuroscience

Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time.

Emotional Intelligence: The Foundation of Leadership

The importance of good reporting of medical research. Doug Altman. Centre for Statistics in Medicine University of Oxford

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

Carrying out an Empirical Project

Improving reporting for observational studies: STROBE statement

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews

Understanding and Building Emotional Resilience

Emotional Intelligence

9 research designs likely for PSYC 2100

Implementing scientific evidence into clinical practice guidelines

Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty

Fixed Effect Combining

Module 2/3 Research Strategies: How Psychologists Ask and Answer Questions

Malhotra - Interview (London 2018)

Publishing Your Study: Tips for Young Investigators. Learning Objectives 7/9/2013. Eric B. Bass, MD, MPH

Recent Developments: Should We WAAP Economics Low Power?

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Promoting Transparent Reporting of Conflicts of Interests. and Statistical Analyses at the Journal of Sex Research

Source: Emotional Intelligence 2.0 by Travis Bradberry and Jean Greaves Copyright 2009 by Talent Smart

THOUGHTS, ATTITUDES, HABITS AND BEHAVIORS

Madhukar Pai, MD, PhD Associate Professor Department of Epidemiology & Biostatistics McGill University, Montreal, Canada

What s Really True? Discovering the Fact and Fiction of Autism

Fame: I m Skeptical. Fernanda Ferreira. Department of Psychology. University of California, Davis.

Intentional Action and Side Effects in Ordinary Language. The chairman of the board of a company has decided to implement

Health Coaching {re} Defined: What health coaching is... And what it s not

Timing Your Research Career & Publishing Addiction Medicine

Supplemental Materials: Facing One s Implicit Biases: From Awareness to Acknowledgment

PSYC1024 Clinical Perspectives on Anxiety, Mood and Stress

Registered Reports - Guidelines for reviewers

Running head: Estimating Replicability 1. Z-Curve: A Method for the Estimating Replicability Based on Test Statistics in Original Studies

The QUOROM Statement: revised recommendations for improving the quality of reports of systematic reviews

CRITICAL APPRAISAL WORKSHEET 1

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

The Power of Feedback

Still important ideas

Lecture 4: Evidence-based Practice: Beyond Colorado

Response to the ASA s statement on p-values: context, process, and purpose

Overview. Survey Methods & Design in Psychology. Readings. Significance Testing. Significance Testing. The Logic of Significance Testing

A Guide to Reading a Clinical or Research Publication

Today the overuse of opioids is a problem. Many of

Running head: PRIOR ODDS IN COGNITIVE AND SOCIAL PSYCHOLOGY 1. The Prior Odds of Testing a True Effect in Cognitive and Social Psychology

STUDIES OF THE ACCURACY OF DIAGNOSTIC TESTS: (Relevant JAMA Users Guide Numbers IIIA & B: references (5,6))

Assessing risk of bias

Evidence-based Imaging: Critically Appraising Studies of Diagnostic Tests

Appendix B. Contributed Work

What is Science 2009 What is science?

Scientific Thinking Handbook

Session 6: Assessing Measurement Validity

2 Critical thinking guidelines

Research Methods in Social Psychology. Lecture Notes By Halford H. Fairchild Pitzer College September 4, 2013

DEPARTMENT OF ECONOMICS AND FINANCE SCHOOL OF BUSINESS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

EFFECTIVE MEDICAL WRITING Michelle Biros, MS, MD Editor-in -Chief Academic Emergency Medicine

Power & Sample Size. Dr. Andrea Benedetti

Descriptive Research Methods. Depending on the type of question researchers want answered, will depend on the way they investigate it

Evidence on questionable research practices: The good, the bad, and the ugly

Healthy Self. Bell Ringer. Class Period

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name:

Transcription:

Fixing the replicability crisis in science Jelte M. Wicherts 1

Supporting responsible research practices Diminishing biases in research Lowering false positive ratios Fixing the replicability crisis in science Enhancing the trust in scientific findings Making science more efficient Improving scientific reproducibility Empowering the truth in science Scaring away scientific cowboys 2

Empirical cycle Observe (literature) Evaluate (present) Hypothesize Test (collect & analyze data) Predict (Set-up exp.) 3

Success rates across the sciences Source: Fanelli, D. (2010). Positive results increase down the hierarchy of the sciences. PloS one, 5(4), e10068. 4

Fraud Observe (literature) Evaluate (present) Hypothesize Test (collect & analyze data) Predict (Set-up exp.) 5

How to counter scientific misconduct Improve regulations & procedures Training in responsible conduct of research Lower questionable research practices Enhance transparency and accountability 6

Open science practices Heightens reproducibility and data re-use Leads to loss of sleep among scientific fraudsters Sources: Wicherts, J. M. (2011). Psychology must learn a lesson from fraud case. Nature, 480, 7. Wicherts, J. M. & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not publish your data too? 7 Intelligence, 40, 73-76.

HARKing Observe (literature) Evaluate (present) Hypothesize Test (collect & analyze data) Predict (Set-up exp.) 8

y(t) -2-1 0 1 2 3 HARKing White Noise 0 100 200 300 400 500 time the infamous one-year dip! (at t=365) 9

HARKing 10

Explorative research 11

Suboptimal designs Observe (literature) Evaluate (present) Hypothesize Test (collect & analyze data) Predict (Set-up exp.) 12

Poor design 13

Study number Distribution under H 0 Distribution under H A Power N l = 1.85 128 2.55 72 3.40 48 4.25 28 1 0 1 2 Actual effect size

N % effects are likely to be inflated estimates of the cts, given the problems associated with small escribed above. Power failure in neuroscience esults described in this section are based on meta-analyses, and we should be appropriately in extrapolating from this limited evidence. less, it is notable that the results are so conith those observed in other fields, such as the results indicated that the median statistical power of these studies was 8% across 461 individual studies aging and neuroscience studies that we have above. ions ions for the likelihood that a research finding true effect. Our results indicate that the avertical power of studies in the field of neurosciobably no more than between ~8% and ~31%, sis of evidence from diverse subfields within ience. If the low average power we observed ese studies is typical of the neuroscience lits a whole, this has profound implications for A major implication is that the likelihood that inally significant finding actually reflects a true mall. As explained above, the probability that h finding reflects a true effect (PPV) decreases ical power decreases for any given pre-study and a fixed type I error level. It is easy to show even lower than the 8 31% range we observed. Ethical implications. Low average power in neuroscience studies also has ethical implications. In our analysis of animal model studies, the average sample size of 22 animals for the water maze experiments was only sufficient to detect an effect size of d = 1.26 with Source: Button, K. S. et al. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 1-12. of meta-analyses (%) on the right axis. There is a clear 15 16 14 12 10 8 6 4 2 0 0 10 11 20 21 30 31 40 41 50 51 60 Power (%) 61 70 71 80 81 90 91 100 Figure 3 Median power of studies included in neuroscience meta-analyses. The figure shows a histogram of median study power calculated for each of the n = 49 meta-analyses included in our analysis, with the number of meta-analyses (N) on the left axis and percent bimodal distribution; n = 15 (31%) of the meta-analyses 30 25 20 15 10 5 0

Fixing the power! Powerful designs collaborate 16

File-drawer problem Observe (literature) Evaluate (present) Hypothesize Test (collect & analyze data) Predict (Set-up exp.) 17

Publication bias When is a study truly failed? Blackboard in the office of a couple of PhD students 18

Power intuitions Marjan Bakker asked 291 psychologists to indicate their: -typical effect size -typical sample size -typical power β α=0.05 I usually aim for 20 25 subjects per cell of the experimental design, which is typically what it takes to detect a medium effect size with.80 probability. Actual power =.35 80% of respondents overestimated the power of their studies Source: Bakker, M., Wicherts, J. M., Hartgerink, C. H. J., & van der Maas, H. L. J. (2016). Researcher s intuitions about power in psychological research. Psychological Science, 27, 1069-1077

Poor statistical intuitions Researchers over overly optimistic to find evidence when they are right Source: Bakker, M., Wicherts, J. M., Hartgerink, C. H. J., & van der Maas, H. L. J. (2016). Researcher s intuitions about power in psychological research. Psychological Science, 27, 1069-1077 20

Failed study /fāld stuhd-ee/ 1. Is an empirical study in which unforeseen problems occurred during the data collection 2. Colloquial expression used in the sciences before 2018 to denote studies with (disappointing) nonsignificant outcomes that were deemed unpublishable Source: van Assen, M. A., van Aert, R. C., Nuijten, M. B., & Wicherts, J. M. (2014). Why publishing everything is more effective than selective publishing of statistically significant results. PLOS ONE, 9, e84896

Overly positive reporting Observe (literature) Evaluate (present) Hypothesize Test (collect & analyze data) Predict (Set-up exp.) 22

Selective outcome reporting Stress Fysiol. Measure Observ. behavior Selfreport p<.05 p<.05 p>.05 23

Evidence for this in clinical trials Source: Compare trials project (Ben Goldacre et al.) 24

Errors in the reporting of statistical results p =.06 Source: Bakker, M. & Wicherts, J. M. (2011). The (mis)reporting of statistical results in psychology journals. Behavior Research Methods, 43, 666-678. 25

Source: Nuijten, M. B., Hartgerink, C.H.J., Van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2016). The Prevalence of Statistical Reporting Errors in Psychology (1985-2013). Behavior Research Methods 26

Fixing misreporting & selective reporting Reporting guidelines -STROBE -PRISMA -ARRIVE -STARD -CARE Peer review with checklists Statcheck & other tools 27

P-Hacking Observe (literature) Evaluate (present) Hypothesize Test (collect & analyze data) Predict (Set-up exp.) 28

P-hacking Remove outliers (Z > 2 ) p>.05 Call this a failed study Perform new study p>.05 p<.05 Add 10 cases p<.05 p>.05 Redo analysis with adapted? dependent var. p>.05 p<.05 Effect! Planned analysis p<.05 Write paper Misreport the p-value as being <.05 29

Many ways to analyse the data. imply many ways to reach the stars* *p<.05 Source: Wicherts et al. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies. A checklist to avoid p-hacking. Frontiers in Psychology, 7, 1832. 30

P-hacking pervasive? 12.5% articles misreporting p-values 87.5% articles using more subtle ways of p-hacking?? 31

Scientists are only human This one SHOULD really be higher! If not my reviewers will kill my paper There MUST be something wrong with this analysis or with these data And I can forget about getting tenure And I cannot buy the house I wanted 32

Pre-register studies Specify hypotheses & analyses in advance Publish the pre-registration Or publish a Registered Report (RR) in which the peer review is focused on the rationale, hypotheses, and methods (and article is published regardless of the results) Sources: Chambers, C. D. (2013). Registered Reports: A new publishing initiative at Cortex. Cortex. Wagenmakers et al. (2012). An Agenda for Purely Confirmatory Research. Perspectives on Psychological Science, 7, 632-638. 33

Pre-registration challenge 34

Big money Observe (literature) Evaluate (present) Hypothesize Test (collect & analyze data) Predict (Set-up exp.) 35

Big money The business model of publishers is not necessarily in line with goals of furthering science. Is non-profit publishing the answer? Robert Maxwell https://www.theguardian.com/science/2017/jun/27/profitable-business-scientificpublishing-bad-for-science 36

Big money Incentivize the right behaviors 37

Lack of replication Observe (literature) Observe (literature) Evaluate (present) Hypothesize Evaluate (present) Hypothesize Test (collect & analyze data) Predict (Set-up exp.) Test (collect & analyze data) Predict (Set-up exp.) Use cross-validation or other holdout sample techniques 38

Problems and how to fix them Problems Scientific misconduct Observer bias & effects File drawer problem HARKing Errors in reporting & selective outcome reporting P-hacking Lack of replication Solutions Open data & regulations Blinding Power, banning failed study Pre-registration Reporting guidelines, reviewer checklists statcheck Pre-registration Incentivize & analyze sensibly 39

The first principle is that you must not fool yourself and you are the easiest person to fool. Richard P. Feynman 40

Metaresearch.nl Hilde Augusteijn Marjan Bakker Marcel van Assen Amir Abdol Michele Nuijten Coosje Veldkamp + Esther Maassen Andrea Stoevenbelt Robbie van Aert Linda Dominguez Alvarez Chris Hartgerink Paulette Flore Olmo van den Akker @JelteWicherts 41