UC Davis UC Davis Previously Published Works

Similar documents
How to interpret results of metaanalysis

EPSE 594: Meta-Analysis: Quantitative Research Synthesis

Fame: I m Skeptical. Fernanda Ferreira. Department of Psychology. University of California, Davis.

Measuring and Assessing Study Quality

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior

Everyone asks and answers questions of causation.

2 Critical thinking guidelines

Fixing the replicability crisis in science. Jelte M. Wicherts

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Introduction to Meta-Analysis

The Regression-Discontinuity Design

Power and Effect Size Measures: A Census of Articles Published from in the Journal of Speech, Language, and Hearing Research

Choose an approach for your research problem

THE INTERPRETATION OF EFFECT SIZE IN PUBLISHED ARTICLES. Rink Hoekstra University of Groningen, The Netherlands

Considerations on determining sample size to give desired power: (pp. 18, 25, 33)

Managing Values in Science and Risk Assessment

Kahneman, Daniel. Thinking Fast and Slow. New York: Farrar, Straus & Giroux, 2011.

Why do Psychologists Perform Research?

Cognitive domain: Comprehension Answer location: Elements of Empiricism Question type: MC

Estimation of effect sizes in the presence of publication bias: a comparison of meta-analysis methods

When the Evidence Says, Yes, No, and Maybe So

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Retrospective power analysis using external information 1. Andrew Gelman and John Carlin May 2011

Reviewing Papers and Writing Referee Reports. (B. DeMarco, Lance Cooper, Tony Liss, Doug Beck) Why referees are needed

A Practical Guide to Getting Started with Propensity Scores

TESTING TREATMENTS SYNTHESIZING INFORMATION FROM RESEARCH

Changes to NIH grant applications:

Tilburg University. Conducting meta-analyses based on p values van Aert, Robbie; Wicherts, Jelte; van Assen, Marcus

Validity and Quantitative Research. What is Validity? What is Validity Cont. RCS /16/04

Checklist of Key Considerations for Development of Program Logic Models [author name removed for anonymity during review] April 2018

Utility Maximization and Bounds on Human Information Processing

Abstract: Learning Objectives: Posttest:

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

Controlled Trials. Spyros Kitsiou, PhD

In this chapter we discuss validity issues for quantitative research and for qualitative research.

Mixed Methods Study Design

TOWARD IMPROVED USE OF REGRESSION IN MACRO-COMPARATIVE ANALYSIS

Research Prospectus. Your major writing assignment for the quarter is to prepare a twelve-page research prospectus.

13 Research for the right reasons: blueprint for a better future

Research Methods in Human Computer Interaction by J. Lazar, J.H. Feng and H. Hochheiser (2010)

UCLA International Journal of Comparative Psychology

Writing Reaction Papers Using the QuALMRI Framework

Systematic Reviews and Meta- Analysis in Kidney Transplantation

Live WebEx meeting agenda

HOW TO IDENTIFY A RESEARCH QUESTION? How to Extract a Question from a Topic that Interests You?

PHO MetaQAT Guide. Critical appraisal in public health. PHO Meta-tool for quality appraisal

Assignment Research Article (in Progress)

Understanding Science Conceptual Framework

School of Psychology. Professor Richard Kemp. Cognitive Bias: Psychology and Forensic Science

ARE RESEARCH FINDINGS RELIABLE? SAINT KIZITO OMALA MAKERERE UNIVERSITY

Conflict Management & Problem Solving

Changing the Graduate School Experience: Impacts on the Role Identity of Women

Do Attitudes Affect Memory? Tests of the Congeniality Hypothesis

Guidelines for reviewers

Fixed Effect Combining

Chapter 6. Methods of Measuring Behavior Pearson Prentice Hall, Salkind. 1

INADEQUACIES OF SIGNIFICANCE TESTS IN

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0

A SAS Macro to Investigate Statistical Power in Meta-analysis Jin Liu, Fan Pan University of South Carolina Columbia

Cross-Cultural Meta-Analyses

Running head: Estimating Replicability 1. Z-Curve: A Method for the Estimating Replicability Based on Test Statistics in Original Studies

Running Head: UNAPPRECIATED HETEROGENEITY OF EFFECTS. The Unappreciated Heterogeneity of Effect Sizes:

A Contextual Approach to Stereotype Content Model: Stereotype Contents in Context

VALIDITY OF QUANTITATIVE RESEARCH

Unit 2, Lesson 5: Teacher s Edition 1. Unit 2: Lesson 5 Understanding Vaccine Safety

Title:Mixed-strain Housing for Female C57BL/6, DBA/2, and BALB/c Mice: Validating a Split-plot Design that promotes Refinement and Reduction

Research Process. the Research Loop. Now might be a good time to review the decisions made when conducting any research project.

Regression Discontinuity Analysis

Book Review of Witness Testimony in Sexual Cases by Radcliffe et al by Catarina Sjölin

Value Differences Between Scientists and Practitioners: A Survey of SIOP Members

A brief history of the Fail Safe Number in Applied Research. Moritz Heene. University of Graz, Austria

DRAFT 15th June 2016: Study Registration For the KPU Study Registry

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

How to use research to lead your clinical decision

What is analytical sociology? And is it the future of sociology?

Youth Participation in Decision Making

Commentary on Beutler et al. s Common, Specific, and Treatment Fit Variables in Psychotherapy Outcome

Confidence Intervals On Subsets May Be Misleading

Chapter 11 Decision Making. Syllogism. The Logic

The Significance of Empirical Reports in the Field of Animal Science

Module 4: Technology: PsycINFO, APA Format and Critical Thinking

Psych 1Chapter 2 Overview

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Tips For Writing Referee Reports. Lance Cooper

Title: A new statistical test for trends: establishing the properties of a test for repeated binomial observations on a set of items

Response to the ASA s statement on p-values: context, process, and purpose

Reliability, validity, and all that jazz

OLS Regression with Clustered Data

Social Research (Complete) Agha Zohaib Khan

Title: Reporting and Methodologic Quality of Cochrane Neonatal Review Group Systematic Reviews

Meta-Analysis and Subgroups

PSYC1024 Clinical Perspectives on Anxiety, Mood and Stress

The Cochrane Collaboration

Overview. Survey Methods & Design in Psychology. Readings. Significance Testing. Significance Testing. The Logic of Significance Testing

1 The conceptual underpinnings of statistical power

Post-Hoc Rationalism in Science. Eric Luis Uhlmann. HEC Paris. (Commentary on Jones and Love, "Bayesian Fundamentalism or Enlightenment?

Multiple Act criterion:

Reliability, validity, and all that jazz

How is ethics like logistic regression? Ethics decisions, like statistical inferences, are informative only if they re not too easy or too hard 1

Power of the test of One-Way Anova after transforming with large sample size data

Transcription:

UC Davis UC Davis Previously Published Works Title Introduction to the Special Section on Improving Research Practices: Thinking Deeply Across the Research Cycle Permalink https://escholarship.org/uc/item/54f712t3 Journal Perspectives on Psychological Science, 11(5) ISSN 1745-6916 Author Ledgerwood, A Publication Date 2016-09-01 DOI 10.1177/1745691616662441 License CC BY-NC-ND 4.0 Peer reviewed escholarship.org Powered by the California Digital Library University of California

RUNNING HEAD: INTRODUCTION TO THE SPECIAL SECTION Introduction to the Special Section on Improving Research Practices: Thinking Deeply Across the Research Cycle Alison Ledgerwood University of California, Davis Address correspondence to: Alison Ledgerwood Department of Psychology UC Davis Davis, CA 95616 aledgerwood@ucdavis.edu

Introduction to the Special Section on Improving Research Practices: Thinking Deeply Across the Research Cycle The past five years have witnessed a profound shift in the way psychological scientists think about methods and practices. Whereas the prevailing sentiment was once a general contentment with the status quo, despite occasional rumblings from methodologists and statisticians (Cohen, 1992; Greenwald, 1975; Maxwell, 2004; Rosenthal, 1979), most psychologists now agree that we could do better. Our growing momentum has placed psychological science at the cutting edge of a broad movement to improve methods and practices across scientific disciplines. And as we move from debating whether we should change to investigating how best to do so, we are increasingly coming to grips with the fact that there are no magic bullet solutions. Of course, as psychologists, we know that humans (including ourselves) often opt for cognitive shortcuts that people tend to love a good heuristic, a simple decision rule, an easy answer (Chaiken & Ledgerwood, 2014; Kool, McGuire, Rosen, & Botvinick, 2010; Taylor, 1981). Yet we know, too, that oversimplified decision rules contributed to the problems with our methods and practices that we now face. For example, enshrining p <.05 as the ultimate arbiter of truth created the motivation to p-hack. Heuristics about sample sizes allowed us to ignore power considerations. Bean-counting publications created an immense pressure to build long CVs. The single most important lesson we can draw from our past in this respect is that we need to think more carefully and more deeply about our methods and our data. Heuristics got us into this mess. Careful thinking will help get us out. After all, science is hard, and reality is messy. These are complex issues we are tackling, and they deserve nuanced and thoughtful solutions. To that end, the May 2014 and November

2014 special sections in Perspectives on Psychological Science provided a toolbox of concrete strategies that researchers can use to think more carefully about their methods and to learn more from their data (Ledgerwood, 2014a, 2014b). The current special section hammers home the importance of thinking carefully at every stage of the research process, from selecting among possible research strategies, to analyzing our data, to aggregating across studies to build a more comprehensive picture of a given topic area. The special section begins by asking how we can think carefully about optimizing our choice of research strategy. Of course, selecting any one research strategy necessarily involves tradeoffs (Brewer & Crano, 2014; Campbell, 1957; Finkel, Eastwick, & Reis, in press). For instance, increasing the sample size of a given study boosts the power of that study, but at the cost of decreasing the total number of studies that can be conducted (given a finite pool of resources such as time, money, and/or available participants). Miller and Ulrich (this issue) propose a quantitative model that enables researchers to start weighing and integrating the costs and benefits of various research outcomes (true positives, false positives, true negatives, and false negatives) and to calculate an optimal sample size that maximizes what they call total research payoff across these study outcomes. Importantly, the model allows researchers to specify the value they place on different study outcomes, so that the calculation of optimal sample size can be tailored to a specific researcher s values or to a particular research area (e.g., how important is it to this field at this time for true positives and true negatives to be correctly identified?). And, using such a model enables researchers to be explicit about the value they are placing on different study outcomes and the assumptions they are making about the effect size and base rate of true effects in a given research area. This kind of clarity helps reveal when conflicting recommendations about optimal research practices stem from variations in the

starting assumptions of their proponents, allowing us to move beyond intuition-based arguments and toward quantitatively-based discussions about best practices. Just as we need to think carefully about how our assumptions can influence the conclusions we draw about optimal research strategies, the next two papers emphasize the need to think carefully about how the assumptions that we make as we construct our datasets and analyze our results can influence the conclusions that we draw from our studies. For instance, we all know that even the most common statistical analyses rely on certain assumptions, and yet we may have only a hazy memory left over from past statistics courses about what those assumptions are and how to check them. Ignoring these assumptions can lead us astray when interpreting our results we might miss an important effect altogether, or conclude that our data provide support for one prediction when in fact they support another. To address this issue, Tay and colleagues (Tay, Parrigon, Huang, & LeBreton, this issue) provide a new tool called Graphical Descriptives, which allows researchers to easily visualize their data, check their statistical assumptions, and transparently communicate rich information about their dataset when writing up their results. Even before we analyze our data, we make a series of choices about how to combine and/or transform our variables, what data to include or exclude, and how to code various participant responses. Sometimes these choices are arbitrary, or perhaps a researcher prefers one choice (e.g., including all participants who completed the study) but others are also defensible (e.g., excluding participants who failed an attention check). As Steegen, Tuerlinckx, Gelman, and Vanpaemel (this issue) point out, depending on the various data processing choices that a researcher makes, a given raw dataset can give rise to a multiplicity of potential datasets and therefore a multiplicity of results. Steegen and colleagues offer a new analytic approach, called a

multiverse analysis, that researchers can use to unpack how the various data processing decisions they made could have influenced their results. By examining study results across the full range of plausible data processing scenarios, researchers can explore the extent to which their conclusions might fluctuate across the multiverse of data processing decisions, and they can identify which decisions matter and which don t matter in shaping their results. This tool can also aid researchers in trying to deflate the multiverse: Researchers in a given subfield can work collectively to identify ahead of time whether certain data processing strategies are optimal. It can also highlight how in the absence of such collective standards reviewers and editors may unintentionally create sprawling multiverses over time by requesting alternative data processing decisions across different papers. Finally, we need to be thoughtful about how we aggregate across studies to form a cumulative understanding of a given topic area. Meta-analytic techniques that perform well under a highly simplified set of starting assumptions may lead to incorrect conclusions when we enter the inevitably messier world of real data. For instance, a simplified set of starting assumptions might posit a single true population effect size, no p-hacking, and an oversimplified kind of publication bias where p-values less than.05 are published and those greater than.05 are not. In the real world, however, effect sizes are often heterogeneous, researchers may employ various kinds of p-hacking, and publication bias can take many complex forms. The last two articles in this special section seek to improve our tools for aggregating research findings by investigating how various meta-analytic techniques perform when some of their idealized assumptions are violated. First, van Aert, Wicherts, and van Assen (this issue) explore how two recently proposed techniques p-curve and p-uniform perform under conditions that involve several forms of p-hacking and effect size heterogeneity. Their

simulations suggest that p-curve and p-uniform produce biased estimates of the average population effect size when various kinds of p-hacking have been employed or when there is substantial effect size heterogeneity. Given these issues, van Aert and colleagues caution against using p-curve or p-uniform when p-hacking and/or effect size heterogeneity may be present. Next, McShane, Böckenholt, and Hansen (this issue) take a step back to think broadly about how meta-analysts can best tackle the thorny real-world complexities of (a) publication bias and (b) effect size heterogeneity. They point out that many meta-analytic techniques account for one while ignoring the other, which can lead these techniques to fall apart when both are present (as they often are in the real world). Regular meta-analysis, for instance, typically accounts for heterogeneity but not publication bias; when publication bias is present, regular meta-analysis not only gives upwardly biased effect size estimates but also typically gives a false impression of homogeneity (leading researchers to erroneously conclude that effect size heterogeneity is not present when it is). Meanwhile, p-curve, p-uniform, and an earlier version of these approaches (Hedges, 1984) each account for a very simple kind of publication bias while ignoring heterogeneity; when these highly idealized assumptions are violated, they perform poorly. McShane and colleagues therefore urge meta-analysts to fit models that account for both publication bias and heterogeneity (e.g., a basic three-parameter variant of Hedges, 1984, method) and point readers toward a user-friendly website and other resources that enable them to apply this recommendation when conducting meta-analyses of their own. Together, these articles underscore the importance of questioning and probing the assumptions we make when selecting among research strategies, when interpreting the results of our own individual studies, and when aggregating across studies to draw conclusions about a literature. They suggest that the choices we make about how to create our datasets and the

choices we make about the starting assumptions for a simulation constrain the results we observe on the output side. Any set of results, whether empirical or simulated, give us only a partial picture of reality. Reality itself is always more complex. If we want to study it, we need to be honest and open about the simplifying choices that we make so that everyone including ourselves can evaluate these choices, question them, and explore what happens when different choices and assumptions are made. More broadly, as our field continues to strive for better and better research practices, we need to remember to think carefully about every stage of the research process, and to be skeptical of any simple heuristic, cut-off value, or one-size-fits-all approach. As Spellman (2015) suggested, after the revolution, we will come to a sensible middle ground. For those interested in constructing that sensible, thoughtful middle ground, the articles in this section provide an excellent starting point.

References Brewer, M. B., & Crano, W.D. (2014). Research design and issues of validity. In H. T. Reis & C. Judd (Eds.), Handbook of research methods in social and personality psychology (2nd ed., pp. 11-26). New York: Cambridge University Press. Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological Bulletin, 54, 297-312. Chaiken, S., & Ledgerwood, A. (2012). A theory of heuristic and systematic information processing. In P. A. M. van Lange, A. W. Kruglanski, & E. T. Higgins (Eds.), Handbook of theories of social psychology (pp. 246-266). Thousand Oaks, CA: Sage. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159. Finkel, E. J., Eastwick, P. W., & Reis, H. T. (in press). Replicability and other features of a highquality science: Toward a balanced and empirical approach. Journal of Personality and Social Psychology. Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82, 1-20. Kool, W., McGuire, J. T., Rosen, Z. B., & Botvinick, M. M. (2010). Decision making and the avoidance of cognitive demand. Journal of Experimental Psychology: General, 139, 665. Ledgerwood, A. (2014a). Introduction to the special section on advancing our methods and practices. Perspectives on Psychological Science, 9, 275-277. Ledgerwood, A. (2014b). Introduction to the special section on moving toward a cumulative science: Maximizing what our research can tell us. Perspectives on Psychological Science, 9, 610-611.

Maxwell, S. E. (2004). The persistence of underpowered studies in psychological research: causes, consequences, and remedies. Psychological Methods, 9, 147-163. Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86, 638. Spellman, B. A. (2015). A short (personal) future history of Revolution 2.0. Perspectives on Psychological Science, 10, 886-899. Taylor, S.E. (1981). The interface of cognitive and social psychology. In J. H. Harvey (ed.), Cognition, social behavior, and the environment (pp. 189 211). Hillsdale, NJ: Erlbaum.