Response to the ASA s statement on p-values: context, process, and purpose

Similar documents
How to use the Lafayette ESS Report to obtain a probability of deception or truth-telling

Bayesian and Frequentist Approaches

Chapter Three: Hypothesis

Cognitive domain: Comprehension Answer location: Elements of Empiricism Question type: MC

Retrospective power analysis using external information 1. Andrew Gelman and John Carlin May 2011

Hypothesis-Driven Research

Where does "analysis" enter the experimental process?

Conducting Research in the Social Sciences. Rick Balkin, Ph.D., LPC-S, NCC

On the diversity principle and local falsifiability

Wason's Cards: What is Wrong?

The Scientific Method

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

METHODOLOGY FOR DISSERTATION

INADEQUACIES OF SIGNIFICANCE TESTS IN

INHERENT DIFFICULTIES WITH ACTIVE CONTROL EQUIVALENCE STUDIES*

CT Rubric V.2 ACCEPTABLE (3)

When Falsification is the Only Path to Truth

UNLOCKING VALUE WITH DATA SCIENCE BAYES APPROACH: MAKING DATA WORK HARDER

Critical Thinking Rubric. 1. The student will demonstrate the ability to interpret information. Apprentice Level 5 6. graphics, questions, etc.

Durkheim. Durkheim s fundamental task in Rules of the Sociological Method is to lay out

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION"

This article, the last in a 4-part series on philosophical problems

Scientific Reasoning A primer

CHAPTER THIRTEEN. Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE

NATURE OF SCIENCE. Professor Andrea Garrison Biology 3A

Making sense of published research Two ways: Format Argument

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior

UNIT. Experiments and the Common Cold. Biology. Unit Description. Unit Requirements

Science, Society, and Social Research (1) Benjamin Graham

Competency Rubric Bank for the Sciences (CRBS)

Neuroscience and Generalized Empirical Method Go Three Rounds

Advanced Power: After Power What Then? By Dan Ayers Senior Associate in Biostatistics

MSR- Methodologies for Scientific Research

9/5/ Research Hazards Awareness Training

Disposition. Quantitative Research Methods. Science what it is. Basic assumptions of science. Inductive and deductive logic

The Scientific Enterprise. The Scientific Enterprise,

Coherence and calibration: comments on subjectivity and objectivity in Bayesian analysis (Comment on Articles by Berger and by Goldstein)

Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials

Chapter 02 Developing and Evaluating Theories of Behavior

TEACHING BAYESIAN METHODS FOR EXPERIMENTAL DATA ANALYSIS

Audio: In this lecture we are going to address psychology as a science. Slide #2

Why Hypothesis Tests Are Essential for Psychological Science: A Comment on Cumming. University of Groningen. University of Missouri

Research Methodologies

Managing Values in Science and Risk Assessment

Scientific Research Overview. Rolfe A. Leary 1

Using The Scientific method in Psychology

WHAT WENT WRONG WITH INTROSPECTIONISM?

It is the theory that decides what we can observe. -- Albert Einstein

THE INTERPRETATION OF EFFECT SIZE IN PUBLISHED ARTICLES. Rink Hoekstra University of Groningen, The Netherlands

What is the Scientific Method?

Difference to Inference 1. Running Head: DIFFERENCE TO INFERENCE. interactivity. Thomas E. Malloy. University of Utah, Salt Lake City, Utah

Beyond Subjective and Objective in Statistics

PSYC1024 Clinical Perspectives on Anxiety, Mood and Stress

III. WHAT ANSWERS DO YOU EXPECT?

DON M. PALLAIS, CPA 14 Dahlgren Road Richmond, Virginia Telephone: (804) Fax: (804)

CHAPTER 3. Methodology

Evaluating Risk Assessment: Finding a Methodology that Supports your Agenda There are few issues facing criminal justice decision makers generating

CSC2130: Empirical Research Methods for Software Engineering

LAW RESEARCH METHODOLOGY HYPOTHESIS

Chapter 3 Tools for Practical Theorizing: Theoretical Maps and Ecosystem Maps

LEARNING. Learning. Type of Learning Experiences Related Factors

Benefits and constraints of qualitative and quantitative research methods in economics and management science

Group Assignment #1: Concept Explication. For each concept, ask and answer the questions before your literature search.

Generalization and Theory-Building in Software Engineering Research

Lesson 1 Understanding Science

Attribute Importance Research in Travel and Tourism: Are We Following Accepted Guidelines?

A Brief Introduction to Bayesian Statistics

Comments on Limits of Econometrics by David Freedman. Arnold Zellner. University of Chicago

Insight Assessment Measuring Thinking Worldwide

The Role and Importance of Research

The Common Priors Assumption: A comment on Bargaining and the Nature of War

POLI 343 Introduction to Political Research

GE Standard V: Scientific Standard

Expanding the Toolkit: The Potential for Bayesian Methods in Education Research (Symposium)

UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4

Investigating the Extraordinary developmental paper

Bayesian performance

Multiple Comparisons and the Known or Potential Error Rate

Introduction to the Scientific Method. Knowledge and Methods. Methods for gathering knowledge. method of obstinacy

What is a logical argument? What is inductive reasoning? Fundamentals of Academic Writing

Terence Young s book Death by Prescription captures Young s determination as he

arxiv: v1 [math.st] 8 Jun 2012

Chapter 1.1. The Process of Science. Essential Questions

Critical Thinking Assessment at MCC. How are we doing?

WRITTEN ASSIGNMENT 1 (8%)

Safeguarding public health Subgroup Analyses: Important, Infuriating and Intractable

Karl Popper is probably the most influential philosopher

Process of Designing & Implementing a Research Project

Survival Skills for Researchers. Study Design

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin

Undesirable Optimality Results in Multiple Testing? Charles Lewis Dorothy T. Thayer

INTERVIEWS II: THEORIES AND TECHNIQUES 5. CLINICAL APPROACH TO INTERVIEWING PART 1

Case 3:16-md VC Document Filed 10/28/17 Page 1 of 15 EXHIBIT 107

John Broome, Weighing Lives Oxford: Oxford University Press, ix + 278pp.

Principles of Sociology

Cult of Statistical Significance 8/03/

VERIFICATION VERSUS FALSIFICATION OF EXISTING THEORY Analysis of Possible Chemical Communication in Crayfish

Science is a way of learning about the natural world by observing things, asking questions, proposing answers, and testing those answers.

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods

Transcription:

Response to the ASA s statement on p-values: context, process, purpose Edward L. Ionides Alexer Giessing Yaacov Ritov Scott E. Page Departments of Complex Systems, Political Science Economics, University of Michigan Version of a letter to appear in The American Statistician. Submitted 16 March 2016, accepted 22 August 2016. 1

The ASA s statement on p-values: context, process, purpose (Wasserstein & Lazar 2016) makes several reasonable practical points on the use of p-values in empirical scientific inquiry. The statement then goes beyond this mate, in opposition to mainstream views on the foundations of scientific reasoning, to advocate that researchers should move away from the practice of frequentist statistical inference deductive science. Mixed with the sensible advice on how to use p-values comes a message that is being interpreted across academia, the business world, policy communities, as, Avoid p-values. They don t tell you what you want to know. We support the idea of an activist ASA that reminds the statistical community of the proper use of statistical tools. However, any tool that is as widely used as the p-value will also often be misused misinterpreted. The ASA s statement, while warning statistical practitioners against these abuses, simultaneously warns practitioners away from legitimate use of the frequentist approach to statistical inference. In particular, the ASA s statement ends by suggesting that other approaches, such as Bayesian inference Bayes factors, should be used to solve the problems of using interpreting p-values. Many committed advocates of the Bayesian paradigm were involved in writing the ASA s statement, so perhaps this conclusion should not surprise the alert reader. Other applied statisticians feel that adding priors to the model often does more to obfuscate the challenges of data analysis than to solve them. It is formally true that difficulties in carrying out frequentist inference can be avoided by following the Bayesian paradigm, since the challenges of properly assessing interpreting the size power for a statistical procedure disappear if one does not attempt to calculate them. However, avoiding frequentist inference is not a constructive approach to carrying out better frequentist inference. On closer inspection, the key issue is a fundamental position of the ASA s statement on the scientific method, related to but formally distinct from the differences between Bayesian frequentist inference. Let s focus on a critical paragraph from the ASA s statement: In view of the prevalent misuses of misconceptions concerning p-values, some statisticians prefer to supplement or even replace p-values with other approaches. These include methods that emphasize estimation over testing, such as confidence, credibility, or pre- 2

diction intervals; Bayesian methods; alternative measures of evidence, such as likelihood ratios or Bayes Factors; other approaches such as decision-theoretic modeling false discovery rates. All these measures approaches rely on further assumptions, but they may more directly address the size of an effect ( its associated uncertainty) or whether the hypothesis is correct. Some people may want to think about whether it makes scientific sense to directly address whether the hypothesis is correct. Some people may have already concluded that usually it does not, be surprised that a statement on hypothesis testing that is at odds with mainstream scientific thought is apparently being advocated by the ASA leadership. Albert Einstein s views on the scientific method are paraphrased by the assertion that, No amount of experimentation can ever prove me right; a single experiment can prove me wrong (Calaprice 2005). This approach to the logic of scientific progress, that data can serve to falsify scientific hypotheses but not to demonstrate their truth, was developed by Popper (1959) has broad acceptance within the scientific community. In the words of Popper (1963), It is easy to obtain confirmations, or verifications, for nearly every theory, while, Every genuine test of a theory is an attempt to falsify it, or to refute it. Testability is falsifiability. The ASA s statement appears to be contradicting the scientific method described by Einstein Popper. In case the interpretation of this paragraph is unclear, the position of the ASA s statement is clarified in their Principle 2: P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by rom chance alone. Researchers often wish to turn a p- value into a statement about the truth of a null hypothesis, or about the probability that rom chance produced the observed data. The p-value is neither. Here, the ASA s statement misleads through omission: a more accurate end of the paragraph would read, The p-value is neither. Nor is any other statistical test used as part of a deductive argument. It is implicit in the way the authors have stated this principle that they believe alternative scientific methods may be appropriate to assess more directly the truth of the null hypothesis. Many readers will infer the ASA to imply the inferiority of deductive frequentist methods for scientific reasoning. The ASA statement, in its current form, will therefore make it harder for scientists to defend a choice of frequentist statistical methods 3

during peer review. Frequentist papers will become more difficult to publish, which will create a cascade of effects on data collection, research design, even research agendas. Goodman (2016) provided an example of how the scientific community is interpreting the ASA s statement. Goodman (2016) noted that the title of the ASA s statement is deceptively innocuous, then proceeded to paraphrase the ASA s statement in support of inductive over deductive scientific reasoning: What scientists want is a measure of the credibility of their conclusions, based on observed data. The P value neither measures that nor is part of a formula that provides it. Gelman & Shalizi (2013) wrote a relevant discussion of the distinction between deductive reasoning (based on deducing conclusions from a hypothesis checking whether they can be falsified, permitting data to argue against a scientific hypothesis but not directly for it) inductive reasoning (which permits generalization, therefore allows data to provide direct evidence for the truth of a scientific hypothesis). It is held widely, though less than universally, that only deductive reasoning is appropriate for generating scientific knowledge. Usually, frequentist statistical analysis is associated with deductive reasoning Bayesian analysis is associated with inductive reasoning. Gelman & Shalizi (2013) argued that it is possible to use Bayesian analysis to support deductive reasoning, though that is not currently the mainstream approach in the Bayesian community. Bayesian deductive reasoning may involve, for example, refusing to use Bayes factors to support scientific conclusions. The Bayesian deductive methodology proposed by Gelman & Shalizi (2013) is a close cousin to frequentist reasoning, in particular emphasizes the use of Bayesian p-values. The ASA probably did not intend to make a philosophical statement on the possibility of acquiring scientific knowledge by inductive reasoning. However, it ended up doing so, by making repeated assertions implying, directly indirectly, the legitimacy desirability of using data to directly assess the correctness of a hypothesis. This philosophical aspect of the ASA statement is far from irrelevant for statistical practice, since the ASA position encourages the use of statistical arguments that might be considered inappropriate. A judgment against the validity of inductive reasoning for generating scientific knowledge does not rule out its utility for other purposes. For example, the demonstrated utility 4

of stard inductive Bayesian reasoning for some engineering applications is outside the scope of our current discussion. This amounts to the distinction Popper (1959) made between common sense knowledge scientific knowledge. References Calaprice, A. (2005), The new quotable Einstein, Princeton University Press, Princeton, NJ. Gelman, A. & Shalizi, C. R. (2013), Philosophy the practice of Bayesian statistics, British Journal of Mathematical Statistical Psychology 66(1), 8 38. Goodman, S. N. (2016), Aligning statistical scientific reasoning, Science 352(6290), 1180 1181. Popper, K. R. (1959), The logic of scientific discovery, Hutchinson, London. Popper, K. R. (1963), Conjectures refutations: The growth of scientific knowledge, Routledge Kegan Paul, New York, NY. Wasserstein, R. L. & Lazar, N. A. (2016), The ASA s statement on p-values: context, process, purpose, The American Statistician, pre-published online. 5