Statistical Decision Theory and Bayesian Analysis

Similar documents
Bayesian and Frequentist Approaches

Statistics Mathematics 243

Using historical data for Bayesian sample size determination

PROBABILITY Page 1 of So far we have been concerned about describing characteristics of a distribution.

On the diversity principle and local falsifiability

MTAT Bayesian Networks. Introductory Lecture. Sven Laur University of Tartu

Audio: In this lecture we are going to address psychology as a science. Slide #2

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Coherence and calibration: comments on subjectivity and objectivity in Bayesian analysis (Comment on Articles by Berger and by Goldstein)

Placebo and Belief Effects: Optimal Design for Randomized Trials

Mathematical-Statistical Modeling to Inform the Design of HIV Treatment Strategies and Clinical Trials

1 The conceptual underpinnings of statistical power

A Brief Introduction to Bayesian Statistics

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Chapter 5: Field experimental designs in agriculture

An Introduction to Bayesian Statistics

Introduction to Applied Research in Economics

ROLE OF RANDOMIZATION IN BAYESIAN ANALYSIS AN EXPOSITORY OVERVIEW by Jayanta K. Ghosh Purdue University and I.S.I. Technical Report #05-04

Exploration and Exploitation in Reinforcement Learning

Previously, when making inferences about the population mean,, we were assuming the following simple conditions:

Chapter 19. Confidence Intervals for Proportions. Copyright 2010 Pearson Education, Inc.

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

Subjectivity and objectivity in Bayesian statistics: rejoinder to the discussion

We Can Test the Experience Machine. Response to Basil SMITH Can We Test the Experience Machine? Ethical Perspectives 18 (2011):

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

An Experimental Investigation of Self-Serving Biases in an Auditing Trust Game: The Effect of Group Affiliation: Discussion

Rejoinder to discussion of Philosophy and the practice of Bayesian statistics

Further Properties of the Priority Rule

Bending it Like Beckham: Movement, Control and Deviant Causal Chains

PARTIAL IDENTIFICATION OF PROBABILITY DISTRIBUTIONS. Charles F. Manski. Springer-Verlag, 2003

The Psychological Mind

Lessons in biostatistics

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

Chapter 12. The One- Sample

Agreement Coefficients and Statistical Inference

A Computational Theory of Belief Introspection

Bayesian performance

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups

Systematic Reviews. Simon Gates 8 March 2007

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

The random variable must be a numeric measure resulting from the outcome of a random experiment.

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2

Chapter 19. Confidence Intervals for Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Statistical inference provides methods for drawing conclusions about a population from sample data.

Power & Sample Size. Dr. Andrea Benedetti

Chapter 2 Planning Experiments

Expert judgements in risk and reliability analysis

The Pretest! Pretest! Pretest! Assignment (Example 2)

AGENT-BASED SYSTEMS. What is an agent? ROBOTICS AND AUTONOMOUS SYSTEMS. Today. that environment in order to meet its delegated objectives.

Hierarchy of Statistical Goals

AClass: A Simple, Online Probabilistic Classifier. Vikash K. Mansinghka Computational Cognitive Science Group MIT BCS/CSAIL

Announcements. Perceptual Grouping. Quiz: Fourier Transform. What you should know for quiz. What you should know for quiz

4. Model evaluation & selection

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

DECISION ANALYSIS WITH BAYESIAN NETWORKS

Visual book review 1 Safe and Sound, AI in hazardous applications by John Fox and Subrata Das

Chapter 02. Basic Research Methodology

ABDUCTION IN PATTERN GENERALIZATION

Funnelling Used to describe a process of narrowing down of focus within a literature review. So, the writer begins with a broad discussion providing b

Proposal to reduce continuing professional development (CPD) audit sample size from 5% to 2.5% from June 2009 and CPD update

MS&E 226: Small Data

Supplementary notes for lecture 8: Computational modeling of cognitive development

Ordinal Data Modeling

Unit notebook May 29, 2015

Introduction & Basics

Chapter 8: Estimating with Confidence

EPSE 594: Meta-Analysis: Quantitative Research Synthesis

Why One Kid Gives Up

Introduction to Computational Neuroscience

Technical Specifications

Handout on Perfect Bayesian Equilibrium

Survival Skills for Researchers. Study Design

- Decide on an estimator for the parameter. - Calculate distribution of estimator; usually involves unknown parameter

The Fallacy of Taking Random Supplements

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

Sample Size Considerations. Todd Alonzo, PhD

Learning from data when all models are wrong

Science, Subjectivity and Software (Comments on the articles by Berger and Goldstein)

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ

Module 28 - Estimating a Population Mean (1 of 3)

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Measurement and meaningfulness in Decision Modeling

PSYCHOLOGY AND THE SCIENTIFIC METHOD

Probability II. Patrick Breheny. February 15. Advanced rules Summary

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Inference About Magnitudes of Effects

CHAPTER THIRTEEN. Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE

Kepler tried to record the paths of planets in the sky, Harvey to measure the flow of blood in the circulatory system, and chemists tried to produce

Bayesian and Classical Hypothesis Testing: Practical Differences for a Controversial Area of Research

A Case Study: Two-sample categorical data

Psychology, 2010, 1: doi: /psych Published Online August 2010 (

Suppose we tried to figure out the weights of everyone on campus. How could we do this? Weigh everyone. Is this practical? Possible? Accurate?

INADEQUACIES OF SIGNIFICANCE TESTS IN

9. Interpret a Confidence level: "To say that we are 95% confident is shorthand for..

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

Transcription:

Statistical Decision Theory and Bayesian Analysis Chapter 3: Prior Information and Subjective Probability Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 11 MayApril 2015

Reference 3, James O. Berger, Statistical Decision Theory and Bayesian Analysis, Springer, 1985. This book covers basic materials of statistical decision theory in an easy-to-understand yet critical manner. The prerequisite is rather low. Statistical level: moderately serious statistics Mathematical level: easy advanced calculus This slide mainly picks textual materials in Chapter 3. For detailed math, please refer to other resources.

Subjective Probability Some even argue that the frequency concept never applies, it being impossible to have an infinite sequence of i.i.d. repetitions of any situation, except in a certain imaginary (subjective) sense.

Criticisms on Noninformative Priors Depend on experimental structure Marginalization paradox Perhaps the most embarrassing feature of noninformative priors, however, is simply that there are often so many of them.

Hierarchical Priors It is somewhat difficult to subjectively specify second stage priors, such sa π 2 (λ) in the above example, but it will be seen in Chapter 4 there is more robustness in second stage specifications than in single specifications (i.e., there is less danger that misspecification of π 2 will lead to a bad answer). The difficulty of specifying second stage priors has made common the use of noninformative priors at the second stage. Note that a hierarchical structure is merely a convenient representation for a prior, rather than an entirely new entity; any hierarchical prior can be written as a standard prior. π(θ) = π 1 (θ λ) df π 2 (λ) Λ

Criticisms Few statisticians would object to a Bayesian analysis when θ was indeed a random quantity with a known prior distribution, or even when the prior distribution could be estimated with reasonable accuracy (from, say, previous data). The major objections are to use of subjective or formal prior distributions, especially when θ is random only in a subjective sense.

Objectivity To most non-bayesians, classical statistics is objective and hence suitable for the needs of science, while Bayesian statistics is subjective and hence (at best) only sueful for making personal decisions. Bayesians respond to this in several ways.... very few statistical analyses are even approximately objective. It is indeed rather peculiar that some decision-theorists are ardent anti-bayesians, precisely because of the subjectivity of the prior, and yet have no qualms about the subjectivity of the loss.

Model assumptions are prior per se Box (1980) says In the past, the need for probabilities expressing prior belief has often been though of, not as a necessity for all scientific inference, but rather as a feature peculiar to Bayesian inference. This seems to come from the curious idea that an outright assumption does not count as a prior belief... I believe that it is impossible logically to distinguish between model assumptions and the prior distribution of the parameters. More bluntly, Good (1973) says The subjectivist states his judgements, whereas the objectivist sweeps them under the carpet by calling assumptions knowledge, and he basks in the glorious objectivity of science.

Data Snooping The potential for misuse by careless or unscrupulous people is obvious. Savage (1962): It takes a lot of self-discipline not to exaggerate the probabilities you would have attached to hypotheses before they were suggested to you

Digression on the Machine Learning Research Regime Data snooping is inevitable, at least in machine learning research. Suppose we have proposed a new algorithm, or model whatever. The only right regime is to train the model on the training set, validate it on the CV set, until satisfied, test it on the test set only once, and report it in the paper. A severe problem rises if your test accuracy does not outperform the state-of-the-art result, you really have no chance to publish your work, never ever. If you then improve your model, you constituent data snooping.

Digression on the Machine Learning Research Regime (2) Non-monotonicity is a peculiar property of machine learning research. If your proposed algorithm yields 61% CV accuracy and 60% test accuracy, you are happy because the state-of-the-art result is 59%, say. You hope to further tune hyperparameters to persue a better result. Unfortunately, you achieve 64% CV acc. but 58% test acc. Now comes the problem. You have to dwarf the accuracy to 58% in your paper (which would not be actually accepted). Some researchers may change the model a little bit, so that a new approach comes into birth, yielding 61% accuracy for both validation and testing. It is a relief to the researchers, because they are not cheating (at least ostensibly). Indeed, they are, because (1) the test set can be used at most once, and (2) Box (1980) suggests no logical difference between models, parameters, or here hyperparameters. Ironically, the above unfavorable hyperparameter only affects your research career. Other researchers do not care about it. The training validating testing regime should be applied to all researchers, because CV acc is an unbiased estimation of test acc. All of us anticipate a high test acc, and thus should have choose the highest cv acc in the world. However, even if you announce your results, other researchers pretend not to have seen it.