A Brief Introduction to Bayesian Statistics

Similar documents
Bayesian and Frequentist Approaches

A Case Study: Two-sample categorical data

Introduction to Bayesian Analysis 1

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods

An Exercise in Bayesian Econometric Analysis Probit and Linear Probability Models

Bayesian Inference Bayes Laplace

Bayesian and Classical Approaches to Inference and Model Averaging

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

ST440/550: Applied Bayesian Statistics. (10) Frequentist Properties of Bayesian Methods

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials

Bayes Linear Statistics. Theory and Methods

BAYESIAN HYPOTHESIS TESTING WITH SPSS AMOS

Response to Comment on Cognitive Science in the field: Does exercising core mathematical concepts improve school readiness?

MTAT Bayesian Networks. Introductory Lecture. Sven Laur University of Tartu

MS&E 226: Small Data

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin

Bayesian Model Averaging for Propensity Score Analysis

Causal inference with large scale assessments in education from a Bayesian perspective: a review and synthesis

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

The Human Side of Science: I ll Take That Bet! Balancing Risk and Benefit. Uncertainty, Risk and Probability: Fundamental Definitions and Concepts

Fundamental Clinical Trial Design

Handling Partial Preferences in the Belief AHP Method: Application to Life Cycle Assessment

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985)

An Introduction to Bayesian Statistics

Att vara eller inte vara (en Bayesian)?... Sherlock-conundrum

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Bayesians methods in system identification: equivalences, differences, and misunderstandings

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS

What is probability. A way of quantifying uncertainty. Mathematical theory originally developed to model outcomes in games of chance.

Bayesian Tolerance Intervals for Sparse Data Margin Assessment

Recognizing Ambiguity

Lecture 9 Internal Validity

Signal Detection Theory and Bayesian Modeling

What is a probability? Two schools in statistics: frequentists and Bayesians.

How many speakers? How many tokens?:

Supplementary notes for lecture 8: Computational modeling of cognitive development

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions

DECISION ANALYSIS WITH BAYESIAN NETWORKS

Cognitive Modeling. Lecture 12: Bayesian Inference. Sharon Goldwater. School of Informatics University of Edinburgh

Bayesian vs Frequentist

Outline. Hierarchical Hidden Markov Models for HIV-Transmission Behavior Outcomes. Motivation. Why Hidden Markov Model? Why Hidden Markov Model?

Bayesian performance

Advanced Bayesian Models for the Social Sciences

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Commentary: Practical Advantages of Bayesian Analysis of Epidemiologic Data

Methods and Criteria for Model Selection

Journal of Clinical and Translational Research special issue on negative results /jctres S2.007

Meta-analysis of few small studies in small populations and rare diseases

Exploring Experiential Learning: Simulations and Experiential Exercises, Volume 5, 1978 THE USE OF PROGRAM BAYAUD IN THE TEACHING OF AUDIT SAMPLING

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Estimating Bayes Factors for Linear Models with Random Slopes. on Continuous Predictors

Advanced Bayesian Models for the Social Sciences. TA: Elizabeth Menninga (University of North Carolina, Chapel Hill)

Welcome to this third module in a three-part series focused on epidemiologic measures of association and impact.

For general queries, contact

Bayesian Methods in Regulatory Science

THE INDIRECT EFFECT IN MULTIPLE MEDIATORS MODEL BY STRUCTURAL EQUATION MODELING ABSTRACT

Identification of Tissue Independent Cancer Driver Genes

Ordinal Data Modeling

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

A Bayesian Nonparametric Model Fit statistic of Item Response Models

You must answer question 1.

Bayesian methods in health economics

Ecological Statistics

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data

Placebo and Belief Effects: Optimal Design for Randomized Trials

SWITCH Trial. A Sequential Multiple Adaptive Randomization Trial

Gianluca Baio. University College London Department of Statistical Science.

References. Christos A. Ioannou 2/37

Analysis of TB prevalence surveys

HOW STATISTICS IMPACT PHARMACY PRACTICE?

Technical Specifications

Bayesian Variable Selection Tutorial

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

16:35 17:20 Alexander Luedtke (Fred Hutchinson Cancer Research Center)

Comparing treatments evaluated in studies forming disconnected networks of evidence: A review of methods

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Coherence and calibration: comments on subjectivity and objectivity in Bayesian analysis (Comment on Articles by Berger and by Goldstein)

Statistical inference provides methods for drawing conclusions about a population from sample data.

Type and quantity of data needed for an early estimate of transmissibility when an infectious disease emerges

A Bayes Factor for Replications of ANOVA Results

Bayesian Approach for Neural Networks Review and Case Studies. Abstract

Bayesian Analysis of a Single-Firm Event Study

Probabilistic Projections of Populations With HIV: A Bayesian Melding Approach

Two-sample Categorical data: Measuring association

How few countries will do? Comparative survey analysis from a Bayesian perspective

Student Performance Q&A:

The Century of Bayes

UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4

Kelvin Chan Feb 10, 2015

Quantifying uncertainty under a predictive, epistemic approach to risk analysis

Transcription:

A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37

The Reverend Thomas Bayes, 1701 1761 2 / 37

Pierre-Simon Laplace, 1749 1827 3 / 37

. [I]t is clear that it is not possible to think about learning from experience and acting on it without coming to terms with theorem. - Jerome Cornfield 4 / 37

statistics has long been overlooked in the quantitative methods training of social scientists. Typically, the only introduction that a student might have to ideas is a brief overview of theorem while studying probability in an introductory statistics class. 1 Until recently, it was not feasible to conduct statistical modeling from a perspective owing to its complexity and lack of availability. 2 statistics represents a powerful alternative to frequentist (classical) statistics, and is therefore, controversial. 5 / 37 statistical methods are now more popular owing to the development of powerful statistical software tools that make the estimation of complex models feasible from a perspective.

Outline Major differences between the and frequentist paradigms of statistics hypothesis testing model building and evaluation within the school. 6 / 37

Paradigm Difference I: Conceptions of Probability For frequentists, the basic idea is that probability is represented by the model of long run frequency. Frequentist probability underlies the Fisher and Neyman-Pearson schools of statistics the conventional methods of statistics we most often use. The frequentist formulation rests on the idea of equally probable and independent events The physical representation is the coin toss, which relates to the idea of a very large (actually infinite) number of repeated experiments. 7 / 37

The entire structure of Neyman - Pearson hypothesis testing and Fisherian statistics (together referred to as the frequentist school) is based on frequentist probability. Our conclusions regarding null and alternative hypotheses presuppose the idea that we could conduct the same experiment an infinite number of times. Our interpretation of confidence intervals also assumes a fixed parameter and CIs that vary over an infinitely large number of identical experiments. 8 / 37

But there is another view of probability as subjective belief. 9 / 37 The physical model in this case is that of the bet. Consider the situation of betting on who will win the World Series. Here, probability is not based on an infinite number of repeatable and independent events, but rather on how much knowledge you have and how much you are willing to bet. Subjective probability allows one to address questions such as what is the probability that my team will win the World Series? Relative frequency supplies information, but it is not the same as probability and can be quite different. This notion of subjective probability underlies statistics.

.. Consider the joint probability of two events, Y and X, for example observing lung cancer and smoking jointly. The joint probability can be written as p(cancer, smoking) = p(cancer smoking)p(smoking) (1) Similarly, p(smoking, cancer) = p(smoking cancer)p(cancer) (2) 10 / 37

.. Because these are symmetric, we can set them equal to each other to obtain the following p(cancer smoking)p(smoking) = p(smoking cancer)p(cancer) (3) p(cancer smoking) = p(smoking cancer)p(cancer) p(smoking) (4). The inverse probability theorem ( theorem) states p(smoking cancer) = p(cancer smoking)p(smoking) p(cancer) (5) 11 / 37

Why do we care? Because this is how you could go from the probability of having cancer given that the patient smokes, to the probability that the patient smokes given that he/she has cancer. We simply need the marginal probability of smoking and the marginal probability of cancer. We will refer to these marginal probabilities as prior probabilities. 12 / 37

Statistical Elements of What is the role of theorem for statistical inference? Denote by Y a random variable that takes on a realized value y. For example, a person s socio-economic status could be considered a random variable taking on a very large set of possible values. Once the person identifies his/her socioeconomic status, the random variable Y is now realized as y. Because Y is unobserved and random, we need to specify a probability model to explain how we obtained the actual data values y. 13 / 37

Next, denote by θ a parameter that we believe characterizes the probability model of interest. The parameter θ can be a single number, such as the mean or the variance of a distribution, or it can be a set of numbers, such as a set of regression coefficients in regression analysis or factor loadings in factor analysis. We are concerned with determining the probability of observing y given the unknown parameters θ, which we write as p(y θ). In statistical inference, the goal is to obtain estimates of the unknown parameters given the data. 14 / 37

Paradigm difference II: Nature of parameters 15 / 37 The key difference between statistical inference and frequentist statistical inference concerns the nature of the unknown parameters θ. In the frequentist tradition, the assumption is that θ is unknown, but that estimation of this parameter does not incorporate the our uncertainty about what is reasonable to believe about the parameter. In statistical inference, θ is considered unknown and random, possessing a probability distribution that reflects our uncertainty about the true value of θ. Because both the observed data y and the parameters θ are assumed random, we can model the joint probability of the parameters and the data as a function of the conditional distribution of the data given the parameters, and the prior distribution of the parameters.

. More formally, p(θ, y) = p(y θ)p(θ). (6) where p(θ, y) is the joint distribution of the parameters and the data. Following theorem described earlier, we obtain p(θ y) = p(y θ)p(θ) p(y) p(y θ)p(θ), (7) where p(θ y) is referred to as the posterior distribution of the parameters θ given the observed data y. 16 / 37

Equation (7) represents the core of statistical inference and is what separates statistics from frequentist statistics. Equation (7) states that our uncertainty regarding the parameters of our model, as expressed by the prior distribution p(θ), is weighted by the actual data p(y θ) yielding an updated estimate of our uncertainty, as expressed in the posterior distribution p(θ y). 17 / 37

statistics requires that prior distributions be specified for ALL model parameters. There are three classes of prior distributions: 1 Non-informative priors We use these to quantify actual or feigned ignorance. Lets the data maximally speak. 2 Weakly informative priors Allows some weak bounds to be placed on parameters. 3 Informative priors Allows cumulative information to be considered. 18 / 37

MCMC The increased popularity of methods in the social and behavioral sciences has been the (re)-discovery of numerical algorithms for estimating the posterior distribution of the model parameters given the data. It was virtually impossible to analytically derive summary measures of the posterior distribution, particularly for complex models with many parameters. Instead of analytically solving for estimates of a complex posterior distribution, we can instead draw samples from p(θ y) and summarize the distribution formed by those samples. This is referred to as Monte Carlo integration. The most popular methods of MCMC are the Gibbs sampler, the Metropolis-Hastings algorithm, and more recently, the Hamiltonian MCMC algorithm. 19 / 37

Posterior Probability Intervals hypothesis testing begins first by obtaining summaries of relevant distributions. The difference between and frequentist statistics is that with statistics we wish to obtain summaries of the posterior distribution. The expressions for the mean (EAP), variance, and mode (MAP) of the posterior distribution come from expressions for the mean and variance of conditional distributions generally. 20 / 37

Posterior Probability Intervals Posterior Probability Intervals In addition to point summary measures, it may also be desirable to provide interval summaries of the posterior distribution. Recall that the frequentist confidence interval requires that we imagine an infinite number of repeated samples from the population characterized by µ. For any given sample, we can obtain the sample mean x and then form a 100(1 α)% confidence interval. The correct frequentist interpretation is that 100(1 α)% of the confidence intervals formed this usual way capture the true parameter µ under the null hypothesis. Notice that the probability that the parameter is in the interval is either zero or one. 21 / 37

Posterior Probability Intervals (cont d) In contrast, the framework assumes that a parameter has a probability distribution. Posterior Probability Intervals Sampling from the posterior distribution of the model parameters, we can obtain its quantiles. From the quantiles, we can directly obtain the probability that a parameter lies within a particular interval. So, a 95% posterior probability interval would mean that the probability that the parameter lies in the interval is 0.95. The posterior probability interval can, of course, include zero. Zero is a credible value for the parameter. Notice that this is entirely different from the frequentist interpretation, and arguably aligns with common sense. 22 / 37

Posterior Predictive Checking The frequentist and goals of model building are the same. 1 specification based on prior knowledge 2 estimation and fitting to data obtained from a relevant population 3 evaluation and modification Bayes Factors 4 choice 23 / 37

Posterior Predictive Checking Despite these similarities there are important differences. A major difference between the and frequentist goals of model building lie in the model specification stage. Because the perspective explicitly incorporates uncertainty regarding model parameters using priors, the first phase of modeling building requires the specification of a full probability model for the data and the parameters of the model. Bayes Factors fit implies that the full probability model fits the data. Lack of model fit may be due to incorrect specification of likelihood, the prior distribution, or both. 24 / 37

Posterior Predictive Checking Bayes Factors Another difference between the and frequentist goals of model building relates to the justification for choosing a particular model among a set of competing models. building and model choice in the frequentist domain is based primarily on choosing the model that best fits the data. This has certainly been the key motivation for model building, re-specification, and model choice in the context of structural equation modeling (Kaplan 2009). In the domain, the choice among a set of competing models is based on which model provides the best posterior predictions. That is, the choice among a set of competing models should be based on which model will best predict what actually happened. 25 / 37

Posterior Predictive Checking A very natural way of evaluating the quality of a model is to examine how well the model fits the actual data. Posterior Predictive Checking Bayes Factors 26 / 37 In the context of statistics, the approach to examining how well a model predicts the data is based on the notion of posterior predictive checks, and the accompanying posterior predictive p-value. The general idea behind posterior predictive checking is that there should be little, if any, discrepancy between data generated by the model (including the priors), and the actual data itself. Posterior predictive checking is a method for assessing the specification quality of the model. Any deviation between the data generated from the model and the actual data as measured by the posterior predictive distribution and the p-value implies model misspecification.

Bayes Factors Posterior Predictive Checking Bayes Factors A very simple and intuitive approach to model building and model selection uses so-called Bayes factors (Kass & Raftery, 1995) Consider two regression models with a different number of variables, or two structural equation models specifying very different directions of mediating effects. The Bayes factor provides a way to quantify the odds that the data favor one hypothesis over another. A key benefit of Bayes factors is that models do not have to be nested. The factor gives rise to the popular information criterion (BIC). 27 / 37

Averaging Posterior Predictive Checking Bayes Factors. The selection of a particular model from a universe of possible models can also be characterized as a problem of uncertainty. This problem was succinctly stated by Hoeting, Raftery & Madigan (1999) who write Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-confident inferences and decisions that are more risky than one thinks they are. (pg. 382) An approach to addressing the problem of model selection is the method of model averaging (BMA). 28 / 37 I will discuss this on Friday.

Within the School The Advantage statistics represents a powerful alternative to frequentist (classical) statistics, and is therefore, controversial. The controversy lies in differing perspectives regarding the nature of probability, and the implications for statistical practice that arise from those perspectives. Recall that the frequentist framework views probability as synonymous with long-run frequency, and that the infinitely repeating coin-toss represents the canonical example of the frequentist view.. 29 / 37

Summarizing the Advantage The Advantage The major advantages of statistical inference over frequentist statistical inference are 1 Coherence Probability axioms constrain how degrees-of-belief translate into action. 2 Flexibility in handling very complex models (through MCMC) 3 Inferences based on data actually observed 4 Quantifying evidence 5 Incorporating prior knowledge 30 / 37

Subjective v. Objective Bayes The Advantage Subjective practice attempts to bring prior knowledge directly into an analysis. This prior knowledge represents the analysts (or others) degree-of-uncertainity. An analyst s degree-of-uncertainty is encoded directly into the prior distribution, and specifically in the degree of precision around the parameter of interest. The advantages according to Press (2003) include 1 Subjective priors are proper 2 can be based on factual prior knowledge 3 Small sample sizes can be handled. Though this is not a free lunch. 31 / 37

The Advantage The disadvantages to the use of subjective priors according to Press (2003) are 1 It is not always easy to encode prior knowledge into probability distributions. 2 Subjective priors are not always appropriate in public policy or clinical situations. 32 / 37

The Advantage Within objective statistics, focus is on letting the data maximally speak In terms of advantages of the objective Bayes school, Press (2003) notes that 1 Objective priors can be used as benchmarks against which choices of other priors can be compared 2 Objective priors reflect the view that little information is available about the process that generated the data 3 An objective prior provides results equivalent to those based on a frequentist analysis 4 Objective priors are sensible public policy priors. 33 / 37

The Advantage In terms of disadvantages of objective priors, Press (2003) notes that 1 Objective priors can lead to improper results when the domain of the parameters lie on the real number line. 2 Parameters with objective priors are often independent of one another, whereas in most multi-parameter statistical models, parameters are correlated. 3 Expressing complete ignorance about a parameter via an objective prior leads to incorrect inferences about functions of the parameter. 34 / 37

In its extreme form, the subjectivist school The Advantage allows for personal opinion to be elicited and incorporated into a analysis. places no restriction on the source, reliability, or validity of the elicited opinion as long as the rules of probability (coherence) are adhered to. In its extreme form, the objectivist school views personal opinion as the realm of psychology with no place in a statistical analysis. would require formal rules for choosing reference priors. 35 / 37

The Advantage To conclude, the school of statistical inference is, arguably, superior to the frequentist school as a means of creating and updating new knowledge in the social sciences. The school of statistical inference is: 1 the only coherent approach to incorporating prior information thus leading to updated inferences and growth in knowledge. 2 represents an internally consistent and rigorous way out of the p-value problem. 3 approaches model building, evaluation, and choice, from a predictive point of view. 36 / 37 As always, the full benefit of the approach to research in the social sciences will be realized when it is more widely adopted and yields reliable predictions that advance knowledge.

The Advantage. THANK YOU 37 / 37