An Introduction to Bayesian Statistics

Similar documents
12.1 Inference for Linear Regression. Introduction

Business Statistics Probability

T-Statistic-based Up&Down Design for Dose-Finding Competes Favorably with Bayesian 4-parameter Logistic Design

What is probability. A way of quantifying uncertainty. Mathematical theory originally developed to model outcomes in games of chance.

Sheila Barron Statistics Outreach Center 2/8/2011

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES

Previously, when making inferences about the population mean,, we were assuming the following simple conditions:

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T.

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger

Bayesian and Frequentist Approaches

Chapter 25. Paired Samples and Blocks. Copyright 2010 Pearson Education, Inc.

Unit 1 Exploring and Understanding Data

A Brief Introduction to Bayesian Statistics

Understandable Statistics

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Ecological Statistics

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

A Case Study: Two-sample categorical data

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Estimating Bayes Factors for Linear Models with Random Slopes. on Continuous Predictors

MS&E 226: Small Data

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Chapter 1: Exploring Data

Treatment effect estimates adjusted for small-study effects via a limit meta-analysis

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

Still important ideas

10.1 Estimating with Confidence. Chapter 10 Introduction to Inference

Simple Linear Regression the model, estimation and testing

Chapter 3 CORRELATION AND REGRESSION

Normal Distribution. Many variables are nearly normal, but none are exactly normal Not perfect, but still useful for a variety of problems.

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

The Pretest! Pretest! Pretest! Assignment (Example 2)

Method Comparison for Interrater Reliability of an Image Processing Technique in Epilepsy Subjects

Bayes Linear Statistics. Theory and Methods

Bayesian approaches to handling missing data: Practical Exercises

Comparison of two means

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Research Analysis MICHAEL BERNSTEIN CS 376

Bayesian hierarchical modelling

bivariate analysis: The statistical analysis of the relationship between two variables.

Fundamental Clinical Trial Design

An informal analysis of multilevel variance

Still important ideas

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data

Name: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.

Psychology Research Process

OCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010

Section 3.2 Least-Squares Regression

Understanding Statistics for Research Staff!

Chapter 8 Estimating with Confidence

PRINCIPLES OF STATISTICS

IAPT: Regression. Regression analyses

On Regression Analysis Using Bivariate Extreme Ranked Set Sampling

Inferential Statistics

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

Comparing multiple proportions

Formulating and Evaluating Interaction Effects

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Bayesian Mediation Analysis

STAT 200. Guided Exercise 4

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

Chapter 3: Examining Relationships

The Confidence Interval. Finally, we can start making decisions!

The Late Pretest Problem in Randomized Control Trials of Education Interventions

APPENDIX N. Summary Statistics: The "Big 5" Statistical Tools for School Counselors

MEA DISCUSSION PAPERS

Chapter 19. Confidence Intervals for Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance

Methods Research Report. An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

3.2 Least- Squares Regression

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Appendix B Statistical Methods

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

Confidence Intervals On Subsets May Be Misleading

Study Guide for the Final Exam

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions

Regression Including the Interaction Between Quantitative Variables

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

Dan Byrd UC Office of the President

Chapter 19. Confidence Intervals for Proportions. Copyright 2010 Pearson Education, Inc.

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

COMPARING SEVERAL DIAGNOSTIC PROCEDURES USING THE INTRINSIC MEASURES OF ROC CURVE

Workshop: Basic Analysis of Survey Data Martin Mölder November 23, 2017

Psychology Research Process

RISK PREDICTION MODEL: PENALIZED REGRESSIONS

Student Performance Q&A:

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

City, University of London Institutional Repository

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

Transcription:

An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA Fielding School of Public Health robweiss@ucla.edu Sept 2015 Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 1 / 33

Bayesian Statistics What Exactly is Bayesian Statistics? A philosophy of statistics. A generalization of classical statistics. An approach to statistics that explicitly incorporates expert knowledge in modeling data. A language to talk about statistical models and expert knowledge. An approach that allows complex models and explains why they are helpful. A different method for fitting models. An easier approach to statistics. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 2 / 33

Prior to Posterior Bayes Theorem Prior which parameter values you think are likely and unlikely. Collect data. Data gives us Likelihood which parameter values the data consider likely Update prior to Posterior what values you think are likely and unlikely given prior info and data. Prior and posterior are both probability distributions Bayes Theorem: Posterior = c Prior Data Likelihood (1) Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 3 / 33

Prior Distributions Prior: Probability distribution summarize beliefs about the unknown parameters prior to seeing the data. In practice: a combination of expert knowledge and practical choices. Priors can be specified by A point estimate (My systolic blood pressure is around 123), and A measure of uncertainty (I might be in error by 3 points). (Is that a maximal error, an SD, or perhaps 2 SD? ) Need to decide (1 SD). And, A practical choice about the density: normal distribution. N(123, 3 2 ). Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 4 / 33

A Prior by Any Other Name Would Smell as Sweet Priors and Bayesian methods keep getting reinvented with other names penalized likelihood; penalized regression Random Effects Models (REMs) Generalized Linear Mixed Models (GLMMs) Hierarchical models multi-level models (MLM) Pitman estimator Shrinkage estimator Ridge regression Stein estimation James Stein estimators Regularization Best Linear Unbiased Prediction (BLUP) Restricted Maximum Likelihood (REML) Lasso Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 5 / 33

Bayesian Inference From Before the Data to After the Data Data is random before you see it, both classical and Bayesian statistics. After you see data, it is fixed in Bayesian statistics (not classical). Uncertainty in parameter values is quantified by probability distributions. Parameters have distributions before you see the data [Prior distribution]. Parameters have updated distributions after you see the data [Posterior distribution]. The conclusion of an analysis is a posterior distribution. Point estimates, sds, 95% CIs: summaries of the posterior distribution. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 6 / 33

Interpreting the Prior Five Data Points 123 is the value I find most likely. 3 is my assessment of the standard deviation. 123 ± 3 is (120, 126) is about a 68% interval. (117, 129) is I expect my true blood pressure to fall in this interval 95% of the time. That 68% or 95% would be over all prior intervals that I specify for various quantities. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 7 / 33

Collecting My Data From CVS and from Kaiser I went into Rite-Aid and took 5 measurements in a row. Measures were: 125, 126, 112, 120, 116. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 8 / 33

Collecting My Data From CVS and from Kaiser I went into Rite-Aid and took 5 measurements in a row. Measures were: 125, 126, 112, 120, 116. Compare to: A Kaiser visit, a measurement of 115. I was surprised. At Kaiser: 115 was common 3 years ago. Last 5 Kaiser data measures were 127 (2010), 115 (2010), 117 (2007), 115, 116. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 8 / 33

Graphical Results for Rob s Blood Pressure Prior, Data Likelihood, Posterior, CVS density 0.00 0.05 0.10 0.15 0.20 prior post like 105 110 115 120 125 130 135 mu Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 9 / 33

Graphical Results for Rob s Blood Pressure 2 Prior, Data Likelihood, Posterior from Kaiser density 0.00 0.05 0.10 0.15 0.20 prior post like 105 110 115 120 125 130 135 mu Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 10 / 33

Interpreting the Figure Rob s Blood Pressure Red curve is prior - what I believed before seeing the data. X axis values where red curve is highest are most plausible values a priori. Where red curve is low, values are less plausible. Black curve is likelihood Where black curve is high are values supported by data. Blue curve is posterior combination of prior and likelihood. Where blue curve is high are most plausible values given data and my prior beliefs. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 11 / 33

Inferences for Rob, CVS data Inference Mean SD 9% CI Posterior 121.2 2.0 (117.2, 125.2) Likelihood 119.8 2.7 (114.5, 125.1) Prior 123 3 (117.0, 129.0) Table: Likelihood gives the classical inference. Posterior is the Bayesian solution. Prior is what you guessed prior to seeing the data. Posterior sd is smaller than classical answer and posterior CI is narrower, ie more precise. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 12 / 33

Making Sausage Formula for Posterior Mean µ = n σ 2 n + 1 ȳ + σ 2 τ 2 µ Posterior mean µ 0 Prior mean τ Prior sd 1 τ 2 n + 1 µ 0 σ 2 τ 2 σ 2 Sampling (data) sd n Sample size ȳ Data mean Posterior mean is a weighted average of the prior mean and the data mean. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 13 / 33

Other Sources of Prior Information In your analyses, prior information came from you, the experts on your own blood pressure. There are other places to get prior information. Knowledge of usual blood pressure values can be used as prior knowledge. For example, nurses average SBP is 118. Nurses population sd is 8.7. Sampling sd of SBP is 13. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 14 / 33

UCLA Nurse s Blood Pressure Study An average of 47 measurements of Ambulatory Blood Pressure on 203 Nurses. Calculate average SBP for each nurse. Pretend this is their true" average SBP. Average SBP is 118, sd of SBP is 8.7. SD of SBP measures around nurse s average is 13. Suppose we record a single SBP measurement of 140. Does the nurse likely have 140 SBP, or is it a somewhat high reading from someone with 125 SBP, or a somewhat low reading from someone with SBP=155? Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 15 / 33

Nurse s Blood Pressure Study The Population Distribution of Blood Pressures Density 0.00 0.01 0.02 0.03 0.04 0.05 100 110 120 130 140 150 Population distribution of blood pressures Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 16 / 33

Nurse s Blood Pressure Study We observe SBP=140 Density 0.00 0.01 0.02 0.03 0.04 0.05 100 110 120 130 140 150 Population distribution of blood pressures Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 17 / 33

Compare Bayesian Analysis to Classical Analysis Sample a single observation y 1 from one nurse in the data set. Model the true mean µ a priori as normal with a prior mean of 118 and standard deviation of 8.7. The sampling sd is 13. The classical estimate is just the value of the single measurement. The Bayes measurement is calculated as illustrated for the measurement of 140 and as you did in the homework. Repeat for all subjects. Which result is closer to the true value? Calculate root mean squared error, and count which statistical approach comes closer to true value more often. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 18 / 33

Bayesian Analysis Beats Classical Analysis Badly Mean squared error Bayesian method 51 units. (7.1) Classical method 153 units. (12.4) Bayes method is 3 times more precise. Bayesian prior worth roughly 3 observations. Bayesian estimate closer 140 times. Classical estimate closer 63 times. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 19 / 33

Nurse s Blood Pressure Study Histogram of Bayesian And Classical Errors Bayesian Classical Density 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Density 0.00 0.01 0.02 0.03 0.04 0.05 0.06 40 20 0 20 40 60 Posterior Minus True 40 20 0 20 40 60 Sample Minus True Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 20 / 33

Nurse s Blood Pressure Study Errors from Bayesian And Classical Estimates Bayesian errors are packed more closely in towards zero. Almost never larger than 20 units. Mostly between -10 and 10 units. Some classical errors are even as large as -40 and +60. Generally between -20 and 20, but roughly 10% outside that range. Bayesian analysis tends to eliminate/reduce ridiculous estimates and make those results more reasonable. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 21 / 33

More Complex Statistical Models Hierarchical Models for Blood Pressure Model all data from the Nurse s Blood Pressure Data set. Model is n=203 copies of our simpler blood pressure data model. Plus a model of the population of true blood pressures. SBP population mean( 118) and sd 8.7 are unknown parms. Sampling sd (sigma 13) is estimated. Estimate all 203 individual nurse s average SBPs. A random effects model. The random effects are the individual nurses (unknown) true SBP values. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 22 / 33

Philosophy Confidence Intervals Classical statistics confidence interval 95% confidence intervals constructed from repeated data sets will contain the true parameter value 95% of the time. Throwing horseshoes: true parameter fixed, the data is random. Data attempts to throw a ringer capturing true parameter. Bayesian confidence (posterior or credible) interval Probability this interval contains true parameter is 95%. Data is fixed, parameter is unknown. Throwing a dart. Both: The parameter is fixed but unknown. Confidence is a statement about the statistician. Bayesian posterior probability is a statement about the particular scientific problem being studied. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 23 / 33

Philosophy Hypothesis Tests Classical hypothesis test p-value is the probability of observing a test statistic as extreme or more extreme assuming the null hypothesis is true. Bayesian hypothesis test. The probability that the null (alternative) hypothesis is true. The classical statement requires one more leap of faith: if the p-value is small, either something unusual occurred or the null hypothesis must be false. The Bayesian statement is a direct statement about the probability of the null hypothesis. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 24 / 33

Philosophy Hypothesis Tests II Classical hypothesis test H 0 is treated differently from H A. Only two hypotheses H0 must be nested in H A. Bayesian hypothesis test. May have 2 or 3 or more hypotheses. Hypotheses are treated symmetrically. Hypotheses need not be nested. Example: Three hypotheses H 1 : β 1 < β 2, H 2 : β 1 = β 2, H 3 : β 1 > β 2 can be handled in a Bayesian framework. The result is three probabilities. The probability that each hypothesis is true given the data. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 25 / 33

Orthodontic Example Example of Many Hypotheses Orthodontic fixtures (braces) gradually reposition teeth by sliding the teeth along an arch wire attached to the teeth by brackets. Each bracket was attached to the arch wire at one of four angles: 0, 2, 4, or 6. Outcome is frictional resistance which is expected to increase linearly as the angle increases. Four different types of fixtures are used in this study. 4 groups. Interest in intercepts and slopes (on angle) in each group. Are intercepts same or different? Which slopes are equal? For any two slopes, which is higher? Lower? Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 26 / 33

ANOVA table Treatments, Angle, Treatments by Angle F value Degrees Freedom p-value Treatment.25 3.86 Angle 185 1 <.0001 Treatment Angle 7.43 3 <.0001 Table: Results from the analysis of covariance with interaction. The p-value for treatment is for differences between treatment intercepts, and the p-value for treatment angle is for differences between treatment-specific slopes. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 27 / 33

Pairwise p-values Are they significant? What about multiple comparisons? Treatment 2 3 4 Treatment 1.03.04.04 2 <.0001.80 3.0003 Table: P-values for differences between treatment-specific slopes from the previous ANOVA. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 28 / 33

Orthodontic Example Plot of Data Lines are fitted values from usual least squares regression. Points are average responses for each group 1 through 4 and each angle. Four intercepts and four slopes in plot. Which intercepts appear to be possibly equal, which slopes appear to possibly be equal? Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 29 / 33

Orthodontic Example Data and fitted lines Friction 200 400 600 800 4 23 1 1 2 3 4 2 14 3 0 2 4 6 Angle 4 2 1 3 2 1 3 Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 30 / 33

Orthodontic Example Model Description Alber and Weiss (2009). A model selection approach to analysis of variance and covariance. Statistics in Medicine, 28, 1821-1840. Data y ij is the jth observation in group i and has angle x ij. y ij = α i + β i x ij + error. Simple linear regression within group Different intercept α i and different slope β i in each group. Prior on intercepts is that any two groups may or may not be equal. Same for slopes: Any two slopes may or may not be equal. Results on next page. For each pair of intercepts or slopes, the figure plots the probability that the first slope (intercept) is less than the next slope (intercept), the probability that they are equal, the probability that the first is greater than the second. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 31 / 33

Figure 4: Posterior probabilities of intercept differences (a) and slope differences (b) from Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 32 / 33 the independent partitions model. For p = 0 (intercepts) and p = 1 (slopes), each pair of Orthodontic Example Results Susan Alber 4 (a) 1 Posterior Probability Posterior Probability 0.75 0.5 0.25 0 (µ01, µ02) (µ01, µ03) (µ01, µ04) (µ02, µ03) (µ02, µ04) (µ03, µ04) Intercept Pair (µ0j, µ0k) (b) 1 0.75 0.5 0.25 µ0j < µ0k µ0j = µ0k µ0j > µ0k µ1j < µ1k µ1j = µ1k µ1j > µ1k 0 (µ11, µ12) (µ11, µ13) (µ11, µ14) (µ12, µ13) (µ12, µ14) (µ13, µ14) Slope Pair (µ1j, µ1k)

Discussion of Orthodontic Results All pairs of intercepts have probability about.6 of being equal. Intercepts, groups 1, 2, 3: Roughly equal probabilities of one being larger or smaller than the other if not equal for groups. Modest probability that group 4 is larger than others, if not equal. Slopes: 1 less than 2, 4. Slope 3 less than 1, 2, and 4. Only slopes 2, 4 may be equal. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIME 2015 33 / 33