Bayesian performance In this section we will study the statistical properties of Bayesian estimates. Major topics include: The likelihood principle Decision theory/bayes rules Shrinkage estimators Frequentist properties of Bayesian estimators ST740 (2) Bayes Performance - Part 1 Page 1
The likelihood principle In a Bayesian analysis, everything we know about the parameters is summarized by the posterior (likelihood x prior). Is it true that classical methods only use the likelihood? Likelihood principle: Once the data are observed, the likelihood contains all the information in the data about the parameters. The p-value can violate the likelihood principle because it depends on both the data and unobserved events. Example (Lindley and Phillips): A coin with P(heads)=θ is flipped 12 times and we observe 9 heads. Now test H 0 : θ = 0.5 versus H 1 : θ > 0.5. Analysis 1: Analysis 2: ST740 (2) Bayes Performance - Part 1 Page 2
The likelihood principle A Bayesian analysis adheres to the likelihood principle. For example, in both analysis Do you think the likelihood principle is important? ST740 (2) Bayes Performance - Part 1 Page 3
Calibrated Bayes Now we ll begin to study the frequentist properties of Bayesian estimators (unbiasedness, consistency, etc.). First, should we (Bayesians) really care about frequentist properties of estimators? ST740 (2) Bayes Performance - Part 1 Page 4
Calibrated Bayes Some quotes from Little (2011). Calibrated Bayes, for Statistics in General, and Missing Data in Particular. Statistical Science, 26, 162-174. Little: To summarize, Bayesian statistics is strong for inference under an assumed model, but relatively weak for the development and assessment of models. Frequentist statistics provides useful tools for model development and assessment, but has weaknesses for inference under an assumed model. If this summary is accepted, then the natural compromise is to use frequentist methods for model development and assessment, and Bayesian methods for inference under a model. This capitalizes on the strengths of both paradigms, and is the essence of the approach known as Calibrated Bayes. Rubin: The applied statistician should be Bayesian in principle and calibrated to the real world in practice - appropriate frequency calculations help to define such a tie...frequency calculations are useful for making Bayesian statements scientific, scientific in the sense of capable of being shown wrong by empirical test; here the technique is the calibration of Bayesian probabilities to the frequencies of actual events. ST740 (2) Bayes Performance - Part 1 Page 5
In a Bayesian analysis all inference is based on the posterior distribution p(θ y). What is the best one-number summary of the posterior, ˆθ, to be used as the estimator? This depends on the situation, and in particular, on the penalty associated with different types of errors (e.g., maybe overestimation is way worse than underestimation). We will use decision theory to form estimators with good properties. ST740 (2) Bayes Performance - Part 1 Page 6
We need a definition of best to get started. Let θ 0 be the true value of the parameter and ˆθ(y) be our estimator (perhaps the posterior mean). The loss function l[θ 0, ˆθ(y)] is cost associated with estimating θ to be ˆθ(y) when the truth is θ 0. Examples: ST740 (2) Bayes Performance - Part 1 Page 7
The loss function l[θ 0, ˆθ(y)] depends on both the true value (θ 0 ) and the data (via ˆθ). We need to average over one or both of these to compare methods. Bayesian analysis is conditioned on the data, so we average over θ 0. Risk = l[θ, ˆθ(y)]w(θ)dθ. Which values of θ 0 should be weighted the highest? Bayesian risk = The Bayes rule is the estimator ˆθ(y) that minimizes Bayesian risk. ST740 (2) Bayes Performance - Part 1 Page 8
Under squared error loss l[θ 0, ˆθ(y)] = [θ 0 ˆθ(y)] 2, the Bayes rule is ST740 (2) Bayes Performance - Part 1 Page 9
Under absolute loss l[θ 0, ˆθ(y)] = θ 0 ˆθ(y), the Bayes rule is Under zero/one loss l[θ 0, ˆθ(y)] = I[θ 0 = ˆθ(y)], the Bayes rule is Hypothesis testing: Say θ = 0 if H 0 is true and θ = 1 if H 1 is true. Give the Bayes rule under the loss l[θ 0, ˆθ(y)] = λ 1 I[θ 0 = 0, ˆθ(y) = 1] + λ 2 I[θ 0 = 1, ˆθ(y) = 0]. How to pick λ 1 and λ 2? ST740 (2) Bayes Performance - Part 1 Page 10
We ve seen loss function for point estimation and hypothesis testing, which loss functions are appropriate for interval estimation? ST740 (2) Bayes Performance - Part 1 Page 11