Bayesian Analysis with Stata John Thompson University of Leicester A Stata Press Publication StataCorp LP College Station, Texas
Contents List of figures List of tables Preface Acknowledgments xiii xvii xix xxi 1 The problem of priors 1 1.1 Case study 1: An early phase vaccine trial 1 1.2 Bayesian calculations 2 1.3 Benefits of a Bayesian analysis 4 1.4 Selecting a good prior 5 1.5 Starting points 7 1.6 Exercises 8 2 Evaluating the posterior 9 2.1 Introduction 9 2.2 Case study 1: The vaccine trial revisited 9 2.3 Marginal and conditional distributions 11 2.4 Case study 2: Blood pressure and age 12 2.5 Case study 2: BP and age continued 17 2.6 General log posteriors 19 2.7 Adding distributions to logdensity 21 2.8 Changing parameterization 23 2.9 Starting points 24 2.10 Exercises 24 3 Metropolis-Hastings 27 3.1 Introduction 27
viii Contents 3.2 The MH algorithm in Stata 29 3.3 The mhs commands 30 3.4 Case study 3: Polyp counts 31 3.5 Scaling the proposal distribution 36 3.6 The mcmcrun command 37 3.7 Multiparameter models 39 3.8 Case study 3: Polyp counts continued 39 3.9 Highly correlated parameters 44 3.9.1 Centering 44 3.9.2 Block updating 47 3.10 Case study 3: Polyp counts yet again 48 3.11 Starting points 49 3.12 Exercises 50 4 Gibbs sampling 53 4.1 Introduction 53 4.2 Case study 4: A regression model for pain scores 54 4.3 Conjugate priors 59 4.4 Gibbs sampling with nonstandard distributions 59 4.4.1 Griddy sampling 60 4.4.2 Slice sampling 61 4.4.3 Adaptive rejection 63 4.5 The gbs commands 66 4.6 Case study 4 continued: Laplace regression 67 4.7 Starting points 71 4.8 Exercises 72 5 Assessing convergence 75 5.1 Introduction 75 5.2 Detecting early drift 75 5.3 Detecting too short a run 80 5.3.1 Thinning the chain 81
Contents ix 5.4 Running multiple chains 83 5.5 Convergence of functions of the parameters 85 5.6 Case study 5: Beta-blocker trials 85 5.7 Further reading 90 5.8 Exercises 90 6 Validating the Stata code and summarizing the results 93 6.1 Introduction 93 6.2 Case study 6: Ordinal regression 93 6.3 Validating the software 97 6.4 Numerical summaries 100 6.5 Graphical summaries 104 6.6 Further reading 108 6.7 Exercises 108 7 Bayesian analysis with Mata 111 7.1 Introduction Ill 7.2 The basics of Mata Ill 7.3 Case study 6: Revisited 116 7.4 Case study 7: Germination of broomrape 118 7.4.1 Tuning the proposal distributions 121 7.4.2 Using conditional distributions 122 7.4.3 More efficient computation 123 7.4.4 Hierarchical centering 124 7.4.5 Gibbs sampling 125 7.4.6 Slice, Griddy, and ARMS sampling 126 7.4.7 Timings 126 7.4.8 Adding new densities to logdensity() 128 7.5 Further reading 129 7.6 Exercises 129 8 Using WinBUGS for model fitting 131 8.1 Introduction 131
X Contents 8.2 Installing the software 131 8.2.1 Installing OpenBUGS 132 8.2.2 Installing WinBUGS 132 8.3 Preparing a WinBUGS analysis 133 8.3.1 The model file 133 8.3.2 The data file 135 8.3.3 The initial values file 136 8.3.4 The script file 137 8.3.5 Running the script 138 8.3.6 Reading the results into Stata 138 8.3.7 Inspecting the log file 139 8.3.8 Reading WinBUGS data files 139 8.4 Case study 8: Growth of sea cows 140 8.4.1 WinBUGS or OpenBUGS 144 8.5 Case study 9: Jawbone size 146 8.5.1 Overrelaxation 150 8.5.2 Changing the seed for the random-number generator... 151 8.6 Advanced features of WinBUGS 152 8.6.1 Missing data 152 8.6.2 Censoring and truncation 152 8.6.3 Nonstandard likelihoods 154 8.6.4 Nonstandard priors 155 8.6.5 The cut() function 155 8.7 GeoBUGS 156 8.8 Programming a series of Bayesian analyses 157 8.9 OpenBUGS under Linux 159 8.10 Debugging WinBUGS 159 8.11 Starting points 161 8.12 Exercises 161
Contents xi 9 Model checking 163 9.1 Introduction 163 9.2 Bayesian residual analysis 163 9.3 The mcmccheck command 165 9.4 Case study 10: Models for Salmonella assays 166 9.4.1 Generating the predictions in WinBUGS 167 9.4.2 Plotting the predictive distributions 169 9.4.3 Residual plots 170 9.4.4 Empirical probability plots 174 9.4.5 A summary plot 177 9.5 Residual checking with Stata 179 9.6 Residual checking with Mata 180 9.7 Further reading 182 9.8 Exercises 182 10 Model selection 185 10.1 Introduction 185 10.2 Case study 11: Choosing a genetic model 186 10.2.1 Plausible models 187 10.2.2 Bayes factors 188 10.3 Calculating a BF 189 10.4 Calculating the BFs for the NTD case study 191 10.5 Robustness of the BF 199 10.6 Model averaging 199 10.7 Information criteria 201 10.8 DIC for the genetic models 203 10.9 Starting points 204 10.10 Exercises.' 204 11 Further case studies 205 11.1 Introduction 205 11.2 Case study 12: Modeling cancer incidence 205
xii Contents 11.3 Case study 13: Creatinine clearance 212 11.4 Case study 14: Microarray experiment 219 11.5 Case study 15: Recurrent asthma attacks 228 11.6 Exercises 235 12 Writing Stata programs for specific Bayesian analysis 237 12.1 Introduction 237 12.2 The Bayesian lasso 237 12.3 The Gibbs sampler 239 12.4 The Mata code 242 12.5 A Stata ado-file 244 12.6 Testing the code 245 12.7 Case study 16: Diabetes data 246 12.8 Extensions to the Bayesian lasso program 249 12.9 Exercises 250 A Standard distributions 251 References 265 Author index 273 Subject index 277