Bayesian hierarchical modelling Matthew Schofield Department of Mathematics and Statistics, University of Otago Bayesian hierarchical modelling Slide 1
What is a statistical model? A statistical model: a data generating process f(y θ) The model can be used to simulate data Fixing parameters θ and considering realizations y The model can be used for statistical inference Observe data y Estimate parameters ˆθ Infer to the population of interest Bayesian hierarchical modelling Slide 2
What is a hierarchical model? The parameters θ are described by a probability model f(θ ψ) θ is considered a random variable Special cases include: Mixed models Latent variable models Missing data models Various forms of overdispersion Penalized regression... Bayesian hierarchical modelling Slide 3
What is Bayesian statistics? Alternate approach for statistical inference Probability is used to express uncertainty Update our knowledge (with data): f(θ) }{{} Prior distribution f(y θ) }{{} Collect data f(θ y) }{{} Posterior distribution Posterior distribution f(θ y) reflects our updated knowledge Used for inference Often the prior is chosen to reflect ignorance Reference or default prior Bayesian hierarchical modelling Slide 4
Bayesian hierarchical modelling Combine the previous two slides Use Bayesian statistics for inference from hierarchical models The two are often combined Hierarchical modelling is natural within a Bayesian context Relatively simple to specify and fit hierarchical models Bayesian hierarchical modelling Slide 5
Example I: muscle fibres Observe fibre level data across a muscle cross-section Binary: slow-twitch or fast-twitch fibre Bayesian hierarchical modelling Slide 6
Example I: muscle fibres Observe fibre level data across a muscle cross-section Binary: slow-twitch or fast-twitch fibre Bayesian hierarchical modelling Slide 6
Example I: muscle fibres Fibres are grouped within fascicles Multiple fascicles make up a muscle Goal: understand how fibre composition depends on location Conjecture: function declines near fascicle and muscle edge Model occurs at two levels: Fibre level Parameters describing how fibres vary within fascicle Fascicle level Model fibre level parameters based on fasicle location Complexity: allow for additional spatial covariation Bayesian hierarchical modelling Slide 6
Example II: genetic mapping SNP data from high-throughput sequencing Full-sibling family population Outcrossing of two individuals Output: a genetic map Locating the (SNP) markers on the genome Estimating the genetic distance between markers Bayesian hierarchical modelling Slide 7
Example II: genetic mapping Statistical model includes: Parameter that account for genotyping error Nuisance parameter Collection of parameters that describe crossover Functions of these parameters determine genetic distance One parameter for each marker data hungry Consider as a realization from hierarchical model Borrow strength and improve estimation? Other advantages: prior specification Potential for model extension describing relationship Consider map uncertainty Bayesian hierarchical modelling Slide 7
Example III: animal abundance (a cautionary tale) Avoid tagging animals (difficult) Use repeated counts to estimate abundance Assume the distribution (binomial) is the same each visit Both index (N) and probability (p) are unknown If we have 2 replicates both parameters can be estimated Properties have been long studied Peter Hall (1992): On the Erratic Behavior of Estimators of N in the Binomial N, p distribution Use repeated trials (across space) Consider abundances (N s) as realization from hierarchical model Borrow strength and improve estimation? Bayesian hierarchical modelling Slide 8
Other examples Climate reconstruction Missing data in earthquake records Density dependence from mark-recapture data... Bayesian hierarchical modelling Slide 9
Some advantages Model latent variables Describe a model for a hidden or partially observed process Separate data collection (nuisance) and process modelling Specify a complex marginal model for the data A series of simple conditional models Return to this point later Improved estimation Specifying hierarchical models can improve estimation Broadly applicable Ideas go back to work by James and Stein Look at some simulation results Bayesian hierarchical modelling Slide 10
Simulation: ANOVA type model Five groups, each with 10 observations Variance is known: 1 Look at two scenarios: 1. Five means are similar: µ = (0, 0.1, 0.1, 0.2, 0.2) 2. Five means are unrelated: µ = (0, 100, 100, 200, 200) Look at the mean square error of µ s: Standard ANOVA model y ij iid N(µ j, 1) Hierarchical model y ij iid N(µ j, 1) µ j iid N(α, κ 2 ) Bayesian hierarchical modelling Slide 11
Simulation: similar values of µ Difference in squared errors (+ve: hier model preferred) 0.0 0.2 0.4 0.6 Hierarchical model lower MSE than standard ANOVA model Bayesian hierarchical modelling Slide 12
Simulation: unrelated values of µ Difference in squared errors (+ve: hier model preferred) 0.02 0.01 0.00 0.01 0.02 When µ j s are unrelated hierarchical model has done no harm Return to this later Bayesian hierarchical modelling Slide 13
Relatively straightforward to fit The model above is straightforward to specify and fit in freely available software, e.g. JAGS. model{ for(j in 1:G){ for(i in 1:n[j]){ y[i,j] ~ dnorm(mu[j],1) } mu[j] ~ dnorm(alpha,tau) } ### Prior distributions -- their specification is for another talk tau <- 1/kappa^2 kappa ~ dt(0,0.04,3)t(0,) alpha ~ dnorm(0,0.0001) } Bayesian hierarchical modelling Slide 14
Relatively straightforward to fit When fitting hierarchical models using MCMC Computational issues can and do arise Generally easier than finding MLEs Extending the hierarchical is relatively easy E.g. we could allow variance of y to be: Unknown Vary by group Hierarchical distribution Bayesian hierarchical modelling Slide 15
Model extensions: JAGS model{ for(j in 1:G){ } for(i in 1:n[j]){ } y[i,j] ~ dnorm(mu[j],tauy[j]) mu[j] ~ dnorm(alpha[1],tau[1]) tauy[j] <- 1/sdy[j] sdy[j] ~ dlnorm(alpha[2],tau[2]) ### Prior distributions for(h in 1:2){ } tau[h] <- 1/kappa[h]^2 kappa[h] ~ dt(0,0.04,3)t(0,) alpha[h] ~ dnorm(0,0.0001) } Bayesian hierarchical modelling Slide 16
What are we doing? Specifying a marginal model We specify the model conditionally f(y θ)f(θ ψ) The model is fitted marginally f(y ψ) = f(y θ)f(θ ψ)dθ MCMC perform (numerical) integration for us With simple conditional models Results in complex marginal models Some care is required Bayesian hierarchical modelling Slide 17
Marginal and conditional models Many common distributions are marginal hierarchical models Negative binomial: Conditional: y P ois(θ) θ Gamma(α, β) Marginal: y NB(y; α, β) t distribution Conditional: y N(µ, θ) Marginal: y t ν (µ, σ 2 ) θ Gamma 1 ( ν 2, ν 2 σ2) Beta-binomial Mixture models Probit regression Bayesian hierarchical modelling Slide 18
What are we doing? Partial pooling Consider the ANOVA model Two choices: 1. Means are different (no pooling) 2. Means are the same (complete pooling) Hierarchical modeling gives an intermediate option Means are different but related (partial pooling) Bayesian hierarchical modelling Slide 19
What are we doing? Partial pooling 0.15 0.20 0.25 0.30 0.35 0.40 Batting average JS MLE Bayesian hierarchical modelling Slide 19
What are we doing? Biased estimation Consider the ANOVA model Gauss-Markov theorem: BLUE Hierarchal model is introducing bias Simulation 1: E[ µ5 ] 0.1 with µ 5 = 0.2 Increased bias is associated with decreased variance Simulation 1: Var( µ5 ) 0.05 compared to Var( µ 5 ) 0.1 Introduce bias to improve (decrease) the mean square error Goes back to work by James, Stein, Efron, Morris,... Bayesian hierarchical modelling Slide 20
Example 1 Goal: was to describe spatial distribution of fibres on muscle Probit regression at the fibre level (for each fascicle) Predictor is the distance from edge of fascicle Spatial model on error structure Both intercept and slope are modelled at the fascicle level Intercept describes relative abundance of fast/slow twitch fibres Modelled as function of distance from muscle edge Slope tells us about amount fast/slow twitch fibres change within fascicle as a function of distance Modelled as function of distance from muscle edge Common spatial process at fascicle level Bayesian hierarchical modelling Slide 21
Example 1 0.80 0.60 0.40 0.20 0.00 0.20 0.40 β 0 0.25 0.20 β 1 0.15 0.10 Bayesian hierarchical modelling Slide 22
Example 1 Distance explained the spatial variability of type at fibre level Distance only partially explained the variability of the parameters in the fascicle level model Considerable spatial clustering after accounting for distance Distance from edge appears to be an important predictor within fascicles Assess the importance at multiple levels within the muscle Future: embed this within a larger hierarchical model to assess demographic changes in muscle composition Bayesian hierarchical modelling Slide 23
Example 2 Work in progress showing considerable promise Bayesian hierarchical modelling Slide 24
Cautions and limitations Hierarchical modelling has the potential for abuse Replace data with model assumption Several examples Example 3 (N-mixture model) Latent class analysis for diagnostic testing Estimating abundance from occupancy data Including heterogeneity in mark-recapture models Factor analysis... Simplicity of model fitting can lead to pushing the boundaries. Bayesian hierarchical modelling Slide 25
A continuum of models M 1 M 5 M 3 M 4 M 2 Data model estimable without hierarchy Data model overspecified Stage 2: partial pooling Examples 1 and 2 Estimable with hierarchy As we move from left to right: Sensitivity to hierarchical model increases Increasing reliance on specification of hierarchical model Bayesian hierarchical modelling Slide 26
Model checking Model fitting is done marginally. Suggests we need to assess fit marginally RHS of continuum: hierarchical model essential Important part of model adequacy Should we check model fit conditionally? Trade-off between data and process models How do we assess fit? Bayesian hierarchical modelling Slide 27
What do the latent variables represent? When the latent variable is a first moment Assess directly against data We may need to pool If the latent variable is not a first moment Cannot directly assess variables against data Estimation can be sensitive to minor changes in data Process variables need not reflect any physical quantity RHS of continuum & not related to a moment Good marginal fit with latent variable not reflecting reality Bayesian hierarchical modelling Slide 28
Example 3 Challenges to fitting model to one site Erratic behaviour of standard estimators Sensitive to model assumptions (cf Poisson) Marginal model is multivariate Poisson Mean = λp = µ Variance = λp = µ Correlation = p Latent abundances do not relate to first moment Good information regarding µ Information about λ (and the site specific N s) Depends on p (estimated from second moment) Ratio Bayesian hierarchical modelling Slide 29
Summary and discussion Hierarchical models have considerable appeal Degree of flexibility in model specification Separate data model and process models Hierarchical models can offer improved estimation Partial pooling Borrowing strength Regularization Bayesian approach offers advantages Hierarchical modelling cannot absolve all statistical sins Potential for a poor model to attain a veneer of respectability Need improved understanding of model adequacy Bayesian hierarchical modelling Slide 30
Acknowledgements Collaborators on various examples Tilman Davies, Phil Sheard, Jon Cornwall Timothy Bilton et al. Richard Barker, Bill Link, John Sauer Bayesian hierarchical modelling Slide 31