Bayesian Tolerance Intervals for Sparse Data Margin Assessment

Bayesian Tolerance Intervals for Sparse Data Margin Assessment Benjamin Schroeder and Lauren Hund ASME V&V Symposium May 3, 2017 - Las Vegas, NV SAND2017-4590 C - (UUR) Sandia National Laboratories is a multi mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly own subsidiary of Honeywell International, Inc., for the U.S. Department of Energy s National Nuclear Security Administration under contract DE-NA0003525.

Outline 1. What is margin assessment? 2. What are the risks involved with current margin assessment practices? 3. How can Bayesian tolerance intervals alleviate some of those risks? Take Away Transparent bias involved with Bayesian tolerance intervals may be preferable to hidden risk associated with current practices

Margin Assessment Is there sufficient margin between performance requirement and observed behavior? Quantification of Margin and Uncertainty (QMU) Figure: QQ Plot performance characteristic - Given a performance measure, requirement, and data 1. Conduct an engineering analysis to assess data quality 2. Select a probability distribution for the data - advised to use probability plots and distributional hypothesis tests to select probability distribution Figure: QMU plot theoretical quantiles 3. Estimate a tolerance bound and margin ratio for the data density U M References & Additional Information ˆQ ˆQ 0.95 performance Pilch (2011), Newcomer (2012)

Parametric Percentile Estimation 1. Hypothesize that data comes from a certain distributional form based on prior knowledge, expert opinion, 2. Test hypothesis with goodness-of-fit test. Goodness-of-fit tests test if hypothesis can be rejected, not if hypothesis might be true. 3. If hypothesis is not rejected, use distributional form to make percentile estimates. Comparison to requirement is margin 4. Classical tolerance intervals can be used for common distributions (normal, log normal, Weibull). Tolerance intervals give percentile estimates with specified confidence level in estimate. Difference between tolerance bound and distribution s percentile estimate is uncertainty. Krishnamoorthy (2009)

Non-Parametric Percentile Estimation Order Statistics Figure: minimum number of points for 1-sided non-parametric TI estimate (p, 1 ) n log( ) log(p) number of points 1e+01 1e+03 1e+05 coverage 0.75 0.9 0.99 0.999 0.9999 0.99999 Wilks (1941) log scaling 0.5 0.6 0.7 0.8 0.9 1.0 confidence confidence

Model Form Assumptions QMU is concerned with extreme tail behaviors Example QMU relevant question: Strength of Assumptions ( Risk? ) Parametric Bayesian Semi-Parametric Are we 95% confident that 99.99% of the population will pass the requirement given 150 test? Non-Parametric Amount of Data Needed ( log scale )

Quantifying Model Form Risk Risk Assessment Metrics Change in Confidence Observed confidence compared to specified confidence P (Q r < ˆQ k r, M 0 )= 0k vs. Change in Tolerance Observed mean tolerance interval estimate minus true percentile ˆQ k r, Q r

Statistical Simulation Algorithm Take n samples from distribution of interest Estimate 1-sided normal tolerance bound Compare tolerance bound estimate with true percentile Repeat 30K times to gather statistics Repeat for n = 4, 6, 40 Apply risk metrics to simulation studies of Sandia motivated examples

Model Form Risk Assessment Statistical Simulation: Baseline - How tolerance intervals based on a normal distribution perform for a normal distribution Figure: Sampled distribution Figure: 99.9% coverage normal tolerance interval performance change in confidence change in tolerance statistical power change in tolerance PDF characteristic Figure: AD normality test s statistical power Figure: 95% confidence normal tolerance interval performance change in confidence number of points

Model Form Risk Assessment Statistical Simulation: Tail Subpopulation - How tolerance intervals based on a normal distribution perform for a distribution with a tail subpopulation Figure: Sampled distribution vs normal equivalent Figure: 99.9% coverage normal tolerance interval performance change in confidence change in tolerance Figure: 95% confidence normal tolerance interval performance statistical power change in confidence change in tolerance PDF characteristic Figure: AD normality test s statistical power number of points

Model Form Risk Assessment Statistical Simulation: T5 - How tolerance intervals based on a normal distribution perform for a T5 distribution Figure: Sampled distribution vs normal equivalent Figure: 99.9% coverage normal tolerance interval performance PDF characteristic Figure: AD normality test s statistical power change in confidence change in tolerance Figure: 95% confidence normal tolerance interval performance statistical power change in confidence change in tolerance number of points

Model Form Risk Assessment Statistical Simulation Results Summary Incorrect distributional assumptions are not adequately rejected by hypothesis tests in sparse data situations Normality assumption s performance quickly degrades as tolerance intervals become increasing informative Predicting extreme percentiles is not forgiving when making parametric model form assumptions - More central percentiles are more forgiving

Bayesian Tolerance Intervals 1) data data 2) distributional assume distributional form assumption form 3) explore hyperparmeter hyper-parameter space space N (µ, ) P (data µ = µ i, = i ) P (µ)p ( ) P (a) where P (x <a µ, )=0.95 P (a) where P (x <a µ, )=0.95 PDF 0.9 confidence 0.9 confidence P (x P (x <x <x µ µ = µ i, = i ) i ) 1 0.95 CDF P (µ, P (µ, data) data) data value data value for 0.95 coverage confidence distribution 0 data value 6) confidence distribution 5) family of distributions 4) hyper-parameter posterior family of distributions Using STAN software package to perform MCMC to characterize posterior distributions Addition information sources can be integrated into estimates through distributional form assumption and distribution parameter priors - for example, mixture of normal distributions could be used to describe distribution with known subpopulation in tail Aitchison (1964), Krishnamoorthy (2009), Stan (2016) data value µ hyperparameter posterior

Bayesian Tolerance Intervals T5 Distribution : Assume Tail Weight Priors location P (µ) =N (0, 5) scaling P ( )=U(0, 5) tail weight =5 Likelihood L = T (µ,, = 5) Figure: 99.9% coverage Bayesian tolerance interval performance change in confidence change in confidence Figure: 95% confidence Bayesian tolerance interval performance log10(change in tolerance) log10(change in tolerance)

Bayesian Tolerance Intervals T5 Distribution : Uncertain Tail Weight Model P (µ) =N (0, 3) P ( )=U(0, 3) P ( ) = (2, 0.3) L = T (µ,, ) DOF prior specified to favor thicker tailed distributions Sensitive to prior specification Figure: 99.9% coverage Bayesian tolerance interval performance change in confidence change in confidence Figure: 95% confidence Bayesian tolerance interval performance log10(change in tolerance) log10(change in tolerance)

Bayesian Tolerance Intervals Current Conclusions Bayesian tolerance intervals appear to have potential to alleviate risk involved with rigid distributional form assumptions used in QMU analyses Information integration is a problem larger than QMU at Sandia. All types of extrapolative prediction could benefit.

Future Directions Study use of metric when true distribution is not available. Use more robust tail estimation methods as reference. Developing visual methods of communicating risk in percentile extrapolations Change mindset for sparse data to concentrating on percentiles supported by data Formalize process of assessing QMU credibility

Thank you for your attention Questions

References Pilch, M. Tuncano, T., and Helton, J. C. Ideas underlying quantification of margins and uncertainties. Reliability Engineering & Systems Safety, 96(9):965-975 (2011) Newcomer, J., Rutherford, B., Thomas, E., Bierbaum, R., Hickman, L., Lane, J., Fitchett, S., Urbina, A., Robertson, A., and Swiler, L. Handbook of Statistical Methodologies for QMU. SAND2012-7912, Sandia National Laboratories (2012). Krishnamoorthy, K. and Mathew, T., Statistical Tolerance Regions: Theory, Applications, and Computation. Wiley Series in Probability and Statistics. Hoboken, N.J., USA: John Wiley & Sons, Inc. (2009). Wilks, S. S. Determination of sample sizes for setting tolerance limits. The Annuals of Mathematical Statistics, 12(1):91-96 (1941). Stan Development Team. 2016. Pystan: the Python interface to Stan, Version 2.14.0.0. http://mcstan.org Aitchison, J. Two Papers on the Comparison of Bayesian and Frequentist Approaches to Statistical Problems of Prediction: Bayesian Tolerance Regions. Journal of the Royal Society, Series B, 26(2):161-175 (1964)