Bayesian Tolerance Intervals for Sparse Data Margin Assessment

Similar documents
A Brief Introduction to Bayesian Statistics

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Bayesian and Frequentist Approaches

Institutional Ranking. VHA Study

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

Bayesian Joint Modelling of Benefit and Risk in Drug Development

ST440/550: Applied Bayesian Statistics. (10) Frequentist Properties of Bayesian Methods

John Quigley, Tim Bedford, Lesley Walls Department of Management Science, University of Strathclyde, Glasgow

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Meta-analysis of few small studies in small populations and rare diseases

Bayesian random-effects meta-analysis made simple

Response to Comment on Cognitive Science in the field: Does exercising core mathematical concepts improve school readiness?

How to use prior knowledge and still give new data a chance?

How to weigh the strength of prior information and clarify the expected level of evidence?

Statistical Tolerance Regions: Theory, Applications and Computation

Decision Making in Confirmatory Multipopulation Tailoring Trials

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012

BAYESIAN HYPOTHESIS TESTING WITH SPSS AMOS

Case Studies in Bayesian Augmented Control Design. Nathan Enas Ji Lin Eli Lilly and Company

NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials

Statistical Audit. Summary. Conceptual and. framework. MICHAELA SAISANA and ANDREA SALTELLI European Commission Joint Research Centre (Ispra, Italy)

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Lecture Outline Biost 517 Applied Biostatistics I

An Exercise in Bayesian Econometric Analysis Probit and Linear Probability Models

System of Systems Operational Availability Modeling

What is probability. A way of quantifying uncertainty. Mathematical theory originally developed to model outcomes in games of chance.

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Bayesian and Classical Approaches to Inference and Model Averaging

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

A Case Study: Two-sample categorical data

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin

Designing a Bayesian randomised controlled trial in osteosarcoma. How to incorporate historical data?

MS&E 226: Small Data

Dynamic borrowing of historical data: Performance and comparison of existing methods based on a case study

Bayesian methods in health economics

Bayesian Estimations from the Two-Parameter Bathtub- Shaped Lifetime Distribution Based on Record Values

Biost 590: Statistical Consulting

Meta-analysis of two studies in the presence of heterogeneity with applications in rare diseases

Kernel Density Estimation for Random-effects Meta-analysis

Score Tests of Normality in Bivariate Probit Models

Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models

Hierarchy of Statistical Goals

Bayesian approaches to analysing studies and summarizing evidences

Bayesian Estimation of a Meta-analysis model using Gibbs sampler

Expert judgements in risk and reliability analysis

ACR Appropriateness Criteria Rating Round Information

Data Analysis Using Regression and Multilevel/Hierarchical Models

Method Comparison for Interrater Reliability of an Image Processing Technique in Epilepsy Subjects

Bayesian Mediation Analysis

Reliability Analysis Combining Multiple Inspection Techniques

Chapter 1. Introduction

BAYESIAN ESTIMATORS OF THE LOCATION PARAMETER OF THE NORMAL DISTRIBUTION WITH UNKNOWN VARIANCE

Michael Hallquist, Thomas M. Olino, Paul A. Pilkonis University of Pittsburgh

Item Analysis: Classical and Beyond

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

Comparing treatments evaluated in studies forming disconnected networks of evidence: A review of methods

An Improved Bayesian Update Tool for Components Failure Rates

Bayesian Methodology to Estimate and Update SPF Parameters under Limited Data Conditions: A Sensitivity Analysis

Bayesian Inference Bayes Laplace

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Exploiting Similarity to Optimize Recommendations from User Feedback

Erin Carson University of Virginia

Bayesian vs Frequentist

Data harmonization tutorial:teaser for FH2019

Journal of Clinical and Translational Research special issue on negative results /jctres S2.007

Using Statistical Intervals to Assess System Performance Best Practice

Classical Psychophysical Methods (cont.)

Mathematical Framework for Health Risk Assessment

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Modelling heterogeneity variances in multiple treatment comparison meta-analysis Are informative priors the better solution?

Ecological Statistics

Towards Open Set Deep Networks: Supplemental

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data

What are the challenges in addressing adjustments for data uncertainty?

Complex Adaptive Systems Engineering - improving our understanding of complex systems and reducing their risk

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION

A COMPARISON OF BAYESIAN MCMC AND MARGINAL MAXIMUM LIKELIHOOD METHODS IN ESTIMATING THE ITEM PARAMETERS FOR THE 2PL IRT MODEL

Using Statistical Principles to Implement FDA Guidance on Cardiovascular Risk Assessment for Diabetes Drugs

A Mini-conference on Bayesian Methods at Lund University 5th of February, 2016 Room C121, Lux building, Helgonavägen 3, Lund University.

Internal Model Calibration Using Overlapping Data

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

On the Targets of Latent Variable Model Estimation

Lecture 9 Internal Validity

Using historical data for Bayesian sample size determination

Short Communication Response to comment on Candidate Distributions for Climatological Drought Indices (SPI and SPEI)

Expanding the Toolkit: The Potential for Bayesian Methods in Education Research (Symposium)

Individualized Treatment Effects Using a Non-parametric Bayesian Approach

Inference About Magnitudes of Effects

Sensory Cue Integration

Implementation of a Performance Evaluation System for Nondestructive Testing Methods Daniel Algernon, Sascha Feistkorn and Michael Scherrer

Monte Carlo Analysis of Univariate Statistical Outlier Techniques Mark W. Lukens

Section 3: Economic evaluation

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

For general queries, contact

Explicit Bayes: Working Concrete Examples to Introduce the Bayesian Perspective.

Pharmacometric Modelling to Support Extrapolation in Regulatory Submissions

Kelvin Chan Feb 10, 2015

Automating network meta-analysis

Transcription:

Bayesian Tolerance Intervals for Sparse Data Margin Assessment Benjamin Schroeder and Lauren Hund ASME V&V Symposium May 3, 2017 - Las Vegas, NV SAND2017-4590 C - (UUR) Sandia National Laboratories is a multi mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly own subsidiary of Honeywell International, Inc., for the U.S. Department of Energy s National Nuclear Security Administration under contract DE-NA0003525.

Outline 1. What is margin assessment? 2. What are the risks involved with current margin assessment practices? 3. How can Bayesian tolerance intervals alleviate some of those risks? Take Away Transparent bias involved with Bayesian tolerance intervals may be preferable to hidden risk associated with current practices

Margin Assessment Is there sufficient margin between performance requirement and observed behavior? Quantification of Margin and Uncertainty (QMU) Figure: QQ Plot performance characteristic - Given a performance measure, requirement, and data 1. Conduct an engineering analysis to assess data quality 2. Select a probability distribution for the data - advised to use probability plots and distributional hypothesis tests to select probability distribution Figure: QMU plot theoretical quantiles 3. Estimate a tolerance bound and margin ratio for the data density U M References & Additional Information ˆQ ˆQ 0.95 performance Pilch (2011), Newcomer (2012)

Parametric Percentile Estimation 1. Hypothesize that data comes from a certain distributional form based on prior knowledge, expert opinion, 2. Test hypothesis with goodness-of-fit test. Goodness-of-fit tests test if hypothesis can be rejected, not if hypothesis might be true. 3. If hypothesis is not rejected, use distributional form to make percentile estimates. Comparison to requirement is margin 4. Classical tolerance intervals can be used for common distributions (normal, log normal, Weibull). Tolerance intervals give percentile estimates with specified confidence level in estimate. Difference between tolerance bound and distribution s percentile estimate is uncertainty. Krishnamoorthy (2009)

Non-Parametric Percentile Estimation Order Statistics Figure: minimum number of points for 1-sided non-parametric TI estimate (p, 1 ) n log( ) log(p) number of points 1e+01 1e+03 1e+05 coverage 0.75 0.9 0.99 0.999 0.9999 0.99999 Wilks (1941) log scaling 0.5 0.6 0.7 0.8 0.9 1.0 confidence confidence

Model Form Assumptions QMU is concerned with extreme tail behaviors Example QMU relevant question: Strength of Assumptions ( Risk? ) Parametric Bayesian Semi-Parametric Are we 95% confident that 99.99% of the population will pass the requirement given 150 test? Non-Parametric Amount of Data Needed ( log scale )

Quantifying Model Form Risk Risk Assessment Metrics Change in Confidence Observed confidence compared to specified confidence P (Q r < ˆQ k r, M 0 )= 0k vs. Change in Tolerance Observed mean tolerance interval estimate minus true percentile ˆQ k r, Q r

Statistical Simulation Algorithm Take n samples from distribution of interest Estimate 1-sided normal tolerance bound Compare tolerance bound estimate with true percentile Repeat 30K times to gather statistics Repeat for n = 4, 6, 40 Apply risk metrics to simulation studies of Sandia motivated examples

Model Form Risk Assessment Statistical Simulation: Baseline - How tolerance intervals based on a normal distribution perform for a normal distribution Figure: Sampled distribution Figure: 99.9% coverage normal tolerance interval performance change in confidence change in tolerance statistical power change in tolerance PDF characteristic Figure: AD normality test s statistical power Figure: 95% confidence normal tolerance interval performance change in confidence number of points

Model Form Risk Assessment Statistical Simulation: Tail Subpopulation - How tolerance intervals based on a normal distribution perform for a distribution with a tail subpopulation Figure: Sampled distribution vs normal equivalent Figure: 99.9% coverage normal tolerance interval performance change in confidence change in tolerance Figure: 95% confidence normal tolerance interval performance statistical power change in confidence change in tolerance PDF characteristic Figure: AD normality test s statistical power number of points

Model Form Risk Assessment Statistical Simulation: T5 - How tolerance intervals based on a normal distribution perform for a T5 distribution Figure: Sampled distribution vs normal equivalent Figure: 99.9% coverage normal tolerance interval performance PDF characteristic Figure: AD normality test s statistical power change in confidence change in tolerance Figure: 95% confidence normal tolerance interval performance statistical power change in confidence change in tolerance number of points

Model Form Risk Assessment Statistical Simulation Results Summary Incorrect distributional assumptions are not adequately rejected by hypothesis tests in sparse data situations Normality assumption s performance quickly degrades as tolerance intervals become increasing informative Predicting extreme percentiles is not forgiving when making parametric model form assumptions - More central percentiles are more forgiving

Bayesian Tolerance Intervals 1) data data 2) distributional assume distributional form assumption form 3) explore hyperparmeter hyper-parameter space space N (µ, ) P (data µ = µ i, = i ) P (µ)p ( ) P (a) where P (x <a µ, )=0.95 P (a) where P (x <a µ, )=0.95 PDF 0.9 confidence 0.9 confidence P (x P (x <x <x µ µ = µ i, = i ) i ) 1 0.95 CDF P (µ, P (µ, data) data) data value data value for 0.95 coverage confidence distribution 0 data value 6) confidence distribution 5) family of distributions 4) hyper-parameter posterior family of distributions Using STAN software package to perform MCMC to characterize posterior distributions Addition information sources can be integrated into estimates through distributional form assumption and distribution parameter priors - for example, mixture of normal distributions could be used to describe distribution with known subpopulation in tail Aitchison (1964), Krishnamoorthy (2009), Stan (2016) data value µ hyperparameter posterior

Bayesian Tolerance Intervals T5 Distribution : Assume Tail Weight Priors location P (µ) =N (0, 5) scaling P ( )=U(0, 5) tail weight =5 Likelihood L = T (µ,, = 5) Figure: 99.9% coverage Bayesian tolerance interval performance change in confidence change in confidence Figure: 95% confidence Bayesian tolerance interval performance log10(change in tolerance) log10(change in tolerance)

Bayesian Tolerance Intervals T5 Distribution : Uncertain Tail Weight Model P (µ) =N (0, 3) P ( )=U(0, 3) P ( ) = (2, 0.3) L = T (µ,, ) DOF prior specified to favor thicker tailed distributions Sensitive to prior specification Figure: 99.9% coverage Bayesian tolerance interval performance change in confidence change in confidence Figure: 95% confidence Bayesian tolerance interval performance log10(change in tolerance) log10(change in tolerance)

Bayesian Tolerance Intervals Current Conclusions Bayesian tolerance intervals appear to have potential to alleviate risk involved with rigid distributional form assumptions used in QMU analyses Information integration is a problem larger than QMU at Sandia. All types of extrapolative prediction could benefit.

Future Directions Study use of metric when true distribution is not available. Use more robust tail estimation methods as reference. Developing visual methods of communicating risk in percentile extrapolations Change mindset for sparse data to concentrating on percentiles supported by data Formalize process of assessing QMU credibility

Thank you for your attention Questions

References Pilch, M. Tuncano, T., and Helton, J. C. Ideas underlying quantification of margins and uncertainties. Reliability Engineering & Systems Safety, 96(9):965-975 (2011) Newcomer, J., Rutherford, B., Thomas, E., Bierbaum, R., Hickman, L., Lane, J., Fitchett, S., Urbina, A., Robertson, A., and Swiler, L. Handbook of Statistical Methodologies for QMU. SAND2012-7912, Sandia National Laboratories (2012). Krishnamoorthy, K. and Mathew, T., Statistical Tolerance Regions: Theory, Applications, and Computation. Wiley Series in Probability and Statistics. Hoboken, N.J., USA: John Wiley & Sons, Inc. (2009). Wilks, S. S. Determination of sample sizes for setting tolerance limits. The Annuals of Mathematical Statistics, 12(1):91-96 (1941). Stan Development Team. 2016. Pystan: the Python interface to Stan, Version 2.14.0.0. http://mcstan.org Aitchison, J. Two Papers on the Comparison of Bayesian and Frequentist Approaches to Statistical Problems of Prediction: Bayesian Tolerance Regions. Journal of the Royal Society, Series B, 26(2):161-175 (1964)