DRAFT REPORT HIERARCHY OF METHODS TO CHARACTERIZE UNCERTAINTY: STATE OF SCIENCE OF METHODS FOR DESCRIBING AND QUANTIFYING UNCERTAINTY

Size: px

Start display at page:

Download "DRAFT REPORT HIERARCHY OF METHODS TO CHARACTERIZE UNCERTAINTY: STATE OF SCIENCE OF METHODS FOR DESCRIBING AND QUANTIFYING UNCERTAINTY"

Brendan Jennings
6 years ago
Views:

1 DRAFT REPORT HIERARCHY OF METHODS TO CHARACTERIZE UNCERTAINTY: STATE OF SCIENCE OF METHODS FOR DESCRIBING AND QUANTIFYING UNCERTAINTY EPA Contract No. 68-D Work Assignment No Prepared by: Chris Frey Doug Crawford-Brown Zheng Junyu Dan Loughlin For: Jim Wilson, Project Manager E.H. Pechan & Associates, Inc B Hempstead Way Springfield, VA For submittal to: Lisa Conner, Work Assignment Manager U.S. Environmental Protection Agency Office of Air Quality Planning and Standards Innovative Strategies and Economics Group (MD-C339-01) Research Triangle Park, NC /12/2003

2 Table of Contents 1 Introduction Objective of this Document Motivations National Academy of Science Report State of Practice Risk, Cost, Uncertainty and Decisions Sources of Variability and Uncertainty Sources of Variability Sources of Uncertainty General Classification of Methods Statistical Methods Based Upon Empirical Data Statistical Methods Based upon Judgment Other Quantitative Methods Qualitative Methods Sensitivity Analysis Statistical Methods Based Upon Empirical Data Introduction Characterizing Distributions Based Upon Data Analysis Empirical Distributions Parametric Distributions Mixture Distributions Censored Data Other Issues Quantification of Uncertainty Based Upon Variability Propagating Uncertainty Analytical propagation techniques Approximation methods Numerical propagation techniques Statistical Methods Based Upon Judgment Heuristics and Biases to Avoid Protocols for Expert Elicitation Expert Groups vs. Individuals Resource Requirements for Expert Elicitation Bayesian Methods Introduction Bayesian Framework for Uncertainty Analysis i

3 3.5.3 Prior Distribution Likelihood Functions Bayesian Updating: Posterior Distribution Propagation of Uncertainty through Model Research Needs for Bayesian Methods Other Quantitative Methods, including Approximation or Interval Methods Interval Methods Other Quantitative Methods Fuzzy Methods Meta-Analysis Methods for Scenario and Model Uncertainty Scenario Uncertainty Model Uncertainty Qualitative Methods Qualitative Assessment and Rationality Principles of Rationality Lines of Reasoning Direct empirical evidence Semi-empirical evidence Empirical correlations Theory-based inference Existential insight Combining the five categories of evidence Structuring the Discourse Interpreting the Results Weight-of-Evidence Sensitivity Analysis Local Sensitivity Analysis Methodologies Global Sensitivity Analysis Methodologies Summary and Conclusions Key Criteria for Classification of Input Uncertainty Characterization Methods Key Criteria for Classification of Uncertainty Propagation Methods Identification of Key Criteria for Classification of Sensitivity Analysis Methods References ii

4 1 Introduction 1.1 Objective of this Document The analysis of uncertainty is playing an increasing role in regulatory decisions, driven by several factors: (i) the tools of uncertainty analysis have been significantly improved over the past decade; (ii) the rise of decision processes involving stakeholders (e.g. regulatory negotiation) has brought competing estimates of risk to the forefront; (iii) the success of the EPA and other environmental managers in removing some of the more obvious threats to health has caused the regulatory arena to focus on sources of risk that are not as obvious and are less well studied; and (iv) legal challenges to regulatory decisions have necessitated a focus on the quality of the risk estimates, which must withstand these challenges in court. While the importance of uncertainty analysis in decisions has increased, so has the awareness that such analyses often require significant resources of time, money, and expertise. Resources spent on uncertainty analysis may draw resources away from analyses of multiple policy options, since it may be possible to conduct full uncertainty analyses on only a few policy options in considering a regulatory decision. It also is the case that some decisions require a full uncertainty analysis, while others may be based on less detailed analysis of uncertainties, or the use of decision tools that do not involve quantitative uncertainty analysis. Finally, an issue can be raised as to whether some aspects of uncertainty (model quality comes to mind) are susceptible to fully quantitative analysis, or may be better approached through a more qualitative process. All of these issues suggest that uncertainty analyses may need to be tailored to the specific decision problem, with the level of detail required of the analysis being related to the kind of decision problem posed. One way to view this situation is to imagine a hierarchy of tools for uncertainty analysis. This hierarchy proceeds from methods that are relatively quick and require few resources, but also provide the least detailed understanding of uncertainty, to methods that require significantly greater resources but produce highly detailed quantitative analyses of uncertainty that can be used in formal tools of decision analysis. 1.2 Motivations There are three motivations for this report. The first is that the EPA, in essentially all of its 1

5 offices, is moving rapidly in the use of uncertainty analyses. This suggests it is timely to consider some common understanding of the available tools, in part to ensure uniformity of application in regulatory and other decisions. An example is the publication of Agency-wide guidelines on the use of Monte Carlo analysis (EPA, Proposed Policy for Use of Monte Carlo Analysis in Agency Risk Assessments, Washington, DC., 1997). The second is that the field of uncertainty analysis has progressed rapidly in the past decade. Many new tools have been developed, and computational power has increased to the point where highly detailed, probabilistic assessments of uncertainty, unwieldy in the past, can now be performed routinely in regulatory assessments. The third is a recent review of a specific use of uncertainty analysis by the Agency in estimating costs and benefits of reductions in air pollution. This review by the National Academy of Sciences Committee on Estimating the Health-Risk-Reduction Benefits of Proposed Air Pollution Regulations (Estimating Public Health Benefits of Air Pollution Regulations, National Academy Press, 2003) recommended several ways in which the Agency might improve the use of uncertainty analysis. These three motivations are discussed in subsections through below, in reverse order from the listing above National Academy of Science Report The National Academy of Sciences was asked by Congress to review recent EPA estimates indicating that thousands of premature deaths and numerous cases of illness, such as chronic bronchitis and asthma attacks, could be prevented by reducing exposure to air pollution (NAS, Estimating Public Health Benefits of Air Pollution Regulations, National Academy Press, 2003). The report further comments that the estimates are often controversial, and the methods used to prepare them have been questioned. The NAS was asked to conduct a study of this issue and recommend to the agency a common methodology to be followed in all future analyses. Notice that the charge itself suggests a belief that there is a single methodology, rather than a suite of methodologies, any one of which might be appropriate in some decision problems but not in others. The committee reviewed a large number of aspects of the agency assessment, much of it focused on the risk and cost estimation processes themselves. Those parts of the NAS recommendations are not considered here. Instead, we focus on those parts of the NAS recommendations related to uncertainty analysis. There is, however, one aspect of the NAS report related to the conduct of an 2

6 assessment, but with clear implications for uncertainty analysis. The committee notes that In three of the four EPA analyses reviewed by the committee, EPA focused on evaluating a single regulatory option. They go on to state that a realistic range of options guided by expert opinion and technical feasibility should be represented in EPA s benefit analyses. Two points are evident here. First, there is a call for defining decision problems as ones requiring choice between several policy options. This means risk, cost, and benefit assessments would need to be performed for each of these options. This in turn raises issues related to the amount of resources that would be needed to conduct full uncertainty analyses for each of these assessments. As a larger number of policy options is considered, it may be necessary to balance the resource requirements of these assessments against the resources required to conduct fully quantitative analyses of uncertainty. Second, note the use of the term expert opinion in the NAS text. This stems from the sense on the committee that many aspects of scientific study of policy options, both in the risk and cost calculations, will not be determined entirely by the available data. These data will need to be abstracted, interpreted through theories, weighed one against the other when they conflict, etc. In these cases, expert judgment will be needed. While the committee does not specify whether this need will arise in the primary assessments of risk and cost, or in the secondary analysis of uncertainty, it is likely that it will arise in both. The committee goes on to consider two ways in which the EPA might improve uncertainty analyses. The first issue focuses on the conduct of the analysis, and the second on the use of uncertainty analyses in decisions. With respect to the conduct of the analysis, they describe the EPA approach as being two-tiered. The first tier produces a probability distribution for each health outcome evaluated. They express concern that the agency has considered only random sampling error in generating this distribution (and that being restricted to the concentrationresponse function, rather than all steps of the assessment), and that the focus was on uncertainty in mean values of risk, rather than on the uncertainty in other characteristics of the inter-subject variability distribution. This leads, they claim, to a view of the results as being more certain than they are, presumably because the uncertainties in the risks associated with the tails of intersubject variability distributions would be larger than uncertainties in means. 3

7 The second tier consists of ancillary uncertainty analyses, which include alternative and supplementary calculations for some uncertainties and sensitivity analyses for others. Ancillary analyses might, for example, consist of re-generating the uncertainty analyses from the first tier under alternative models of atmospheric dispersion, or alternative assumptions about the presence of a dose-response threshold. The critique by the committee is that the ancillary analyses usually examine one source of uncertainty at a time and therefore do not adequately convey the aggregate uncertainty from other sources, nor do they discern the relative degrees of uncertainty in the various components of the health benefit analysis. The committee is calling for a more comprehensive method of uncertainty analysis in which all sources of uncertainty are considered simultaneously (including source characterization; dispersion and fate; exposure as this relates to human activity patterns; and exposure-response relationships). This same methodology would form the groundwork for a detailed sensitivity analysis, which then could be used to direct research resources most effectively towards those aspects of the assessments contributing most heavily to uncertainty in the final risk, cost and benefits picture and, hence, to any decision. The recommended methodology is not fully described, but is mentioned as being a probabilistic, multiple-source uncertainty analysis, which seems to suggest some form of nested variability and uncertainty analysis accounting simultaneously for all significant sources of uncertainty. The committee goes on to make several other recommendations that are germane here. They note that the uncertainty distributions, however they are generated, will need to be based on available data and expert judgment. This calls for a methodology capable of incorporating both kinds of information. They also note that experts whose judgments are used should be identified, and the rationales and empirical bases for their judgments described. This clearly calls not only for encoding of expert judgments, but some way of describing how those judgments were formed, and the degree of rationality behind the judgments. Finally, the committee notes that there may always be some sources of uncertainty that are not reflected in the primary analysis (i.e. the analysis producing the quantitative probability density function describing uncertainty), and that the agency should clearly identify these. No guidance is provided on how to use this identification, but it is reasonable to expect that the significance of this non-quantified source of uncertainty on decisions should be described. 4

8 The NAS report (Estimating Public Health Benefits of Air Pollution Regulations, National Academy Press, 2003), therefore, presents several challenges: To formalize the analysis of uncertainty as much as possible; To make the results of the analysis as detailed as feasible given resources and the need to simulate a number of policy options; To make the results as quantitative as possible, summarized in a probability density function (PDF); To reflect as many sources of uncertainty as possible in this PDF, including components from source characterization to risk; To perform uncertainty analysis not only on mean values of risk, but on other aspects of the inter-subject variability distribution; To use not only expectation values of risk in decisions, but other characteristics of the uncertainty PDF; To find a way to characterize expert judgment, and to fold this into the analysis of uncertainty as appropriate; and To find a way to reflect sources of uncertainty which have not been included in developing the PDF, either because they weren t considered or (more interestingly) because they are not necessarily amenable to mathematical treatment State of Practice We turn next to the issue of how the conduct of uncertainty analyses has improved recently, and the ways in which these improvements have been incorporated into assessment and decisionmaking processes in a variety of settings. A goal of this report is to present a picture of the state of development of uncertainty analysis in decision-making, so a decision-maker can determine what is feasible at present and what will meet current standards of practice in the risk assessment and decision community. 5

9 The use of probabilistic analysis methods for dealing with variability and uncertainty is becoming more widely recognized and recommended for environmental modeling and assessment applications. The National Research Council (NRC) and others have recommend that EPA use quantitative probabilistic analysis methods that distinguish between variability and uncertainty (NRC, 1994). One of the recommendations of the Emission Inventory Improvement Program (EIIP), which is jointly sponsored by EPA and other organizations, is to encourage the use of quantitative methods to characterize variability and uncertainties in emission inventories (Radian, 1996). In 2002, NRC (2002) recommended that EPA should begin to move the assessment of uncertainties from its ancillary analyses to its primary analyses in estimating public health benefits of air pollution regulations. EPA has been responsive to these recommendations. For example, EPA has sponsored workshops regarding Monte Carlo simulation methods, has developed a guidance document on Monte Carlo methods, and has included guidance regarding probabilistic analysis in Risk Assessment Guidance for Superfund (EPA, 1996; EPA, 1997; EPA, 1999a; EPA, 199b; EPA, 2001a); has developed computational methods for sensitivity and uncertainty for environmental and biological models (EPA, 2001b) and methods for characterizing uncertainty in exposure data (EPA, 2000). For example, EPA has applied two-stage Monte Carlo simulation to characterize both variability and uncertainty in human cumulative exposure and dose to pollutants, such as pesticides and particulate matter (PM), in the development of SHEDS (Stochastic Human Exposure Dose Simulation) series models (Zartarian, 2000). Uncertainty analysis is now part of the planning process for major assessments performed by EPA, such as the National Air Toxics Assessment and exposure assessment. Recently, the National Research Council released a report on mobile source emissions estimation that calls for new efforts to quantify uncertainty in such emissions (NRC, 2000). The Intergovernmental Panel on Climate Change (IPCC) recently issues a good practice document regarding uncertainty analysis for greenhouse gas emission inventories (IPCC, 2000). Thus, the quantification of variability and uncertainty has become widely accepted not only in human health risk assessment but also in supporting or related areas, such as emissions estimation. In addition, there is a growing track record of the demonstrated use of quantitative methods for characterizing variability and uncertainty applied to emission factors, emission inventories, air 6

10 quality modeling, exposure assessment, risk assessment, policy analysis, decision-making under uncertainty, and economic assessment. Some examples of these are briefly mentioned here. There have been a number of projects aimed at quantifying variability and uncertainty in highway vehicle emissions, including uncertainty estimates associated with the Mobile5a emission factor model and with the EMFAC emission factor model used in California (Kini and Frey, 1997; Frey, 1997; Frey, Bharvirkar and Zheng, 1999; Pollack et al.,.1999; Frey and Zheng, 2002). Frey and Eichenberger (1997) and Frey et al. (2001) have quantified uncertainty in highway vehicle emission factors estimated based upon measured data collected using remote sensing and on-board instrumentation, respectively. Frey et al. (2002) have recommended modeling methods for the New Generation Model (NGM) that will succeed the Mobile6 emission factor model. These methods include quantification of unexplained inter-vehicle variability and fleet average uncertainty. There have been a number of efforts aimed at probabilistic analysis of various other emission sources, including power plants, non-road mobile sources, natural gas-fired engines, and specific area sources (Frey and Rhodes, 1996; Frey, Bharvirkar and Zheng, 1999; Frey and Zheng, 2000; Frey and Bammi, 2002a&b; Frey, and Zheng, 2002; Frey and Bharvirkar, 2002; Li and Frey, 2002, Abdel-Aziz and Frey, 2002). Probabilistic anlayses have also been applied to air quality models; Hanna et al. (1998) used Monte Carlo simulation to quantify uncertainty in ozone predictions for New York City by using the Urban Air Shed Model (UAM-IV). Hanna et al. (2001) employed the same approach to the ozone transport assessment group (OTAG) domain. In the area of exposure and risk assessment, there have been a number of analyses in which uncertainty analysis was performed. These include, for example, Bogen and Spear (1987; 1990), Frey (1992), Hoffman and Hammonds (1996), Cohen et al. (1996), Evans et al. (1994), Greenland et al. (1999), Hattis and Burmaster (1994), Krewski et al.(1999), Whitfield et al.(1991), and others. For policy and decision-making, there have been a number of analyses where quantitative uncertainty analysis was used. These include, for example, Morgan et al. (1984), Morgan and Henrion (1990), Harrison (2002), Chao and Hobbs, 1997; Venkatesh and Hobbs, 1999, and 7

11 others. Quantitative uncertainty analysis has also been used in economic analysis field. These include, for example, Clarke and Low (1993), Belli et al s Economic Analysis of Investment Operations (World Bank Institute 2001), Holland et al. (1999), and others Risk, Cost, Uncertainty and Decisions This document is concerned not only with the development of tools for uncertainty analysis, but also with determining which tools are appropriate for which applications in decision-making. There are several distinct ways in which uncertainty enters a decision problem: Sound Science. As the economic impacts (costs) of environmental decisions have increased, there has been greater scrutiny of the scientific basis for risk estimates. The concern has been that some risk estimates may be so uncertain as to be an unreliable guide for decisions as to whether a risk exists in the first place, and the magnitude of that risk if it does exist. This has led to a call for the use of sound science as a basis for risk estimates. While the term is contentious, and can often be related to attempts to simply impede regulatory action, it does have a footing in the philosophical concept of minimal epistemic status (Crawford-Brown, Chapter 1, Risk-Based Environmental Decisions: Methods and Culture, Kluwer Academic Publishers, 1999). Epistemic status refers to the rational basis for belief in a statement, such as a statement about risk (or cost). As the evidence underlying a belief builds up, the degree of support for that belief increases, and it becomes more rational to adopt that belief in reaching decisions. This rationality might be because the estimate of risk is more true, in the sense that it corresponds better to the real risk (e.g. the real probability of effect), because it works better in some sense (leading to decisions that produce the results we desired), or some combination of these two criteria. Regardless, increasing evidence, and better use of that evidence in reasoning, increases the epistemic status of a belief. At some point as the evidence increases, the belief becomes a rational basis for decisions. It should be clear that perfect evidence and reasoning produces a rational basis for belief, and that the existence of zero evidence makes the belief an irrational basis for decisions. As the strength 8

12 of the evidence and reasoning increases, the epistemic status rises from zero and moves towards perfection. At some threshold level, called the level of minimal epistemic status, the belief is justified as a basis for decisions. Where that level lies is open to debate, as is the issue of how the epistemic status itself is to be characterized, but the concept of epistemic status and its role in decisions is clear. This issue is discussed in greater detail in Sections and 4.3, and in Chapter 5. A key characteristic of this aspect of uncertainty is that it is not necessarily quantitative. Rational analysis of epistemic status focuses on the evidence and reasoning process underlying a belief, asking whether that combination is strong, how it compares to other combinations of evidence and reasoning, etc. A judgment that the results of a risk assessment represent sound science, therefore, often involves a judgment of the scientific process, and often rests on some combination of objective and subjective statements. A full assessment of the epistemic status of a statement about risk (or cost) cannot avoid the quantitative aspects of uncertainty analysis, but it also cannot be fully reduced to those quantitative aspects. A full assessment would produce probability density functions (PDFs) describing confidence in risk estimates, but it would set these PDFs inside a larger, and often qualitative, assessment of the quality of the science that went into producing these PDFs (which are necessarily conditional on the state of existing science). Similar thoughts seem to be expressed by the NAS in their recommendations mentioned earlier, where they call for both a quantitative assessment of uncertainty and a discussion of other causes of uncertainty that are not included quantitatively. Finally, following the NAS, we note that significant uncertainties do not necessarily paralyze decisions. A quantitative PDF, however broad it might be, can still be used in decision theoretic approaches to produce rational decisions. And even a judgment of low epistemic status might allow a decision to proceed, although the kind of decision allowed might be tailored to the degree of epistemic status (e.g. a low epistemic status might require adoption of no regrets policies, while higher epistemic status might allow policies requiring greater costs). That is why it is so important to determine how a statement of risk (or cost) will be used in decisions before the criterion of minimal epistemic status is established. Precaution. The earliest rulings on EPA decisions, such as the cases of benzene and vinyl 9

13 chloride, established the basic framework within which agency decisions on protection of public health are interpreted. This framework is based on three key principles: (1) that the agency need not protect individuals against ALL risk; there is some level of risk that is unacceptable, but this level is not equal to zero (even if it might be contentious where it does lie); (2) in the face of inter-subject variability of risk, it is not necessary to protect ALL individuals through environmental policies (some individuals might need to be protected by other forms of mitigation), but rather to protect some `reasonable fraction of individuals ; and (3) in the face of uncertainty, it is not necessary that a decision be based on complete confidence in the results of an assessment, but rather that there be some standard of reasonable confidence. These latter two principles are related to the general idea of precaution, and have been formalized in various concepts such as adequate protection against unreasonable risk, margin of safety, etc. Common to these concepts is recognition that risk varies throughout a population and that there is uncertainty in any estimate of risk, as well as less than perfect epistemic status in the science underlying the risk estimates. To deal with this, procedures have been put in place to ensure that decisions are based on risk estimates that are likely to reflect individuals at the upper end of any distribution of risk in a population, and (in the face of uncertainty) are more likely to overestimate risk to these individuals than to underestimate that risk. By using these conservative assumptions, the resulting risk estimates are thought of as having a margin of safety or analogous concept built into them. The margin is not necessarily known in any detailed, quantitative or probabilistic sense, but it provides risk estimates the decision-maker can reasonably claim are protective of health (and, hence, a rational basis for protection of health). Such an approach does not require a fully quantitative uncertainty analysis. It requires, instead, identification of exposure scenarios, models, parameter values, etc, that produce confidence in the claim that the resulting risk estimate is significantly more likely to be an overestimate of risk than to be an underestimate. Uncertainty analysis then is treated as a process of discourse in which the basis for this claim is judged. An advantage of the approach is that it is not very resource intensive, and has formed the basis of many decisions that have withstood court challenges. The major disadvantage is that it usually is not possible to relate the resulting risk estimate to any specific point in the inter-subject variability distribution or the uncertainty distribution, although work by Dourson and others (Dourson, Felter and Robinson, Evolution of 10

14 Science-Based Uncertainty Factors in Noncancer Risk Assessment, Regulatory Toxicology and Pharmacology, 24, , 1996) has helped move at least the concept of uncertainty factors applied in developing RfDs closer to a formal, probabilistic interpretation. Since one cannot specify the point in the variability and uncertainty distributions represented by the risk estimate, it is not possible to specify the magnitude of the margin of safety or analogous concept. This leaves the assessment open to the charge that the actual margin is not reasonable, since there is the possibility that it is either too small or too large (bearing in mind that a margin that is too small may threaten public health, and a margin that is too large carries economic consequences). Probabilistic arguments. The third realm of uncertainty analysis is in the development of decision processes rooted in probabilistic arguments. Following the discussion of margins of safety above, imagine a decision problem stated as follows: locate a policy that will protect at least X% of the exposed population against unacceptable risk Y, and that does this with Z% confidence. For example, this might be to locate an air pollution policy for which one can state with at least 90% confidence that it will protect at least 99% of the population against an excess risk of cancer of 1E-5. Such a probabilistic decision criterion requires development of a fully nested variability and uncertainty analysis. The strength of such an approach is that it allows the decision-maker to know precisely where a given estimate falls in the variability and uncertainty distributions. This removes one of the key arguments against approaches rooted in concepts like margin of safety, which is that those concepts do not allow the decision-maker to know the magnitude of any degree of conservatism. The weakness of the fully probabilistic approach is that it is significantly more resource intensive, and requires that all sources of uncertainty be described in probabilistic terms. Part of this weakness can be reduced by noting the context within which probabilistic assessments of variability and uncertainty are set. If the scenario being simulated is well specified, and if it is stated clearly that the assessment is contingent on adopting a specific set of models (e.g. a specific set of dispersion models or models of exposure-response), then one can perform a contingent uncertainty analysis. The results do not imply that the absolute uncertainty, as it would be described in an analysis of epistemic status, has been assessed. Instead they imply that given the sets of assumptions considered in the assessment, the relative confidence in any estimate of risk is reflected in the resulting PDF describing uncertainty. This 11

15 then breaks the decision process into several steps: a step bounding the decision problem by specifying the scenarios, models, data, etc to be considered; a step performing a probabilistic analysis of uncertainty conditional on these specified components; and a final judgment as to whether the original bounding of the problem was justifiable through use of a concept such as bounded rationality, which states that decision-makers cannot possibly take into account all factors that affect the outcome of a decision, but must instead focus on an approximation of the decision problem that is reasonably complete and can allow the assessment to proceed with the available resources. Questions asked from different policy motivations How do these three motivations for consideration of uncertainty in decisions differ in the questions they address? The specific questions that must be answered in the assessment (qualitative or quantitative) of uncertainty are: Sound science: Is the science sufficient to support any estimate of risk? If yes, which estimate has the highest epistemic quality, and what range includes all risk estimates that pass some standard of being reasonably sound (i.e. are scientifically plausible)? Precaution: What is the scientifically plausible upper value the risk might have in the exposed population, taking into account both inter-subject variability and risk? Probabilistic: What is the probability density function (or cumulative distribution function) describing (i) the best estimate of the variability of risk in an exposed population and (ii) the uncertainty in any specific characteristic (mean, 95 th percentile, etc) of this variability distribution? 1.3 Sources of Variability and Uncertainty As indicated previously, variability and uncertainty are two distinct concepts within a decision problem, even though they often have been lumped together in environmental analyses. Variability results from natural stochastic behavior, periodicity, or variance in a trait across a population. Uncertainty, in contrast, is related to lack of knowledge and originates from sources 12

16 such as measurement error, sampling bias, lack of data, spatial and temporal averaging, and model formulation.(bogan & Spear, 1987; Morgan & Henrion, 1990; Frey, 1992; NRC, 1994; Cullen & Frey, 1999) If unaccounted for, both variability and uncertainty have the potential to affect the quality of an environmental assessment. For example, in a human health risk assessment context, individuals in a population will have differing levels of exposure because of variability in physiology, activity patterns, and temporal and spatial variability in pollutant concentrations. Failure to account for this variability provides results that do not address the wide range of exposure values possible, and may lead to an overstatement or understatement of risk. Accounting for variability helps analysts and decision makers to determine the portion of the population that is subject to high levels of exposure and, hence, to identify priorities for mitigation. Similarly, failure to account for uncertainty may lead to assumptions of precision that do not convey the true state of knowledge. For example, an air quality modeling prediction of an 8-hour average ground-level ozone concentration of 80 parts per billion (ppb) suggests that a region may be in compliance with the 85 ppb standard. Accounting for uncertainty may provide a confidence bound of plus-or-minus 20 ppb, however, suggesting that there is a considerable likelihood for exceeding the standard. Thus, both variability and uncertainty may have ramifications on policy assessments and the ultimate success of the resulting policies. Disassociating and determining the relative effects of different sources of variability and uncertainty may be advantageous in determining how to address these factors most effectively within a decision-making context. Further, understanding which sources of variability and uncertainty are reducible provides insight to the process of determining how to most appropriately allocate resources to improve the certainty in the results of an analysis. These are both common goals associated with an uncertainty analysis. Understanding the sources of variability and uncertainty thus is critical in identifying their existence and characterizing their impact Sources of Variability Variability is present in any data set that is not static in space, time, or across members of a population. Common sources of variability include: 13

17 Stochasticity. Stochasticity is random, non-predictable behavior that is common in many physical phenomena. Stochasticity is often caused by the random motions of particles at the molecular and submolecular levels. This random motion affects processes such as dispersion, which, in turn, produce variability in pollutant concentrations across space. Stochasticity is irreducible, although it can typically be represented over time or space with a frequency distribution. Use of averaging periods can also reduce the effect of stochasticity. For example, stochastic fluctuations have limited influence on annual average particulate matter (PM) concentrations. Periodicity. Periodicity implies cyclical behavior. For example, ground-level temperature is cyclical, tending to rise during the day and decrease at night. Daily mean temperatures are also cyclical, tending to be higher in the summer and cooler in the winter. Other examples of cyclical phenomena include tides, weather patterns, global climate patterns, and emissions. Periodicity is sometimes addressed using averaging periods or assessing maximum values. Time series approaches can also be used to represent periodicity more explicitly. This is often accomplished using a sinusoidal curve or representing the value of a parameter as a function of prior values of the same parameter. Population variance. Many traits vary across a population of individuals. Examples include human physiology (e.g., height), attitudes (e.g., risk-avoidance), and activities (e.g., commute distance to work). These traits often can be represented with frequency distributions. Existence of subpopulations. In some instances, a population of data may include distinct (or not-so distinct) subpopulations that have significantly different distributions for a given trait. If the subpopulations are aggregated together, the variance of the trait in the total population may be greater than that in the individual subpopulations. If distinct subpopulations can be identified, it can be advantageous to perform an assessment of each subpopulation separately. A simple example is that of height in an adult human population. The variability in height among men or among women typically would be less than the variability in height among the entire adult population. Similarly, in a risk exposure setting, an analysis directed toward at-risk subpopulations (e.g., the elderly or children) may provide more precise information that an analysis on the entire population. 14

18 1.3.2 Sources of Uncertainty Sources of uncertainty include the following: Random error. Random error, like stochasticity, represents random, non-predictable behavior. For example, in a time series analysis, the behavior of a time-dependent trait is characterized with a function. This function typically does not capture all of the variability, however. An error term, often having a normal probability distribution, is used to capture or represent the unexplainable behavior. Lack of precision. The equipment used to measure a trait in a data set may lack precision. This, in turn, may add uncertainty to measured values. In some instances, this uncertainty may be represented using a probability distribution. To compensate for limited precision, data is sometimes censored. For example, some meteorological monitoring sites have a threshold wind speed under which only 0.0 meters per second is reported. Censoring may introduce biases that should be corrected to more adequately reflect variability. Systematic Error. Systematic error involves the introduction of bias into a data set. Bias may originate from sources such as the imperfect calibration of equipment, simplifying or incorrect assumptions, and any other errors introduced in the selection and implementation of methodologies for collecting and utilizing data. Lack of Data. Characterizing uncertainty in an input to an assessment can often be accomplished by analyzing sampled data. A consequence of the Central Limit Theorem is that the estimates of any statistics (e.g., mean, median, percentiles) of a distribution converge as sample size increases. If the sample size is not sufficient, however, sampling error is introduced into the estimates. This is a factor in many environmental and human health assessments where data are limited. For example, emissions factors used to generate emissions estimates are often calculated using a limited number of emissions tests. Thus, there is uncertainty in the true value of the statistics that describe the data. Parametric bootstrapping approaches are useful in characterizing uncertainty in the distribution parameters. Lack of representativeness. In a health or environmental assessment, data used as input to the assessment may not be completely representative of the study objectives. For example, use of 15

19 mortality data for the general population for a study involving seniors is a source of uncertainty. Similarly, emissions factor data used to estimate emissions may have been generated using different kinds of equipment or on equipment that was used under different operating or maintenance conditions. Use of surrogate data is a related source of uncertainty. When data describing an input to an assessment is unavailable, limited, or not practical to collect, surrogate data is often used. For example, toxicity studies in rodents are often used to predict toxicity in humans, even though the exact relationship between the effects in rodents and those in humans is unknown. Similarly, county-level population data are sometimes used as a surrogate to estimate area source emissions, neglecting the differences in emissions not accounted for in population data. Disagreement of experts. Expert opinion is often used to select appropriate values or distributions for inputs into an assessment. For example, experts may suggest the most appropriate reaction rate, or, in a Bayesian analysis, experts may supply a subjective prior distribution. Often different experts opinions on these data and distributions may differ, however. This disagreement results in uncertainty regarding the most appropriate values or distributions to use. Problem and scenario specification. Specification of the goals and scope of an environmental assessment includes the determination of assessment boundaries (e.g., the time period and region to be studied), the pollutant transport media (e.g., water or air), the pathways by which the pollutants reach their end points (e.g., inhalation or ingestion), and various other decision about which components to include. In some instances, problem and scenario specification decisions are made such that important factors may be omitted. For example, the computational intensity of air quality modeling often limits the number of meteorological episodes that can be evaluated. Thus, episodes that would have an impact on the assessment may be omitted. Similarly, a study involving the health effects associated with air pollutants might neglect the impacts associated with water pollution resulting from air pollution control equipment. Modeling. The computer models used in assessments represent only caricatures of reality, and thus do not capture all aspects of the problem. To meet various practical constraints (e.g., understanding of the problem, availability of data for model calibration, and time, financial, and 16

20 computational limitations), tradeoffs involving model formulation and scope must be made. Specific sources of uncertainty in modeling include: (Frey, 1992; Cullen & Frey, 1999; Isukapalli, 1999) conceptual model The analyst s conceptual model of the problem may be different from reality, with the analyst inadvertently omitting important factors, disproportionately emphasizing specific factors, omitting or misrepresenting dependence, etc. model structure There may be competing model formulations that may have advantages or disadvantages over each other. The alternative formulations may also represent different schools of thought or assumptions. mathematical implementation Model implementation and parameterization decisions, particularly in finite difference and finite element models, may result in the introduction of uncertainty through numerical instability.(hornbeck, 1975) There is also the possibility that coding errors introduce additional uncertainty. detail Model detail refers to the degree to which physical and chemical processes are described within the model. There is often the assumption is that the greater the level of detail, the greater less uncertainty in the model predictions. In some cases this is true. However, more detailed models may require the introduction of new parameters, the values of which may be uncertain. Thus, increasing detail may lead to greater uncertainty. resolution Model resolution refers to factors such as the time steps and grid sizes used. Decreasing these parameters can yield more detailed results. For example, a model with a grid size of 5 km would be expected to replicate observed values more closely than a model with a 20 km grid size. The tradeoff would be the additional computational cost. boundary conditions The selection of boundary conditions can have a significant impact upon model results. Boundary conditions, such as background concentrations and initial conditions may be based on expert judgment, assumed behavior, or the outputs of other models. Each source potentially contributes to uncertainty. aggregation Spatial and temporal averaging in modeling (i.e, determining the average 17

21 hourly concentration within a grid cell) do not facilitate the estimation of sub-grid cell or instantaneous conditions. While techniques such as kriging exist to obtain estimates of subgrid cell conditions by considering concentrations in surrounding cells, these techniques themselves introduce some uncertainty. Further models are not capable of capturing withincell variability due to random effects. calibration When applying a model for a specific application, an initial step is often to calibrate the model by comparing its performance to observed data and adjusting various parameters to improve the model performance. Calibration is a complex process. It most often requires a large number of model runs. Further, multiple calibrations could potentially produce similar results, raising the question of whether the selected calibration is the most appropriate. extrapolation Extrapolation involves the use of a model for conditions different from how it was calibrated or developed. For example, in cases where data are limited, it may be necessary to use an uncalibrated model or a model that was calibrated for a different location or meteorological episode. Another example would be the use of a statistical model (e.g., a regression model) to extrapolate behavior outside of the range in which the model was generated. 1.4 General Classification of Methods This section provides the basic rationale for a general classification of methods that are discussed further in later sections of this report. The major classes of methods include statistical methods based upon empirical data, statistical methods based upon judgment, other quantitative methods that either are approximation methods or that are non-statistical, qualitative methods, and methods for sensitivity analysis. Each of these are briefly discussed Statistical Methods Based Upon Empirical Data Statistical methods that are based upon analysis of empirical data are typically termed as frequentist methods, although sometimes the term classical is used (e.g., Warren-Hicks and Butler, 1996; Morgan and Henrion, 1990; Cullen and Frey, 1999). However, the term classical is sometimes connotated with thought experiments (e.g., what happens with a roll of a die) as opposed to inference from empirical data (DeGroot, 1986). Therefore, we use the term 18

22 frequentist. Frequentist methods are fundamentally predicated upon statistical inference based upon long-run frequencies. For example, suppose that one wishes to estimate the mean emission factor for a specific pollutant emitted from a specific source category under specific conditions. Because of the cost of collecting measurements, it is not practically to measure each and every such emission source, which would result in a census of the actual population distribution of emissions. With limited resources, one instead would prefer to randomly select a representative sample of such sources. Suppose 10 sources were selected. The mean emission rate is calculated based upon these ten sources and a probability distribution model could be fit to the random sample of data. If this process is repeated many times, with a different set of ten random samples each time, the results will vary. The variation in results for estimates of a given statistic, such as the mean, based upon random sampling is quantified using a sampling distribution. From sampling distributions, confidence intervals are obtained. Thus, the commonly used 95 percent confidence interval for the mean is a frequentist inference based upon how the estimates of the mean vary because of random sampling for a finite sample of data. Statistical inference can be used to develop compact representations of data sets (Box and Tiao, 1973). For example, a probability distribution model, such as a normal, lognormal, or other, can be fit to a random sample of empirical data. The use of probability distribution models is a convenient way to summarize information. The parameters of the distributions are subject to random sampling error and statistical methods can be applied to evaluate goodness-of-fit of the model to the data (Hahn and Shapiro, 1967). Goodness-of-fit methods are typically based upon comparison of a test statistic with a critical value, taking into account the sample size and desired level of significance (Cullen and Frey, 1999). Frequentist statistical methods are powerful tools for working with empirical data. Although there appears to be a common misperception that one must have a lot of data in order to use frequentist statistics, in fact the fundamental starting point for a frequentist analysis is to have a random representative sample. As long as this assumption is valid, it is possible to make statistical inferences even for very small data sets. The trade-off with regard to sample size is that the sampling distributions for estimates of statistics, such as the mean, distribution parameters, and others, become narrower as the sample size increases. Thus, inferences based upon data with small sample sizes will typically have wider confidence intervals than those 19

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon