Model calibration and Bayesian methods for probabilistic projections

Similar documents
The Context of Detection and Attribution

Climate-Related Decision Making Under Uncertainty. Chris Weaver February 2016

Bayesian and Frequentist Approaches

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Guidance Note for Lead Authors of the IPCC Fifth Assessment Report on Consistent Treatment of Uncertainties

The Human Side of Science: I ll Take That Bet! Balancing Risk and Benefit. Uncertainty, Risk and Probability: Fundamental Definitions and Concepts

Lecture 2: Learning and Equilibrium Extensive-Form Games

UNLOCKING VALUE WITH DATA SCIENCE BAYES APPROACH: MAKING DATA WORK HARDER

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

Improving the U.S. Public s Understanding of Climate Change

Subjectivity and objectivity in Bayesian statistics: rejoinder to the discussion

ST440/550: Applied Bayesian Statistics. (10) Frequentist Properties of Bayesian Methods

Expert judgements in risk and reliability analysis

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models

A Brief Introduction to Bayesian Statistics

MS&E 226: Small Data

Combining Risks from Several Tumors using Markov chain Monte Carlo (MCMC)

Structural assessment of heritage buildings

We Can Test the Experience Machine. Response to Basil SMITH Can We Test the Experience Machine? Ethical Perspectives 18 (2011):

Risk Interpretation and Action

Spatial and Temporal Data Fusion for Biosurveillance

Confirmation Bias. this entry appeared in pp of in M. Kattan (Ed.), The Encyclopedia of Medical Decision Making.

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

A Cue Imputation Bayesian Model of Information Aggregation

Learning from data when all models are wrong

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

Bayesian integration in sensorimotor learning

How to get the most out of data with Bayes

On the meaning of independence in climate science

An Experimental Investigation of Self-Serving Biases in an Auditing Trust Game: The Effect of Group Affiliation: Discussion

Type and quantity of data needed for an early estimate of transmissibility when an infectious disease emerges

John Quigley, Tim Bedford, Lesley Walls Department of Management Science, University of Strathclyde, Glasgow

Beyond Subjective and Objective in Statistics

Artificial Intelligence Programming Probability

Political Science 15, Winter 2014 Final Review

Recognizing Ambiguity

CISC453 Winter Probabilistic Reasoning Part B: AIMA3e Ch

Decisions based on verbal probabilities: Decision bias or decision by belief sampling?

Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985)

Supplementary notes for lecture 8: Computational modeling of cognitive development

Generic Priors Yield Competition Between Independently-Occurring Preventive Causes

FAQ: Heuristics, Biases, and Alternatives

Att vara eller inte vara (en Bayesian)?... Sherlock-conundrum

Endogeneity is a fancy word for a simple problem. So fancy, in fact, that the Microsoft Word spell-checker does not recognize it.

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

The Psychology of Inductive Inference

What is a probability? Two schools in statistics: frequentists and Bayesians.

IEEE SIGNAL PROCESSING LETTERS, VOL. 13, NO. 3, MARCH A Self-Structured Adaptive Decision Feedback Equalizer

DECISION ANALYSIS WITH BAYESIAN NETWORKS

You must answer question 1.

So You Want to do a Survey?

A Case Study: Two-sample categorical data

Coherence and calibration: comments on subjectivity and objectivity in Bayesian analysis (Comment on Articles by Berger and by Goldstein)

Response to the ASA s statement on p-values: context, process, and purpose

ERA: Architectures for Inference

Introduction to Bayesian Analysis 1

Chapter 02 Developing and Evaluating Theories of Behavior

Bayes theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

USING ASSERTIVENESS TO COMMUNICATE ABOUT SEX

How do People Really Think about Climate Change?

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned?

Bayesian performance

How to weigh the strength of prior information and clarify the expected level of evidence?

5 $3 billion per disease

International Climate Mitigation Regime Beyond 2012: How Do Quota Allocation Rules Perform Under Uncertainty?

Some Thoughts on Expert Elicitation

Bayesian (Belief) Network Models,

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

January 2, Overview

INFERENCING STRATEGIES

Discussion Papers in Economics

FMEA AND RPN NUMBERS. Failure Mode Severity Occurrence Detection RPN A B

Remarks on Bayesian Control Charts

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Expanding the Toolkit: The Potential for Bayesian Methods in Education Research (Symposium)

Doing High Quality Field Research. Kim Elsbach University of California, Davis

Bayesian Inference Bayes Laplace

DRAFT REPORT HIERARCHY OF METHODS TO CHARACTERIZE UNCERTAINTY: STATE OF SCIENCE OF METHODS FOR DESCRIBING AND QUANTIFYING UNCERTAINTY

PUBLISHED VERSION. Authors. 3 August As per correspondence received 8/07/2015 PERMISSIONS

Running head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick.

Macroeconometric Analysis. Chapter 1. Introduction

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

Naïve Sampling and Format Dependence in Subjective Probability Calibration

Hierarchy of Statistical Goals

Evaluating the Causal Role of Unobserved Variables

INTRODUCTION TO BAYESIAN REASONING

Answers to end of chapter questions

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS

Learn 10 essential principles for research in all scientific and social science disciplines

Cognitive Science and Bayes. The main question. Several views. Introduction. The main question Several views Examples. The heuristics view

The Good Judgment Project: A Large Scale Test of Different Methods of Combining Expert Predictions

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA

20. Experiments. November 7,

Testing the robustness of anonymization techniques: acceptable versus unacceptable inferences - Draft Version

Response to Comment on Cognitive Science in the field: Does exercising core mathematical concepts improve school readiness?

Today: Statistical inference.

Internal Requirements for Change Agents

Transcription:

ETH Zurich Reto Knutti Model calibration and Bayesian methods for probabilistic projections Reto Knutti, IAC ETH

Toy model Model: obs = linear trend + noise(variance, spectrum) 1) Short term predictability, 2) separation of trend plus noise, 3) structure of model / model evaluation, 4) calibration/probability

Questions What is probabilistic modeling? What is it useful for? Why is it difficult? What is model calibration in the climate model context? Why is it helpful? Why is it hard? Is it ok to tune a model? Why yes, why not?

The EU, UNFCCC and 2 C [...] the Council believes that global average temperatures should not exceed 2 degrees above pre-industrial level and that therefore concentration levels lower than 550 ppm CO 2 should guide global limitation and reduction efforts.[...] (1939th Council meeting, Luxembourg, 25 June 1996) The European Council calls upon all Parties to embrace the 2 C objective and to agree to global emission reductions of at least 50%, and aggregate developed country emission reductions of at least 80-95%, as part of such global emission reductions, by 2050 compared to 1990 levels. (EU Council, 2009) The Paris agreement aims at holding the increase in the global average temperature to well below 2 C above pre-industrial levels and to pursue efforts to limit the temperature increase to 1.5 C above preindustrial levels (UNFCCC Paris, 2015)

Why 2 C? 2 C has been agreed formally as a climate target. Science alone cannot defend 2 C. 2 C may be the worst we can tolerate and the best we can hope for. Even 2 C will have significant adverse impacts that require adaptation. Warming without intervention is likely to have serious negative and potentially irreversible impacts. 2 C as an illustration of a mitigation scenario. The following ideas apply also to any other target.

How do we get there? 2 C? Similar to our toy model idea, to answer this we need: a model to make prediction calibration to observations to take into account uncertainties

How do we get there? 2 C Similar to our toy model idea, to answer this we need: a model to make prediction calibration to observations to take into account uncertainties

Definitions Verification/Validation: any model is a simplification of the target system and cannot strictly be verified, i.e. proven to be true, nor can it be validated in the sense of being shown to accurately represent all the relevant processes Evaluation: testing a model and comparing model results with observations Calibration: estimating values for unknown or uncertain parameters Tuning: calibration, but with a negative undertone of being dishonest and adjusting parameters to get the right effect for the wrong reason Agreement between model and data does not imply that the modelling assumptions accurately describe the processes producing the observed climate system behaviour; it merely indicates that the model is one (of maybe several) that is plausible, meaning it is empirically adequate for a particular purpose.

Frequentist vs Bayes The probability of throwing 6 with a die is 1/6. Frequency of occurrence in a repeated experiment. Global temperature change in 2100 will be in the range of 2 to 3 C with 66% probability. Single outcome, true but unknown value. Probability is a degree of belief, probability measures the degree to which different outcomes are supported by current evidence (data, models), quantifies my judgement of what will happen.

Bayesian inference Thomas Bayes (1702-1761, Tunbridge Wells, Kent ) P( A, B) = P( A) P( B A) = P( B) P( A B) P( B A) = P( B) P( A B) P( A) What we want to know P( par data) = Likelihood (often Prior from a model) P( par) P( data par) P( data) Normalization

Example Prior: P(sun) = 0.8, P(rain) = 0.2 P(swim sun) = 0.5, P(swim rain) = 0.1 Thomas went swimming. What is the probability that the sun was shining given that additional information? P(sun swim) = P(swim sun) P(sun) / P(swim) =?

Bayesian inference Bayesian inference uses a numerical estimate of the degree of belief in a hypothesis before evidence has been observed and calculates a numerical estimate of the degree of belief in the hypothesis after evidence has been observed. Bayesian inference usually relies on degrees of belief, or subjective probabilities, in the induction process and does not necessarily claim to provide an objective method of induction (Source: Wikipedia)

Energy (im)balance Q = F - λ T GHG F λ T 2 Equilibrium CO 2 only GHG Transient Q=0 Aerosol F λ T 1 Q Equilibrium: Q = 0 F = λ T 2 Climate sensitivity: Global equilibrium temperature change for double CO 2 forcing: S = 1 / λ = T 2 / F Commitment warming: T 2 - T 1

How much warming and CO 2 is dangerous? Involves value judgments Equilibrium warming depends only on climate sensitivity 450 ppm equivalent CO 2 gives a ~50% probability for keeping warming below 2 C (Knutti and Hegerl 2008, based on IPCC 2007)

Compensating effects of forcing and feedbacks High S (small λ), small F Q = F - λ T Low S (large λ), large F

Climate sensitivity IPCC AR4: Climate sensitivity is most likely near 3 C, likely (>66%) in the range 2-4.5 C, very likely (>90%) larger than 1.5 C. Disturbingly long tail at the upper end. IPCC AR5: likely 1.5-4.5 C, less fat tail (Knutti and Hegerl 2008)

Why the long tail? (1) Equilibrium climate sensitivity is not well constrained from the transient climate response (Knutti et al. 2005)

Why the long tail? (2) Sensitivity S=1/(1-f) if f is the feedback/gain Explains much of the shape of most PDFs if the distribution of the feedbacks f is normal Shown by Roe and Baker 2007, but concept is 25 years old 30% reduction in feedback uncertainty would bring the 95% level from 8.5 C down to 6 C. - - - 30% reduction in feedback uncertainty (Knutti and Hegerl 2008, modified from Roe and Baker 2007)

Does the fat tail of climate sensitivity matter? There is a small probability that climate change may be very large. Allen and Frame 2007: The upper bound of sensitivity is inherently hard to find but does not matter, we simply have to adapt our stabilization CO 2 target as we go along and observe. For discount rates of a few percent anything beyond 50 or 100 years is irrelevant, so we should not care about sensitivity and stabilization. Martin Weitzman: the tail of the sensitivity is not only long but fat (polynomial). If the damage function is exponential, the expected damage is infinite. The somewhat strange conclusion is that we should allocate all resources to prevent the tiny probability of a true catastrophe.

Probabilistic projections with simple models?

Probabilistic projections with simple models

Probabilistic projections with simple models

Probabilistic projections with simple models Economic uncertainties are reflected in scenarios Bayesian methods use observations to yield probabilistic projections.

Probabilistic attribution of warming Assume priors on model parameters Use Bayesian method to constrain parameters based on observations Run single forcings with constrained parameter distributions (Huber and Knutti, 2011)

Probabilistic attribution of warming Observed warming and ocean heat uptake, combined with prior knowledge about forcing and a model allow for attribution of the observed warming to causes Very likely more than 75% of the warming is externally forced Warming due to greenhouse gases very likely larger than the observed, and partly compensated by aerosol cooling. (Huber and Knutti, 2011)

The hard bits For Bayesian studies, PDFs depend on the assumed prior distribution. Uniform priors in climate sensitivity have been used as uninformative priors but have been criticized and may be overly pessimistic. No prior knowledge is impossible. Expert priors are unlikely to be independent of the data. Fuzzy PDFs, classes of priors, In theory, combining independent constraints should lead to a more narrow constraints. In practice, there is no formal way to combine different PDFs. We don t know how to properly account for structural uncertainty.

Bayesian methods Decide on a model that describes the qualitative behavior of your system. Decide on a prior distribution for uncertain parameters that reflects your prior belief before using the data. Calculate the posterior PDF of the parameters given the data Use posterior PDFs to make predictions, understand system,

Open issues Results depend strongly on prior distribution if the data constraint is weak. (Frame et al. 2005)

Thoughts on prior distributions There is usually no correct prior, every prior is subjective. There is no uninformative prior. MY PDF rather than THE PDF, because it is conditional on prior, data, model, assumptions, Avoid double counting. Prior knowledge and model must not be influenced by the data (or the amount of data). Avoid prior that gives zero probability to things that may be possible. Priors are particularly problematic if the data constraint is weak.

Expert prior distributions of climate sensitivity A uniform prior in the 1-10K (1-100K) range assumes >50% (>95%) probability that sensitivity is higher than any of the CMIP models. Expert priors may be an alternative to flat priors. But which expert should we ask? How do we ensure the data has not been used in generating the prior? (Morgan and Keith, 1995)

Constructing a likelihood Weighting for example by exp(-(model-observations)^2) RMS error (relative to internal variability)

Open issues All models are wrong. Structural error is usually not taken into account. Adding data might show that the model is inadequate. Sometimes all models have near zero likelihood, but we just scale them up such that the PDF integrates to one. Add discrepancy term. (Sanderson et al 2008)

Computational aspects Analytical solutions are rarely possible. Simple brute force Monte Carlo sampling is often unfeasible. Imagine a climate model with 20 parameters and testing 10 values for each parameter. How long would this take? Importance sampling, Markov Chain Monte Carlo sampling etc. are powerful methods to approximate the posterior, but they are still expensive. Some are sequential and therefore slow. The acceptable parameter space may be very small. Imagine that two of the ten values for each of the 20 parameters give a good fit. How big is the hit rate for a good fit with random sampling?

Do people understand probabilities?

Do we need PDFs, and when should we stop? PDF of change in summer mean daily maximum temperature (ºC) over a particular 25 km square by the 2080s under the High emissions scenario. (UKCP09) Given the acknowledged systematic errors in all current climate models, treating model outputs as decision-relevant probabilistic forecasts can be seriously misleading. This casts doubt on our ability, today, to make trustworthy, high-resolution predictions out to the end of this century (Frigg et al. 2013)

Do we need PDFs, and when should we stop? PDFs in principal convey the maximum information. In an ideal quantifiable world, shared values/attitude towards risk, and an agreed goal, they allow for an optimal decision. Probabilistic methods are not fully objective, but they help formalize uncertainty assessments by being explicit about priors, data, methods. But they are technically very hard. And they are hard to communicate. In the presence of deep uncertainties, they may depend sensitively on subjective choice (structure, prior, model discrepancy), and imply accuracy that is not justified, in particular in the tails. Robust decisions are not optimal but perform well under a broad range of outcomes.