MTAT Bayesian Networks. Introductory Lecture. Sven Laur University of Tartu

Similar documents
A Brief Introduction to Bayesian Statistics

Game Theoretic Statistics

The Human Side of Science: I ll Take That Bet! Balancing Risk and Benefit. Uncertainty, Risk and Probability: Fundamental Definitions and Concepts

Bayesian (Belief) Network Models,

Bayesian and Frequentist Approaches

Beyond Subjective and Objective in Statistics

[1] provides a philosophical introduction to the subject. Simon [21] discusses numerous topics in economics; see [2] for a broad economic survey.

Bayes theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Statistical Decision Theory and Bayesian Analysis

COHERENCE: THE PRICE IS RIGHT

The Common Priors Assumption: A comment on Bargaining and the Nature of War

Chapter 2. Knowledge Representation: Reasoning, Issues, and Acquisition. Teaching Notes

Recognizing Ambiguity

COMP329 Robotics and Autonomous Systems Lecture 15: Agents and Intentions. Dr Terry R. Payne Department of Computer Science

Lecture 2: Learning and Equilibrium Extensive-Form Games

Structural Modelling of Operational Risk in Financial Institutions:

Experimental Economics Lecture 3: Bayesian updating and cognitive heuristics

DECISION ANALYSIS WITH BAYESIAN NETWORKS

Lecture 9: The Agent Form of Bayesian Games

AGENT-BASED SYSTEMS. What is an agent? ROBOTICS AND AUTONOMOUS SYSTEMS. Today. that environment in order to meet its delegated objectives.

Lecture 9 Internal Validity

Assessment and Estimation of Risk Preferences (Outline and Pre-summary)

Chapter 1 Review Questions

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Lecture 3: Bayesian Networks 1

Data Mining in Bioinformatics Day 4: Text Mining

Probability II. Patrick Breheny. February 15. Advanced rules Summary

Position Paper: How Certain is Recommended Trust-Information?

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods

UNESCO EOLSS. This article deals with risk-defusing behavior. It is argued that this forms a central part in decision processes.

When Intuition. Differs from Relative Frequency. Chapter 18. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

MS&E 226: Small Data

A Computational Theory of Belief Introspection

Cognitive Modeling. Lecture 12: Bayesian Inference. Sharon Goldwater. School of Informatics University of Edinburgh

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin

Quantifying uncertainty under a predictive, epistemic approach to risk analysis

Supplementary notes for lecture 8: Computational modeling of cognitive development

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION"

What is Risk Aversion?

CS 4365: Artificial Intelligence Recap. Vibhav Gogate

Hierarchy of Statistical Goals

On some misconceptions about subjective probability and Bayesian inference Coolen, F.P.A.

Bayesian Modeling, Inference, Prediction and Decision-Making

Signal Detection Theory and Bayesian Modeling

Psychological. Influences on Personal Probability. Chapter 17. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Analysis of complex patterns of evidence in legal cases: Wigmore charts vs. Bayesian networks

145 Philosophy of Science

Lecture 15. When Intuition Differs from Relative Frequency

TEACHING YOUNG GROWNUPS HOW TO USE BAYESIAN NETWORKS.

Simpson s paradox (or reversal paradox)

Publicly available solutions for AN INTRODUCTION TO GAME THEORY

1 What is an Agent? CHAPTER 2: INTELLIGENT AGENTS

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models

Handout on Perfect Bayesian Equilibrium

Irrationality in Game Theory

Audio: In this lecture we are going to address psychology as a science. Slide #2

A Taxonomy of Decision Models in Decision Analysis

10 Intraclass Correlations under the Mixed Factorial Design

Simulation of the Enhanced Version of Prisoner s Dilemma Game

P(HYP)=h. P(CON HYP)=p REL. P(CON HYP)=q REP

Chapter 19. Confidence Intervals for Proportions. Copyright 2010 Pearson Education, Inc.

Learning from data when all models are wrong

Math 2a: Lecture 1. Agenda. Course organization. History of the subject. Why is the subject relevant today? First examples

Bayesian Concepts in Software Testing: An Initial Review

Are We Rational? Lecture 23

FUNCTIONAL ACCOUNT OF COMPUTATIONAL EXPLANATION

BAYESIAN HYPOTHESIS TESTING WITH SPSS AMOS

UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4

Swinburne, Richard (ed.), Bayes's Theorem, Oxford University Press, 2002, 160pp, $24.95 (hbk), ISBN

Statistical sciences. Schools of thought. Resources for the course. Bayesian Methods - Introduction

What is probability. A way of quantifying uncertainty. Mathematical theory originally developed to model outcomes in games of chance.

BAYESIAN NETWORK FOR FAULT DIAGNOSIS

CISC453 Winter Probabilistic Reasoning Part B: AIMA3e Ch

Rational Choice Theory I: The Foundations of the Theory

Cognitive Modeling. Lecture 9: Intro to Probabilistic Modeling: Rational Analysis. Sharon Goldwater

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

Evolutionary Approach to Investigations of Cognitive Systems

Cognitive Modeling. Mechanistic Modeling. Mechanistic Modeling. Mechanistic Modeling Rational Analysis

The evolution of optimism: A multi-agent based model of adaptive bias in human judgement

Bayes Linear Statistics. Theory and Methods

Understanding Probability. From Randomness to Probability/ Probability Rules!

Unit 1 Exploring and Understanding Data

Design an Experiment. Like a Real Scientist!!

Probability Factors, Future and Predictions in Project Management:

Lecture 3. PROBABILITY. Sections 2.1 and 2.2. Experiment, sample space, events, probability axioms. Counting techniques

Chapter 11 Decision Making. Syllogism. The Logic

Expert judgements in risk and reliability analysis

Unit 5. Thinking Statistically

SUPPLEMENTARY INFORMATION

Information Cascade Experiments

Biostatistics Lecture April 28, 2001 Nate Ritchey, Ph.D. Chair, Department of Mathematics and Statistics Youngstown State University

What is analytical sociology? And is it the future of sociology?

2014 Philosophy. National 5. Finalised Marking Instructions

Artificial Intelligence and Human Thinking. Robert Kowalski Imperial College London

Behavioural Economics University of Oxford Vincent P. Crawford Michaelmas Term 2012

Chapter 19. Confidence Intervals for Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Probability: Judgment and Bayes Law. CSCI 5582, Fall 2007

Learning Deterministic Causal Networks from Observational Data

Author's personal copy

Transcription:

MTAT.05.113 Bayesian Networks Introductory Lecture Sven Laur University of Tartu

Motivation Probability calculus can be viewed as an extension of classical logic. We use many imprecise and heuristic rules in everyday life. Often even experts find it difficult to formalise their knowledge. One often wants to infer rules comprehensible rules directly form data. Bayesian networks provide one possible solution. Network structure reveals causal relations between attributes Each individual table of conditional probabilities is comprehensible. The inference can be completely automated. Bayesian networks can be used to support decision making. MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 1

A simple example Trojan in a computer Money transfer to attacker s account Successful phishing Security breach Attack is detected Attack is successful Attack against server Money transfer to mule s account Mule is caught Arrow indicate direct causal relations between events or indicators. For each node v, we have to specify Pr[v parents of v]. For nodes u without parents, we have to specify prior probabilities Pr[u]. MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 2

Three main tasks Knowledge representation Encoding of prior beliefs (expert knowledge). Parameter estimation from experimental data. Structural inference from experimental data. Inference and belief propagation Fast computation of marginal probabilities. Coherence and sensitivity analysis. Decision theory Optimal and near-optimal strategies Relevance of observations. Sensitivity analysis. MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 3

Presentation topics 1. Interpretations of Probabilities (1) 2. Construction of Bayesian Networks (1) 3. Inference and Belief Propagation (2) 4. Analysis of Bayesian Networks (1) 5. Parameter Estimation (1) 6. Network Structure Estimation (1) 7. Bayesian Networks as Classifiers (1) 8. Elements of Decision Theory (2) 9. Optimal Strategies and Ways to Find Them (2) 10. Some Properties of Decision Problems (1) MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 4

What is a probability?

Five main interpretations Standford Encyclopedia of Philosophy lists five interpretations. Classical probability is a ratio between favourable and all events. Logical probability assigns plausibility over a set of formal statements. Frequentistic interpretation states that probability is relative frequency in a finite or infinite trial sequence, i.e., it is a property of sequence. Propensity interpretation states that probability is a property of physical objects, which manifests itself in an experiment(s). Subjective probability is a normalised degree of belief that a rational entity assigns to plausible events based on observations. Kolmogorov s calculus of probabilities is interpretation agnostic. It provides universal consistent axiomatisation. All rules for manipulating probabilities can be derived within this theory. MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 5

Three dominant schools of thought Frequentism Knowledge of the Long Run Fair Price Bayesianism Probability Mathematical Statistics Three notions in the graph form a vicious cycle. Depending on the starting point we get different interpretations. Each of them has its own application area and weaknesses. MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 6

1764 T. Bayes Bayes Theorem Approximate time-line 1810 P. Laplace Central Limit Theorem 1900 K. Pearson χ 2 -test 1931 F. Ramsey Subjective probability 1919 R. Mises Kollektivs 1954 Savage Subjective utility 1700 1800 1900 2000 1713 J. Bernoulli Ars Conjectandi 1774 P. Laplace Bayes Theorem 1834 A. Cournot Finite frequentism 1921 M. Keynes Logical probability 1933 A. Kolmogorov Grundbegriffe 1969 P. Martin-Löf Random sequences 1937 de Finetti Coherence principle Kolmogorov s neat axiomatisation of probability as a measure set off the balance and mathematical statistics quickly became a dominant school. It took decades for other interpretations to return. Th resurrection of Mises theory of kollektivs is particularly interesting. MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 7

Main points of controversy Is Bayes Theorem really a theorem? Bayesianists have to make a big effort to prove it. In Mises s and Kolmogorov s axiomatisation it is just a tautology. Is there any probability left when the coin has landed? For Bayesianist it depends on the observed data. Frequentists do not consider individual observations. The measurement is outside of the realm of mathematical statistics. What about average behaviour of an inference algorithm? An orthodox Bayesianist considers only individual events. Analysis of average-case behaviour is the core of mathematical statistics. MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 8

Strict Frequentism

Main ideas in one slide Von Mises. Probability is a property of an infinite sequence. A sequence x {0, 1} is a kollektiv if it satisfies the following conditions. Relative frequency has a limiting value P(x). For any admissible sub-sequence x, the corresponding relative frequency must converge to P(x). An sub-sequence is admissible if it is chosen by a method that uses only values x 1,...,x i to decide whether to take x i+1 or not. Additionally, we have construction for creating conditional events so that the Bayes theorem would hold. Most results of classical probability theory can be proved in this theory. MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 9

Main reasons why the theory was rejected A kollektiv is a non-constructible object. There are no kollektivs without restrictions on admissible selections. It is impossible to derive the law of iterated logarithms. Even if we restrict the set of admissible selections, there are kollektivs with weird properties. Namely, there are kollectivs for which the relative frequency approaches the limit from above. For all finite sub-sequences the gambler gains more than looses. There are kollektivs such that a gambler can win infinite amount of money if he or she varies bet prices. MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 10

Bayesianism

Main ideas in one slide Objective Bayesianism Internal consistency. Quantitative correspondence with common sense. Inference results should be acceptable for most of us. Tries to minimise the amount personal of prior information. Subjective Bayesianism Probabilities are formalised as prices of bets. Dutch Book argument a rational person does not give money away. Betting prices are continuous, i.e., Pr[A n ] n Pr[lim n A n ]. The outcome of the inference procedure is inherently individual. MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 11

Dutch Book argument Let p(x) denote the price a rational entity is willing to pay for a lottery ticket with a prize 1 if the event X happens. First note that p(a) + p(a) = 1 or otherwise our entity is either willing to buy or sell ticket pairs A, A for a price slightly above or below 1. Analogously, prices of mutually exclusive events A and B must satisfy p(a) + p(p) = p(a B) or we can trick the entity to buy or sell A, B and A B a price slightly above or below 1 that is again irrational. MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 12

Mathematical Statistics

Main ideas in one slide We are interested in average-case behaviour of inference algorithms. What is the probability over the data that true value lies in the interval? Does the expected value of an estimate coincide with the true value? What is the maximal false negative ratio for fixed false positive ratio? What is the probability of getting a sample form distribution H 0? Since the measurement is outside of the scope of the theory, we have to connect average-case guarantees with real world measurements. MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 13

Cournot principle. Events with a sufficiently small probability do not happen. P-values If probability of getting a sample x from a distribution H 0 is below 10 6 then the sample is not from the distribution H 0. Confidence intervals If an inference method returns an interval [a, b] that contains the true value with probability 95% over the assumed data distribution, then the interval returned by the algorithm contains the true value. More precisely, algorithm works on typical data samples. Our losses are tolerable in the long run. MTAT.05.113 Bayesian Networks, Introductory Lecture, 11 February, 2008 14