Sequential Decision Making

Size: px
Start display at page:

Download "Sequential Decision Making"

Transcription

1 Sequential Decision Making

2 Sequential decisions Many (most) real world problems cannot be solved with a single action. Need a longer horizon

3 Ex: Sequential decision problems We start at START and want to get to the goal (+1) while avoiding the pit (-1). How do we get to goal (and be rewarded +1)? Assume a fully observable environment Plan: Up, Up, Right, Right Right

4 Ex: Sequential decision problems cont d What if actions are not deterministic? Prob 0.8 to move in desired direction Prob 0.1 to move to the side (left or right) Cannot create a plan ahead of time! What is the probability that the old plan succeeds?

5 Execute predefined plan Old plan: Up, Up, Right, Right, Right Probability to succeed:

6 Markov Model In a first order Markov model The distribution p(x t t) depends only on the distribution p(x t-1 ) The present (current state) can be predicted using local knowledge of the most recent past (state at the previous step)

7 Transition model The transition function defines the probability to transition from one state to another. In decision making, each action is associated with its own transition function à select action to control the system to behave in some desired way or maximize the chance of achieving some goal T(s,a,s ) (first order Markov assumption*) Probability to reach s starting from s given action a. * T depends only on the previous state s and not the rest of the history

8 Reward function Defines the reward of a certain state In this problem someone defined the two terminal states (where a mission / an episode would stop) to have reward +1 (goal) and -1 (trap). Typically something that is part of our job as engineers to come up with to achieve a certain behavior

9 We need to make it up! Assume agent gets reward R(s) for being in state s R([4,3]) = +1 (Go to goal) R([4,2]) = -1 (Avoid trap) R(rest) = (Get to goal quickly) Let the utility of a certain path be the sum of the state utilities for that path

10 Markov Decision Process This defines a Markov Decision Process (MDP) Assumes fully observable environment Defined by: Initial state: S 0 Transition model: T(s,a,s ) Reward: R(s) What does a solution look like?

11 Solution to MDP A solution to an MDP cannot be a fixed plan (non-deterministic world, need to sense state) It is a policy π Maps state to action à a=π(s)

12 How good is a policy? How to measure the quality of a policy à Measure expected utility over the history (stochastic Env means that we need to use expectations) Optimal policy, π * : highest possible expected utility

13 Ex: Optimal policy π * Optimal policy for the previous problem Tells us what to do in each state Actual path only known when moving because actions are non-deterministic

14 Optimal policy depends on T and R!! Different behaviors for different R What if R>0 always? How about changing T?

15 Utility of sequences U h ([s 0,s 1,.s n ]) = R(s 0 )+R(s 1 )+ +R(s n ) What about infinite sequences? Might get without terminal state. How to compare and?

16 Discounted rewards Idea: Give less weight to future rewards Captures that rewards tomorrow is less certain Use discount factor γ, 0< γ<1 Gives bounded utility

17 Selecting the best policy How do we select the best policy? Many state sequences to compare As before, maximize expected utility

18 Value iteration Key insight: Utility of a state is immediate reward plus discounted expected utility of next states (assuming that we choose the optimal policy) Idea: Iterate Calculate utility of each state Use utilities to select optimal decision in each state

19 Algorithm: Value iteration Initialize U(s) arbitrary for all s Loop until policy has converged loop over all states, s loop over all actions, a end end end NOTE: Q-function comes back in Reinforcement learning

20 Bellman equation Bellman equation Converges to unique optimal solution Stop iterations when largest change in utility for any state is small enough Can show that

21 Value iteration Run value_iteration.m

22 Policy iteration Alternative to value iteration to find a policy Choose an arbitrary policy Loop until policy does not change any more Compute the value function, V(s) given the policy, known as policy evaluation Given this value function, improve the policy for each state

23 Do not always have a model What if we do not have, e.g., the model T(s,a,s )? Use learning, for example Identify T from lots of data Q-learning: Estimate Q(s,a) directly Q(s,a) is total reward starting from s, applying action a and then acting optimally after that. Ex: DeepMinds atari games (modeled Q using a deep net where s was a sequence of images, i.e. learned action from image sequence)

24 Partially observable environment What if the environment is not observable? Remember: there are no sensors that let us see everything so this is the normal case!

25 Partially observable environment What if the environment is not observable? Cannot execute policy since the state is unknown! Results in a Partially Observable MDP or POMDP Sensors provide observations of the environment Observation model O(s,o) gives probability of making observation o in state s

26 First attempt at POMDP No measurements Initial position unknown First attempt on a plan: Move left 5 times (now likely that you are on the left side) Move up 5 times (now likely that you are in upper left corner) Move right 5 times 77.5% success rate Expected utility only 0.08

27 Belief state The belief state can be used in partially observable world (conditional planning, vacuum cleaner) Definition: Belief state b is a probability distribution over possible states B(s) is the probability of being in state s Example: Initial belief state assuming that the agent can be anywhere except +1 or -1?

28 Belief state update Need to update the belief state as we go along Assume belief state b and action a Update? b (s ) = some function of b(s), a, s and o.

29 Belief state update Need to update the belief state as we go along Assume belief state b and action a Update: a is a normalization factor such that Bayes rulewith: b(s) = p(s) ( prior ) b (s ) = p(s s,a,o) ( posterior ) Sum[T(s,a,s )b(s)] = p(s a,s) ( prediction ) O(s,o) = p(o s ) = p(o s,s,a) {o indep of s and a, given s }

30 Solving POMDP Key insight: Optimal action depends on belief state and not actual state! Optimal policy p * (b) Decision cycle: 1. Execute action a=p * (b) 2. Receive observation 3. Update belief state

31 Turn a POMDP into a MDP Introduce t(b,a,b ) probability of reaching belief state b from b given action a r(b) = S b(s)r(s) t(b,a,b ) and r(b) define an observable MDP Optimal MDP strategy p * (b) is also optimal for the original POMDP

32 Second attempt at POMDP No observation à Problem is deterministic in belief space The policy is a fixed sequence Optimal sequence is: Left, Up, Up, Right, Up, Up, Right, Up, Up, Right, Up, Right, Up, Right, Up, Right, Expected utility 0.38 (was 0.08 before)

33 So is it simple? Sounds simple at first, BUT Belief state is a probability distribution! Compare MDP and POMDP in the 4x3 world MDP state : The position of the agent, i.e. 1 discrete variable with 11 possible values. POMDP belief state: 11 dimensional vector of continuous variables!!!

Lecture 13: Finding optimal treatment policies

Lecture 13: Finding optimal treatment policies MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 13: Finding optimal treatment policies Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Peter Bodik for slides on reinforcement learning) Outline

More information

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018 Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this

More information

Seminar Thesis: Efficient Planning under Uncertainty with Macro-actions

Seminar Thesis: Efficient Planning under Uncertainty with Macro-actions Seminar Thesis: Efficient Planning under Uncertainty with Macro-actions Ragnar Mogk Department of Computer Science Technische Universität Darmstadt ragnar.mogk@stud.tu-darmstadt.de 1 Introduction This

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Michèle Sebag ; TP : Herilalaina Rakotoarison TAO, CNRS INRIA Université Paris-Sud Nov. 9h, 28 Credit for slides: Richard Sutton, Freek Stulp, Olivier Pietquin / 44 Introduction

More information

Partially-Observable Markov Decision Processes as Dynamical Causal Models. Finale Doshi-Velez NIPS Causality Workshop 2013

Partially-Observable Markov Decision Processes as Dynamical Causal Models. Finale Doshi-Velez NIPS Causality Workshop 2013 Partially-Observable Markov Decision Processes as Dynamical Causal Models Finale Doshi-Velez NIPS Causality Workshop 2013 The POMDP Mindset We poke the world (perform an action) Agent World The POMDP Mindset

More information

The optimism bias may support rational action

The optimism bias may support rational action The optimism bias may support rational action Falk Lieder, Sidharth Goel, Ronald Kwan, Thomas L. Griffiths University of California, Berkeley 1 Introduction People systematically overestimate the probability

More information

Adversarial Decision-Making

Adversarial Decision-Making Adversarial Decision-Making Brian J. Stankiewicz University of Texas, Austin Department Of Psychology & Center for Perceptual Systems & Consortium for Cognition and Computation February 7, 2006 Collaborators

More information

Bayesian Reinforcement Learning

Bayesian Reinforcement Learning Bayesian Reinforcement Learning Rowan McAllister and Karolina Dziugaite MLG RCC 21 March 2013 Rowan McAllister and Karolina Dziugaite (MLG RCC) Bayesian Reinforcement Learning 21 March 2013 1 / 34 Outline

More information

Chapter 2: Intelligent Agents

Chapter 2: Intelligent Agents Chapter 2: Intelligent Agents Outline Last class, introduced AI and rational agent Today s class, focus on intelligent agents Agent and environments Nature of environments influences agent design Basic

More information

Cognitive Modeling. Lecture 12: Bayesian Inference. Sharon Goldwater. School of Informatics University of Edinburgh

Cognitive Modeling. Lecture 12: Bayesian Inference. Sharon Goldwater. School of Informatics University of Edinburgh Cognitive Modeling Lecture 12: Bayesian Inference Sharon Goldwater School of Informatics University of Edinburgh sgwater@inf.ed.ac.uk February 18, 20 Sharon Goldwater Cognitive Modeling 1 1 Prediction

More information

Introduction to Artificial Intelligence 2 nd semester 2016/2017. Chapter 2: Intelligent Agents

Introduction to Artificial Intelligence 2 nd semester 2016/2017. Chapter 2: Intelligent Agents Introduction to Artificial Intelligence 2 nd semester 2016/2017 Chapter 2: Intelligent Agents Mohamed B. Abubaker Palestine Technical College Deir El-Balah 1 Agents and Environments An agent is anything

More information

Rational Agents (Ch. 2)

Rational Agents (Ch. 2) Rational Agents (Ch. 2) Extra credit! Occasionally we will have in-class activities for extra credit (+3%) You do not need to have a full or correct answer to get credit, but you do need to attempt the

More information

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models Jonathan Y. Ito and David V. Pynadath and Stacy C. Marsella Information Sciences Institute, University of Southern California

More information

CS 331: Artificial Intelligence Intelligent Agents

CS 331: Artificial Intelligence Intelligent Agents CS 331: Artificial Intelligence Intelligent Agents 1 General Properties of AI Systems Sensors Reasoning Actuators Percepts Actions Environment This part is called an agent. Agent: anything that perceives

More information

Markov Decision Processes for Screening and Treatment of Chronic Diseases

Markov Decision Processes for Screening and Treatment of Chronic Diseases Markov Decision Processes for Screening and Treatment of Chronic Diseases Lauren N. Steimle and Brian T. Denton Abstract In recent years, Markov decision processes (MDPs) and partially obserable Markov

More information

CS 331: Artificial Intelligence Intelligent Agents

CS 331: Artificial Intelligence Intelligent Agents CS 331: Artificial Intelligence Intelligent Agents 1 General Properties of AI Systems Sensors Reasoning Actuators Percepts Actions Environment This part is called an agent. Agent: anything that perceives

More information

Solutions for Chapter 2 Intelligent Agents

Solutions for Chapter 2 Intelligent Agents Solutions for Chapter 2 Intelligent Agents 2.1 This question tests the student s understanding of environments, rational actions, and performance measures. Any sequential environment in which rewards may

More information

Rational Agents (Ch. 2)

Rational Agents (Ch. 2) Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with the environment A rational agent is one that always takes the best action (possibly expected best) Agent

More information

Agents and State Spaces. CSCI 446: Artificial Intelligence

Agents and State Spaces. CSCI 446: Artificial Intelligence Agents and State Spaces CSCI 446: Artificial Intelligence Overview Agents and environments Rationality Agent types Specifying the task environment Performance measure Environment Actuators Sensors Search

More information

Approximate Solutions For Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions For Partially Observable Stochastic Games with Common Payoffs Approximate Solutions For Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo, Geoff Gordon,Jeff Schneider The Robotics Institute Carnegie Mellon University Pittsburgh,

More information

A Scoring Policy for Simulated Soccer Agents Using Reinforcement Learning

A Scoring Policy for Simulated Soccer Agents Using Reinforcement Learning A Scoring Policy for Simulated Soccer Agents Using Reinforcement Learning Azam Rabiee Computer Science and Engineering Isfahan University, Isfahan, Iran azamrabiei@yahoo.com Nasser Ghasem-Aghaee Computer

More information

CS 331: Artificial Intelligence Intelligent Agents. Agent-Related Terms. Question du Jour. Rationality. General Properties of AI Systems

CS 331: Artificial Intelligence Intelligent Agents. Agent-Related Terms. Question du Jour. Rationality. General Properties of AI Systems General Properties of AI Systems CS 331: Artificial Intelligence Intelligent Agents Sensors Reasoning Actuators Percepts Actions Environmen nt This part is called an agent. Agent: anything that perceives

More information

A Cooking Assistance System for Patients with Alzheimers Disease Using Reinforcement Learning

A Cooking Assistance System for Patients with Alzheimers Disease Using Reinforcement Learning International Journal of Information Technology Vol. 23 No. 2 2017 A Cooking Assistance System for Patients with Alzheimers Disease Using Reinforcement Learning Haipeng Chen 1 and Yeng Chai Soh 2 1 Joint

More information

Lecture 2 Agents & Environments (Chap. 2) Outline

Lecture 2 Agents & Environments (Chap. 2) Outline Lecture 2 Agents & Environments (Chap. 2) Based on slides by UW CSE AI faculty, Dan Klein, Stuart Russell, Andrew Moore Outline Agents and environments Rationality PEAS specification Environment types

More information

Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes Using Eligibility Traces to Find the est Memoryless Policy in Partially Observable Markov Decision Processes John Loch Department of Computer Science University of Colorado oulder, CO 80309-0430 loch@cs.colorado.edu

More information

CS 771 Artificial Intelligence. Intelligent Agents

CS 771 Artificial Intelligence. Intelligent Agents CS 771 Artificial Intelligence Intelligent Agents What is AI? Views of AI fall into four categories 1. Thinking humanly 2. Acting humanly 3. Thinking rationally 4. Acting rationally Acting/Thinking Humanly/Rationally

More information

Towards Learning to Ignore Irrelevant State Variables

Towards Learning to Ignore Irrelevant State Variables Towards Learning to Ignore Irrelevant State Variables Nicholas K. Jong and Peter Stone Department of Computer Sciences University of Texas at Austin Austin, Texas 78712 {nkj,pstone}@cs.utexas.edu Abstract

More information

Remarks on Bayesian Control Charts

Remarks on Bayesian Control Charts Remarks on Bayesian Control Charts Amir Ahmadi-Javid * and Mohsen Ebadi Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran * Corresponding author; email address: ahmadi_javid@aut.ac.ir

More information

POND-Hindsight: Applying Hindsight Optimization to POMDPs

POND-Hindsight: Applying Hindsight Optimization to POMDPs POND-Hindsight: Applying Hindsight Optimization to POMDPs Alan Olsen and Daniel Bryce alan@olsen.org, daniel.bryce@usu.edu Utah State University Logan, UT Abstract We present the POND-Hindsight entry in

More information

CS324-Artificial Intelligence

CS324-Artificial Intelligence CS324-Artificial Intelligence Lecture 3: Intelligent Agents Waheed Noor Computer Science and Information Technology, University of Balochistan, Quetta, Pakistan Waheed Noor (CS&IT, UoB, Quetta) CS324-Artificial

More information

Marcus Hutter Canberra, ACT, 0200, Australia

Marcus Hutter Canberra, ACT, 0200, Australia Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ Australian National University Abstract The approaches to Artificial Intelligence (AI) in the last century may be labelled as (a) trying

More information

CS343: Artificial Intelligence

CS343: Artificial Intelligence CS343: Artificial Intelligence Introduction: Part 2 Prof. Scott Niekum University of Texas at Austin [Based on slides created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All materials

More information

Agents and Environments. Stephen G. Ware CSCI 4525 / 5525

Agents and Environments. Stephen G. Ware CSCI 4525 / 5525 Agents and Environments Stephen G. Ware CSCI 4525 / 5525 Agents An agent (software or hardware) has: Sensors that perceive its environment Actuators that change its environment Environment Sensors Actuators

More information

Reinforcement Learning and Artificial Intelligence

Reinforcement Learning and Artificial Intelligence Reinforcement Learning and Artificial Intelligence PIs: Rich Sutton Michael Bowling Dale Schuurmans Vadim Bulitko plus: Dale Schuurmans Vadim Bulitko Lori Troop Mark Lee Reinforcement learning is learning

More information

Bayes Theorem Application: Estimating Outcomes in Terms of Probability

Bayes Theorem Application: Estimating Outcomes in Terms of Probability Bayes Theorem Application: Estimating Outcomes in Terms of Probability The better the estimates, the better the outcomes. It s true in engineering and in just about everything else. Decisions and judgments

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Intelligent Agents Chapter 2 & 27 What is an Agent? An intelligent agent perceives its environment with sensors and acts upon that environment through actuators 2 Examples of Agents

More information

Adaptive Treatment of Epilepsy via Batch Mode Reinforcement Learning

Adaptive Treatment of Epilepsy via Batch Mode Reinforcement Learning Adaptive Treatment of Epilepsy via Batch Mode Reinforcement Learning Arthur Guez, Robert D. Vincent and Joelle Pineau School of Computer Science, McGill University Massimo Avoli Montreal Neurological Institute

More information

Agents and Environments

Agents and Environments Agents and Environments Berlin Chen 2004 Reference: 1. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Chapter 2 AI 2004 Berlin Chen 1 What is an Agent An agent interacts with its

More information

Artificial Intelligence Lecture 7

Artificial Intelligence Lecture 7 Artificial Intelligence Lecture 7 Lecture plan AI in general (ch. 1) Search based AI (ch. 4) search, games, planning, optimization Agents (ch. 8) applied AI techniques in robots, software agents,... Knowledge

More information

Rational Agents (Chapter 2)

Rational Agents (Chapter 2) Rational Agents (Chapter 2) Agents An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators Example: Vacuum-Agent Percepts:

More information

Artificial Intelligence Intelligent agents

Artificial Intelligence Intelligent agents Artificial Intelligence Intelligent agents Peter Antal antal@mit.bme.hu A.I. September 11, 2015 1 Agents and environments. The concept of rational behavior. Environment properties. Agent structures. Decision

More information

Probabilistic Graphical Models: Applications in Biomedicine

Probabilistic Graphical Models: Applications in Biomedicine Probabilistic Graphical Models: Applications in Biomedicine L. Enrique Sucar, INAOE Puebla, México May 2012 What do you see? What we see depends on our previous knowledge (model) of the world and the information

More information

Intelligent Agents. CmpE 540 Principles of Artificial Intelligence

Intelligent Agents. CmpE 540 Principles of Artificial Intelligence CmpE 540 Principles of Artificial Intelligence Intelligent Agents Pınar Yolum pinar.yolum@boun.edu.tr Department of Computer Engineering Boğaziçi University 1 Chapter 2 (Based mostly on the course slides

More information

Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) Computer Science Department

Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) Computer Science Department Princess Nora University Faculty of Computer & Information Systems 1 ARTIFICIAL INTELLIGENCE (CS 370D) Computer Science Department (CHAPTER-3) INTELLIGENT AGENTS (Course coordinator) CHAPTER OUTLINE What

More information

RAPID: A Belief Convergence Strategy for Collaborating with Inconsistent Agents

RAPID: A Belief Convergence Strategy for Collaborating with Inconsistent Agents RAPID: A Belief Convergence Strategy for Collaborating with Inconsistent Agents Trevor Sarratt and Arnav Jhala University of California Santa Cruz {tsarratt, jhala}@soe.ucsc.edu Abstract Maintaining an

More information

Intelligent Systems. Discriminative Learning. Parts marked by * are optional. WS2013/2014 Carsten Rother, Dmitrij Schlesinger

Intelligent Systems. Discriminative Learning. Parts marked by * are optional. WS2013/2014 Carsten Rother, Dmitrij Schlesinger Intelligent Systems Discriminative Learning Parts marked by * are optional 30/12/2013 WS2013/2014 Carsten Rother, Dmitrij Schlesinger Discriminative models There exists a joint probability distribution

More information

Intelligent Agents. Russell and Norvig: Chapter 2

Intelligent Agents. Russell and Norvig: Chapter 2 Intelligent Agents Russell and Norvig: Chapter 2 Intelligent Agent? sensors agent actuators percepts actions environment Definition: An intelligent agent perceives its environment via sensors and acts

More information

A Framework for Sequential Planning in Multi-Agent Settings

A Framework for Sequential Planning in Multi-Agent Settings A Framework for Sequential Planning in Multi-Agent Settings Piotr J. Gmytrasiewicz and Prashant Doshi Department of Computer Science University of Illinois at Chicago piotr,pdoshi@cs.uic.edu Abstract This

More information

EECS 433 Statistical Pattern Recognition

EECS 433 Statistical Pattern Recognition EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern

More information

Agents and Environments

Agents and Environments Artificial Intelligence Programming s and s Chris Brooks 3-2: Overview What makes an agent? Defining an environment Types of agent programs 3-3: Overview What makes an agent? Defining an environment Types

More information

Intelligent Agents. Chapter 2

Intelligent Agents. Chapter 2 Intelligent Agents Chapter 2 Outline Agents and environments Rationality Task environment: PEAS: Performance measure Environment Actuators Sensors Environment types Agent types Agents and Environments

More information

Agents & Environments Chapter 2. Mausam (Based on slides of Dan Weld, Dieter Fox, Stuart Russell)

Agents & Environments Chapter 2. Mausam (Based on slides of Dan Weld, Dieter Fox, Stuart Russell) Agents & Environments Chapter 2 Mausam (Based on slides of Dan Weld, Dieter Fox, Stuart Russell) Outline Agents and environments Rationality PEAS specification Environment types Agent types 2 Agents An

More information

Learning and Adaptive Behavior, Part II

Learning and Adaptive Behavior, Part II Learning and Adaptive Behavior, Part II April 12, 2007 The man who sets out to carry a cat by its tail learns something that will always be useful and which will never grow dim or doubtful. -- Mark Twain

More information

Intelligent Agents. Outline. Agents. Agents and environments

Intelligent Agents. Outline. Agents. Agents and environments Outline Intelligent Agents Chapter 2 Source: AI: A Modern Approach, 2 nd Ed Stuart Russell and Peter Norvig Agents and environments Rationality (Performance measure, Environment, Actuators, Sensors) Environment

More information

Intelligent Agents. Soleymani. Artificial Intelligence: A Modern Approach, Chapter 2

Intelligent Agents. Soleymani. Artificial Intelligence: A Modern Approach, Chapter 2 Intelligent Agents CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2016 Soleymani Artificial Intelligence: A Modern Approach, Chapter 2 Outline Agents and environments

More information

Dr. Mustafa Jarrar. Chapter 2 Intelligent Agents. Sina Institute, University of Birzeit

Dr. Mustafa Jarrar. Chapter 2 Intelligent Agents. Sina Institute, University of Birzeit Lecture Notes, Advanced Artificial Intelligence (SCOM7341) Sina Institute, University of Birzeit 2 nd Semester, 2012 Advanced Artificial Intelligence (SCOM7341) Chapter 2 Intelligent Agents Dr. Mustafa

More information

Asalient problem in genomic signal processing is the design

Asalient problem in genomic signal processing is the design 412 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 2, NO. 3, JUNE 2008 Optimal Intervention in Asynchronous Genetic Regulatory Networks Babak Faryabi, Student Member, IEEE, Jean-François Chamberland,

More information

Artificial Intelligence. Intelligent Agents

Artificial Intelligence. Intelligent Agents Artificial Intelligence Intelligent Agents Agent Agent is anything that perceives its environment through sensors and acts upon that environment through effectors. Another definition later (Minsky) Humans

More information

Learning Utility for Behavior Acquisition and Intention Inference of Other Agent

Learning Utility for Behavior Acquisition and Intention Inference of Other Agent Learning Utility for Behavior Acquisition and Intention Inference of Other Agent Yasutake Takahashi, Teruyasu Kawamata, and Minoru Asada* Dept. of Adaptive Machine Systems, Graduate School of Engineering,

More information

Agents. This course is about designing intelligent agents Agents and environments. Rationality. The vacuum-cleaner world

Agents. This course is about designing intelligent agents Agents and environments. Rationality. The vacuum-cleaner world This course is about designing intelligent agents and environments Rationality The vacuum-cleaner world The concept of rational behavior. Environment types Agent types 1 An agent is an entity that perceives

More information

On Modeling Human Learning in Sequential Games with Delayed Reinforcements

On Modeling Human Learning in Sequential Games with Delayed Reinforcements On Modeling Human Learning in Sequential Games with Delayed Reinforcements Roi Ceren, Prashant Doshi Department of Computer Science University of Georgia Athens, GA 30602 {ceren,pdoshi}@cs.uga.edu Matthew

More information

Intelligent Agents. Instructor: Tsung-Che Chiang

Intelligent Agents. Instructor: Tsung-Che Chiang Intelligent Agents Instructor: Tsung-Che Chiang tcchiang@ieee.org Department of Computer Science and Information Engineering National Taiwan Normal University Artificial Intelligence, Spring, 2010 Outline

More information

Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents

Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents Autonomous Agents and Multiagent Systems manuscript No. (will be inserted by the editor) Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents Pedro Sequeira Francisco S. Melo Ana Paiva

More information

Time-varying decision boundaries: insights from optimality analysis

Time-varying decision boundaries: insights from optimality analysis DOI.378/s3423-7-34-6 THEORETICAL REVIEW Time-varying decision boundaries: insights from optimality analysis Gaurav Malhotra David S. Leslie 2 Casimir J. H. Ludwig Rafal Bogacz 3 The Author(s) 27. This

More information

DESIGNING PERSONALIZED TREATMENT: AN APPLICATION TO ANTICOAGULATION THERAPY

DESIGNING PERSONALIZED TREATMENT: AN APPLICATION TO ANTICOAGULATION THERAPY DESIGNING PERSONALIZED TREATMENT: AN APPLICATION TO ANTICOAGULATION THERAPY by Rouba Ibrahim UCL School of Management, University College London, London, UK rouba.ibrahim@ucl.ac.uk, Tel: (44)20-76793278

More information

Agents. Environments Multi-agent systems. January 18th, Agents

Agents. Environments Multi-agent systems. January 18th, Agents Plan for the 2nd hour What is an agent? EDA132: Applied Artificial Intelligence (Chapter 2 of AIMA) PEAS (Performance measure, Environment, Actuators, Sensors) Agent architectures. Jacek Malec Dept. of

More information

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful. icausalbayes USER MANUAL INTRODUCTION You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful. We expect most of our users

More information

AI: Intelligent Agents. Chapter 2

AI: Intelligent Agents. Chapter 2 AI: Intelligent Agents Chapter 2 Outline Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environment types Agent types Agents An agent is anything

More information

Strategic Level Proton Therapy Patient Admission Planning: A Markov Decision Process Modeling Approach

Strategic Level Proton Therapy Patient Admission Planning: A Markov Decision Process Modeling Approach University of New Haven Digital Commons @ New Haven Mechanical and Industrial Engineering Faculty Publications Mechanical and Industrial Engineering 6-2017 Strategic Level Proton Therapy Patient Admission

More information

Intelligent Agents. BBM 405 Fundamentals of Artificial Intelligence Pinar Duygulu Hacettepe University. Slides are mostly adapted from AIMA

Intelligent Agents. BBM 405 Fundamentals of Artificial Intelligence Pinar Duygulu Hacettepe University. Slides are mostly adapted from AIMA 1 Intelligent Agents BBM 405 Fundamentals of Artificial Intelligence Pinar Duygulu Hacettepe University Slides are mostly adapted from AIMA Outline 2 Agents and environments Rationality PEAS (Performance

More information

Silvia Rossi. Agent as Intentional Systems. Lezione n. Corso di Laurea: Informatica. Insegnamento: Sistemi multi-agente.

Silvia Rossi. Agent as Intentional Systems. Lezione n. Corso di Laurea: Informatica. Insegnamento: Sistemi multi-agente. Silvia Rossi Agent as Intentional Systems 2 Lezione n. Corso di Laurea: Informatica Insegnamento: Sistemi multi-agente Email: silrossi@unina.it A.A. 2014-2015 Agenti e Ambienti (RN, WS) 2 Environments

More information

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi

More information

Overview. What is an agent?

Overview. What is an agent? Artificial Intelligence Programming s and s Chris Brooks Overview What makes an agent? Defining an environment Overview What makes an agent? Defining an environment Department of Computer Science University

More information

Agents. Formalizing Task Environments. What s an agent? The agent and the environment. Environments. Example: the automated taxi driver (PEAS)

Agents. Formalizing Task Environments. What s an agent? The agent and the environment. Environments. Example: the automated taxi driver (PEAS) What s an agent? Russell and Norvig: An agent is anything that can be viewed as perceiving its environment through sensors and acting on that environment through actuators. (p. 32) Examples: The agent

More information

Outline. What s inside this paper? My expectation. Software Defect Prediction. Traditional Method. What s inside this paper?

Outline. What s inside this paper? My expectation. Software Defect Prediction. Traditional Method. What s inside this paper? Outline A Critique of Software Defect Prediction Models Norman E. Fenton Dongfeng Zhu What s inside this paper? What kind of new technique was developed in this paper? Research area of this technique?

More information

Equilibrium Selection In Coordination Games

Equilibrium Selection In Coordination Games Equilibrium Selection In Coordination Games Presenter: Yijia Zhao (yz4k@virginia.edu) September 7, 2005 Overview of Coordination Games A class of symmetric, simultaneous move, complete information games

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 2. Rational Agents Nature and Structure of Rational Agents and Their Environments Wolfram Burgard, Bernhard Nebel and Martin Riedmiller Albert-Ludwigs-Universität

More information

CS 4365: Artificial Intelligence Recap. Vibhav Gogate

CS 4365: Artificial Intelligence Recap. Vibhav Gogate CS 4365: Artificial Intelligence Recap Vibhav Gogate Exam Topics Search BFS, DFS, UCS, A* (tree and graph) Completeness and Optimality Heuristics: admissibility and consistency CSPs Constraint graphs,

More information

PART - A 1. Define Artificial Intelligence formulated by Haugeland. The exciting new effort to make computers think machines with minds in the full and literal sense. 2. Define Artificial Intelligence

More information

How do you design an intelligent agent?

How do you design an intelligent agent? Intelligent Agents How do you design an intelligent agent? Definition: An intelligent agent perceives its environment via sensors and acts rationally upon that environment with its effectors. A discrete

More information

Web-Mining Agents Cooperating Agents for Information Retrieval

Web-Mining Agents Cooperating Agents for Information Retrieval Web-Mining Agents Cooperating Agents for Information Retrieval Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen) Literature Chapters 2, 6, 13, 15-17

More information

Intelligent Autonomous Agents. Ralf Möller, Rainer Marrone Hamburg University of Technology

Intelligent Autonomous Agents. Ralf Möller, Rainer Marrone Hamburg University of Technology Intelligent Autonomous Agents Ralf Möller, Rainer Marrone Hamburg University of Technology Lab class Tutor: Rainer Marrone Time: Monday 12:15-13:00 Locaton: SBS93 A0.13.1/2 w Starting in Week 3 Literature

More information

Pond-Hindsight: Applying Hindsight Optimization to Partially-Observable Markov Decision Processes

Pond-Hindsight: Applying Hindsight Optimization to Partially-Observable Markov Decision Processes Utah State University DigitalCommons@USU All Graduate Theses and Dissertations Graduate Studies 5-2011 Pond-Hindsight: Applying Hindsight Optimization to Partially-Observable Markov Decision Processes

More information

Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning

Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning Arthur Guez, Robert D. Vincent School of Computer Science McGill University Montreal, Quebec Canada Massimo Avoli Montreal Neurological

More information

Contents. Foundations of Artificial Intelligence. Agents. Rational Agents

Contents. Foundations of Artificial Intelligence. Agents. Rational Agents Contents Foundations of Artificial Intelligence 2. Rational s Nature and Structure of Rational s and Their s Wolfram Burgard, Bernhard Nebel, and Martin Riedmiller Albert-Ludwigs-Universität Freiburg May

More information

MAC Sleep Mode Control Considering Downlink Traffic Pattern and Mobility

MAC Sleep Mode Control Considering Downlink Traffic Pattern and Mobility 1 MAC Sleep Mode Control Considering Downlink Traffic Pattern and Mobility Neung-Hyung Lee and Saewoong Bahk School of Electrical Engineering & Computer Science, Seoul National University, Seoul Korea

More information

arxiv: v1 [cs.lg] 25 Nov 2018

arxiv: v1 [cs.lg] 25 Nov 2018 A MODEL-BASED REINFORCEMENT LEARNING APPROACH FOR A RARE DISEASE DIAGNOSTIC TASK. A PREPRINT arxiv:1811.10112v1 [cs.lg] 25 Nov 2018 Rémi Besson CMAP École Polytechnique Route de Saclay, 91128 Palaiseau

More information

Learning to Identify Irrelevant State Variables

Learning to Identify Irrelevant State Variables Learning to Identify Irrelevant State Variables Nicholas K. Jong Department of Computer Sciences University of Texas at Austin Austin, Texas 78712 nkj@cs.utexas.edu Peter Stone Department of Computer Sciences

More information

The Next 32 Days. By James FitzGerald

The Next 32 Days. By James FitzGerald The Next 32 Days By James FitzGerald The CrossFit OPENS are here again. It seems like a few months ago we were all making the statements - THIS will be the year of training Only to see those months FLY

More information

Challenges in Developing Learning Algorithms to Personalize mhealth Treatments

Challenges in Developing Learning Algorithms to Personalize mhealth Treatments Challenges in Developing Learning Algorithms to Personalize mhealth Treatments JOOLHEALTH Bar-Fit Susan A Murphy 01.16.18 HeartSteps SARA Sense 2 Stop Continually Learning Mobile Health Intervention 1)

More information

MINTO PREVENTION & REHABILITATION CENTRE CENTRE DE PREVENTION ET DE READAPTATION MINTO. Preparing to Quit. About This Kit

MINTO PREVENTION & REHABILITATION CENTRE CENTRE DE PREVENTION ET DE READAPTATION MINTO. Preparing to Quit. About This Kit MINTO PREVENTION & REHABILITATION CENTRE CENTRE DE PREVENTION ET DE READAPTATION MINTO Preparing to Quit About This Kit There is no one best way to quit smoking. Your Heart Institute Prevention and Rehabilitation

More information

Agents & Environments Chapter 2. Mausam (Based on slides of Dan Weld, Dieter Fox, Stuart Russell)

Agents & Environments Chapter 2. Mausam (Based on slides of Dan Weld, Dieter Fox, Stuart Russell) Agents & Environments Chapter 2 Mausam (Based on slides of Dan Weld, Dieter Fox, Stuart Russell) Outline Agents and environments Rationality PEAS specification Environment types Agent types D. Weld, D.

More information

Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task

Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit tas Anonymous Author(s) Affiliation Address email Abstract 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

More information

Interaction as an emergent property of a Partially Observable Markov Decision Process

Interaction as an emergent property of a Partially Observable Markov Decision Process Interaction as an emergent property of a Partially Observable Markov Decision Process Andrew Howes, Xiuli Chen, Aditya Acharya School of Computer Science, University of Birmingham Richard L. Lewis Department

More information

Dynamic Control Models as State Abstractions

Dynamic Control Models as State Abstractions University of Massachusetts Amherst From the SelectedWorks of Roderic Grupen 998 Dynamic Control Models as State Abstractions Jefferson A. Coelho Roderic Grupen, University of Massachusetts - Amherst Available

More information

Approximate Inference in Bayes Nets Sampling based methods. Mausam (Based on slides by Jack Breese and Daphne Koller)

Approximate Inference in Bayes Nets Sampling based methods. Mausam (Based on slides by Jack Breese and Daphne Koller) Approximate Inference in Bayes Nets Sampling based methods Mausam (Based on slides by Jack Breese and Daphne Koller) 1 Bayes Nets is a generative model We can easily generate samples from the distribution

More information

Intelligent Agents. Chapter 2 ICS 171, Fall 2009

Intelligent Agents. Chapter 2 ICS 171, Fall 2009 Intelligent Agents Chapter 2 ICS 171, Fall 2009 Discussion \\Why is the Chinese room argument impractical and how would we have to change the Turing test so that it is not subject to this criticism? Godel

More information

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful. icausalbayes USER MANUAL INTRODUCTION You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful. We expect most of our users

More information

Abstract. 2. Metacognition Architecture. 1. Introduction

Abstract. 2. Metacognition Architecture. 1. Introduction Design of Metacontrol and Metacognition mechanisms in SMCA Using Norms and Affect rules Venkatamuni Vijaya Kumar, Darryl. N. Davis, and K.R. Shylaja New Horizon college of Engineering, DR.AIT, Banagalore,

More information