Learning to play Mario

Similar documents
FEELING AS INTRINSIC REWARD

Clusters, Symbols and Cortical Topography

Learning to Use Episodic Memory

Expert System Profile

Using Diverse Cognitive Mechanisms for Action Modeling

Artificial Intelligence Lecture 7

Using Imagery to Simplify Perceptual Abstrac8on in Reinforcement Learning Agents. Sam Wintermute, University of Michigan

Mindset For Optimal Performance: Essential Mental Skills DR. RICK MCGUIRE DIRECTOR OF SPORT PSYCHOLOGY ANNE SHADLE M.ED.

Lecturer: Dr. Benjamin Amponsah, Dept. of Psychology, UG, Legon Contact Information:

Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) Computer Science Department

Spatial Orientation Using Map Displays: A Model of the Influence of Target Location

Hebbian Plasticity for Improving Perceptual Decisions

Why do we need vision? We need vision for two quite different but complementary reasons We need vision to give us detailed knowledge of the world beyo

Optical Illusions 4/5. Optical Illusions 2/5. Optical Illusions 5/5 Optical Illusions 1/5. Reading. Reading. Fang Chen Spring 2004

OUTLINE OF PRESENTATION

Human cogition. Human Cognition. Optical Illusions. Human cognition. Optical Illusions. Optical Illusions

Exploration and Exploitation in Reinforcement Learning

Competing Frameworks in Perception

Competing Frameworks in Perception

The Benefits of Walking

Intelligent Agents. Soleymani. Artificial Intelligence: A Modern Approach, Chapter 2

Cohen and the First Computer Virus. From: Wolfgang Apolinarski

Peak Performance Book Series FOCUS! Mental Strategies for Zone Concentration*

CHAPTER 6: Memory model Practice questions at - text book pages 112 to 113

Fraud Awareness Workshop

Expert System-Based Post-Stroke Robotic Rehabilitation for Hemiparetic Arm

Assistant Professor Computer Science. Introduction to Human-Computer Interaction

Perception, Attention and Problem Solving on the Block Design Task. Mohamed El Banani University of Michigan

Hall of Fame or Shame? Human Abilities: Vision & Cognition. Hall of Shame! Human Abilities: Vision & Cognition. Outline. Video Prototype Review

Identify these objects

Exploring MultiObjective Fitness Functions and Compositions of Different Fitness Functions in rtneat

CS148 - Building Intelligent Robots Lecture 5: Autonomus Control Architectures. Instructor: Chad Jenkins (cjenkins)


CS160: Sensori-motor Models. Prof Canny

Learning Objectives. Neuroscience of Learning. How can Neuroscience help understand the process of learning?

Introduction to Computational Neuroscience

Realization of Visual Representation Task on a Humanoid Robot

Learning Utility for Behavior Acquisition and Intention Inference of Other Agent

Katsunari Shibata and Tomohiko Kawano

(Visual) Attention. October 3, PSY Visual Attention 1

Automatic Context-Aware Image Captioning

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society

Learning Classifier Systems (LCS/XCSF)

Human Computer Interaction - An Introduction

Autonomous Mobile Robotics

Genetically Generated Neural Networks I: Representational Effects

5/20/2014. Leaving Andy Clark's safe shores: Scaling Predictive Processing to higher cognition. ANC sysmposium 19 May 2014

Cognition, Learning and Social Change Conference Summary A structured summary of the proceedings of the first conference

Time Pressure: Behavioural Science Considerations for Mobile Marketing

Time Pressure: Behavioral Science Considerations for Mobile Marketing

Intelligent Agents. CmpE 540 Principles of Artificial Intelligence

Object Recognition & Categorization. Object Perception and Object Categorization

Year 7 Key Performance Indicators Physical Education (Invasion Games)

Understanding Users. - cognitive processes. Unit 3

Unifying Cognitive Functions and Emotional Appraisal. Bob Marinier John Laird University of Michigan 26 th Soar Workshop: May 24, 2006

Cognitive Modeling Demonstrates How People Use Anticipated Location Knowledge of Menu Items

CS449/649: Human-Computer Interaction

Workout: Workout: run. From there, do the about 10 minutes. Choose MHR *This would be an optimal. 30 seconds. That s minute Run to Cool

Re: ENSC 370 Project Gerbil Functional Specifications

Representing Problems (and Plans) Using Imagery

The McQuaig Word Survey

How important are trees for our planet?

SIM 16/17 T1.2 Limitations of the human perceptual system

Prof. Greg Francis 7/31/15

Introduction to Deep Reinforcement Learning and Control

Discovering Meaningful Cut-points to Predict High HbA1c Variation

CS343: Artificial Intelligence

Information on ADHD for Children, Question and Answer - long version

Lecture 6. Perceptual and Motor Schemas

Rational Agents (Ch. 2)

Applying Appraisal Theories to Goal Directed Autonomy

John E. Laird University of Michigan 1101 Beal Ave. Ann Arbor, MI

Extending Cognitive Architecture with Episodic Memory

Introduction to Computational Neuroscience

Cognitive Modeling Reveals Menu Search is Both Random and Systematic

2 Psychological Processes : An Introduction

Attention Disorders. By Donna Walker Tileston, Ed.D.

Autism & intellectual disabilities. How to deal with confusing concepts

Chapter 5. Summary and Conclusions! 131

Free Time Boredom. I performed the Free Time Boredom assessment to Linda (fictitious name to

Augmented Cognition to enhance human sensory awareness, cognitive functioning and psychic functioning: a research proposal in two phases

Visual Selection and Attention

COMP329 Robotics and Autonomous Systems Lecture 15: Agents and Intentions. Dr Terry R. Payne Department of Computer Science

Motor Systems I Cortex. Reading: BCP Chapter 14

Acceptance and Commitment Therapy (ACT) for Physical Health Conditions

Auditory Scene Analysis

Cell Responses in V4 Sparse Distributed Representation

IPM 12/13 T1.2 Limitations of the human perceptual system

CS 544 Human Abilities

Children with cochlear implants: parental perspectives. Parents points of view

Visual Processing (contd.) Pattern recognition. Proximity the tendency to group pieces that are close together into one object.

Artificial Neural Networks (Ref: Negnevitsky, M. Artificial Intelligence, Chapter 6)

NEOCORTICAL CIRCUITS. specifications

Running Head: VISUAL SCHEDULES FOR STUDENTS WITH AUTISM SPECTRUM DISORDER

Overcoming Subconscious Resistances

Human Information Processing. CS160: User Interfaces John Canny

Introduction. Abstract. Figure 1: Proportional Analogy.

Achievement: Approach versus Avoidance Motivation

Attention! 5. Lecture 16 Attention. Science B Attention. 1. Attention: What is it? --> Selection

AQA A Level Psychology. Topic Companion. Memory. Joseph Sparks & Helen Lakin

Transcription:

29 th Soar Workshop Learning to play Mario Shiwali Mohan University of Michigan, Ann Arbor 1

Outline Why games? Domain Approaches and results Issues with current design What's next 2

Why computer games? Why Mario? Computer games Complex, can have really large, continuous state and action spaces. Reasoning and learning required at many levels Sensory-motor primitives Path planning Strategy Knowledge learned in one level should be applicable in other levels of the game. Exciting to watch Soar play! Infinite Mario Parameterized domain from RL Competition 2009 Easy interface (RL-glue) 3

Domain Visual space 16 x 22 tiles in visual scene, each tile can have 13 different values 9 kinds of monsters, can be anywhere, agent knows their exact location, speed, type Reasonable action space Left, Right, Stay, Jump and Speed Toggle (12 combinations). Reward 100 to reach finish line -1 for every step in the environment +1 killing monster, taking coin etc -10 for dying Not real time. Sample Agent Memory of last episode Levels 0-3 4

How should states be represented? Simple RL - agent State [5 x 5] Tiles around Mario Action space Primitive Actions,- left, right, still jump, speed 5

Simple RL agent - Results Reasons Huge state space! Rewards too far into future. A large number of steps required to complete on episode Also, a policy if ever learned, might not help in other instances of the game. 6

Different way to look at the problem Object oriented representation* of state From low level tile by tile representation to a view composed of objects and their relationships with the agent. From the tile at position (x,y) is of type t or the agent is at tile (x,y) to pit P is at a distance of 3.5 tiles on x -axis. Features like pits, pipes, platforms etc extracted from the lower level data using simple heuristics. Can be accounted for as background knowledge. pit <p> <p> ^distx 3.5 <p> ^disty -5 <p> ^width 2 7 *Diuk, Cohen and Littman, An object-oriented representation for efficient reinforcement learning, in proceedings of ICML 2008

Operator/Action abstraction Make a distinction between key-stroke level operators (KLOs) and functional level operators (FLOs) KLOs primitive actions (left, right jump etc in present case) FLOs collection of KLOs executed in a sequence to perform a specific task Key observations from GOMS* analysis on HI-Soar Games have a limited number for FLOs (depends different categories of objects present ) Human experts decide between different FLOs using very local and immediate conditions / features (like distance from monsters, coins etc) Example FLOs tackle_monster, grab_coin, search_question How does this help? Provides structure to the problem. When agent executes a FLOs, it has an immediate goal to attend like getting rid of the monster. Learning is easier, similar to HRL 8*John B. E. Extensions of GOMS analyses to expert performance requiring perception of dynamic visual and auditory information. In proceedings of CHI 1990

Operator Hierarchy Complete Game State defined by near ness of an object. Is monster A close enough to be a threat? etc Grab Coin Search Question Tackle Monster Avoid Pit Move to Goal Grab Coin C1 Grab Coin C2 State defined by exact distance of an object. Monster A is at a distance on 3.5 tiles on x axis. (discrete) Move Left/Right Jump Yes/No Speed High/Low 9

How it works Objects from the sensory input are extracted and elaborated to have a relational (with Mario) representation. All objects in close vicinity cause FLO proposals. The agent makes a selection between these operators based on hard coded/ learned preferences. Agent goes into a sub-goal to execute the FLO; FLO proposes lower, keystroke-level operators. Primitive actions are executed, environment steps, produces a reward and next set of sensory data. 10

Agent with hard-coded FLO preferences Complete Game Grab Coin Search Question Tackle Monster Avoid Pit Move to Goal Grab Coin C1 Grab Coin C2 Move Left/Right Jump Yes/No Speed High/Low 11

12

Agent with learned FLO preference Complete Game Grab Coin Search Question Tackle Monster Avoid Pit Move to Goal Grab Coin C1 Grab Coin C2 Move Left/Right Jump Yes/No Speed High/Low 13

14

Hard coded FLO preferences. Learned FLO preferences. 15

How good is the current design? Good at simple scenarios where decisions affected by only one object in the vicinity. In case there are more than one objects, the preference is clear, Dealing with monster has higher preference than collecting coins. In case of multiple objects have to be considered together, the agent fails to learn a good policy 16

Trouble Complete Game Grab Coin Search Question Tackle Monster Avoid Pit Move to Goal Tackle Monster2 Tackle Monster 1 Move Left/Right Jump Yes/No Speed High/Low 17

Example scenarios 18

Next question to ask When operators (KLOs) have conflicting preferences due to different objects, what constitutes a good policy? How can it be learned? What is a conflict? How can it be detected? 19

Some ideas - Learn more specific rules. For Objects A and B, the agent has learned independent policies for A, B. If both A and B are encountered, previously leaned policies are split as policies for - A ^!B!A ^ B A ^ B (to be learned from more experience) Move beyond Reinforcement Learning The situation causes an impasse. Agent uses other tools like episodic memory or mental imagery and spatial reasoning to deal with the current situation and chunks the information. 20

Finally Nuggets Relational, object oriented representations used in reinforcement learning. Imposing hierarchy facilitates better learning, better organization. Generalized learning Coal Multiple object affecting decision still a problem. Using templates makes it slow (can not be used in a real time games) Using gp causes it to produce large (~million) rules quite a few of which are never used. Continuous values! 21