Learning and Adaptive Behavior, Part II

Similar documents
Lesson 6 Learning II Anders Lyhne Christensen, D6.05, INTRODUCTION TO AUTONOMOUS MOBILE ROBOTS

Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence

Artificial Neural Networks (Ref: Negnevitsky, M. Artificial Intelligence, Chapter 6)

Learning in neural networks

Psychology, Ch. 6. Learning Part 1

Chapter 5: Learning and Behavior Learning How Learning is Studied Ivan Pavlov Edward Thorndike eliciting stimulus emitted

Animal Behavior. Relevant Biological Disciplines. Inspirations => Models

An Artificial Synaptic Plasticity Mechanism for Classical Conditioning with Neural Networks

PSY 215 Lecture 13 (3/7/11) Learning & Memory Dr. Achtman PSY 215. Lecture 13 Topic: Mechanisms of Learning and Memory Chapter 13, section 13.

Chapter 1. Introduction

1. Introduction 1.1. About the content

1. Introduction 1.1. About the content. 1.2 On the origin and development of neurocomputing

Computational Cognitive Neuroscience

The storage and recall of memories in the hippocampo-cortical system. Supplementary material. Edmund T Rolls

Reinforcement Learning. With help from

Pavlovian, Skinner and other behaviourists contribution to AI

Classical & Operant Conditioning. Learning: Principles and Applications

The organism s problem: Avoid predation and injury so you can reproduce

Rolls,E.T. (2016) Cerebral Cortex: Principles of Operation. Oxford University Press.

CS 453X: Class 18. Jacob Whitehill

Katsunari Shibata and Tomohiko Kawano

Chapter 7 - Learning

Chapter 1. Give an overview of the whole RL problem. Policies Value functions. Tic-Tac-Toe example

Conditioning and Learning. Chapter 7

Bronze statue of Pavlov and one of his dogs located on the grounds of his laboratory at Koltushi Photo taken by Jackie D. Wood, June 2004.

Question 1 Multiple Choice (8 marks)

Artificial Neural Networks

Classical Conditioning. Learning. Classical conditioning terms. Classical Conditioning Procedure. Procedure, cont. Important concepts

SUPPLEMENTARY INFORMATION

Brief History of Work in the area of Learning and Memory

PSY 315 Lecture 13 (3/7/2011) (Learning & Memory Mechanics) Dr. Achtman PSY 215

acquisition associative learning behaviorism B. F. Skinner biofeedback

Learning. Learning: Problems. Chapter 6: Learning

Reinforcement learning and the brain: the problems we face all day. Reinforcement Learning in the brain

Learning. AP PSYCHOLOGY Unit 4

Artificial Intelligence Lecture 7

Learning. AP PSYCHOLOGY Unit 5

CHAPTER 6. Learning. Lecture Overview. Introductory Definitions PSYCHOLOGY PSYCHOLOGY PSYCHOLOGY

University of Cambridge Engineering Part IB Information Engineering Elective

PSYCHOLOGY. Chapter 6 LEARNING PowerPoint Image Slideshow

CHAPTER 7 LEARNING. Jake Miller, Ocean Lakes High School

1. A type of learning in which behavior is strengthened if followed by a reinforcer or diminished if followed by a punisher.

Cerebral Cortex. Edmund T. Rolls. Principles of Operation. Presubiculum. Subiculum F S D. Neocortex. PHG & Perirhinal. CA1 Fornix CA3 S D

Intelligent Control Systems

New Ideas for Brain Modelling

Classical Conditioning Classical Conditioning - a type of learning in which one learns to link two stimuli and anticipate events.

Chapter 5: How Do We Learn?

Learning Utility for Behavior Acquisition and Intention Inference of Other Agent

How has Computational Neuroscience been useful? Virginia R. de Sa Department of Cognitive Science UCSD

Chapter 6. Learning: The Behavioral Perspective

ANN predicts locoregional control using molecular marker profiles of. Head and Neck squamous cell carcinoma

Associative Learning

Sparse Coding in Sparse Winner Networks

Learning Classifier Systems (LCS/XCSF)

A Reinforcement Learning Approach Involving a Shortest Path Finding Algorithm

Chapter 7. Learning From Experience


TIME SERIES MODELING USING ARTIFICIAL NEURAL NETWORKS 1 P.Ram Kumar, 2 M.V.Ramana Murthy, 3 D.Eashwar, 4 M.Venkatdas

Memory, Attention, and Decision-Making

Butter Food Eat Sandwich Rye Jam Milk Flour Jelly Dough Crust Slice Wine Loaf Toast

CHAPTER I From Biological to Artificial Neuron Model

Realization of Visual Representation Task on a Humanoid Robot

CS 182 Midterm review

Outline. History of Learning Theory. Pavlov s Experiment: Step 1. Associative learning 9/26/2012. Nature or Nurture

I. Classical Conditioning

Neuroinformatics. Ilmari Kurki, Urs Köster, Jukka Perkiö, (Shohei Shimizu) Interdisciplinary and interdepartmental

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

The Re(de)fined Neuron

Hebbian Plasticity for Improving Perceptual Decisions

Psychology in Your Life

COMP9444 Neural Networks and Deep Learning 5. Convolutional Networks

The Rescorla Wagner Learning Model (and one of its descendants) Computational Models of Neural Systems Lecture 5.1

acquisition associative learning behaviorism A type of learning in which one learns to link two or more stimuli and anticipate events

3/7/2010. Theoretical Perspectives

Introduction and Historical Background. August 22, 2007

PSY380: VISION SCIENCE

Robot Behavior Genghis, MIT Callisto, GATech

Cognitive Modelling Themes in Neural Computation. Tom Hartley

THEORIES OF PERSONALITY II

Active Sites model for the B-Matrix Approach

Information Processing During Transient Responses in the Crayfish Visual System

Unit 6 Learning.

Learning. 3. Which of the following is an example of a generalized reinforcer? (A) chocolate cake (B) water (C) money (D) applause (E) high grades

This Lecture: Psychology of Memory and Brain Areas Involved

Classical Conditioning & Operant Conditioning

Learning Habituation Associative learning Classical conditioning Operant conditioning Observational learning. Classical Conditioning Introduction

Chapter 1: Behaviorism and the beginning of cognitive science

Agents and Environments

Learning. Learning is a relatively permanent change in behavior acquired through experience.

Multi-Associative Memory in flif Cell Assemblies

Learning. Association. Association. Unit 6: Learning. Learning. Classical or Pavlovian Conditioning. Different Types of Learning

Reward-Modulated Hebbian Learning of Decision Making

A model to explain the emergence of reward expectancy neurons using reinforcement learning and neural network $

Neuromorphic computing

Lecture 1: Neurons. Lecture 2: Coding with spikes. To gain a basic understanding of spike based neural codes

Dikran J. Martin Introduction to Psychology

CAS Seminar - Spiking Neurons Network (SNN) Jakob Kemi ( )

PSYC 337 LEARNING. Session 3 Classical Conditioning. Lecturer: Dr. Inusah Abdul-Nasiru Contact Information:

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Learning = an enduring change in behavior, resulting from experience.

Transcription:

Learning and Adaptive Behavior, Part II April 12, 2007 The man who sets out to carry a cat by its tail learns something that will always be useful and which will never grow dim or doubtful. -- Mark Twain

A Challenge: Getting RL to Work on Real Robots When is learning appropriate? When task is originally under-specified or difficult to code exactly by hand When task has parameters that are likely to change over time in unpredictable ways When time taken to learn control policy is less than that for hand-coding a comparable policy When learned policy can be executed more efficiently than a hand-coded one

Problems with RL on Robots Huge number of states to explore, with large number of possible actions in each state. E.g., 24 sonar sensors, quantized into 3 range bands 282 billion possible states If possible actions in each state or go forwards or backwards > 560 billion state-action combinations to try Robot is physical, thus it takes time to perform an action 1 second per action 20,000 years to try each combination During early learning, robot s actions may be dangerous Let s try rolling down the stairwell to see what next state I end up in One possible safeguard: give robot reflexes to stop dangerous actions

One Possible Framework for RL on Robots 3 components: Provided system, with initial control policy and immediate reward function Policy learning process Value learning process Policy learner + value learner Q(s,a) Initial control policy next state action Reward Environment

Fundamental Issues Quick learning: learn in real-time Generalization: take what is known and hypothesize about unknown/unvisited states Shifting functions: adapt over time

Other Examples of Reinforcmenet Learning: Model-Based Reinforcement Learning QuickTime and a Sorenson Video decompressor are needed to see this picture. QuickTime and a Sorenson Video decompressor are needed to see this picture.

Another Related Example: Imitation Learning QuickTime and a Sorenson Video decompressor are needed to see this picture.

Two New Topics: Objectives To understand neural network learning applied to robotics To understand genetic algorithms applied to robotics

First: Quick Background in Neural Nets Some of earliest work in neural networks (or connectionist systems) was McCulloch-Pitts model of neurons (1943) McCulloch-Pitts: Simple linear threshold unit Synaptic weights associated with each synaptic input If threshold exceeded, neuron fired, carrying output to next neuron Later, Rosenblatt (1958) introduced Perceptron: Input vector x 1 x 2 x 3.. w 1 w 2 w 3.. Neuron Σ θ Output.. x n w n Synaptic weights

Neural Net Quick Background (con t.) 1960s 1970s: neural net research in decline due to Minsky and Papert book (1969) that proved limitations of single-layer perceptron networks 1980s: resurgence due to multi-layer neural networks and use of backpropogation as a means for training these systems Much work in connectionism since, with significant progress Keep in mind: neural nets are only abstract computational models of biological neurons

Methods for Encoding Behavior-Based Robotic Control in Neural Networks Hebbian Learning Perceptron Learning Classical Conditioning Adaptive Heuristic Critic Learning

1. Hebbian Learning Hebb (1949) developed one of earliest training algorithms for neural networks Hebbian learning: increases synaptic strength along neural pathways associated with a stimulus and a correct response Specifically: w ij (t+1) = w ij (t) + η*o i o j Donald Hebb 1904-1985 where: w ij (t) and w ij (t+1) are synaptic weights connecting neurons i and j before and after updating η is the learning rate coefficient o i and o j are the outputs of neurons i and j, respectively

2. Perceptron Learning has been used for Robotic Learning Overall training procedure: Repeat: 1. Present an example from a set of positive and negative learning experiences. 2. Verify the output of the network as to whether it is correct or incorrect. 3. If it is incorrect, supply the correct output at the output unit. 4. Adjust the synaptic weights of the perceptrons in a manner that reduces the error between the observed output and correct output. Until satisfactory performance as manifested by convergence is achieved or some other stopping condition is met.

How to Update Synaptic Weights? Delta rule: used for perceptrons without hidden layers Modify synaptic weights according to the formula: (w ij ) = η*w ij *(t j -o j ) where (w ij ) is the synaptic adjustment applied to the connection between neurons i and j η is the learning rate coefficient t j and o j are the correct and incorrect outputs, respectively The Delta rule strives to minimize the error term using a gradient descent approach

Gradient Descent Approach Gradient Descent: Refers to learning methods that seek to minimize an objective function by which system performance is measured At each point in time, the policy is to choose the next step that yields the minimal objective function value. The learning rate parameters refer to the step size taken at each point in time Each step is computed only on the basis of local information, which is extremely efficient, but introduces possibility of traps in local minima Gradient Descent Hill-Climbing: Analogous process whereby objective function is maximized Hill climbing

Another Method for Updating Weights: Back-Propagation Back-propagation: most commonly used method for updating synaptic weights Employs generalized version of delta rule for use in multilayer perceptron networks (which is commonly used in robotic control and vision) Usually, synaptic weights are initialized to random values Weights are adjusted by following update rule as training instances are provided: w ij (t+1) = w ij (t) + ηδ j o i where: δ j = o j (1-o j )(t j -o j ) for an output node, and δ j = o j (1-o j )Σ k δ k w jk for a hidden layer node t j and o j are the correct and incorrect outputs, respectively The errors are propagated backward from the output layer.

3. Classical Conditioning Studied by Pavlov (1927), assumes that unconditioned stimuli (US) automatically generates an unconditioned response (UR) Pavlov determined: US-UR pair is defined genetically and is appropriate to ensure survival in the agent s environment In Pavlov s studies: Sight of food (US) results in dog s salivation (UR) Associations can be developed between a conditioned stimulus (CS), which has no intrinsic survival value, and the UR Further studies: Bell rings repeatedly with sight of food Leads to bell ringing alone generating salivation NOTE: Hebbian learning can also produce classical conditioning Ivan Pavlov

Classical Conditioning for Self-Organization of Behavior-Based Robotic System Instead of hard-wiring relationships between stimuli and responses, learning architecture permits associations to develop over time SENSORS Aversive Unconditioned Stimulus Collision Detectors Range Finder Target Detector Conditioned Stimulus Inhibition Unconditioned Response Robot Motors Appetitive Unconditioned Stimulus

Robot Example of Classical Conditioning Studies of Vershure, et al (1992) for collision avoidance and target acquisition Divide positive unconditioned stimulus (US) into 4 discrete areas in which attractive (appetitive) target may appear: ahead behind left right Unconditioned response (UR) set consists of 6 possible commands: advance reverse turn right 9 degrees turn right 1 degree turn left 9 degrees turn left 1 degree Two additional collision sensors serve as negative US, producing a response consisting of a reverse and turning 9 degrees away from direction of collision Negative US can inhibit positive US, to ensure that managing collisions is first priority

Robot example (con t.) CSs: use range sensor that produces distance profile of 180 degrees in direction robot is heading Readings divided into varying, discrete levels of resolution: Forward (ranging from 30 to +30): 20 units covering 3 degrees each Area to right (+30 to +60): 5 units covering 6 degrees each Area to far right (+60 to +90): 3 units covering 10 degrees each Left side: analogous to right Total range readings: 36 units

Robot example (con t.) For neural network implementation, perceptron-like linear threshold units with binary activation values are used Synaptic weights updated according to following rule: (w ij ) = (1/N) x (ηo i o j εοw ij ) where η is the learning rate ε is the decay rate N is the number of units in the CS field o i, o j is the binary output value of units i and j, respectively, and o is the average activity of the US field

Robot example (con t.) Robot s task: learn useful behaviors by associating perceptual stimuli with environmental feedback Behaviors include: avoidance, where robot learns not to bump into things approach, to desired target Note that robot has no a priori understanding of how to use range data to prevent collisions from occurring in the US set; this must be learned from the CS Simulation studies: indicate that successful behavior does occur in a manner consistent with agent s goals

Other Examples of Robot Learning Using Classical Conditioning Others have shown similar technique useful for learning: Sorting colored blocks that conduct electricity (either strongly or weakly) based on feedback from an aversive or appetitive response Teaching a Braitenberg-like robot, consisting of a neural net of 5 neurons, to seek light Teaching robot to develop a topological map using a neural network, by learning suitable responses at different locations in the world Robot learning of encodings similar to potential fields for representing learned landmark positions

Example of RL Learning (+Neural Nets) (Movies) QuickTime and a Cinepak decompressor are needed to see this picture. Khepera robot QuickTime and a Cinepak decompressor are needed to see this picture.

4. Connectionist Adaptive Heuristic Critic Learning (AHC) With connectionist adaptive heuristic critic (AHC) reinforcement learning methods, a critic learns the utility or value of particular states through reinforcement Learned values are then used locally to reinforce the selection of particular actions for a given state Gachet, et al (1994) use this approach as basis for learning relative strengths (i.e., gain matrix G) of each behavior s response within an active assemblage Specific goals of this study: Learn how to effectively coordinate a robot equipped with the following behaviors: goal attraction two perimeter-following behaviors (left and right) free-space attraction avoiding objects following a path Output for each behavior is a vector that the robot sums before execution (a la schema methods we have studied)

Connectionist AHC Learning System Weight Matrix V Adaptive Critic Element Reinforcement Signal Local Reinforcement Signals from Critic Sensor Data Classification System Input Layer Weight Matrix W Adaptive Search Element Output Behavioral Gains

AHC network (con t) AHC Network starts with classification system that maps incoming sonar data onto a set of situations (either 32 or 64, depending on the task) that reflect the sensed environment Output layer containing a weight matrix W, called the associative search element (ASE) computes individual behavioral gains for each behavior: W ki (t+1) = W ki (t) + αb i (t)e ki (t) where α is learning rate and e ki (t) is eligibility of weight W ki for reinforcement Adaptive critic element (ACE) determines reinforcement signal to be applied to the ASE ACE s weights V are updated independently: V ki (t+1) = V ki (t) + βb i (t)x k (t) where β is positive constant and X k is eligibility for the ACE This partitioning of reinforcement updating rules from action element updating is a characteristic of AHC methods in general

AHC Example (con t.) Task for robot: learn set of gain multipliers (G) for a particular task-environment Three different missions: learning to explore environment safely learning how to move back and forth between alternating goal points safely follow a predetermined path without collisions During exploration: Robot moves randomly When collision occurs, negative reinforcement applied and robot moved back to a position it occupied N steps earlier (N = 30 for simulation, 10 for physical robot) For goal-oriented missions, negative reinforcement occurs when collision is imminent, and when robot is pointing away from goal (or path) and no obstacles are blocking its way to the goal Experimentations showed learning approach successful for these tasks

Summary of neural network learning Neural networks, a form of reinforcement learning, use specialized, multi-node architectures. Learning in neural nets occurs through the adjustment of synaptic weights by an error minimization procedure, such as: Hebb s rule Back Propagation Classical conditioning in which a conditioned stimulus is eventually associated with an unconditioned response can be manifested in robotic systems