Search e Fall /18/15

Similar documents
Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012

Exploiting Similarity to Optimize Recommendations from User Feedback

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Policy Gradients. CS : Deep Reinforcement Learning Sergey Levine

Bayesians methods in system identification: equivalences, differences, and misunderstandings

BayesOpt: Extensions and applications

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Challenges in Developing Learning Algorithms to Personalize mhealth Treatments

Policy Gradients. CS : Deep Reinforcement Learning Sergey Levine

Intelligent Systems. Discriminative Learning. Parts marked by * are optional. WS2013/2014 Carsten Rother, Dmitrij Schlesinger

Threshold Learning for Optimal Decision Making

Sensory Cue Integration

Learning from data when all models are wrong

My Review of John Barban s Venus Factor (2015 Update and Bonus)

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Towards Learning to Ignore Irrelevant State Variables

Expert System Profile

Learning to Identify Irrelevant State Variables

arxiv: v1 [stat.ap] 2 Feb 2016

Reach and grasp by people with tetraplegia using a neurally controlled robotic arm

Analysis of acgh data: statistical models and computational challenges

A Brief Introduction to Bayesian Statistics

RoBO: A Flexible and Robust Bayesian Optimization Framework in Python

5/20/2014. Leaving Andy Clark's safe shores: Scaling Predictive Processing to higher cognition. ANC sysmposium 19 May 2014

Seminar Thesis: Efficient Planning under Uncertainty with Macro-actions

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models

EECS 433 Statistical Pattern Recognition

Reinforcement Learning

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Information-theoretic stimulus design for neurophysiology & psychophysics

arxiv: v1 [cs.lg] 17 Dec 2018

Model-Based fmri Analysis. Will Alexander Dept. of Experimental Psychology Ghent University

Motivation: Fraud Detection

NONLINEAR REGRESSION I

Generative Adversarial Networks.

Ch.20 Dynamic Cue Combination in Distributional Population Code Networks. Ka Yeon Kim Biopsychology

Bayesian integration in sensorimotor learning

Bayesian Reinforcement Learning

Human Behavior Modeling with Maximum Entropy Inverse Optimal Control

Recognizing Ambiguity

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

4. Model evaluation & selection

MS&E 226: Small Data

Individualized Treatment Effects Using a Non-parametric Bayesian Approach

The optimism bias may support rational action

Thank you for signing up for the 2017 COLOR ME BELMAR Walk Run Dash for PossABILITIES

Lecture 13: Finding optimal treatment policies

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Sequential Decision Making

Accelerated Arm Recovery Workout and Training Plan

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

Intelligent Machines That Act Rationally. Hang Li Toutiao AI Lab

On Training of Deep Neural Network. Lornechen

Bayesian Inference. Thomas Nichols. With thanks Lee Harrison

Neurons and neural networks II. Hopfield network

Conjoint Audiogram Estimation via Gaussian Process Classification

Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

3. Model evaluation & selection

Locally-Biased Bayesian Optimization using Nonstationary Gaussian Processes

Why did the network make this prediction?

arxiv: v2 [stat.ap] 7 Dec 2016

THE ANALYTICS EDGE. Intelligence, Happiness, and Health x The Analytics Edge

Bayesian and Frequentist Approaches

A Cooking Assistance System for Patients with Alzheimers Disease Using Reinforcement Learning

Numerical Integration of Bivariate Gaussian Distribution

K Armed Bandit. Goal. Introduction. Methodology. Approach. and optimal action rest of the time. Fig 1 shows the pseudo code

A Bayesian Approach to Tackling Hard Computational Challenges

Make a Difference: UNDERSTANDING INFLUENCE, POWER, AND PERSUASION. Tobias Spanier Extension Educator Center for Community Vitality

Using Heuristic Models to Understand Human and Optimal Decision-Making on Bandit Problems

Cost-sensitive Dynamic Feature Selection

EXPLORATION FLOW 4/18/10

For general queries, contact

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials

A Scoring Policy for Simulated Soccer Agents Using Reinforcement Learning

CS4495 Computer Vision Introduction to Recognition. Aaron Bobick School of Interactive Computing

Introduction to Computational Neuroscience

SLAUGHTER PIG MARKETING MANAGEMENT: UTILIZATION OF HIGHLY BIASED HERD SPECIFIC DATA. Henrik Kure

Macroeconometric Analysis. Chapter 1. Introduction

A Diabetes minimal model for Oral Glucose Tolerance Tests

Progress in Risk Science and Causality

Bayesian Nonparametric Methods for Precision Medicine

Data Analysis Using Regression and Multilevel/Hierarchical Models

CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning

Bayesian Inference Bayes Laplace

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

NDT PERFORMANCE DEMONSTRATION USING SIMULATION

Psychology Research Methods Lab Session Week 10. Survey Design. Due at the Start of Lab: Lab Assignment 3. Rationale for Today s Lab Session

Online Journal for Weightlifters & Coaches

ST440/550: Applied Bayesian Statistics. (10) Frequentist Properties of Bayesian Methods

P. RICHARD HAHN. Research areas. Employment. Education. Research papers

SPSS Correlation/Regression

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

User Guide Seeing and Managing Patients with AASM SleepTM

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

UNLOCKING VALUE WITH DATA SCIENCE BAYES APPROACH: MAKING DATA WORK HARDER

Cindy Fan, Rick Huang, Maggie Liu, Ethan Zhang October 20, f: Design Check-in Tasks

Transcription:

Sample Efficient Policy Click to edit Master title style Search Click to edit Emma Master Brunskill subtitle style 15-889e Fall 2015 11

Sample Efficient RL Objectives Probably Approximately Correct Minimizing regret Bayes-optimal RL Focus today: Empirical performance 2

Last Time: Policy Search Using Gradients Gradient approaches only guaranteed to find a local optima Finite-difference methods scale with # of parameters needed to represent the policy, but don t require differentiable policy Likelihood ratio gradient approaches Require us to be able to compute gradient analytically Cost independent of # params Don t need to know dynamics model Benefit from using value function/baseline to reduce variance 3

Question from Last Class Figure from David Silver 4

How to Choose a Compatible Value Function Approximator? Ex. Policy parameterization: Compatibility requires that: Therefore a reasonable choice for value function is Example from Sutton, McAllester, SIngh & Mansour NIPS 2000 5

Today: Sample Efficient Policy Search Powerful function approximators to represent policy value May not be easy to take derivative Like last time, may benefit from exploiting structure (e.g. not completely blackbox optimization Figure from David Silver 6

Recall: Gaussian Process to Represent MDP Dynamics/Reward Models s = + s s Figure adjusted from Wilson et al. JMLR 2014 7

Today: Gaussian Process to Represent Value of Parameterized Policy V(ϴ) ϴ Figure adjusted from Wilson et al. JMLR 2014 8

Now Generalizing Policy Value (Rather than Model Dynamics/Rewards) V(ϴ) ϴ Figure adjusted from Wilson et al. JMLR 2014 9

Why Use GPs? Used frequently in Bayesian optimization Last ~5 years Bayesian optimization has become very influential & useful Brief (relevant) digression Two big motivations for Bayesian optimization 10

Motivation 1: ML Parameter Tuning ML methods getting more powerful and complex Sophisticated fitting methods Often involve tuning many parameters Hard for non experts to use Performance often substantially impacted by choice of parameters Material from Ryan Adams 11

Deep Learning Big investments by Google, Facebook, Microsoft, etc. Many choices: number of layers, weight regularization, layer size, which nonlinearity, batch size, learning rate schedule, stopping conditions Material from Ryan Adams 12

Classification of DNA Sequences Predict which DNA sequences will bind with which proteins, Miller et al. (2012) Choices: margin param, entropy param, converg. criterion Material from Ryan Adams 13

Motivation 2: Increasing Instances Which Blur Experimentation & Optimization 14

Website Design Ex. What Photo to Put Here? Material from Peter Frazier 15

6 Photos on Website: 51*50*49*48*47*46= 12,966,811,200 Options! Material from Peter Frazier 16

Design Choices Matter Material from Peter Frazier 17

Standard A/B Testing or Experimentation Doesn t Scale Material from Peter Frazier 18

Standard A/B Testing or Experimentation Doesn t Scale Material from Peter Frazier 19

Standard A/B Testing or Experimentation Doesn t Scale Material from Peter Frazier 20

Don t Want to Have Worse Revenue on 1.25 million Users Material from Peter Frazier 21

Why Use GPs? Used frequently in Bayesian optimization Last ~5 years Bayesian optimization has become very influential & useful Brief (relevant) digression Two big motivations for Bayesian optimization ML algorithm parameter tuning Large online experimental settings where care about performance (e.g. revenue) while testing 22

Bayesian Optimization Build a probabilistic model for the objective. Include hierarchical structure about units, etc. Compute the posterior predictive distribution. Integrate out all the possible true functions. Gaussian process regression popular Optimize a cheap proxy function instead. The model is much cheaper than that true objective Two key ideas Use model to guide how to search space. Model is an approximation, but when sampling a point in the real world is more costly than computation, very useful Use proxy function to guide how to balance exploration and exploitation 23

Historical Background of Bayesian Optimization Closely related to statistical ideas of optimal design of experiments, dating back to Kirstine Smith in 1918. As response surface methods, date back to Box and Wilson in 1951 As Bayesian optimization, studied first by Kushner in 1964 and then Mockus in 1978. Methodologically, it touches on several important machine learning areas: active learning, contextual bandits, Bayesian nonparametrics Started receiving serious attention in ML in 2007, Interest exploded when it was realized that Bayesian optimization provides an excellent tool for finding good ML hyperparameters. Material from Ryan Adams 24

Bayesian Optimization for Policy Search V(ϴ) ϴ Material from Ryan Adams 25

Bayesian Optimization for Policy Search Dark points are the policies we ve already tried in the world V(ϴ) Blue line is estimated mean of function (value of policy) Grey areas represent uncertainty over function value ϴ Figure modified from Ryan Adams 26

Exercise: Where Would You Sample Next (What Policy Would You Evaluate) & Why? What Algorithm Would You Use for Sampling? Dark points are the policies we ve already tried in the world V(ϴ) Blue line is estimated mean of function (value of policy) Grey areas represent uncertainty over function value ϴ Figure modified from Ryan Adams 27

Probability of Improvement Exercise: Where Would You Sample Next (What Policy Would You Evaluate) & Why? What Algorithm Would You Use for Sampling? Dark points are the policies we ve already tried in the world V(ϴ) Blue line is estimated mean of function (value of policy) Grey areas represent uncertainty over function value ϴ Figure modified from Ryan Adams 28

Expected Exercise:Improvement Where Would You Sample Next (What Policy Would You Evaluate) & Why? What Algorithm Would You Use for Sampling? Dark points are the policies we ve already tried in the world V(ϴ) Blue line is estimated mean of function (value of policy) Grey areas represent uncertainty over function value ϴ Figure modified from Ryan Adams 29

Upper/Lower Confidence Bound Exercise: Where Would You Sample Next (What Policy Would You Evaluate) & Why? What Algorithm Would You Use for Sampling? Dark points are the policies we ve already tried in the world V(ϴ) Blue line is estimated mean of function (value of policy) Grey areas represent uncertainty over function value ϴ Figure modified from Ryan Adams 30

Probability of Improvement Utility function relative to f, best point so far V(ϴ) Acquisition function 31 Figure modified from Ryan Adams, Adams Equations from http://www.cse.wustl.edu/~garnett/cse515t/files/lecture_notes/12.pdf

Expected Improvement Utility function relative to f, best point so far Acquisition function V(ϴ) 32 Figure modified from Ryan Adams, Equations from http://www.cse.wustl.edu/~garnett/cse515t/files/lecture_notes/12.pdf

Upper/Lower Confidence Bound Acquisition function V(ϴ) 33 Figure modified from Ryan Adams, Equations from http://www.cse.wustl.edu/~garnett/cse515t/files/lecture_notes/12.pdf

Acquisition Function Probability of improvement Expected Improvement Upper confidence bound Other ideas? What are the limitations of these? 34

Policy Search as Black Box Bayesian Optimization Bayesian Optimization with a Gaussian Process Create new training point [f(θi),r] π = f(θi) Execute policy π in environment for T steps, get total reward R 35

Policy Search as Bayesian Optimization Figure modified from Calandra, Seyfarth, Peters & Deissenroth 2015 36

Gait Optimization GP Policy search Reduced samples needed to find a fast walk by about 3x Lizotte et al. IJCAI 2007 37

More Gait Optimization Gait parameters: 4 threshold values of the FSM (two for each leg) & 4 control signals applied during extension and flexion (separately for knees and hips). Figures modified from Calandra, Seyfarth, Peters & Deissenroth 2015 38

videos 39

Why is this Suboptimal? Bayesian Optimization with a Gaussian Process Create new training point [f(θi),r] π = f(θi) Execute policy π in environment for T steps, get total reward R 40

Throwing Away All Information But Policy Parameter & Total Reward from Trajectory. Also Ignores Structure of Relationship Between Policies Figure from David Silver 41

Covariance function: Key choice for GP When should 2 policies be considered close? Figures from Ryan Adams 42

Behavior Based Kernel (Wilson et al. JMLR 2014) KL divergence Probability of trajectory under policy ϴi 43

Behavior Based Kernel (Wilson et al. JMLR 2014) KL divergence Prob. trajectory under policy ϴi 44

Behavior Based Kernel (BBK) (Wilson et al. JMLR 2014) KL divergence Prob. trajectory under policy ϴi Do we need to know the dynamics model? 45

Throwing Away All Information But Policy Parameter & Total Reward from Trajectory. Also Ignores Structure of Relationship Between Policies Figure from David Silver 46

Model-Based Bayesian Optimization Algorithm (MBOA) (Wilson et al. JMLR 2014) Bayesian Optimization with a Gaussian Process π = f(θi) Use data to build a model. Use to accelerate learning Create new training point [f(θi),r] Execute policy π in environment for T steps, get total reward R 47

Figure from WIlson, Fern & Tadepalli JMLR 2014 48

Figure from WIlson, Fern & Tadepalli JMLR 2014 49

Figure from WIlson, Fern & Tadepalli JMLR 2014 50

Figure from WIlson, Fern & Tadepalli JMLR 2014 51

New Kernel vs Model Based Information? Using model information often greatly improves performance if it s a good model If it s not good, learns to ignore New kernel to relate policies (BBK) much less of an impact 52

Other Ways to Go Beyond Black-Box Bayesian Optimization Current work in my group (Rika, Christoph, Dexter Lee, Joe Runde) Bayesian Optimization with a Gaussian Process Create new training point [f(θi),r] π = f(θi) Execute policy π in environment for T steps, get total reward R 53

Subtleties of Bayesian Optimization Still have to choose policy class (this determines the input space) Have to choose kernel function Have to choose hyperparameters of kernel function (can optimize these) Have to choose acquisition function 54

Impact of Acquisition Function & Hyperparameters Manually fixed hyperparameters led to sub-optimal solutions for all the acquisition functions. Figures modified from Calandra, Seyfarth, Peters & Deissenroth 2015 55

Summary: Bayesian Optimization for Efficient Policy Search Benefits Direct policy search Finds global optima Uses sophisticated function class (GP) to model input policy param & output policy value Use smart (but typically myopic) acquisition function to balance exploration/exploitation in searching policy space Can be very sample efficient Not a silver bullet Still have to decide on policy space Choose kernel function (though squared expl often works well) Should optimize hyperparameters 56

Stuff to Know: Bayesian Optimization for Efficient Policy Search Properties (global optima, no gradient information used) Define and know benefits/drawbacks of different acquisition functions Understand how to take a policy search problem and formulate as a black box Bayesian optimization problem Be able to list some things have to do in practice (optimize hyperparameters, etc) 57