Exploiting Similarity to Optimize Recommendations from User Feedback

Similar documents
Search e Fall /18/15

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012

arxiv: v1 [stat.ap] 2 Feb 2016

Reinforcement Learning : Theory and Practice - Programming Assignment 1

EXPLORATION FLOW 4/18/10

Challenges in Developing Learning Algorithms to Personalize mhealth Treatments

BayesOpt: Extensions and applications

Two-sided Bandits and the Dating Market

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models

Inferring Cognitive Models from Data using Approximate Bayesian Computation

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

K Armed Bandit. Goal. Introduction. Methodology. Approach. and optimal action rest of the time. Fig 1 shows the pseudo code

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Data Mining in Bioinformatics Day 4: Text Mining

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Ch.20 Dynamic Cue Combination in Distributional Population Code Networks. Ka Yeon Kim Biopsychology

CS 4365: Artificial Intelligence Recap. Vibhav Gogate

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Data Analysis Using Regression and Multilevel/Hierarchical Models

ST440/550: Applied Bayesian Statistics. (10) Frequentist Properties of Bayesian Methods

Using Heuristic Models to Understand Human and Optimal Decision-Making on Bandit Problems

Human and Optimal Exploration and Exploitation in Bandit Problems

Bayesian integration in sensorimotor learning

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

Introduction to Computational Neuroscience

Introduction to Computational Neuroscience

Shu Kong. Department of Computer Science, UC Irvine

Reach and grasp by people with tetraplegia using a neurally controlled robotic arm

MS&E 226: Small Data

SLA learning from past failures, a Multi-Armed Bandit approach

Bayesian Tolerance Intervals for Sparse Data Margin Assessment

For general queries, contact

BayesRandomForest: An R

Using Bayesian Networks to Analyze Expression Data Λ

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Risk Mediation in Association Rules:

Data Availability and Function Extrapolation

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

Bayesian Nonparametric Methods for Precision Medicine

RoBO: A Flexible and Robust Bayesian Optimization Framework in Python

Bayes Linear Statistics. Theory and Methods

Emotional Evaluation of Bandit Problems

Individual Differences in Attention During Category Learning

Bayesian Models for Combining Data Across Domains and Domain Types in Predictive fmri Data Analysis (Thesis Proposal)

Automatic Medical Coding of Patient Records via Weighted Ridge Regression

Between-word regressions as part of rational reading

Reward-Modulated Hebbian Learning of Decision Making

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

The weak side of informal social control Paper prepared for Conference Game Theory and Society. ETH Zürich, July 27-30, 2011

Statistical Audit. Summary. Conceptual and. framework. MICHAELA SAISANA and ANDREA SALTELLI European Commission Joint Research Centre (Ispra, Italy)

Toward Comparison-based Adaptive Operator Selection

Sensory Cue Integration

Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task

Bayesian (Belief) Network Models,

Bayesian Joint Modelling of Benefit and Risk in Drug Development

Introduction to Survival Analysis Procedures (Chapter)

Exploring Experiential Learning: Simulations and Experiential Exercises, Volume 5, 1978 THE USE OF PROGRAM BAYAUD IN THE TEACHING OF AUDIT SAMPLING

10-1 MMSE Estimation S. Lall, Stanford

Modeling Nonresponse Bias Likelihood and Response Propensity

Neurons and neural networks II. Hopfield network

Artificial Intelligence Programming Probability

Gender Based Emotion Recognition using Speech Signals: A Review

Is Motion Planning Overrated? Jeannette Bohg - Interactive Perception and Robot Learning Lab - Stanford

Learning from data when all models are wrong

Sawtooth Software. A Parameter Recovery Experiment for Two Methods of MaxDiff with Many Items RESEARCH PAPER SERIES

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch.

A Bayesian Approach to Tackling Hard Computational Challenges

Georgetown University ECON-616, Fall Macroeconometrics. URL: Office Hours: by appointment

Shu Kong. Department of Computer Science, UC Irvine

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

Blood Glucose Monitoring System. Copyright 2016 Ascensia Diabetes Care Holdings AG diabetes.ascensia.com

Using historical data for Bayesian sample size determination

Policy Gradients. CS : Deep Reinforcement Learning Sergey Levine

Statistical Analysis Using Machine Learning Approach for Multiple Imputation of Missing Data

Rational Learning and Information Sampling: On the Naivety Assumption in Sampling Explanations of Judgment Biases

A Hierarchical Adaptive Approach to the Optimal Design of Experiments

Remarks on Bayesian Control Charts

Feedback-Controlled Parallel Point Process Filter for Estimation of Goal-Directed Movements From Neural Signals

Nonlinear, Nongaussian Ensemble Data Assimilation with Rank Regression and a Rank Histogram Filter

The Simulacrum. What is it, how is it created, how does it work? Michael Eden on behalf of Sally Vernon & Cong Chen NAACCR 21 st June 2017

The Outlier Approach How To Triumph In Your Career As A Nonconformist

Sample size calculation a quick guide. Ronán Conroy

Information-theoretic stimulus design for neurophysiology & psychophysics

Hybrid HMM and HCRF model for sequence classification

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data

Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting

You must answer question 1.

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Introduction. Chapter 1

Institutional Ranking. VHA Study

Introduction to Bayesian Analysis 1

Using AUC and Accuracy in Evaluating Learning Algorithms

Empirical game theory of pedestrian interaction for autonomous vehicles

Analysis of acgh data: statistical models and computational challenges

Dopamine enables dynamic regulation of exploration

Outline. What s inside this paper? My expectation. Software Defect Prediction. Traditional Method. What s inside this paper?

Evidence-Based Filters for Signal Detection: Application to Evoked Brain Responses

Transcription:

1

Exploiting Similarity to Optimize Recommendations from User Feedback Hasta Vanchinathan Andreas Krause (Learning and Adaptive Systems Group, D-INF,ETHZ ) Collaborators: Isidor Nikolic (Microsoft, Zurich), Fabio De Bona (Google, Zurich) 2

A Recommendation Example 3

A Recommendation Example 4

A Recommendation Example 5

A Recommendation Example 6

A Recommendation Example 7

A Recommendation Example 8

A Recommendation Example 9

A Recommendation Example 10

Many real world instances Disclaimer: All trademarks belong to respective owners

Many real world instances Disclaimer: All trademarks belong to respective owners

Many real world instances Disclaimer: All trademarks belong to respective owners

Many real world instances Disclaimer: All trademarks belong to respective owners

Many real world instances Disclaimer: All trademarks belong to respective owners

Many real world instances Disclaimer: All trademarks belong to respective owners

Many real world instances Disclaimer: All trademarks belong to respective owners

Many real world instances Disclaimer: All trademarks belong to respective owners

Common Thread 19

Common Thread To do well, we need a model. e.g., 20

Common Thread To do well, we need a model. e.g., Popular techniques include Content-based filtering Collaborative filtering Hybrid recommendation systems 21

Common Thread To do well, we need a model. e.g., Popular techniques include Content-based filtering Collaborative filtering Hybrid recommendation systems All aim to predict reward given a fixed data set 22

Challenges 23

Challenges Many, dynamic! 24

Challenges Many, dynamic! Preferences change 25

Challenges Many, dynamic! Estimating all combinations both hard and wasteful! Preferences change 26

Challenges Many, dynamic! Estimating all combinations both hard and wasteful! Preferences change Only need identify high reward items! 27

Challenges Many, dynamic! Estimating all combinations both hard and wasteful! Preferences change Only need identify high reward items! 28

Multi Arm Bandits 29

Multi Arm Bandits 30

Multi Arm Bandits Early approaches require k << T 31

Multi Arm Bandits Early approaches require k << T Can get strong guarantees for a finite set of actions Gittins indices - Greedy, UCB1 (Auer et al 01) #of arms increases -> performance degrades 32

Multi Arm Bandits Early approaches require k << T Can get strong guarantees for a finite set of actions Gittins indices - Greedy, UCB1 (Auer et al 01) #of arms increases -> performance degrades For dynamic web scale recommendations, k >> T 33

Learning meets bandits f(x) x 34

Learning meets bandits Exploit similarity information to predict rewards for new items f(x) x 35

Learning meets bandits Exploit similarity information to predict rewards for new items Must make assumptions on reward function, e.g.: f(x) x 36

Learning meets bandits Exploit similarity information to predict rewards for new items Must make assumptions on reward function, e.g.: Linear (linucb - Li et al 10) f(x) x 37

Learning meets bandits Exploit similarity information to predict rewards for new items Must make assumptions on reward function, e.g.: Linear (linucb - Li et al 10) Lipschitz (Bubeck et al 08) f(x) x 38

Learning meets bandits Exploit similarity information to predict rewards for new items Must make assumptions on reward function, e.g.: Linear (linucb - Li et al 10) Lipschitz (Bubeck et al 08) Low RKHS norm (GP-UCB - Srinivas et al 12) f(x) x 39

Learning meets bandits Exploit similarity information to predict rewards for new items Must make assumptions on reward function, e.g.: Linear (linucb - Li et al 10) Lipschitz (Bubeck et al 08) Low RKHS norm (GP-UCB - Srinivas et al 12) This is the approach we pursue in this work! f(x) x 40

Problem Setup 41

Problem Setup 42

Problem Setup = user attributes 43

Problem Setup = user attributes 44

Problem Setup = user attributes 45

Problem Setup = user attributes 46

Problem Setup = user attributes 47

Problem Setup = user attributes 48

Problem Setup = user attributes 49

Problem Setup = user attributes 50

Problem Setup = user attributes We want to maximize: 51

Problem Setup = user attributes Equivalently, minimize 52

Problem Setup = user attributes Equivalently, minimize 53

Our Approach 54

Our Approach We propose CGPRank, that uses a bayesian model for the rewards 55

Our Approach We propose CGPRank, that uses a bayesian model for the rewards CGPRank efficiently shares reward across 56

Our Approach We propose CGPRank, that uses a bayesian model for the rewards CGPRank efficiently shares reward across Items 57

Our Approach We propose CGPRank, that uses a bayesian model for the rewards CGPRank efficiently shares reward across Items Users 58

Our Approach We propose CGPRank, that uses a bayesian model for the rewards CGPRank efficiently shares reward across Items Users positions 59

Demux ing Feedback 60

Demux ing Feedback We still need to predict: 61

Demux ing Feedback We still need to predict: Assume: items do not influence reward of other items 62

Demux ing Feedback We still need to predict: Assume: items do not influence reward of other items 63

Demux ing Feedback We still need to predict: Assume: items do not influence reward of other items 64

Demux ing Feedback We still need to predict: Assume: items do not influence reward of other items relevance! 65

Demux ing Feedback We still need to predict: Assume: items do not influence reward of other items relevance! Position CTR! 66

CGPRank Sharing across positions 67

CGPRank Sharing across positions 68

CGPRank Sharing across positions 0.3 0.17 0.16 0.08 69

CGPRank Sharing across positions 0.3 0.17 0.16 0.08 70

CGPRank Sharing across positions 0.3 0.3 0.17?? 0.16?? 0.08 0.08 71

CGPRank Sharing across positions 0.3 0.3 0.17?? 0.16?? 0.08 0.08 72

CGPRank Sharing across positions 0.3 0.3 1 0.17?? 0.8 - Position weights - independent of items! - estimated from logs 0.16?? 0.65 0.08 0.08 0.47 73

CGPRank Sharing across positions 0.3 0.3 1 0.17 0.19 0.8 - Position weights - independent of items! - estimated from logs 0.16 0.13 0.65 0.08 0.08 0.47 74

CGPRank Sharing across items/users 75

CGPRank Sharing across items/users 76

CGPRank Sharing across items/users 77

CGPRank Sharing across items/users 78

CGPRank Sharing across items/users 79

CGPRank Sharing across items/users 80

CGPRank Sharing across items/users 81

CGPRank Sharing across items/users 82

CGPRank Sharing across items/users 83

CGPRank Sharing across items/users 84

CGPRank Sharing across items/users 85

CGPRank Sharing across items/users 86

CGPRank Sharing across items/users 87

Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f) f(x) reward x choice 88

Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f) f(x) reward x choice 89

Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f) f(x) reward x choice 90

Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f) f(x) reward likely x choice 91

Sharing across items / users with Bayesian models for functions Gaussian processes Prior P(f) unlikely f(x) reward likely x choice 92

Sharing across items / users with Gaussian processes Bayesian models for functions Likelihood P(data f) Prior P(f) unlikely f(x) f(x) reward likely + + + + x choice 93

Sharing across items / users with Bayesian models for functions Prior P(f) Gaussian processes unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely + + + + x choice x 94

Sharing across items / users with Bayesian models for functions Prior P(f) Gaussian processes unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely + + + + x choice x 95

Sharing across items / users with Bayesian models for functions Prior P(f) Gaussian processes unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely likely + + + + x choice x 96

Sharing across items / users with Bayesian models for functions Prior P(f) Gaussian processes unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely likely + + + + x choice x 97

Sharing across items / users with Bayesian models for functions Prior P(f) Gaussian processes unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely likely + + + + x choice x unlikely 98

Sharing across items / users with Bayesian models for functions Prior P(f) Gaussian processes unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely likely + + + + x choice x unlikely Closed form Bayesian posterior inference possible! 99

Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f) unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely likely + + + + x choice Closed form Bayesian posterior inference possible! Allows to represent uncertainty in prediction x unlikely 100

Predictive confidence in GPs f(x) Typically, only care about marginals, i.e., x 101

Predictive confidence in GPs f(x) Typically, only care about marginals, i.e., x x 102

Predictive confidence in GPs f(x) f(x ) x x Typically, only care about marginals, i.e., P(f(x )) 103

Predictive confidence in GPs f(x) f(x ) x x Typically, only care about marginals, i.e., P(f(x )) Parameterized by covariance function K(x,x ) = Cov(f(x),f(x )) 104

Predictive confidence in GPs f(x) f(x ) x x Typically, only care about marginals, i.e., P(f(x )) Parameterized by covariance function K(x,x ) = Cov(f(x),f(x )) Can capture many rec. tasks using appropriate cov. function 105

Intuition: Explore-Exploit using GPs Selection Rule: 118

Intuition: Explore-Exploit using GPs Selection Rule: 119

CGPRank Selection Rule 120

CGPRank Selection Rule At t=0, if no prior observations 121

CGPRank Selection Rule At t=0, with some prior observation 122

CGPRank Selection Rule Uncertainty shrinks not just at observation. 123

CGPRank Selection Rule but also at other locations based on similarity! 124

CGPRank Selection Rule If list size is 2 125

CGPRank Selection Rule The first item,, is selected according to 126

CGPRank Selection Rule 127

CGPRank Selection Rule Secret sauce? 128

CGPRank Selection Rule Time varying tradeoff parameter 129

CGPRank Selection Rule Hallucinate mean and shrink uncertainties 130

CGPRank Selection Rule Hallucinate mean and shrink uncertainties 131

CGPRank Selection Rule Now update model and again pick using: 132

CGPRank Selection Rule Now update model and again pick using: 133

CGPRank 134

CGPRank 135

CGPRank 136

CGPRank 137

CGPRank 138

CGPRank 139

CGPRank 140

CGPRank 141

CGPRank 142

CGPRank 143

CGPRank 144

CGPRank 145

CGPRank 146

Theorem 1 If we choose CGPRank - guarantees, then running CGPRank for T rounds, we incur a regret sublinear in T. Specifically, Grows strongly sublinearly for typical kernels 147

Experiments - Datasets 153

Experiments - Datasets Google book store logs 42 days of user logs Given key book, suggest list of related books Kernel computed from related graph on books 154

Experiments - Datasets Google book store logs 42 days of user logs Given key book, suggest list of related books Kernel computed from related graph on books Yahoo! Webscope R6B* 10 days of user log on Yahoo! Frontpage Unbiased method to test bandit algorithms 45 million user interations with 271 articles Feedback available for single selection, we simulated list selection 155

Experiments - Questions How much does principled sharing of feedback help? Across items/context? Across positions? Can CGPRank outperform an existing, tuned recommendation system? 156

Sharing across items 157

Sharing across contexts 158

Effect of increasing list size 159

Boost over existing approach Existing Algorithm 160

Conclusions CGPRank - Efficient Algorithm with strong theoretical guarantees Can generalize from sparse feedback across Items Contexts Positions Experiments suggest Statistical and computational efficiency 161