1
Exploiting Similarity to Optimize Recommendations from User Feedback Hasta Vanchinathan Andreas Krause (Learning and Adaptive Systems Group, D-INF,ETHZ ) Collaborators: Isidor Nikolic (Microsoft, Zurich), Fabio De Bona (Google, Zurich) 2
A Recommendation Example 3
A Recommendation Example 4
A Recommendation Example 5
A Recommendation Example 6
A Recommendation Example 7
A Recommendation Example 8
A Recommendation Example 9
A Recommendation Example 10
Many real world instances Disclaimer: All trademarks belong to respective owners
Many real world instances Disclaimer: All trademarks belong to respective owners
Many real world instances Disclaimer: All trademarks belong to respective owners
Many real world instances Disclaimer: All trademarks belong to respective owners
Many real world instances Disclaimer: All trademarks belong to respective owners
Many real world instances Disclaimer: All trademarks belong to respective owners
Many real world instances Disclaimer: All trademarks belong to respective owners
Many real world instances Disclaimer: All trademarks belong to respective owners
Common Thread 19
Common Thread To do well, we need a model. e.g., 20
Common Thread To do well, we need a model. e.g., Popular techniques include Content-based filtering Collaborative filtering Hybrid recommendation systems 21
Common Thread To do well, we need a model. e.g., Popular techniques include Content-based filtering Collaborative filtering Hybrid recommendation systems All aim to predict reward given a fixed data set 22
Challenges 23
Challenges Many, dynamic! 24
Challenges Many, dynamic! Preferences change 25
Challenges Many, dynamic! Estimating all combinations both hard and wasteful! Preferences change 26
Challenges Many, dynamic! Estimating all combinations both hard and wasteful! Preferences change Only need identify high reward items! 27
Challenges Many, dynamic! Estimating all combinations both hard and wasteful! Preferences change Only need identify high reward items! 28
Multi Arm Bandits 29
Multi Arm Bandits 30
Multi Arm Bandits Early approaches require k << T 31
Multi Arm Bandits Early approaches require k << T Can get strong guarantees for a finite set of actions Gittins indices - Greedy, UCB1 (Auer et al 01) #of arms increases -> performance degrades 32
Multi Arm Bandits Early approaches require k << T Can get strong guarantees for a finite set of actions Gittins indices - Greedy, UCB1 (Auer et al 01) #of arms increases -> performance degrades For dynamic web scale recommendations, k >> T 33
Learning meets bandits f(x) x 34
Learning meets bandits Exploit similarity information to predict rewards for new items f(x) x 35
Learning meets bandits Exploit similarity information to predict rewards for new items Must make assumptions on reward function, e.g.: f(x) x 36
Learning meets bandits Exploit similarity information to predict rewards for new items Must make assumptions on reward function, e.g.: Linear (linucb - Li et al 10) f(x) x 37
Learning meets bandits Exploit similarity information to predict rewards for new items Must make assumptions on reward function, e.g.: Linear (linucb - Li et al 10) Lipschitz (Bubeck et al 08) f(x) x 38
Learning meets bandits Exploit similarity information to predict rewards for new items Must make assumptions on reward function, e.g.: Linear (linucb - Li et al 10) Lipschitz (Bubeck et al 08) Low RKHS norm (GP-UCB - Srinivas et al 12) f(x) x 39
Learning meets bandits Exploit similarity information to predict rewards for new items Must make assumptions on reward function, e.g.: Linear (linucb - Li et al 10) Lipschitz (Bubeck et al 08) Low RKHS norm (GP-UCB - Srinivas et al 12) This is the approach we pursue in this work! f(x) x 40
Problem Setup 41
Problem Setup 42
Problem Setup = user attributes 43
Problem Setup = user attributes 44
Problem Setup = user attributes 45
Problem Setup = user attributes 46
Problem Setup = user attributes 47
Problem Setup = user attributes 48
Problem Setup = user attributes 49
Problem Setup = user attributes 50
Problem Setup = user attributes We want to maximize: 51
Problem Setup = user attributes Equivalently, minimize 52
Problem Setup = user attributes Equivalently, minimize 53
Our Approach 54
Our Approach We propose CGPRank, that uses a bayesian model for the rewards 55
Our Approach We propose CGPRank, that uses a bayesian model for the rewards CGPRank efficiently shares reward across 56
Our Approach We propose CGPRank, that uses a bayesian model for the rewards CGPRank efficiently shares reward across Items 57
Our Approach We propose CGPRank, that uses a bayesian model for the rewards CGPRank efficiently shares reward across Items Users 58
Our Approach We propose CGPRank, that uses a bayesian model for the rewards CGPRank efficiently shares reward across Items Users positions 59
Demux ing Feedback 60
Demux ing Feedback We still need to predict: 61
Demux ing Feedback We still need to predict: Assume: items do not influence reward of other items 62
Demux ing Feedback We still need to predict: Assume: items do not influence reward of other items 63
Demux ing Feedback We still need to predict: Assume: items do not influence reward of other items 64
Demux ing Feedback We still need to predict: Assume: items do not influence reward of other items relevance! 65
Demux ing Feedback We still need to predict: Assume: items do not influence reward of other items relevance! Position CTR! 66
CGPRank Sharing across positions 67
CGPRank Sharing across positions 68
CGPRank Sharing across positions 0.3 0.17 0.16 0.08 69
CGPRank Sharing across positions 0.3 0.17 0.16 0.08 70
CGPRank Sharing across positions 0.3 0.3 0.17?? 0.16?? 0.08 0.08 71
CGPRank Sharing across positions 0.3 0.3 0.17?? 0.16?? 0.08 0.08 72
CGPRank Sharing across positions 0.3 0.3 1 0.17?? 0.8 - Position weights - independent of items! - estimated from logs 0.16?? 0.65 0.08 0.08 0.47 73
CGPRank Sharing across positions 0.3 0.3 1 0.17 0.19 0.8 - Position weights - independent of items! - estimated from logs 0.16 0.13 0.65 0.08 0.08 0.47 74
CGPRank Sharing across items/users 75
CGPRank Sharing across items/users 76
CGPRank Sharing across items/users 77
CGPRank Sharing across items/users 78
CGPRank Sharing across items/users 79
CGPRank Sharing across items/users 80
CGPRank Sharing across items/users 81
CGPRank Sharing across items/users 82
CGPRank Sharing across items/users 83
CGPRank Sharing across items/users 84
CGPRank Sharing across items/users 85
CGPRank Sharing across items/users 86
CGPRank Sharing across items/users 87
Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f) f(x) reward x choice 88
Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f) f(x) reward x choice 89
Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f) f(x) reward x choice 90
Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f) f(x) reward likely x choice 91
Sharing across items / users with Bayesian models for functions Gaussian processes Prior P(f) unlikely f(x) reward likely x choice 92
Sharing across items / users with Gaussian processes Bayesian models for functions Likelihood P(data f) Prior P(f) unlikely f(x) f(x) reward likely + + + + x choice 93
Sharing across items / users with Bayesian models for functions Prior P(f) Gaussian processes unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely + + + + x choice x 94
Sharing across items / users with Bayesian models for functions Prior P(f) Gaussian processes unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely + + + + x choice x 95
Sharing across items / users with Bayesian models for functions Prior P(f) Gaussian processes unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely likely + + + + x choice x 96
Sharing across items / users with Bayesian models for functions Prior P(f) Gaussian processes unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely likely + + + + x choice x 97
Sharing across items / users with Bayesian models for functions Prior P(f) Gaussian processes unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely likely + + + + x choice x unlikely 98
Sharing across items / users with Bayesian models for functions Prior P(f) Gaussian processes unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely likely + + + + x choice x unlikely Closed form Bayesian posterior inference possible! 99
Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f) unlikely f(x) Likelihood P(data f) Posterior: P(f data) f(x) reward likely likely + + + + x choice Closed form Bayesian posterior inference possible! Allows to represent uncertainty in prediction x unlikely 100
Predictive confidence in GPs f(x) Typically, only care about marginals, i.e., x 101
Predictive confidence in GPs f(x) Typically, only care about marginals, i.e., x x 102
Predictive confidence in GPs f(x) f(x ) x x Typically, only care about marginals, i.e., P(f(x )) 103
Predictive confidence in GPs f(x) f(x ) x x Typically, only care about marginals, i.e., P(f(x )) Parameterized by covariance function K(x,x ) = Cov(f(x),f(x )) 104
Predictive confidence in GPs f(x) f(x ) x x Typically, only care about marginals, i.e., P(f(x )) Parameterized by covariance function K(x,x ) = Cov(f(x),f(x )) Can capture many rec. tasks using appropriate cov. function 105
Intuition: Explore-Exploit using GPs Selection Rule: 118
Intuition: Explore-Exploit using GPs Selection Rule: 119
CGPRank Selection Rule 120
CGPRank Selection Rule At t=0, if no prior observations 121
CGPRank Selection Rule At t=0, with some prior observation 122
CGPRank Selection Rule Uncertainty shrinks not just at observation. 123
CGPRank Selection Rule but also at other locations based on similarity! 124
CGPRank Selection Rule If list size is 2 125
CGPRank Selection Rule The first item,, is selected according to 126
CGPRank Selection Rule 127
CGPRank Selection Rule Secret sauce? 128
CGPRank Selection Rule Time varying tradeoff parameter 129
CGPRank Selection Rule Hallucinate mean and shrink uncertainties 130
CGPRank Selection Rule Hallucinate mean and shrink uncertainties 131
CGPRank Selection Rule Now update model and again pick using: 132
CGPRank Selection Rule Now update model and again pick using: 133
CGPRank 134
CGPRank 135
CGPRank 136
CGPRank 137
CGPRank 138
CGPRank 139
CGPRank 140
CGPRank 141
CGPRank 142
CGPRank 143
CGPRank 144
CGPRank 145
CGPRank 146
Theorem 1 If we choose CGPRank - guarantees, then running CGPRank for T rounds, we incur a regret sublinear in T. Specifically, Grows strongly sublinearly for typical kernels 147
Experiments - Datasets 153
Experiments - Datasets Google book store logs 42 days of user logs Given key book, suggest list of related books Kernel computed from related graph on books 154
Experiments - Datasets Google book store logs 42 days of user logs Given key book, suggest list of related books Kernel computed from related graph on books Yahoo! Webscope R6B* 10 days of user log on Yahoo! Frontpage Unbiased method to test bandit algorithms 45 million user interations with 271 articles Feedback available for single selection, we simulated list selection 155
Experiments - Questions How much does principled sharing of feedback help? Across items/context? Across positions? Can CGPRank outperform an existing, tuned recommendation system? 156
Sharing across items 157
Sharing across contexts 158
Effect of increasing list size 159
Boost over existing approach Existing Algorithm 160
Conclusions CGPRank - Efficient Algorithm with strong theoretical guarantees Can generalize from sparse feedback across Items Contexts Positions Experiments suggest Statistical and computational efficiency 161