Cocktail Preference Prediction

Size: px

Start display at page:

Download "Cocktail Preference Prediction"

Constance Edwards
6 years ago
Views:

1 Cocktail Preference Prediction Linus Meyer-Teruel, 1 Michael Parrott 1 1 Department of Computer Science, Stanford University, In this paper we approach the problem of rating prediction on data from a number of perspectives. First, we considered trying to predict personal user preferences using a multi-feature linear model. Second, we consider new recipe generation given constraints and preferences, and then predicting a personal rating for new cocktail mixes based on its features. Introduction On campus, we have a well known problem with students drinking excessive amounts of hard liquor, which we believe stems from the lack of knowledge and appreciation for alcohol in moderation. In response, we wanted to create a customized Cocktail Recommendation System to engender appreciation, that takes into account what ingredients you have available to use, your desired alcohol content, and your personal preferences, and recommends a cocktail recipe for you to make. Then, having tried this recipe, the user can rate the output, the system will learn from this feedback for future predictions. We separate the problem into three parts; dataset collection, user rating prediction, and custom recommendation based on rating prediction. 1

2 Data Collection - Datasets, datasets, datasets... The biggest issue that we faced at the beginning of our project was attaining reliable data. Unlike movie or product recommendations, there are few cocktail recipe and rating sites, and even less with cohesive data that included quantities, reliable names, and ratings. We originally wanted to approach the problem of customized prediction from scraped ratings on websites, but for any reviewers that we found, they tended to have less than two reviews, and typically only their favorite drinks, which was unusable. Additionally, the most reliable website that we found (1001cocktails.com) had web blockers implemented to prevent scraping. We decided to break the problem into two parts, personal ratings and generalized ratings. To tackle the personalized rating, we organized several sampling sessions, each of approximately 10 people, who each sampled and rated 15 different drinks. For generalized ratings, we spent several weeks collecting datasets from several different websites, both manually and using BeautifulSoup, and then combined and collated them into one central cocktail database. We began with three separate recipe databases, with a total of 9,800 recipes and about 600 ingredients. This database was full of repeat drinks with different names, one-off recipes, branded ingredients and custom instructions. We eliminated renamed drinks, drinks with less than 10 reviews, separated ingredients out from brands to alcohol types, homogenized the serving quantities, and found ABV values for all alcoholic drinks. We also eliminated drinks that contained ingredients that appeared less than four times, to ensure that we had some reliability in the ratings. Our final dataset consists of 1273 drinks, each with ingredients, quantities, ratings, preparation and serving instructions, and ABV. There are a total of 178 ingredients included in these recipes. An example is shown below. {mint julep : {glasst ype :..., ingredients : {ingredient : bourbon, quantity : 45mL...}, method :..., rating : {numratings : 21, rating = 6.1 }} 2

3 User Rating Prediction 1. Summary Our primary goal in this project was to identify ways to predict a users drink ratings based on their past drink ratings. To do so we compiled a data set of user drink ratings, and then split the data for each user into a training an test set. As each user had an average of 13 ratings, these sets were relatively small and ended up leaving and ingredients they had not tried yet as 0 s. To ensure comprehensive coverage in the ingredients, and the case of no prior knowledge for a new user, we began to consider the case of warm starting our predictor with a weights trained on the 1001cocktails database. 2. Baseline For our baseline we simply return the average rating based on all the drinks in the training set. Cross Validation Baseline K=5: MSE Initial Results All three of our predictors performed worse than the baseline and clearly over-fitted the data. We noticed that each of these features did not have nearly enough data to create accurate predictions. CV Single Ingredient Ingredient Features K=5: MSE % worse CV W/ ABV Ingredient Features K=5 MSE: % worse CV Pairs Ingredient Features K=5 MSE: % worse 4. Results 3

4 By giving the weights a warm start we managed to prevent the over-training that occurred previously. CV Single Ingredient Ingredient Features K=5: MSE % improvement CV W/ ABV Ingredient Features K=5: MSE % improvement CV Pairs Ingredient Features K=5: MSE % improvement CV Triples and Pairs of Ingredient Features K=5: MSE % improvement Drink Rating Prediction for User Feature Learning 1. Summary We needed to use the average rating data set in order to give our model somewhere to start on. Here we try to predict the average rating of a new drink given only its ingredients. For our baseline we simply returns the average rating over all drinks in the current data set. 2. Baseline For our baseline we simply returned the average over all of the previously rated drinks. Using the database of cleaned user ratings, we found that 3. Results We noted that the improvements over the baseline were small in all cases. However, the weightings from these linear models were used to warm start the User Rating Predictions, which helped to improve their performance significantly more. Cross Validation Baseline K=5 MSE Cross Validation Ingredient Features K=5 MSE % improvement 4

5 Cross Validation Pair Features K=5 MSE $ improvement Cross Validation Triple Ingredient Features K=5 MSE $ improvement Custom Recipe Generation Weighted CSP for Recipe Generation Modeling For custom recipe generation, we decided to model the problem as a weighed CSP. The variables X i are a set of 5 ingredients, with a domain of {0, i all ingredients}. The constraints and factors were such that: 1. The user preference for pairs ingredients, on all pairs of variables, given by: weight w(pair, p) φ(x)), where w(pair,p) is the personalized weight of that ingredient pair. 2. A potential on the total quantity of alcohol in the assignment. We found typical serving sizes for each kind of alcohol, and from these calculate ABVs. 3. X 2, X 3, X 4 are constrained to be non-alcohols 4. X 0, X 1 are constrained to be alcohols The ingredient pair potentials were trained on our recipe dataset using our predictor. If the pair did not exist in the dataset, then a penalty is given to the weight of that recipe Algorithm Because of the large number of ingredients and ingredient combinations, we decided to use iterated conditional modes to both allow for ease of processing, and provide variety in output because of the possibility of finding local maxima as opposed to the absolute best possible 5

6 recipe. As given in class, the algorithm for ICM is as follows: Initialize x to a random assignment: Loop until assignment no longer changes: For all variables X i, Iterate through domain of X i Compute weight of X v Set X i to highest weight value Results We evaluated our output based on the ABV deviation from the desired amount, and used our linear predictor to estimate a rating for the produced drink. Our baseline consists of a random selection of ingredients that fit the constraints (two alcohols and three add-ins). The MSE of the baseline was , with an average rating across the drinks of 5.43, while our ICM MSE was with an average rating of Error Analysis 1. Over-fitting due to lack of user data At first our predictions from using linear models for prediction were worse than the baseline MSE of as linear regression tended to over-fit due to the sparsity of the feature vector. We therefore chose to extract general user preferences by looking at the average drink rating data set that we scraped from the 1001cocktails.com website. We trained a linear predictor with each set of features on those ratings, and then carried those weightings over to the linear regression. This helped to prevent the model from 6

7 over-fitting as it increased the chance that it would converge to a more representative local optimum. This led to the 2. Feature Selection We initially tried to choose our features just by looking at the user ratings data set. However, we found due to the sparsity of the features and the size of our data set, there was little we could do choose features there, as it would likely be the cause of snooping, and might not carry over to the more generalized data sets. To guide our feature selection, we ended up using the 1001cocktails average rating data set, as we believed that this would provide a better way to see which features appeared relevant in a larger data set. This let us identify that certain features such as ABV, total volume, and total alcohol content did not have linear correlations with the rating, as individuals can like both strong drinks like Martini s and sweeter drinks like Gin Rickey s. Although ingredients on their own once warm started did improve upon the baseline by 9.9 %, the best features we found were the ingredient pairings, which provided a 27% improvement over the baseline. This problem as a whole was made very difficult by the quality and availability of the data, as well as the apparent difference in between generalized ratings and true personal preferences. We were originally attempting to learn and predict ratings based on our collected data, but it was nearly impossible due to the large variability from person to person. Our predictor performance was terrible, and the weights developed gave no useful information for the drink outputs, and they seemed to be just random collections of drinks, although the ABV accuracy was high. Once we were able to scrape enough drink recipes together, the performance increased significantly because of the warm-start on the linear predictor, and these improved weights 7

8 led to significantly more cohesive new drink outputs, that follow drink recipe trends much more closely. From our results, we found that ingredient pairing features had the strongest influence on predicting a users ratings, and providing a good recommendation for a new drink. Ingredient pairing also led to the most seemingly cohesive new recipe generation. 1 Conclusion 8

CSE 258 Lecture 1.5. Web Mining and Recommender Systems. Supervised learning Regression

CSE 258 Lecture 1.5. Web Mining and Recommender Systems. Supervised learning Regression CSE 258 Lecture 1.5 Web Mining and Recommender Systems Supervised learning Regression What is supervised learning? Supervised learning is the process of trying to infer from labeled data the underlying