Collaborative Filtering with Multi-component Rating for Recommender Systems

Similar documents
Towards More Confident Recommendations: Improving Recommender Systems Using Filtering Approach Based on Rating Variance

Overcoming Accuracy-Diversity Tradeoff in Recommender Systems: A Variance-Based Approach

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

On the Combination of Collaborative and Item-based Filtering

E-MRS: Emotion-based Movie Recommender System. Ai Thanh Ho, Ilusca L. L. Menezes, and Yousra Tagmouti

Using Personality Information in Collaborative Filtering for New Users

Emotion Recognition using a Cauchy Naive Bayes Classifier

Technical Specifications

The QUOROM Statement: revised recommendations for improving the quality of reports of systematic reviews

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

The Long Tail of Recommender Systems and How to Leverage It

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Recommender Systems in Computer Science and Information Systems - a Landscape of Research

De-Biasing User Preference Ratings in Recommender Systems

Stability of Collaborative Filtering Recommendation Algorithms 1

Performance Indicator INFORMATION FLUENCY - CONNECT

Predicting Breast Cancer Survivability Rates

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

An Improved Algorithm To Predict Recurrence Of Breast Cancer

A Social Curiosity Inspired Recommendation Model to Improve Precision, Coverage and Diversity

Description of components in tailored testing

Gender Based Emotion Recognition using Speech Signals: A Review

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model

Exploiting Implicit Item Relationships for Recommender Systems

Data Mining in Bioinformatics Day 4: Text Mining

Counteracting Novelty Decay in First Story Detection

Positive and Unlabeled Relational Classification through Label Frequency Estimation

Positive and Unlabeled Relational Classification through Label Frequency Estimation

A Context-Aware Music Recommendation System Using Fuzzy Bayesian Networks with Utility Theory

Collaborating across Cultures: Cultural Metacognition and Affect- Based Trust in Creative Collaboration

Towards Effective Structure Learning for Large Bayesian Networks

Extending Rungie et al. s model of brand image stability to account for heterogeneity

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Correcting Popularity Bias by Enhancing Recommendation Neutrality

Scaling TOWES and Linking to IALS

Validity and reliability of measurements

RECALL OF PAIRED-ASSOCIATES AS A FUNCTION OF OVERT AND COVERT REHEARSAL PROCEDURES TECHNICAL REPORT NO. 114 PSYCHOLOGY SERIES

Social Structure and Personality Enhanced Group Recommendation

Building Evaluation Scales for NLP using Item Response Theory

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

Reliability of feedback fechanism based on root cause defect analysis - case study

PSYCHOLOGICAL STRESS EXPERIENCES

Probabilistically Estimating Backbones and Variable Bias: Experimental Overview

Adjustments for Rater Effects in

Advanced Bayesian Models for the Social Sciences

Psychometric properties of the PsychoSomatic Problems scale an examination using the Rasch model

Pushing the Right Buttons: Design Characteristics of Touch Screen Buttons

Chapter 1. Introduction

Missing data in clinical trials: making the best of what we haven t got.

Theme I: Introduction and Research Methods. Topic 1: Introduction. Topic 2: Research Methods

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions?

Score Tests of Normality in Bivariate Probit Models

Jitter-aware time-frequency resource allocation and packing algorithm

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Improved Intelligent Classification Technique Based On Support Vector Machines

Challenges of Fingerprint Biometrics for Forensics

Neuroinformatics. Ilmari Kurki, Urs Köster, Jukka Perkiö, (Shohei Shimizu) Interdisciplinary and interdepartmental

Advanced Bayesian Models for the Social Sciences. TA: Elizabeth Menninga (University of North Carolina, Chapel Hill)

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian

A Trade-off Between Number of Impressions and Number of Interaction Attempts

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Clustering mass spectrometry data using order statistics

Towards Collaborative Filtering Recommender Systems for Tailored Health Communications

The Predictive Power of Bias + Likeness Combining the True Mean Bias and the Bipartite User Similarity Experiment to Enhance Predictions of a Rating

Using Association Rule Mining to Discover Temporal Relations of Daily Activities

Suggest or Sway? Effects of Online Recommendations on Consumers Willingness to Pay

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016

EER Assurance Criteria & Assertions IAASB Main Agenda (June 2018)

January 2, Overview

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

ITEM-LEVEL TURST-BASED COLLABORATIVE FILTERING FOR RECOMMENDER SYSTEMS

Pythia WEB ENABLED TIMED INFLUENCE NET MODELING TOOL SAL. Lee W. Wagenhals Alexander H. Levis

Comparative Study of K-means, Gaussian Mixture Model, Fuzzy C-means algorithms for Brain Tumor Segmentation

Evaluation of Linguistic Labels Used in Applications

Bayesian Networks Representation. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Evaluating Quality in Creative Systems. Graeme Ritchie University of Aberdeen

Accuracy of Range Restriction Correction with Multiple Imputation in Small and Moderate Samples: A Simulation Study

Generating Artificial EEG Signals To Reduce BCI Calibration Time

of subjective evaluations.

Part II II Evaluating performance in recommender systems

Modeling Uncertainty Driven Curiosity for Social Recommendation

The Cognitive Systems Paradigm

Vital Responder: Real-time Health Monitoring of First- Responders

AN INFORMATION VISUALIZATION APPROACH TO CLASSIFICATION AND ASSESSMENT OF DIABETES RISK IN PRIMARY CARE

Improving Construction of Conditional Probability Tables for Ranked Nodes in Bayesian Networks Pekka Laitila, Kai Virtanen

A Brief Introduction to Bayesian Statistics

Ecological Statistics

Handling Partial Preferences in the Belief AHP Method: Application to Life Cycle Assessment

Alleviating the Sparsity in Collaborative Filtering using Crowdsourcing

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Validity and reliability of measurements

Artificial Intelligence For Homeopathic Remedy Selection

Framework for Comparative Research on Relational Information Displays

The Role of Emotions in Context-aware Recommendation

Transcription:

Collaborative Filtering with Multi-component Rating for Recommender Systems Nachiketa Sahoo Ramayya Krishnan George Duncan James P. Callan Heinz School Heinz School Heinz School Language Technology Institute Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University Abstract The dependency structure among the component ratings is discovered and incorporated into a mixture model. The parameters of the model were estimated using the Expectation Maximization algorithm. The algorithm was evaluated using data collected from Yahoo Movies. It was found that using multiple components leads to improved recommendations over using only one component when small amount of training data is used. However, when we have enough training data no gain from using additional components of the ratings was observed. Introduction Collaborative filtering based recommender systems are being used in an increasing number of online shopping sites and meeting places to help users better navigate the large choice space they face (Table ). They generate recommendations from the collective user experiences and preferences, yet the recommendations are personalized for each user hence, the name collaborative filtering [9]. Item type Commercial Non Commercial Music itunes, Last.fm, GenieLab.com irateradio.com, mystrands.com Movies Netflix.com, blockbuster.com movielens.umn.edu, filmaffinity.com Books StoryCode.com gnooks.com Dating reciprodate.com Webpages GiveALink.org, StumbleUpon.com Aggregated Amazon.com, half.ebay.com Table. Examples of Collaborative Filtering based recommender systems [3] Used by a retailer collaborative filtering is a useful tool to target-advertise items to its customers: items that a customer is likely to like, yet was unlikely to have discovered on his own because of the large number of items available. Thus, a merchant can induce demand for its less known items. For example, Netflix, a movie-rental company, uses collaborative filtering to create demand for older and less known movies by advertising them to users who might like those movies [8]. The goal of a collaborative filtering based recommender system is: given a user s ratings on a subset of items and his peers ratings on possibly different subsets of items, to predict among the items that the user has not yet rated which ones he would rate highly. The algorithm treats two users as similar if they rate same items similarly. Two items are considered similar if they have received similar ratings from users who have rated both. The idea is to make a recommendation from the items liked by a user s similar users. This can be thought of as automating the spread of information through word-of-mouth [0]. One way to do this is to learn a model that explains how the ratings are generated and then use this model to make predictions about the users. Notable among the model based collaborative filtering works is the Flexible Mixture Model (FMM) where two latent variables are used to

characterize the user behavior and item characteristics separately []. This leads to improved performance over other models, such as the Aspect Model, where a single latent variable is used to characterize the user and item distribution[6]. Often the model based algorithms are designed using Bayesian Networks. This framework allows us to systematically incorporate expert judgment and our intuitions in the model. As we shall show in Section 2, we use such a BayesNet to incorporate the discovered dependency structure among the rating components into our model of rating generation process. There is a large research literature on collaborative filtering from a number of perspectives []. But, most of the current literature discusses methods to use ratings with only one component the overall rating. However, when we have ratings with multiple components spanning more than one aspect of an item, we have more information about the user s preferences and an opportunity to generate more accurate recommendations. Recently there has been a growing interest in the research community in effective use of such multi-component ratings []. Yahoo Movies, for example, has started collecting ratings along multiple aspects of a movie, such as, story, acting, visuals, direction. Such developments and the limited amount of work that has been done in integrating multiple components of ratings to generate improved recommendations has been the motivation behind this work. In this work we have described a model based approach that uses information in multiple components of the ratings to generate improved recommendation. 2 Multi-component rating collaborative filtering Data This work has been facilitated by the availability of component rating data from Yahoo Movies web-site. Each record of the rating data consists of seven variables: item or movie id (I), user id (U), ratings on story (S), acting (A), visuals (V ), direction(d) and overall (O) quality of the movie. The ratings were converted to 0 4 integers (A 4, B 3, C 2, D, F 0). Only the ratings from users who have rated more than 20 movies were used in order to have enough training and test data for each user a practice commonly seen in the literature []. After this filtering there were 45892 records, 058 unique users and 3430 unique movies. The intuition We expect that by rating multiple aspects of an item users provide richer information about their preferences. Consider the following example, where two users have given similar low Overall ratings to a movie. User Movie story acting visuals direction overall u m 4 0 0 u 2 m 0 0 4 Table 2. An example of multi-component rating (all ratings take value in 0 4) The component ratings suggest that the user u might like other movies that have a story similar to that of m, while user u 2 might like a movie that has been directed by the same director or a director with similar style, as the one in m. Hence, if we can effectively use the information in the component ratings provided by the users, we should be able to make more accurate recommendations. The problem When a user likes a movie, he, in general, rates the components of the movie higher. Therefore, the components will be correlated (see Figure ). From the correlation matrix it seems that the components vary together and do not give much independent units of information. In fact, a principal component analysis on the correlation matrix shows that the first principal component explains 84.5% of the total variance. This phenomenon of observing a higher

than expected correlation between the components of the ratings is known as the Halo effect in the psychometric literature. One important reason for Halo is that the users rate components based on their overall impression [2]. S A V D O S.00 9 2 4 7 O A.00 3 3 V.00 9 8 D.00 0 O.00 S A V D Figure. Correlation among rating variables. The solution There are two intuitions outlined in last two paragraphs: Figure 2. The BayesNet encoding conditional independence. There is distinguishing information in the variation of components of ratings even when Overall ratings agree, 2. The Overall rating might be the cause of high correlation among components. This lead us to compute the partial correlation among the component ratings while controlling for the effect of Overall rating. We found that the average inter-component correlation among variables S, A, V, D reduces from 8 to 0.26. As all correlations are positive we should expect some reduction in correlation when computing partial correlations. But, the average partial correlation among the variables is the least when we control for the variable O among the five possible variables. The average partial correlations when we controlled for S,A,V,D were 0.47, 3, 0.35 and 0 respectively. This confirms our intuition that the Overall rating is the highest correlation inducing variable. A similar approach is taken by Holzbach[7]. He asserts that global impression rating is the cause of high correlation among the components and ameliorates this by partialing out the effect of global impression rating. The application The observation that controlling for the Overall rating leads to smallest partial correlation among the component variables leads to the BayesNet shown in Figure 2. It states that the component variables are independent conditional on Overall variable. Note that we do not make an independence assertion among the components. Instead we assert that the components are dependent with each other via only the Overall rating. Empirically we shall observe some residual dependence among the rating components (partial correlations are not all zero). This is a limitation of assuming that a single variable can explain the correlation between pairs of components. A model where we allow multiple nodes to explain dependency among pairs of variables would explain more of these dependencies, but, will be more complex. Embedding this BayesNet in the Flexible Mixture Model (FMM) we get the model shown in Figure 3(b). S S A A Zu Zi U Zu O Zi I U Zu O Zi I V V U R I D D Figure 3. (a) FMM with one rating component by Luo Si and Rong Jin[], (b) FMM with multiple component with dependency structure, (c) Naive FMM with multiple component without dependency structure.

In a BayesNet diagram a variable is independent of its non-descendants conditional on its parent(s). The original FMM asserts that rating variable R, user variable U, and item variable I are conditionally independent of each other given latent variables Z u and Z i. Latent variable Z u is used to characterize the distribution of U and the latent variable Z i is used to characterize the distribution of I. Each value of Z u specify a conditional distribution of U, each value of Z i specify a conditional distribution of I and each combination of Z u, Z i specify a conditional distribution of R. When hidden variables are marginalized away this leads to a mixture model specification of distribution of observed variables. We embed the five rating components in the FMM BayesNet (Figure 3(b)), while making use of the observed conditional independence among them (Figure 2). This modified model additionally asserts that the component ratings are conditionally independent of each other given the latent variables Z u, Z i and the Overall rating. In Figure 3(c) we present the naive model which assumes the components to offer independent units of information. Learning and prediction The conditional independence assumptions allow us to factorize the joint distribution over all variables as a product of conditional probability tables(cpt). To estimate these CPTs we need to follow some iterative approximation algorithm such as Expectation Maximization [4] because we have two latent variables. After estimating the CPTs to make a prediction about the Overall rating (O) for a user-item pair (U, I), we marginalize away all other variables from the joint distribution and get the joint distribution of these three. Then we can find conditional distribution over O given values of U and I, and make a prediction using this conditional distribution. In the original FMM, mean of this distribution was used to make a prediction []. But, we believe that the mode is more appropriate since the ratings are treated as multinomials. 3 Results and discussion We use a randomly selected fraction of each user s ratings for training and the remaining for testing. Mean Absolute Error (MAE) of the predictions were computed to evaluate the suitability of the methods when the task is to predict the future rating accurately. The MAE was plotted against the fraction of data that was used for training. Error by amount of training data MAE 0.9.0 In (Fig 3 (c)) Only Overall (Fig 3 (a)), predict mode Dependent subrating, (Fig 3 (b)) Only overall, predict expectation (orginal FMM formulation, Fig 3(a)) 0.0 0.2 0.4 Training Fraction Figure 4. Plot of errors by fraction of data used for training.

For each point in the plot, 30 random train-test splits were made, keeping the ratio at the given value, and average was taken. The same train-test set was used for all the models. Hence, we got paired readings. We did a pairwise t-test and found that all differences between algorithms are significant. We can see from the Figure 4 that naively embedding component ratings as independent variables in FMM gives the highest error. This is not surprising since the component ratings are highly correlated. Treating them as independent variables leads to over-counting of evidence. Using the mode of the distribution to make a prediction leads to lower error than using the expectation (original FMM formulation). Comparing the model that uses only Overall rating with the model that uses five components with the dependence structure among them we find that when we use less data for training, using multiple components leads to lower error. But, when we use more data using only Overall component leads to lower error. These algorithms were also evaluated for their effectiveness in retrieving the items that the active user has rated highly (A or 4). This was done using a precision-recall curve [2], where, the fraction of the retrieved items that are rated 4 (precision) is plotted against fraction of the 4- rated items that are retrieved (Figure 5). - curve @ training fraction=0.05 - curve @ training fraction=0. - curve @ training fraction= 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 - curve @ training fraction=0.3 - curve @ training fraction= - curve @ training fraction= 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 Figure 5. Retrieval performance of algorithm using dependency structure among the components, using only overall rating and algorithm using components as if they were independent As we can see when the training fraction is low the difference between the the three algorithms is the most pronounced. The algorithm with dependency structure among components gives the highest precision at each recall level followed by the method using only the Overall component. The algorithm that assumes independence among the component ratings leads to lowest precision. As we use more and more training data the difference between these algorithms diminishes. The interesting point to note here is that although when using only Overall rating as we use more training data we get a lower MAE, it does not perform better in retrieving top-n items. As pointed out in [5] these metrics measure two different aspects of the performance of the algorithms and are often not correlated. One must use the appropriate evaluation metric to measure the suitability of an algorithm for the task at hand.

4 Conclusion We have presented an approach to integrate multiple components of rating into a collaborative filtering algorithm. We started by observing that the Overall rating, an indicator of rater s general impression, is the highest correlation inducing variable; controlling for Overall rating makes the remaining components most independent. This conditional independence was incorporated into a mixture model describing rating generation process. Parameters of the models were then estimated using the EM algorithm. Then the models were tested using different amounts of training data for two task scenarios: (i) predict the Overall rating on an unseen user-item pair (ii) retrieve the highest rated items by a user. We find that when we use little training data using multiple components helps us in making better prediction and retrieval. But, when we have sufficient training data using multiple components leads to lower prediction accuracy and little gain in retrieval precision. In future research this behavior should be investigated to better understand the settings in which using multiple components is helpful. In the current model after controlling for Overall rating the correlations are vastly reduced, but, are not zero. As we pointed out in Section 2, this could be a limitation of searching for single components to explain the dependencies among the components. In a future work combinations of variables can be explored to explain the dependencies among the components. Bibliography [] Gediminas Adomavicius and Alexander Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng, 7(6):734 749, 2005. [2] Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. Modern Information Retrieval. ACM Press / Addison-Wesley, 999. [3] Wikipedia contributors. Collaborative filtering. Wikipedia, The Free Encyclopedia, 2006. [Online; accessed 5-August-2006]. [4] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39: 38, 977. [5] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22():5 53, 2004. [6] Thomas Hofmann and Jan Puzicha. Latent class models for collaborative filtering. In Dean Thomas, editor, Proceedings of the 6th International Joint Conference on Artificial Intelligence (IJCAI-99-Vol2), pages 688 693, S.F., July 3 August 6 999. Morgan Kaufmann Publishers. [7] R. L. Holzbach. Rater bias in performance ratings: Superior, self, and peer ratings. Journal of Applied Psychology, 63:579 588, 978. [8] Inc. Netflix. Form 0-k annual report pursuant to section 3 or 5(d) of the securities exchange act of 934. UNITED STATES SECURITIES AND EXCHANGE COMMISSION, Washington, D.C. 20549, 2006. [9] Paul Resnick and Hal R. Varian. Recommender systems. Commun. ACM, 40(3):56 58, 997. [0] Upendra Shardanand and Pattie Maes. Social information filtering: Algorithms for automating ẅord of mouth. In CHI, pages 20 27, 995. [] Luo Si and Rong Jin. Flexible mixture model for collaborative filtering. In ICML, pages 704 7. AAAI Press, 2003. [2] F. L. Wells. A statistical study of literary merit. Archives of psychology,, 907.