Group recommender systems: exploring underlying information of the user space

2013 BRICS Congress on 1st Computational BRICS Countries Intelligence Congress & on 11th Computational Brazilian Congress Intelligence on Computational Intelligence Group recommender systems: exploring underlying information of the user space Pedro Rougemont pedro.rougemont @ufrj.br Filipe Braida do Carmo filipebraida@cos.ufrj. br Marden Braga Pasinato marden@cos.ufrj.br Carlos Eduardo Mello UFRRJ carlos.mello@ufrrj.br Geraldo Zimbrão zimbrao@cos.ufrj.br Abstract This work proposes a new methodology for the Group Recommendation problem. In this approach we choose the Most Representative User (MRU) as the group medoid in a user space projection, and then generate the recommendation list based on his preferences. We evaluate our proposal by using the well-known dataset Movielens. We have taken two different measures so as to evaluate the group recommender strategies. The obtained results seem promising and our strategy has shown an empirical robustness compared with the baselines in the literature. Keywords group recommender systems; space transformation; singular value decomposition; social choice theory I. INTRODUCTION Recommender Systems arise as a powerful tool on the Web, aiming primarily at e-commerce applications. It has been attracting attention from both the academia and industry due to its capability to leverage cross-sales and to improve customer satisfaction and loyalty [1]. The social aspect of the recommendation has been disregarded by the mainstream algorithms which rely heavily on Machine Learning (ML) and Datamining (DM) techniques [2]. In response to that, Social Recommendation and Group Recommendation emerged as new areas of investigation in the academia. Social Recommendation tries to incorporate the social network of users in the single-user recommendation [3]. The goal of Group Recommendation, in the other hand, is to recommend items for groups of users instead of a single individual [4]. Some activities have an inherent plural context associated to them such as watching a movie, travelling, choosing a restaurant and many others. These kinds of activities bring up the challenge of conciliating different personal preferences into a common choice. Therefore, traditional recommender algorithms cannot address this type of recommendation due to its individual properties. Suppose that a group of people are in a room listening to the same songs. In order to make this experience more pleasing, one could propose a playlist that best suits all users simultaneously. Suppose also that the preferences of each user are known beforehand in the format of ratings. In this scenario, one question emerges: is it possible to choose a playlist that meets all the users preferences? Group Recommender systems try to respond this question by decomposing the problem into four major steps: Preference Elicitation, Preference Aggregation, Recommendation and Negotiation. The first step, Preference Elicitation, consists in acquiring the several individual users ratings towards their consumed items, e.g., songs, movies, places. This can be performed explicitly, when users manually give ratings by assigning reviews to items, or one could implicitly infer the user ratings according to the time he spent in the webpage, how many times he listened to the song, and so forward. The second step, Preference Aggregation, performs the transformation from individual preferences to group preferences. It estimates a Group Utility Function (GUF) that measures the item s utility value to the group which will be used as input in the Recommendation step. This can be done in several ways as described in [5]. The Recommendation step basically takes the GUF output for every item and decides which items should be included in the limited recommendation list. The TOP-K criterion is the simplest and most used implementation for this step [6]. The last step, Negotiation, is optional for most Group Recommender Systems. After presenting the recommended items to the group, users may disagree about the result and interact to each other in order to achieve a common solution. This step gives them manners to interact, haggle and decide about the acceptance or rejection of some or all items. More extensive research over this theme can be found in the literature of Decision Support Systems (DSS) [7]. This work proposes a new methodology to tackle the Preference Aggregation step based on the Most Representative User (MRU). All users are mapped into a low-dimension feature space then the user closest to all others (medoid of the user cluster) is computed and elected as the MRU. The MRU preferences are used as the group preference in order to compute the recommendation. This paper is organized in 5 sections. In the second section we describe the related works and the baseline strategies. In the third section, we explain in details our proposal and we outline its advantages. In the fourth section, we describe our experiments and we present the results in the fifth section. At last, we present our conclusions and outline future works. II. RELATED WORK A. Individual Recommender Systems Collaborative Filtering (CF) algorithms have emerged as an effective way of implementing traditional, i.e., individual Recommender Systems, in contrast with 978-1-4799-3194-1/13 $31.00 2013 IEEE DOI 10.1109/BRICS-CCI.&.CBIC.2013.89 10.1109/BRICS-CCI-CBIC.2013.95 540

Content-based algorithms. The CF algorithms rely solely on the similarities between users to compute the recommendation, disregarding the contextual information about the user and the item. They can be classified into two major categories: memory-based or model-based. The first can also be called heuristic-based since it computes the ratings predictions using some heuristic. The later tries to create a model that simulates the user behavior in giving ratings. Matrix Factorization (MF) and Machine Learning (ML) algorithms have been extensively explored to this task, using the known ratings as training set [2]. The ratings are stored in a matrix called user-item matrix, where lines represents users and columns represents items. Thus, the element r ij is the rating assigned by user i to item j according to the example in TABLE 1. In many applications, it is common to have a large number of users and items, but a small number of evaluations. Therefore, the user-item matrix is usually very sparse, i.e., with many unfilled positions. In spite of that, CF methods have shown a good performance in predicting those unfilled positions [2]. TABLE 1 WAITING ROOM EXAMPLE: GROUP PREFERENCES i 1 i 2 i 3 i 4 i 5 i 6 i 7 i 8 u 1 2 5 4 2 5 5 5 4 u 2 3 3 1 2 3 5 2 5 u 3 2 2 1 4 3 2 5 1 u 4 3 4 1 3 2 2 5 4 The Singular Value Decomposition (SVD) is a particular MF technique that excelled among the others CF algorithms. Besides the good prediction results, SVDbased algorithms offer a way to map users and items into a customized feature space. The number of dimensions in this feature space can be arbitrarily low and each dimension is a latent variable that can assume several interpretations. This can be very useful in the task of clustering users and items, because it provides a straightforward way to compute the distance between them which the distance between the points in the lowdimension feature space [8]. B. Group Recommender Systems Group Recommender System (GRS) has its foundations on Social Choice Theory (SCT) [9] in contrast with traditional Recommender System (RS), since group behavior towards a recommendation have social implications that cannot be taken into account by traditional RS, due to its individualistic framework. SCT doesn t address the whole set of matters discussed in GRS, but provides valuable results and approaches to support the GRS research. In Masthoff first studies [10], an experiment was conducted exposing volunteers to a group of three fictitious users which had their preferences defined for ten movieclips. The participants were asked which movieclips should be recommended to this group, with the constraint that only a subset of them would be watched - starting from one to seven. Her conclusions were that the human decision follows some common patterns or strategies to solve this task, being the three most frequent ones: Average: The group preference is computed, for each item, by taking the simple mean of users preferences. Afterwards, the RS sorts the set of items in descendant order and recommends the TOP-K items. For example, considering TABLE 1, the first item in the group recommendation list would be i 7, and i 8 would appear on top of i 6. Least Misery: The group preference is computed, for each item, by taking the highest of the lowest scores. The motivation to this criterion is to penalize items that leave one user or more in a situation of misery. In our example, i 4 would precede i 8 even though the second one has a better average, since i 4 lowest score is higher than i 8 s. Average without Misery: Same as Average, except that items with a score lower than certain threshold are disregarded. For example, suppose in TABLE 1, the threshold is equals 1, so i 8 wouldn t be eligible for recommendation. Besides these three strategies, there are many other preference aggregation strategies suggested in literature, such as: Borda Count, Most Pleasure, Spearman Footrule, and many others found in [10]. The user preferences aggregation is not the only approach to this step, there are other documented alternatives such as the ones describe in [5]: To create a user profile representing the group, for which a recommendation will be made; To aggregate individual recommendations C. Measuring Recommendation Quality After recommending the items to the group, one needs to measure the recommendation performance, i.e., if the recommended item correspond to the group s preference. There are many measures that can be applied to this task. However, they are often based in comparing their ability to minimize a mutual distance between all lists of users preferences. Several of these measures and distance definitions are explained in [11]. In this section, we describe two of them: the Kendall Tau and the ndcg. The Kendall Tau coefficient () computes a measure of correlation between two lists. This indicates how distant the lists are to each other in terms of position shifts. A high score given by the Kendall Tau coefficient means that it is required few position shifts in order to match the two lists. To find the optimal solution in a set of lists, known as Kemeny optimal aggregation, one would have to maximize simultaneously for each distinct user preferences (l) with the recommendation list (r) under a set of items (I), as follows: where: = (n c n d ) / (½ n (n-1)) (1) n c = (i, j) in I: i < j, l i < l j and r i < r j n d = (i, j) in I: i < j, l i < l j but r i > r j Computing the Kemeny optimal is proven to be NPhard [5] and several heuristics try to obtain a good approximation of it. A cheaper alternative is to minimize Spearman Footrule distance or one of its variations. In 541

[12], this approach is chosen as a strategy for Group Recommendation. However, this classical measure lacks some properties pertinent to a good recommender. It is incapable of distinguishing ordering disagreement between the top and the bottom of the recommendation list, punishing both with the same emphasis. It is clear that the main interest of the GRS is to guarantee good matches between individual preferences and the first K items in the recommendation list, since the remaining will be discarded. A metric capable of addressing this issue is the Normalized Cumulative Discounted Gain (ndcg) which uses a scoring function that decays with the position of the item in the recommendation list [13]. Many works in GRS makes use of ndcg to measure the quality of strategies [12] [6]. Besides GRS, this it is widely used in Search Engines research. In this paper, both Kendall Tau and ndcg were considered. III. PROPOSED GROUP RECOMMENDATION The main assumption of the proposed group recommendation strategy lies in searching for patterns that could represent the group as a whole. The idea consists in creating a sort of stereotype for the entire group: the Most Representative User (MRU) or the medoid user. Then one can recommend items to this user that represents the complete group. In order to find the MRU, or the medoid user, it is plausible to assume that this user should be the most similar user to all others, in the sense that, his/her preferences should be the most general for the entire group. Thus, one should look for the user who is, according to some distance measure, in the densest local region. In order to implement this proposal, the user-item matrix A, containing the ratings, is decomposed by the SVD factorization from which one obtains the matrices U, S, and V T. The matrix U represents the users in a features space of latent variables [14]. This linear algebra operation allows us to build a new space of features where the user projection can be analyzed. A NxM = U NN S NM V T MM = Σσ i U i V i (4) In the equation (4), matrix A is factorized by the SVD algorithm into a product of two orthogonal matrices U and V T with a diagonal matrix S. The matrix U carries information about the rows in A, and therefore the most significant columns of U, namely U f, works as a latent user space [15]. Over U f, it is possible to calculate the user that minimizes the Euclidian distance (or some other distance) over all others and use his preferences as the recommendation list. To illustrate a scenario where this Preference Aggregation technique could improve the recommender performance, consider the Kendall Tau coefficient. The performance of the proposed approach will be compared to Average strategy in the toy example on TABLE 2. The reader may note that all columns means are equal to 3.5, thus the Average strategy will behave like the Random strategy. An example of a recommendation list given by this strategy could be the list l avg = {i 1, i 2, i 3, i 4, i 5, i 6, i 7, i 8 }. The MRU computed for this example will be user u 2, whose preferences are represented by l mru = {i 5, i 8, i 6, i 1, i 2, i 4, i 7, i 3 }. Thus, the average Kendall Tau coefficient for each strategies are (l avg ) ~ 0.1607 and (l mru ) ~ 0.3036, where equal to 1 would be the perfect hypothetical recommendation. This limit is unreachable given that users preferences are not the same. In fact, among the 40.320 possible permutations of items, the l mru is the Kemeny optimal, scoring the maximum value, besides four other solutions for this example. TABLE 2 SECOND SCENARIO USER PREFERENCES i 1 i 2 i 3 i 4 i 5 i 6 i 7 i 8 u 1 3 5 5 3 1 2 3 4 u 2 3 3 2 3 5 4 3 5 u 3 5 3 5 5 3 4 5 1 u 4 3 3 2 3 5 4 3 4 IV. METHODOLOGY PROTOCOL The experiments were conducted as close as possible to the methodology described by [12] and resumed in Fig. 1. We implemented only three of the five group strategies covered in [12], in addition to the MRU described in section III. It was assumed that items already consumed by any user in the group should not be eligible for recommendation. The input data consists of a sparse user-item matrix, the public Movielens dataset containing 100.000 ratings. The missing values in this matrix are fulfilled with the ratings predictions given by a traditional RS, the Regulared SVD [14]. The work proposed in [12] applies the TOP-K step to all recommendation lists before computing the ndcg. Due to the lack of ratings in common between the truncated recommendations and the test set, the execution of this step may lead to improperly high results of ndcg and insufficient data for computing the Kendall Tau coefficient. In our experiments, the TOP-K step was performed separately by intersecting each user test data. Input U = {u 1, u 2,, u n } user set I = {i 1, i 2,, i m } item set R n m sparse preferences matrix Preprocess Split R into test set Q and training set P Fill P using Collaborative Filtering Split the user base in K groups U k of same size For each U kq Separate P k and Q k, containing only ratings from u U k For each aggregation strategy S t, generate L t = S t (U k ), the list of items recommended by S t to U k Generate the ideal list L I from Q k for each u U k Compute ndcg and Kendall Tau of each L t against L I separately for each u U k Figure 1. Experiment pseudocode. 542

An individual Recommender System dataset was chosen, because there are few datasets available on the web for the purpose of testing Group Recommender Systems (GRS). As seen in works such as [5], [16] or [17], many authors choose to build their own system in order to test new theories and those datasets are not made available afterwards. Another problem faced by researches is that most available datasets for Group Recommendation are too domain-specific and cannot be reused in other experiments. V. RESULTS The experiments performed for this article were conducted considering the following variations in the training data: Group inner similarity o Random groups o High inner similarity groups o ¼ Outliers groups o Low inner similarity groups Group size o 4 members o 8 members o 20 members (random groups only) Item set o All items o High variance only (75 percentile) There were innumerous variations that could be taken into account, but after some experiments we selected the most promising attributes to evaluate. The group recommender strategies tested were Average, Most Representative User, Least Misery and Random. The measures ndcg and Kendall Tau coefficient were compared in terms of shape using the quartiles, mean and standard deviation of its values along several iterations of a same scenario. For random groups scenarios, Fig. 2 and TABLE 3 reveal that, in the long run, both Average and Most Representative User strategies break even in all aspects of ndcg: a mean of approximately 0.87; the same standard deviation and quartiles. Least Misery seems unable to approach them, with mean around 0.7 and Random strategy keeps the mean around 0.5. The shapes and proportions are maintained for group sizes of 4, 8 and 20. The Random groups Kendall Tau coefficient shown in TABLE 4 demonstrates that the MRU strategy was capable of obtaining the best mean in all cases. The possible reasons for that will be further discussed later at the end of this section. Figure 2. Quartiles representation of ndcg for random groups of 8 users TABLE 3 RANDOM GROUPS NDCG STATISTICS Strategy 4users 8users 20 users Average 0.872 0.126 0.869 0.128 0.864 0.132 MRU 0.871 0.127 0.868 0.128 0.864 0.132 Least Misery 0.720 0.169 0.697 0.167 0.705 0.170 Random 0.508 0.287 0.502 0.289 0.493 0.296 TABLE 4 RANDOM GROUPS KENDALL TAU STATISTICS Strategy 4 users 8 users 20 users Average 0.122 0.13 0.108 0.145 0.088 0.189 MRU 0.161 0.138 0.161 0.151 0.155 0.199 Least Misery 0.144 0.135 0.139 0.144 0.136 0.152 Random 0.006 0.121 0.002 0.137-0.004 0.183 In the next experiment, the selection of group users was conducted according to the group inner similarity, so as to have groups of high and low similarity. This selection also considers groups composed of a 75% high similarity and a 25% of outlier minority. The group sizes were of four and eight users. Items were also divided in two sets, one considering the entire set of items and other composed only by items that have a great variance over the ratings, i.e., the most polemical items in the set. TABLE 5 SECOND EXPERIMENT SCENARIOS Scenario Inner Similarity Size Item Set 1 High Inner Similarity 4 All Items 2 High Inner Similarity 4 High Variance 3 High Inner Similarity 8 All Items 4 High Inner Similarity 8 High Variance 5 25% Outlier 4 All Items 6 25% Outlier 4 High Variance 7 25% Outlier 8 All Items 8 25% Outlier 8 High Variance 9 Low Inner Similarity 4 All Items 10 Low Inner Similarity 4 High Variance 11 Low Inner Similarity 8 All Items 12 Low Inner Similarity 8 High Variance 543

The results obtained for ndcg are summarized in TABLE 6 for High Inner Similarity, 25% Outlier and Low Inner Similarity group formations, respectively. The impact brought by items with high variance (even rows of TABLE 6) is notorious compared to the set of all items (the previous odd rows in the same table) on the strategies Average and MRU. However, it seems that Least Misery is more resilient in this aspect, where high variance items do not significantly change its performance. Both Average and MRU perform almost identically over this measure, surpassing the others with a higher mean and lower variance in almost all cases. TABLE 6 NDCG FOR GROUPS OF HIGH INNER SIMILARITY Scenario Average MRU Least Misery Random 1 0.88 0.14 0.87 0.14 0.74 0.18 0.53 0.26 2 0.74 0.21 0.74 0.21 0.73 0.20 0.49 0.27 3 0.87 0.14 0.87 0.14 0.74 0.18 0.51 0.25 4 0.73 0.21 0.73 0.21 0.74 0.20 0.49 0.26 5 0.87 0.14 0.87 0.14 0.74 0.19 0.52 0.25 6 0.74 0.22 0.74 0.22 0.73 0.20 0.51 0.27 7 0.86 0.16 0.86 0.16 0.74 0.19 0.52 0.26 8 0.73 0.22 0.72 0.22 0.75 0.20 0.51 0.27 9 0.86 0.17 0.86 0.17 0.75 0.20 0.57 0.27 10 0.78 0.24 0.78 0.24 0.72 0.23 0.53 0.28 11 0.86 0.17 0.85 0.17 0.74 0.21 0.55 0.27 12 0.76 0.24 0.76 0.24 0.72 0.24 0.55 0.30 The relation of results with Kendall Tau coefficient is presented in TABLE 7. We note here a constant advantage in MRU s mean compared to Average strategy under this measure. Both strategies carry the same variance and are overcome by Least Misery when subject to the High Variance item set. Least Misery also proves to be very stable in terms of variance in all scenarios, contrasting with the others, whose results degenerate under High Variance set. TABLE 7 KENDALL TAU FOR GROUPS OF HIGH INNER SIMILARITY Scenario Average MRU Least Misery Random 1 0,13 0,16 0,16 0,16 0,14 0,14 0 0,15 2 0,07 0,51 0,09 0,51 0,11 0,19 0,02 0,48 3 0,11 0,18 0,16 0,18 0,13 0,14 0 0,17 4 0,06 0,53 0,09 0,53 0,12 0,18 0 0,52 5 0,12 0,17 0,15 0,18 0,13 0,14 0 0,16 6 0,06 0,54 0,09 0,54 0,1 0,2 0,01 0,52 7 0,1 0,2 0,15 0,2 0,12 0,14 0 0,19 8 0,05 0,56 0,08 0,56 0,12 0,2-0,02 0,55 9 0,11 0,17 0,14 0,18 0,11 0,15-0,01 0,16 10 0,05 0,63 0,08 0,63 0,11 0,21-0,02 0,62 11 0,1 0,2 0,15 0,2 0,11 0,16-0,01 0,18 12 0,05 0,64 0,1 0,64 0,12 0,21-0,01 0,64 As previously mentioned in section II, the Kendall Tau coefficient and the ndcg are significantly different in their target measurements. With this in mind, the results presented in TABLE 6 and TABLE 7 demonstrates that the MRU strategy, compared to the baselines, is capable of delivering mean results as good as the Average strategy, but with reduced disagreement. Moreover, MRU shows improvement in the relevance of recommended items, compared to Least Misery, as ndcg points out. However, both MRU and the Average strategy present similar behavior in the set with High Variance items under the Kendall Tau coefficient. This can be explained by the nature of those heuristics, which could be classified as Consensus-based. In other words, both strategies are wired to find consensus among the users and they suffer when the set contains polemical items that cause disagreement. The Least Misery strategy is immune to this situation, since it belongs to another class of heuristic, Borderlinebased. Those heuristics establish a threshold value, according to the ratings, and choose the items that are further away from this value. Yet, there is one major problem with this kind of strategies, since users may learn how to manipulate the threshold in their favor. There are cases where even the Average strategy can be manipulated by users. In this sense, the MRU strategy has shown to be more robust than all the others considered in the experiments, but a formalized prove of this property is yet to be investigated. VI. CONCLUSION AND FUTURE WORK This work presents a new approach to the Group Recommendation problem. The Most Representative User (MRU) strategy consists in exploiting underlying information in the user space so that we can measure distance functions between users, e.g. Euclidian distance, Cosine distance, etc.. By doing so, it is possible to compute the user who has the least distance to all the others (medoid user) and use him as the stereotype of the group itself. Therefore, the group recommendations can be computed as the recommendations to the most representative user. This approach has shown good results compared with the baseline approaches in the literature. Beyond that, the MRU strategy has presented an empirical robustness to the problem of recommendation manipulation that was not verified in the other strategies. This justifies further investigation on the properties of this strategy. REFERENCES [1] J. B. Schafer, J. Konstan and J. Riedi, Recommender systems in e- commerce, in Proceedings of the 1st ACM conference on Electronic commerce, 1999, p. 158 166. [2] G. Adomavicius and A. Tuzhilin, Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, Knowledge and Data Engineering, IEEE Transactions on, vol. 17, n o 6, p. 734 749, 2005. [3] I. Guy and D. Carmel, Social recommender systems, in Proceedings of the 20th international conference companion on World wide web, 2011, p. 283 284. [4] G. Popescu and P. Pu, Group recommender systems as a voting problem, EPFL Technical report, 2010. 544

[5] A. Jameson, More than the sum of its members: challenges for group recommender systems, in Proceedings of the working conference on Advanced visual interfaces, 2004, p. 48 54. [6] S. Amer-Yahia, S. B. Roy, A. Chawlat, G. Das, e C. Yu, Group recommendation: Semantics and efficiency, Proceedings of the VLDB Endowment, vol. 2, n o 1, p. 754 765, 2009. [7] E. Bellucci and J. Zeleznikow, A comparative study of negotiation decision support systems, in System Sciences, 1998., Proceedings of the Thirty-First Hawaii International Conference on, 1998, vol. 1, p. 254 262. [8] Y. Koren, R. Bell and C. Volinsky, Matrix factorization techniques for recommender systems. [9] K. J. Arrow, Social Choice and Individual Values, Yale University, 1951. [10] J. Masthoff, Modeling a group of television viewers, in Proceedings of the Workshop Future tv, in Intelligent Tutoring Systems Conference, 2002, p. 34 42. [11] Rank Aggregation Methods for the Web. [Online]. Available at: http://www10.org/cdrom/papers/577/. [Last access: 17-abr-2013]. [12] L. Baltrunas, T. Makcinskas and F. Ricci, Group recommendations with rank aggregation and collaborative filtering, in Proceedings of the fourth ACM conference on Recommender systems, 2010, p. 119 126. [13] R. Kumar and S. Vassilvitskii, Generalized distances between rankings, WWW 10 Proceedings of the 19th international conference on World wide web, p. 571 580, 2010. [14] A. Paterek, Improving regularized singular value decomposition for collaborative filtering, in Proceedings of KDD cup and workshop, 2007, vol. 2007, p. 5 8. [15] D. J. Bartholomew, M. Knott and I. Moustaki, Latent Variable Models and Factor Analysis: A Unified Approach, 3 o ed. Wiley, 2011. [16] K. McCarthy, L. McGinty, B. Smythand and M. Salamo, Social interaction in the cats group recommender, in Workshop on the social navigation and community based adaptation technologies, 2006. [17] Z. Yu, X. Zhou, Y. Hao and J. Gu, TV Program Recommendation for Multiple Viewers Based on user Profile Merging, User Modeling and User-Adapted Interaction, vol. 16, n o 1, p. 63 82, jun. 2006. 545