Beyond Parity: Fairness Objectives for Collaborative Filtering

Size: px
Start display at page:

Download "Beyond Parity: Fairness Objectives for Collaborative Filtering"

Transcription

1 Beyond Parity: Fairness Objectives for Collaborative Filtering Sirui Yao Department of Computer Science Virginia Tech Blacksburg, VA Bert Huang Department of Computer Science Virginia Tech Blacksburg, VA Abstract We study fairness in collaborative-filtering recommender systems, which are sensitive to discrimination that exists in historical data. Biased data can lead collaborative-filtering methods to make unfair predictions for users from minority groups. We identify the insufficiency of existing fairness metrics and propose four new metrics that address different forms of unfairness. These fairness metrics can be optimized by adding fairness terms to the learning objective. Experiments on synthetic and real data show that our new metrics can better measure fairness than the baseline, and that the fairness objectives effectively help reduce unfairness. 1 Introduction This paper introduces new measures of unfairness in algorithmic recommendation and demonstrates how to optimize these metrics to reduce different forms of unfairness. Recommender systems study user behavior and make recommendations to support decision making. They have been widely applied in various fields to recommend items such as movies, products, jobs, and courses. However, since recommender systems make predictions based on observed data, they can easily inherit bias that may already exist. To address this issue, we first formalize the problem of unfairness in recommender systems and identify the insufficiency of demographic parity for this setting. We then propose four new unfairness metrics that address different forms of unfairness. We compare our fairness measures with non-parity on biased, synthetic training data and prove that our metrics can better measure unfairness. To improve model fairness, we provide five fairness objectives that can be optimized, each adding unfairness penalties as regularizers. Experimenting on real and synthetic data, we demonstrate that each fairness metric can be optimized without much degradation in prediction accuracy, but that trade-offs exist among the different forms of unfairness. We focus on a frequently practiced approach for recommendation called collaborative filtering, which makes recommendations based on the ratings or behavior of other users in the system. The fundamental assumption behind collaborative filtering is that other users opinions can be selected and aggregated in such a way as to provide a reasonable prediction of the active user s preference [7]. For example, if a user likes item A, and many other users who like item A also like item B, then it is reasonable to expect that the user will also like item B. Collaborative filtering methods would predict that the user will give item B a high rating. With this approach, predictions are made based on co-occurrence statistics, and most methods assume that the missing ratings are missing at random. Unfortunately, researchers have shown that sampled ratings have markedly different properties from the users true preferences [21, 22]. Sampling is heavily influenced by social bias, which results in more missing ratings in some cases than others. This non-random pattern of missing and observed rating data is a potential source of unfairness. For the purpose of improving recommendation accuracy, there are collaborative filtering models 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.

2 [2, 21, 25] that use side information to address the problem of imbalanced data, but in this work, to test the properties and effectiveness of our metrics, we focus on the basic matrix-factorization algorithm first. Investigating how these other models could reduce unfairness is one direction for future research. Throughout the paper, we consider a running example of unfair recommendation. We consider recommendation in education, and unfairness that may occur in areas with current gender imbalance, such as science, technology, engineering, and mathematics (STEM) topics. Due to societal and cultural influences, fewer female students currently choose careers in STEM. For example, in 2010, women accounted for only 18% of the bachelor s degrees awarded in computer science [3]. The underrepresentation of women causes historical rating data of computer-science courses to be dominated by men. Consequently, the learned model may underestimate women s preferences and be biased toward men. We consider the setting in which, even if the ratings provided by students accurately reflect their true preferences, the bias in which ratings are reported leads to unfairness. The remainder of the paper is organized as follows. First, we review previous relevant work in Section 2. In Section 3, we formalize the recommendation problem, and we introduce four new unfairness metrics and give justifications and examples. In Section 4, we show that unfairness occurs as data gets more imbalanced, and we present results that successfully minimize each form of unfairness. Finally, Section 5 concludes the paper and proposes possible future work. 2 Related Work As machine learning is being more widely applied in modern society, researchers have begun identifying the criticality of algorithmic fairness. Various studies have considered algorithmic fairness in problems such as supervised classification [20, 23, 28]. When aiming to protect algorithms from treating people differently for prejudicial reasons, removing sensitive features (e.g., gender, race, or age) can help alleviate unfairness but is often insufficient. Features are often correlated, so other unprotected attributes can be related to the sensitive features and therefore still cause the model to be biased [17, 29]. Moreover, in problems such as collaborative filtering, algorithms do not directly consider measured features and instead infer latent user attributes from their behavior. Another frequently practiced strategy for encouraging fairness is to enforce demographic parity, which is to achieve statistical parity among groups. The goal is to ensure that the overall proportion of members in the protected group receiving positive (or negative) classifications is identical to the proportion of the population as a whole [29]. For example, in the case of a binary decision Ŷ {0, 1} and a binary protected attribute A {0, 1}, this constraint can be formalized as [9] Pr{Ŷ = 1 A = 0} = Pr{Ŷ = 1 A = 1}. (1) Kamishima et al. [13 17] evaluate model fairness based on this non-parity unfairness concept, or try to solve the unfairness issue in recommender systems by adding a regularization term that enforces demographic parity. The objective penalizes the differences among the average predicted ratings of user groups. However, demographic parity is only appropriate when preferences are unrelated to the sensitive features. In tasks such as recommendation, user preferences are indeed influenced by sensitive features such as gender, race, and age [4, 6]. Therefore, enforcing demographic parity may significantly damage the quality of recommendations. To address the issue of demographic parity, Hardt et al. [9] propose to measure unfairness with the true positive rate and true negative rate. This idea encourages what they refer to as equal opportunity and no longer relies on the implicit assumption of demographic parity that the target variable is independent of sensitive features. They propose that, in a binary setting, given a decision Ŷ {0, 1}, a protected attribute A {0, 1}, and the true label Y {0, 1}, the constraints are equivalent to [9] Pr{Ŷ = 1 A = 0, Y = y} = Pr{Ŷ = 1 A = 1, Y = y}, y {0, 1}. (2) This constraint upholds fairness and simultaneously respects group differences. It penalizes models that only perform well on the majority groups. This idea is also the basis of the unfairness metrics we propose for recommendation. Our running example of recommendation in education is inspired by the recent interest in using algorithms in this domain [5, 24, 27]. Student decisions about which courses to study can have 2

3 significant impacts on their lives, so the usage of algorithmic recommendation in this setting has consequences that will affect society for generations. Coupling the importance of this application with the issue of gender imbalance in STEM [1] and challenges in retention of students with backgrounds underrepresented in STEM [8, 26], we find this setting a serious motivation to advance scientific understanding of unfairness and methods to reduce unfairness in recommendation. 3 Fairness Objectives for Collaborative Filtering This section introduces fairness objectives for collaborative filtering. We begin by reviewing the matrix factorization method. We then describe the various fairness objectives we consider, providing formal definitions and discussion of their motivations. 3.1 Matrix Factorization for Recommendation We consider the task of collaborative filtering using matrix factorization [19]. We have a set of users indexed from 1 to m and a set of items indexed from 1 to n. For the ith user, let g i be a variable indicating which group the ith user belongs to. For example, it may indicate whether user i identifies as a woman, a man, or with a non-binary gender identity. For the jth item, let h j indicate the item group that it belongs to. For example, h j may represent a genre of a movie or topic of a course. Let r ij be the preference score of the ith user for the jth item. The ratings can be viewed as entries in a rating matrix R. The matrix-factorization formulation builds on the assumption that each rating can be represented as the product of vectors representing the user and item. With additional bias terms for users and items, this assumption can be summarized as follows: r ij p i q j + u i + v j, (3) where p i is a d-dimensional vector representing the ith user, q j is a d-dimensional vector representing the jth item, and u i and v j are scalar bias terms for the user and item, respectively. The matrixfactorization learning algorithm seeks to learn these parameters from observed ratings X, typically by minimizing a regularized, squared reconstruction error: J(P, Q, u, v) = λ 2 ( P 2 F + Q 2 F) + 1 X (i,j) X (y ij r ij ) 2, (4) where u and v are the vectors of bias terms, F represents the Frobenius norm, and y ij = p i q j + u i + v j. (5) Strategies for minimizing this non-convex objective are well studied, and a general approach is to compute the gradient and use a gradient-based optimizer. In our experiments, we use the Adam algorithm [18], which combines adaptive learning rates with momentum. 3.2 Unfair Recommendations from Underrepresentation In this section, we describe a process through which matrix factorization leads to unfair recommendations, even when rating data accurately reflects users true preferences. Such unfairness can occur with imbalanced data. We identify two forms of underrepresentation: population imbalance and observation bias. We later demonstrate that either leads to unfair recommendation, and both forms together lead to worse unfairness. In our discussion, we use a running example of course recommendation, highlighting effects of underrepresentation in STEM education. Population imbalance occurs when different types of users occur in the dataset with varied frequencies. For example, we consider four types of users defined by two aspects. First, each individual identifies with a gender. For simplicity, we only consider binary gender identities, though in this example, it would also be appropriate to consider men as one gender group and women and all non-binary gender identities as the second group. Second, each individual is either someone who enjoys and would excel in STEM topics or someone who does and would not. Population imbalance occurs in STEM education when, because of systemic bias or other societal problems, there may be significantly fewer women who succeed in STEM (WS) than those who do not (W), and because of converse societal 3

4 unfairness, there may be more men who succeed in STEM (MS) than those who do not (M). This four-way separation of user groups is not available to the recommender system, which instead may only know the gender group of each user, but not their proclivity for STEM. Observation bias is a related but distinct form of data imbalance, in which certain types of users may have different tendencies to rate different types of items. This bias is often part of a feedback loop involving existing methods of recommendation, whether by algorithms or by humans. If an individual is never recommended a particular item, they will likely never provide rating data for that item. Therefore, algorithms will never be able to directly learn about this preference relationship. In the education example, if women are rarely recommended to take STEM courses, there may be significantly less training data about women in STEM courses. We simulate these two types of data bias with two stochastic block models [11]. We create one block model that determines the probability that an individual in a particular user group likes an item in a particular item group. The group ratios may be non-uniform, leading to population imbalance. We then use a second block model to determine the probability that an individual in a user group rates an item in an item group. Non-uniformity in the second block model will lead to observation bias. Formally, let matrix L [0, 1] g h be the block-model parameters for rating probability. For the ith user and the jth item, the probability of r ij = +1 is L (gi,h j), and otherwise r ij = 1. Morever, let O [0, 1] g h be such that the probability of observing r ij is O (gi,h j). 3.3 Fairness Metrics In this section, we present four new unfairness metrics for preference prediction, all measuring a discrepancy between the prediction behavior for disadvantaged users and advantaged users. Each metric captures a different type of unfairness that may have different consequences. We describe the mathematical formulation of each metric, its justification, and examples of consequences the metric may indicate. We consider a binary group feature and refer to disadvantaged and advantaged groups, which may represent women and men in our education example. The first metric is value unfairness, which measures inconsistency in signed estimation error across the user types, computed as U val = 1 n ) ( ) (E g [y] n j E g [r] j E g [y] j E g [r] j, (6) j=1 where E g [y] j is the average predicted score for the jth item from disadvantaged users, E g [y] j is the average predicted score for advantaged users, and E g [r] j and E g [r] j are the average ratings for the disadvantaged and advantaged users, respectively. Precisely, the quantity E g [y] j is computed as E g [y] j := 1 {i : ((i, j) X) g i } and the other averages are computed analogously. i:((i,j) X) g i y ij, (7) Value unfairness occurs when one class of user is consistently given higher or lower predictions than their true preferences. If the errors in prediction are evenly balanced between overestimation and underestimation or if both classes of users have the same direction and magnitude of error, the value unfairness becomes small. Value unfairness becomes large when predictions for one class are consistently overestimated and predictions for the other class are consistently underestimated. For example, in a course recommender, value unfairness may manifest in male students being recommended STEM courses even when they are not interested in STEM topics and female students not being recommended STEM courses even if they are interested in STEM topics. The second metric is absolute unfairness, which measures inconsistency in absolute estimation error across user types, computed as U abs = 1 n E g E g [y] n j E g [r] j [y] j E g [r] j. (8) j=1 Absolute unfairness is unsigned, so it captures a single statistic representing the quality of prediction for each user type. If one user type has small reconstruction error and the other user type has large 4

5 reconstruction error, one type of user has the unfair advantage of good recommendation, while the other user type has poor recommendation. In contrast to value unfairness, absolute unfairness does not consider the direction of error. For example, if female students are given predictions 0.5 points below their true preferences and male students are given predictions 0.5 points above their true preferences, there is no absolute unfairness. Conversely, if female students are given ratings that are off by 2 points in either direction while male students are rated within 1 point of their true preferences, absolute unfairness is high, while value unfairness may be low. The third metric is underestimation unfairness, which measures inconsistency in how much the predictions underestimate the true ratings: U under = 1 n n max{0, E g [r] j E g [y] j } max{0, E g [r] j E g [y] j }. (9) j=1 Underestimation unfairness is important in settings where missing recommendations are more critical than extra recommendations. For example, underestimation could lead to a top student not being recommended to explore a topic they would excel in. Conversely, the fourth new metric is overestimation unfairness, which measures inconsistency in how much the predictions overestimate the true ratings: U over = 1 n n max{0, E g [y] j E g [r] j } max{0, E g [y] j E g [r] j }. (10) j=1 Overestimation unfairness may be important in settings where users may be overwhelmed by recommendations, so providing too many recommendations would be especially detrimental. For example, if users must invest large amounts of time to evaluate each recommended item, overestimating essentially costs the user time. Thus, uneven amounts of overestimation could cost one type of user more time than the other. Finally, a non-parity unfairness measure based on the regularization term introduced by Kamishima et al. [17] can be computed as the absolute difference between the overall average ratings of disadvantaged users and those of advantaged users: U par = E g [y] E g [y]. Each of these metrics has a straightforward subgradient and can be optimized by various subgradient optimization techniques. We augment the learning objective by adding a smoothed variation of a fairness metric based on the Huber loss [12], where the outer absolute value is replaced with the squared difference if it is less than 1. We solve for a local minimum, i.e, min J(P, Q, u, v) + U. (11) P,Q,u,v The smoothed penalty helps reduce discontinuities in the objective, making optimization more efficient. It is also straightforward to add a scalar trade-off term to weight the fairness against the loss. In our experiments, we use equal weighting, so we omit the term from Eq. (11). 4 Experiments We run experiments on synthetic data based on the simulated course-recommendation scenario and real movie rating data [10]. For each experiment, we investigate whether the learning objectives augmented with unfairness penalties successfully reduce unfairness. 4.1 Synthetic Data In our synthetic experiments, we generate simulated course-recommendation data from a block model as described in Section 3.2. We consider four user groups g {W, WS, M, MS} and three item groups h {Fem, STEM, Masc}. The user groups can be thought of as women who do not enjoy STEM topics (W), women who do enjoy STEM topics (WS), men who do not enjoy STEM topics (M), and men who do (MS). The item groups can be thought of as courses that tend to appeal to most 5

6 U O P O+P Error Value Absolute U O P O+P 0.00 U O P O+P Under Over Parity U O P O+P U O P O+P 0.00 U O P O+P Figure 1: Average unfairness scores for standard matrix factorization on synthetic data generated from different underrepresentation schemes. For each metric, the four sampling schemes are uniform (U), biased observations (O), biased populations (P), and both biases (O+P). The reconstruction error and the first four unfairness metrics follow the same trend, while non-parity exhibits different behavior. women (Fem), STEM courses, and courses that tend to appeal to most men (Masc). Based on these groups, we consider the rating block model Fem STEM Masc W L = WS MS (12) M We also consider two observation block models: one with uniform observation probability across all groups O uni = [0.4] 4 3 and one with unbalanced observation probability inspired by how students are often encouraged to take certain courses Fem STEM Masc O bias = W WS MS M (13) We define two different user group distributions: one in which each of the four groups is exactly a quarter of the population, and an imbalanced setting where 0.4 of the population is in W, 0.1 in WS, 0.4 in MS, and 0.1 in M. This heavy imbalance is inspired by some of the severe gender imbalances in certain STEM areas today. For each experiment, we select an observation matrix and user group distribution, generate 400 users and 300 items, and sample preferences and observations of those preferences from the block models. Training on these ratings, we evaluate on the remaining entries of the rating matrix, comparing the predicted rating against the true expected rating, 2L (gi,h j) Unfairness from different types of underrepresentation Using standard matrix factorization, we measure the various unfairness metrics under the different sampling conditions. We average over five random trials and plot the average score in Fig. 1. We label the settings as follows: uniform user groups and uniform observation probabilities (U), uniform groups and biased observation probabilities (O), biased user group populations and uniform observations (P), and biased populations and biased observations (P+O). The statistics demonstrate that each type of underrepresentation contributes to various forms of unfairness. For all metrics except parity, there is a strict order of unfairness: uniform data is the most 6

7 Table 1: Average error and unfairness metrics for synthetic data using different fairness objectives. The best scores and those that are statistically indistinguishable from the best are printed in bold. Each row represents a different unfairness penalty, and each column is the measured metric on the expected value of unseen ratings. Unfairness Error Value Absolute Underestimation Overestimation Non-Parity None ± 1.3e ± 1.8e ± 2.2e ± 6.5e ± 2.0e ± 1.6e-02 Value ± 1.0e ± 1.4e ± 1.5e ± 4.1e ± 1.5e ± 1.2e-02 Absolute ± 8.8e ± 1.6e ± 1.3e ± 6.2e ± 1.4e ± 1.0e-02 Under ± 1.6e ± 2.3e ± 2.4e ± 3.5e ± 2.3e ± 1.6e-02 Over ± 6.5e ± 1.2e ± 1.3e ± 6.0e ± 1.1e ± 1.2e-02 Non-Parity ± 1.3e ± 1.8e ± 2.2e ± 6.9e ± 1.9e ± 1.0e-02 fair; biased observations is the next most fair; biased populations is worse; and biasing the populations and observations causes the most unfairness. The squared rating error also follows this same trend. In contrast, non-parity behaves differently, in that it is heavily amplified by biased observations but seems unaffected by biased populations. Note that though non-parity is high when the observations are imbalanced, because of the imbalance in the observations, one should actually expect non-parity in the labeled ratings, so it a high non-parity score does not necessarily indicate an unfair situation. The other unfairness metrics, on the other hand, describe examples of unfair behavior by the rating predictor. These tests verify that unfairness can occur with imbalanced populations or observations, even when the measured ratings accurately represent user preferences Optimization of unfairness metrics As before, we generate rating data using the block model under the most imbalanced setting: The user populations are imbalanced, and the sampling rate is skewed. We provide the sampled ratings to the matrix factorization algorithms and evaluate on the remaining entries of the expected rating matrix. We again use two-dimensional vectors to represent the users and items, a regularization term of λ = 10 3, and optimize for 250 iterations using the full gradient. We generate three datasets each and measure squared reconstruction error and the six unfairness metrics. The results are listed in Table 1. For each metric, we print in bold the best average score and any scores that are not statistically significantly distinct according to paired t-tests with threshold The results indicate that the learning algorithm successfully minimizes the unfairness penalties, generalizing to unseen, held-out user-item pairs. And reducing any unfairness metric does not lead to a significant increase in reconstruction error. The complexity of computing the unfairness metrics is similar to that of the error computation, which is linear in the number of ratings, so adding the fairness term approximately doubles the training time. In our implementation, learning with fairness terms takes longer because loops and backpropagation introduce extra overhead. For example, with synthetic data of 400 users and 300 items, it takes seconds to train a matrix factorization model without any unfairness term and seconds for one with value unfairness. While optimizing each metric leads to improved performance on itself (see the highlighted entries in Table 1), a few trends are worth noting. Optimizing any of our new unfairness metrics almost always reduces the other forms of unfairness. An exception is that optimizing absolute unfairness leads to an increase in underestimation. Value unfairness is closely related to underestimation and overestimation, since optimizing value unfairness is even more effective at reducing underestimation and overestimation than directly optimizing them. Also, optimizing value and overestimation are more effective in reducing absolute unfairness than directly optimizing it. Finally, optimizing parity unfairness leads to increases in all unfairness metrics except absolute unfairness and parity itself. These relationships among the metrics suggest a need for practitioners to decide which types of fairness are most important for their applications. 4.2 Real Data We use the Movielens Million Dataset [10], which contains ratings (from 1 to 5) by 6,040 users of 3,883 movies. The users are annotated with demographic variables including gender, and the movies are each annotated with a set of genres. We manually selected genres that feature different forms of 7

8 Table 2: Gender-based statistics of movie genres in Movielens data. Romance Action Sci-Fi Musical Crime Count Ratings per female user Ratings per male user Average rating by women Average rating by men Table 3: Average error and unfairness metrics for movie-rating data using different fairness objectives. Unfairness Error Value Absolute Underestimation Overestimation Non-Parity None ± 1.9e ± 6.3e ± 1.7e ± 1.6e ± 3.9e ± 1.3e-03 Value ± 2.2e ± 6.9e ± 2.2e ± 1.9e ± 4.9e ± 1.6e-03 Absolute ± 2.0e ± 6.2e ± 1.7e ± 1.8e ± 4.2e ± 2.7e-03 Under ± 2.2e ± 6.8e ± 1.8e ± 1.7e ± 4.2e ± 9.3e-04 Over ± 1.9e ± 5.8e ± 1.6e ± 1.9e ± 4.1e ± 2.0e-03 Non-Parity ± 1.9e ± 6.0e ± 1.6e ± 1.7e ± 3.9e ± 1.5e-03 gender imbalance and only consider movies that list these genres. Then we filter the users to only consider those who rated at least 50 of the selected movies. The genres we selected are action, crime, musical, romance, and sci-fi. We selected these genres because they each have a noticeable gender effect in the data. Women rate musical and romance films higher and more frequently than men. Women and men both score action, crime, and sci-fi films about equally, but men rate these film much more frequently. Table 2 lists these statistics in detail. After filtering by genre and rating frequency, we have 2,953 users and 1,006 movies in the dataset. We run five trials in which we randomly split the ratings into training and testing sets, train each objective function on the training set, and evaluate each metric on the testing set. The average scores are listed in Table 3, where bold scores again indicate being statistically indistinguishable from the best average score. On real data, the results show that optimizing each unfairness metric leads to the best performance on that metric without a significant change in the reconstruction error. As in the synthetic data, optimizing value unfairness leads to the most decrease on under- and overestimation. Optimizing non-parity again causes an increase or no change in almost all the other unfairness metrics. 5 Conclusion In this paper, we discussed various types of unfairness that can occur in collaborative filtering. We demonstrate that these forms of unfairness can occur even when the observed rating data is correct, in the sense that it accurately reflects the preferences of the users. We identify two forms of data bias that can lead to such unfairness. We then demonstrate that augmenting matrix-factorization objectives with these unfairness metrics as penalty functions enables a learning algorithm to minimize each of them. Our experiments on synthetic and real data show that minimization of these forms of unfairness is possible with no significant increase in reconstruction error. We also demonstrate a combined objective that penalizes both overestimation and underestimation. Minimizing this objective leads to small unfairness penalties for the other forms of unfairness. Using this combined objective may be a good approach for practitioners. However, no single objective was the best for all unfairness metrics, so it remains necessary for practitioners to consider precisely which form of fairness is most important in their application and optimize that specific objective. Future Work While our work in this paper focused on improving fairness among users so that the model treats different groups of users fairly, we did not address fair treatment of different item groups. The model could be biased toward certain items, e.g., performing better at prediction for some items than others in terms of accuracy or over- and underestimation. Achieving fairness for both users and items may be important when considering that the items may also suffer from discrimination or bias, for example, when courses are taught by instructors with different demographics. Our experiments demonstrate that minimizing empirical unfairness generalizes, but this generalization is dependent on data density. When ratings are especially sparse, the empirical fairness does not 8

9 always generalize well to held-out predictions. We are investigating methods that are more robust to data sparsity in future work. Moreover, our fairness metrics assume that users rate items according to their true preferences. This assumption is likely to be violated in real data, since ratings can also be influenced by various environmental factors. E.g., in education, a student s rating for a course also depends on whether the course has an inclusive and welcoming learning environment. However, addressing this type of bias may require additional information or external interventions beyond the provided rating data. Finally, we are investigating methods to reduce unfairness by directly modeling the two-stage sampling process we used to generate synthetic, biased data. We hypothesize that by explicitly modeling the rating and observation probabilities as separate variables, we may be able to derive a principled, probabilistic approach to address these forms of data imbalance. References [1] D. N. Beede, T. A. Julian, D. Langdon, G. McKittrick, B. Khan, and M. E. Doms. Women in STEM: A gender gap to innovation. U.S. Department of Commerce, Economics and Statistics Administration, [2] A. Beutel, E. H. Chi, Z. Cheng, H. Pham, and J. Anderson. Beyond globally optimal: Focused learning for improved recommendations. In Proceedings of the 26th International Conference on World Wide Web, pages International World Wide Web Conferences Steering Committee, [3] S. Broad and M. McGee. Recruiting women into computer science and information systems. Proceedings of the Association Supporting Computer Users in Education Annual Conference, pages 29 40, [4] O. Chausson. Who watches what? Assessing the impact of gender and personality on film preferences [5] M.-I. Dascalu, C.-N. Bodea, M. N. Mihailescu, E. A. Tanase, and P. Ordoñez de Pablos. Educational recommender systems and their application in lifelong learning. Behaviour & Information Technology, 35(4): , [6] T. N. Daymont and P. J. Andrisani. Job preferences, college major, and the gender gap in earnings. Journal of Human Resources, pages , [7] M. D. Ekstrand, J. T. Riedl, J. A. Konstan, et al. Collaborative filtering recommender systems. Foundations and Trends in Human-Computer Interaction, 4(2):81 173, [8] A. L. Griffith. Persistence of women and minorities in STEM field majors: Is it the school that matters? Economics of Education Review, 29(6): , [9] M. Hardt, E. Price, N. Srebro, et al. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, pages , [10] F. M. Harper and J. A. Konstan. The Movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4):19, [11] P. W. Holland and S. Leinhardt. Local structure in social networks. Sociological Methodology, 7:1 45, [12] P. J. Huber. Robust estimation of a location parameter. The Annals of Mathematical Statistics, pages , [13] T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma. Enhancement of the neutrality in recommendation. In Decisions@ RecSys, pages 8 14, [14] T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma. Efficiency improvement of neutralityenhanced recommendation. In Decisions@ RecSys, pages 1 8, [15] T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma. Correcting popularity bias by enhancing recommendation neutrality. In RecSys Posters,

10 [16] T. Kamishima, S. Akaho, H. Asoh, and I. Sato. Model-based approaches for independenceenhanced recommendation. In Data Mining Workshops (ICDMW), 2016 IEEE 16th International Conference on, pages IEEE, [17] T. Kamishima, S. Akaho, and J. Sakuma. Fairness-aware learning through regularization approach. In 11th International Conference on Data Mining Workshops (ICDMW), pages IEEE, [18] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arxiv preprint arxiv: , [19] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8), [20] K. Lum and J. Johndrow. A statistical framework for fair predictive algorithms. arxiv preprint arxiv: , [21] B. Marlin, R. S. Zemel, S. Roweis, and M. Slaney. Collaborative filtering and the missing at random assumption. arxiv preprint arxiv: , [22] B. M. Marlin and R. S. Zemel. Collaborative prediction and ranking with non-random missing data. In Proceedings of the third ACM conference on Recommender systems, pages ACM, [23] D. Pedreshi, S. Ruggieri, and F. Turini. Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages ACM, [24] C. V. Sacin, J. B. Agapito, L. Shafti, and A. Ortigosa. Recommendation in higher education using data mining techniques. In Educational Data Mining, [25] S. Sahebi and P. Brusilovsky. It takes two to tango: An exploration of domain pairs for crossdomain collaborative filtering. In Proceedings of the 9th ACM Conference on Recommender Systems, pages ACM, [26] E. Smith. Women into science and engineering? Gendered participation in higher education STEM subjects. British Educational Research Journal, 37(6): , [27] N. Thai-Nghe, L. Drumond, A. Krohn-Grimberghe, and L. Schmidt-Thieme. Recommender system for predicting student performance. Procedia Computer Science, 1(2): , [28] M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi. Fairness constraints: Mechanisms for fair classification. arxiv preprint arxiv: , [29] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In Proceedings of the 30th International Conference on Machine Learning, pages ,

Balanced Neighborhoods for Fairness-aware Collaborative Recommendation

Balanced Neighborhoods for Fairness-aware Collaborative Recommendation ABSTRACT Robin Burke rburke@cs.depaul.edu Masoud Mansoury mmansou4@depaul.edu Recent work on fairness in machine learning has begun to be extended to recommender systems. While there is a tension between

More information

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Comparison of Collaborative Filtering Methods for Medication Reconciliation A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,

More information

Correcting Popularity Bias by Enhancing Recommendation Neutrality

Correcting Popularity Bias by Enhancing Recommendation Neutrality Correcting Popularity Bias by Enhancing Recommendation Neutrality Toshihiro Kamishima *, Shotaro Akaho *, Hideki Asoh *, and Jun Sakuma ** *National Institute of Advanced Industrial Science and Technology

More information

Future directions of Fairness-aware Data Mining

Future directions of Fairness-aware Data Mining Future directions of Fairness-aware Data Mining Recommendation, Causality, and Theoretical Aspects Toshihiro Kamishima *1 and Kazuto Fukuchi *2 joint work with Shotaro Akaho *1, Hideki Asoh *1, and Jun

More information

A HMM-based Pre-training Approach for Sequential Data

A HMM-based Pre-training Approach for Sequential Data A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University

More information

Achieving Fairness through Adversarial Learning: an Application to Recidivism Prediction

Achieving Fairness through Adversarial Learning: an Application to Recidivism Prediction Achieving Fairness through Adversarial Learning: an Application to Recidivism Prediction Christina Wadsworth cwads@cs.stanford.edu Francesca Vera fvera@cs.stanford.edu Chris Piech piech@cs.stanford.edu

More information

Non-Discriminatory Machine Learning through Convex Fairness Criteria

Non-Discriminatory Machine Learning through Convex Fairness Criteria Non-Discriminatory Machine Learning through Convex Fairness Criteria Naman Goel and Mohammad Yaghini and Boi Faltings Artificial Intelligence Laboratory, École Polytechnique Fédérale de Lausanne, Lausanne,

More information

Emotion Recognition using a Cauchy Naive Bayes Classifier

Emotion Recognition using a Cauchy Naive Bayes Classifier Emotion Recognition using a Cauchy Naive Bayes Classifier Abstract Recognizing human facial expression and emotion by computer is an interesting and challenging problem. In this paper we propose a method

More information

On the Combination of Collaborative and Item-based Filtering

On the Combination of Collaborative and Item-based Filtering On the Combination of Collaborative and Item-based Filtering Manolis Vozalis 1 and Konstantinos G. Margaritis 1 University of Macedonia, Dept. of Applied Informatics Parallel Distributed Processing Laboratory

More information

Towards More Confident Recommendations: Improving Recommender Systems Using Filtering Approach Based on Rating Variance

Towards More Confident Recommendations: Improving Recommender Systems Using Filtering Approach Based on Rating Variance Towards More Confident Recommendations: Improving Recommender Systems Using Filtering Approach Based on Rating Variance Gediminas Adomavicius gedas@umn.edu Sreeharsha Kamireddy 2 skamir@cs.umn.edu YoungOk

More information

CSE 258 Lecture 1.5. Web Mining and Recommender Systems. Supervised learning Regression

CSE 258 Lecture 1.5. Web Mining and Recommender Systems. Supervised learning Regression CSE 258 Lecture 1.5 Web Mining and Recommender Systems Supervised learning Regression What is supervised learning? Supervised learning is the process of trying to infer from labeled data the underlying

More information

Exploiting Implicit Item Relationships for Recommender Systems

Exploiting Implicit Item Relationships for Recommender Systems Exploiting Implicit Item Relationships for Recommender Systems Zhu Sun, Guibing Guo, and Jie Zhang School of Computer Engineering, Nanyang Technological University, Singapore School of Information Systems,

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Testing Bias Prevention Techniques on Recidivism Risk Models

Testing Bias Prevention Techniques on Recidivism Risk Models Testing Bias Prevention Techniques on Recidivism Risk Models Claudia McKenzie, Mathematical and Computational Science, Stanford University, claudi10@stanford.edu 1 Introduction Risk assessment algorithms

More information

Unconscious Gender Bias in Academia: from PhD Students to Professors

Unconscious Gender Bias in Academia: from PhD Students to Professors Unconscious Gender Bias in Academia: from PhD Students to Professors Poppenhaeger, K. (2017). Unconscious Gender Bias in Academia: from PhD Students to Professors. In Proceedings of the 6th International

More information

Understanding Science Conceptual Framework

Understanding Science Conceptual Framework 1 Understanding Science Conceptual Framework This list of conceptual understandings regarding the nature and process of science are aligned across grade levels to help instructors identify age-appropriate

More information

Learning with Rare Cases and Small Disjuncts

Learning with Rare Cases and Small Disjuncts Appears in Proceedings of the 12 th International Conference on Machine Learning, Morgan Kaufmann, 1995, 558-565. Learning with Rare Cases and Small Disjuncts Gary M. Weiss Rutgers University/AT&T Bell

More information

Exploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning

Exploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning Exploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning Joshua T. Abbott (joshua.abbott@berkeley.edu) Thomas L. Griffiths (tom griffiths@berkeley.edu) Department of Psychology,

More information

Cocktail Preference Prediction

Cocktail Preference Prediction Cocktail Preference Prediction Linus Meyer-Teruel, 1 Michael Parrott 1 1 Department of Computer Science, Stanford University, In this paper we approach the problem of rating prediction on data from a number

More information

Remarks on Bayesian Control Charts

Remarks on Bayesian Control Charts Remarks on Bayesian Control Charts Amir Ahmadi-Javid * and Mohsen Ebadi Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran * Corresponding author; email address: ahmadi_javid@aut.ac.ir

More information

The Long Tail of Recommender Systems and How to Leverage It

The Long Tail of Recommender Systems and How to Leverage It The Long Tail of Recommender Systems and How to Leverage It Yoon-Joo Park Stern School of Business, New York University ypark@stern.nyu.edu Alexander Tuzhilin Stern School of Business, New York University

More information

Estimating the number of components with defects post-release that showed no defects in testing

Estimating the number of components with defects post-release that showed no defects in testing SOFTWARE TESTING, VERIFICATION AND RELIABILITY Softw. Test. Verif. Reliab. 2002; 12:93 122 (DOI: 10.1002/stvr.235) Estimating the number of components with defects post-release that showed no defects in

More information

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction

More information

Predicting Breast Cancer Survivability Rates

Predicting Breast Cancer Survivability Rates Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer

More information

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES Sawtooth Software RESEARCH PAPER SERIES The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? Dick Wittink, Yale University Joel Huber, Duke University Peter Zandan,

More information

Recommendation Independence

Recommendation Independence Proceedings of Machine Learning Research 81:1 15, 2018 Conference on Fairness, Accountability, and Transparency Recommendation Independence Toshihiro Kamishima Shotaro Akaho Hideki Asoh National Institute

More information

NMF-Density: NMF-Based Breast Density Classifier

NMF-Density: NMF-Based Breast Density Classifier NMF-Density: NMF-Based Breast Density Classifier Lahouari Ghouti and Abdullah H. Owaidh King Fahd University of Petroleum and Minerals - Department of Information and Computer Science. KFUPM Box 1128.

More information

The Role of Emotions in Context-aware Recommendation

The Role of Emotions in Context-aware Recommendation The Role of Emotions in Context-aware Recommendation Yong Zheng, Robin Burke, Bamshad Mobasher Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA {yzheng8, rburke,

More information

Case Studies of Signed Networks

Case Studies of Signed Networks Case Studies of Signed Networks Christopher Wang December 10, 2014 Abstract Many studies on signed social networks focus on predicting the different relationships between users. However this prediction

More information

Selection and Combination of Markers for Prediction

Selection and Combination of Markers for Prediction Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe

More information

Data Mining in Bioinformatics Day 4: Text Mining

Data Mining in Bioinformatics Day 4: Text Mining Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1 What is text mining?

More information

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive

More information

Training deep Autoencoders for collaborative filtering Oleksii Kuchaiev & Boris Ginsburg

Training deep Autoencoders for collaborative filtering Oleksii Kuchaiev & Boris Ginsburg Training deep Autoencoders for collaborative filtering Oleksii Kuchaiev & Boris Ginsburg Motivation Personalized recommendations 2 Key points (spoiler alert) 1. Deep autoencoder for collaborative filtering

More information

Stability of Collaborative Filtering Recommendation Algorithms 1

Stability of Collaborative Filtering Recommendation Algorithms 1 Stability of Collaborative Filtering Recommendation Algorithms GEDIMINAS ADOMAVICIUS, University of Minnesota JINGJING ZHANG, University of Minnesota The paper explores stability as a new measure of recommender

More information

Utilizing Posterior Probability for Race-composite Age Estimation

Utilizing Posterior Probability for Race-composite Age Estimation Utilizing Posterior Probability for Race-composite Age Estimation Early Applications to MORPH-II Benjamin Yip NSF-REU in Statistical Data Mining and Machine Learning for Computer Vision and Pattern Recognition

More information

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Vs. 2 Background 3 There are different types of research methods to study behaviour: Descriptive: observations,

More information

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures 1 2 3 4 5 Kathleen T Quach Department of Neuroscience University of California, San Diego

More information

Local Image Structures and Optic Flow Estimation

Local Image Structures and Optic Flow Estimation Local Image Structures and Optic Flow Estimation Sinan KALKAN 1, Dirk Calow 2, Florentin Wörgötter 1, Markus Lappe 2 and Norbert Krüger 3 1 Computational Neuroscience, Uni. of Stirling, Scotland; {sinan,worgott}@cn.stir.ac.uk

More information

A NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION

A NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION OPT-i An International Conference on Engineering and Applied Sciences Optimization M. Papadrakakis, M.G. Karlaftis, N.D. Lagaros (eds.) Kos Island, Greece, 4-6 June 2014 A NOVEL VARIABLE SELECTION METHOD

More information

Testing the robustness of anonymization techniques: acceptable versus unacceptable inferences - Draft Version

Testing the robustness of anonymization techniques: acceptable versus unacceptable inferences - Draft Version Testing the robustness of anonymization techniques: acceptable versus unacceptable inferences - Draft Version Gergely Acs, Claude Castelluccia, Daniel Le étayer 1 Introduction Anonymization is a critical

More information

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models

More information

CS221 / Autumn 2017 / Liang & Ermon. Lecture 19: Conclusion

CS221 / Autumn 2017 / Liang & Ermon. Lecture 19: Conclusion CS221 / Autumn 2017 / Liang & Ermon Lecture 19: Conclusion Outlook AI is everywhere: IT, transportation, manifacturing, etc. AI being used to make decisions for: education, credit, employment, advertising,

More information

Positive and Unlabeled Relational Classification through Label Frequency Estimation

Positive and Unlabeled Relational Classification through Label Frequency Estimation Positive and Unlabeled Relational Classification through Label Frequency Estimation Jessa Bekker and Jesse Davis Computer Science Department, KU Leuven, Belgium firstname.lastname@cs.kuleuven.be Abstract.

More information

Direct memory access using two cues: Finding the intersection of sets in a connectionist model

Direct memory access using two cues: Finding the intersection of sets in a connectionist model Direct memory access using two cues: Finding the intersection of sets in a connectionist model Janet Wiles, Michael S. Humphreys, John D. Bain and Simon Dennis Departments of Psychology and Computer Science

More information

arxiv: v3 [stat.ml] 27 Mar 2018

arxiv: v3 [stat.ml] 27 Mar 2018 ATTACKING THE MADRY DEFENSE MODEL WITH L 1 -BASED ADVERSARIAL EXAMPLES Yash Sharma 1 and Pin-Yu Chen 2 1 The Cooper Union, New York, NY 10003, USA 2 IBM Research, Yorktown Heights, NY 10598, USA sharma2@cooper.edu,

More information

Active Deformable Part Models Inference

Active Deformable Part Models Inference Active Deformable Part Models Inference Menglong Zhu Nikolay Atanasov George J. Pappas Kostas Daniilidis GRASP Laboratory, University of Pennsylvania 3330 Walnut Street, Philadelphia, PA 19104, USA Abstract.

More information

Identification of Neuroimaging Biomarkers

Identification of Neuroimaging Biomarkers Identification of Neuroimaging Biomarkers Dan Goodwin, Tom Bleymaier, Shipra Bhal Advisor: Dr. Amit Etkin M.D./PhD, Stanford Psychiatry Department Abstract We present a supervised learning approach to

More information

arxiv: v2 [cs.lg] 1 Jun 2018

arxiv: v2 [cs.lg] 1 Jun 2018 Shagun Sodhani 1 * Vardaan Pahuja 1 * arxiv:1805.11016v2 [cs.lg] 1 Jun 2018 Abstract Self-play (Sukhbaatar et al., 2017) is an unsupervised training procedure which enables the reinforcement learning agents

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

PDRF About Propensity Weighting emma in Australia Adam Hodgson & Andrey Ponomarev Ipsos Connect Australia

PDRF About Propensity Weighting emma in Australia Adam Hodgson & Andrey Ponomarev Ipsos Connect Australia 1. Introduction It is not news for the research industry that over time, we have to face lower response rates from consumer surveys (Cook, 2000, Holbrook, 2008). It is not infrequent these days, especially

More information

Fairness-aware Learning through Regularization Approach

Fairness-aware Learning through Regularization Approach 2011 11th IEEE International Conference on Data Mining Workshops Fairness-aware Learning through Regularization Approach Toshihiro Kamishima,Shotaro Akaho, and Jun Sakuma National Institute of Advanced

More information

Outlier Analysis. Lijun Zhang

Outlier Analysis. Lijun Zhang Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based

More information

Statistical Audit. Summary. Conceptual and. framework. MICHAELA SAISANA and ANDREA SALTELLI European Commission Joint Research Centre (Ispra, Italy)

Statistical Audit. Summary. Conceptual and. framework. MICHAELA SAISANA and ANDREA SALTELLI European Commission Joint Research Centre (Ispra, Italy) Statistical Audit MICHAELA SAISANA and ANDREA SALTELLI European Commission Joint Research Centre (Ispra, Italy) Summary The JRC analysis suggests that the conceptualized multi-level structure of the 2012

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Using AUC and Accuracy in Evaluating Learning Algorithms

Using AUC and Accuracy in Evaluating Learning Algorithms 1 Using AUC and Accuracy in Evaluating Learning Algorithms Jin Huang Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 fjhuang, clingg@csd.uwo.ca

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing Categorical Speech Representation in the Human Superior Temporal Gyrus Edward F. Chang, Jochem W. Rieger, Keith D. Johnson, Mitchel S. Berger, Nicholas M. Barbaro, Robert T. Knight SUPPLEMENTARY INFORMATION

More information

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Gene Selection for Tumor Classification Using Microarray Gene Expression Data Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology

More information

Three principles of data science: predictability, computability, and stability (PCS)

Three principles of data science: predictability, computability, and stability (PCS) Three principles of data science: predictability, computability, and stability (PCS) Bin Yu a,b and Karl Kumbier a a Statistics Department, University of California, Berkeley, CA 94720; b EECS Department,

More information

Positive and Unlabeled Relational Classification through Label Frequency Estimation

Positive and Unlabeled Relational Classification through Label Frequency Estimation Positive and Unlabeled Relational Classification through Label Frequency Estimation Jessa Bekker and Jesse Davis Computer Science Department, KU Leuven, Belgium firstname.lastname@cs.kuleuven.be Abstract.

More information

A scored AUC Metric for Classifier Evaluation and Selection

A scored AUC Metric for Classifier Evaluation and Selection A scored AUC Metric for Classifier Evaluation and Selection Shaomin Wu SHAOMIN.WU@READING.AC.UK School of Construction Management and Engineering, The University of Reading, Reading RG6 6AW, UK Peter Flach

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Contributions to Brain MRI Processing and Analysis

Contributions to Brain MRI Processing and Analysis Contributions to Brain MRI Processing and Analysis Dissertation presented to the Department of Computer Science and Artificial Intelligence By María Teresa García Sebastián PhD Advisor: Prof. Manuel Graña

More information

GUIDELINE COMPARATORS & COMPARISONS:

GUIDELINE COMPARATORS & COMPARISONS: GUIDELINE COMPARATORS & COMPARISONS: Direct and indirect comparisons Adapted version (2015) based on COMPARATORS & COMPARISONS: Direct and indirect comparisons - February 2013 The primary objective of

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

arxiv: v1 [cs.ai] 28 Nov 2017

arxiv: v1 [cs.ai] 28 Nov 2017 : a better way of the parameters of a Deep Neural Network arxiv:1711.10177v1 [cs.ai] 28 Nov 2017 Guglielmo Montone Laboratoire Psychologie de la Perception Université Paris Descartes, Paris montone.guglielmo@gmail.com

More information

Type II Fuzzy Possibilistic C-Mean Clustering

Type II Fuzzy Possibilistic C-Mean Clustering IFSA-EUSFLAT Type II Fuzzy Possibilistic C-Mean Clustering M.H. Fazel Zarandi, M. Zarinbal, I.B. Turksen, Department of Industrial Engineering, Amirkabir University of Technology, P.O. Box -, Tehran, Iran

More information

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model Delia North Temesgen Zewotir Michael Murray Abstract In South Africa, the Department of Education allocates

More information

Methods for Addressing Selection Bias in Observational Studies

Methods for Addressing Selection Bias in Observational Studies Methods for Addressing Selection Bias in Observational Studies Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA What is Selection Bias? In the regression

More information

Comparison of volume estimation methods for pancreatic islet cells

Comparison of volume estimation methods for pancreatic islet cells Comparison of volume estimation methods for pancreatic islet cells Jiří Dvořák a,b, Jan Švihlíkb,c, David Habart d, and Jan Kybic b a Department of Probability and Mathematical Statistics, Faculty of Mathematics

More information

Framework for Comparative Research on Relational Information Displays

Framework for Comparative Research on Relational Information Displays Framework for Comparative Research on Relational Information Displays Sung Park and Richard Catrambone 2 School of Psychology & Graphics, Visualization, and Usability Center (GVU) Georgia Institute of

More information

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp The Stata Journal (22) 2, Number 3, pp. 28 289 Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve

More information

Considerations on Fairness-aware Data Mining

Considerations on Fairness-aware Data Mining 2012 IEEE 12th International Conference on Data Mining Workshops Considerations on Fairness-aware Data Mining Toshihiro Kamishima,Shotaro Akaho, Hideki Asoh, and Jun Sakuma National Institute of Advanced

More information

Minority Report: ML Fairness in Criminality Prediction

Minority Report: ML Fairness in Criminality Prediction Minority Report: ML Fairness in Criminality Prediction Dominick Lim djlim@stanford.edu Torin Rudeen torinmr@stanford.edu 1. Introduction 1.1. Motivation Machine learning is used more and more to make decisions

More information

Rank Aggregation and Belief Revision Dynamics

Rank Aggregation and Belief Revision Dynamics Rank Aggregation and Belief Revision Dynamics Igor Volzhanin (ivolzh01@mail.bbk.ac.uk), Ulrike Hahn (u.hahn@bbk.ac.uk), Dell Zhang (dell.z@ieee.org) Birkbeck, University of London London, WC1E 7HX UK Stephan

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

Cognitive modeling versus game theory: Why cognition matters

Cognitive modeling versus game theory: Why cognition matters Cognitive modeling versus game theory: Why cognition matters Matthew F. Rutledge-Taylor (mrtaylo2@connect.carleton.ca) Institute of Cognitive Science, Carleton University, 1125 Colonel By Drive Ottawa,

More information

Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data. Technical Report

Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data. Technical Report Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street

More information

Reviewing Applicants. Research on Bias and Assumptions

Reviewing Applicants. Research on Bias and Assumptions Reviewing Applicants Research on Bias and Assumptions Weall like to think that we are objective scholars who judge people solely on their credentials and achievements, but copious research shows that every

More information

Introduction to Econometrics

Introduction to Econometrics Global edition Introduction to Econometrics Updated Third edition James H. Stock Mark W. Watson MyEconLab of Practice Provides the Power Optimize your study time with MyEconLab, the online assessment and

More information

ITEM-LEVEL TURST-BASED COLLABORATIVE FILTERING FOR RECOMMENDER SYSTEMS

ITEM-LEVEL TURST-BASED COLLABORATIVE FILTERING FOR RECOMMENDER SYSTEMS ITEM-LEVEL TURST-BASED COLLABORATIVE FILTERING FOR RECOMMENDER SYSTEMS Te- Min Chag Department of Information Management, National Sun Yat-sen University temin@mail.nsysu.edu.tw Wen- Feng Hsiao Department

More information

Applied Machine Learning, Lecture 11: Ethical and legal considerations; domain effects and domain adaptation

Applied Machine Learning, Lecture 11: Ethical and legal considerations; domain effects and domain adaptation Applied Machine Learning, Lecture 11: Ethical and legal considerations; domain effects and domain adaptation Richard Johansson including some slides borrowed from Barbara Plank overview introduction bias

More information

Simultaneous Measurement Imputation and Outcome Prediction for Achilles Tendon Rupture Rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for Achilles Tendon Rupture Rehabilitation Simultaneous Measurement Imputation and Outcome Prediction for Achilles Tendon Rupture Rehabilitation Charles Hamesse 1, Paul Ackermann 2, Hedvig Kjellström 1, and Cheng Zhang 3 1 KTH Royal Institute of

More information

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Empirical Formula for Creating Error Bars for the Method of Paired Comparison Empirical Formula for Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Rochester Institute of Technology Munsell Color Science Laboratory Chester F. Carlson Center for Imaging Science

More information

OHDSI Tutorial: Design and implementation of a comparative cohort study in observational healthcare data

OHDSI Tutorial: Design and implementation of a comparative cohort study in observational healthcare data OHDSI Tutorial: Design and implementation of a comparative cohort study in observational healthcare data Faculty: Martijn Schuemie (Janssen Research and Development) Marc Suchard (UCLA) Patrick Ryan (Janssen

More information

EPSE 594: Meta-Analysis: Quantitative Research Synthesis

EPSE 594: Meta-Analysis: Quantitative Research Synthesis EPSE 594: Meta-Analysis: Quantitative Research Synthesis Ed Kroc University of British Columbia ed.kroc@ubc.ca March 28, 2019 Ed Kroc (UBC) EPSE 594 March 28, 2019 1 / 32 Last Time Publication bias Funnel

More information

De-Biasing User Preference Ratings in Recommender Systems

De-Biasing User Preference Ratings in Recommender Systems Gediminas Adomavicius University of Minnesota Minneapolis, MN gedas@umn.edu De-Biasing User Preference Ratings in Recommender Systems Jesse Bockstedt University of Arizona Tucson, AZ bockstedt@email.arizona.edu

More information

Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies

Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies PART 1: OVERVIEW Slide 1: Overview Welcome to Qualitative Comparative Analysis in Implementation Studies. This narrated powerpoint

More information

[1] provides a philosophical introduction to the subject. Simon [21] discusses numerous topics in economics; see [2] for a broad economic survey.

[1] provides a philosophical introduction to the subject. Simon [21] discusses numerous topics in economics; see [2] for a broad economic survey. Draft of an article to appear in The MIT Encyclopedia of the Cognitive Sciences (Rob Wilson and Frank Kiel, editors), Cambridge, Massachusetts: MIT Press, 1997. Copyright c 1997 Jon Doyle. All rights reserved

More information

Size Matters: the Structural Effect of Social Context

Size Matters: the Structural Effect of Social Context Size Matters: the Structural Effect of Social Context Siwei Cheng Yu Xie University of Michigan Abstract For more than five decades since the work of Simmel (1955), many social science researchers have

More information

arxiv: v2 [cs.lg] 30 Oct 2013

arxiv: v2 [cs.lg] 30 Oct 2013 Prediction of breast cancer recurrence using Classification Restricted Boltzmann Machine with Dropping arxiv:1308.6324v2 [cs.lg] 30 Oct 2013 Jakub M. Tomczak Wrocław University of Technology Wrocław, Poland

More information

An Improved Algorithm To Predict Recurrence Of Breast Cancer

An Improved Algorithm To Predict Recurrence Of Breast Cancer An Improved Algorithm To Predict Recurrence Of Breast Cancer Umang Agrawal 1, Ass. Prof. Ishan K Rajani 2 1 M.E Computer Engineer, Silver Oak College of Engineering & Technology, Gujarat, India. 2 Assistant

More information

An Overview and Comparative Analysis on Major Generative Models

An Overview and Comparative Analysis on Major Generative Models An Overview and Comparative Analysis on Major Generative Models Zijing Gu zig021@ucsd.edu Abstract The amount of researches on generative models has been grown rapidly after a period of silence due to

More information

Reviewing Applicants

Reviewing Applicants Reviewing Applicants Research on Bias and Assumptions We all like to think that we are objective scholars who judge people solely on their credentials and achievements, but copious research shows that

More information

Mathematical Structure & Dynamics of Aggregate System Dynamics Infectious Disease Models 2. Nathaniel Osgood CMPT 394 February 5, 2013

Mathematical Structure & Dynamics of Aggregate System Dynamics Infectious Disease Models 2. Nathaniel Osgood CMPT 394 February 5, 2013 Mathematical Structure & Dynamics of Aggregate System Dynamics Infectious Disease Models 2 Nathaniel Osgood CMPT 394 February 5, 2013 Recall: Kendrick-McKermack Model Partitioning the population into 3

More information