Chapter 12 (Section 12.4): Recommender Systems Second edition of the book, coming soon Road Map пЃ® пЃ® пЃ® Introduction Content-based recommendation Collaborative filtering based recommendation пЃ± пЃ± пЃ± K-nearest neighbor Association rules Matrix factorization CS583, Bing Liu, UIC 2 Introduction пЃ® пЃ® пЃ® Recommender systems are widely used on the Web for recommending products and services to users. Most e-commerce sites have such systems. These systems serve two important functions. пЃ± пЃ± They help users deal with the information overload by giving them recommendations of products, etc. They help businesses make more profits, i.e., selling more products. CS583, Bing Liu, UIC 3 E.g., movie recommendation пЃ® The most common scenario is the following: пЃ± пЃ± пЃ± A set of users has initially rated some subset of movies (e.g., on the scale of 1 to 5) that they have already seen. These ratings serve as the input. The recommendation system uses these known ratings to predict the ratings that each user would give to those not rated movies by him/her. Recommendations of movies are then made to each user based on the predicted ratings. CS583, Bing Liu, UIC 4 Different variations пЃ® In some applications, there is no rating information while in some others there are also additional attributes пЃ± пЃ± пЃ® about each user (e.g., age, gender, income, marital status, etc), and/or about each movie (e.g., title, genre, director, leading actors or actresses, etc). When no rating information, the system will not predict ratings but predict the likelihood that a user will enjoy watching a movie. CS583, Bing Liu, UIC 5 The Recommendation Problem пЃ® пЃ® We have a set of users U and a set of items S to be recommended to the users. Let p be an utility function that measures the usefulness of item s (пѓЋ S) to user u (пѓЋ U), i.e., пЃ± пЃ® p:UГ—S п‚® R, where R is a totally ordered set (e.g., non-negative integers or real numbers in a range) Objective пЃ± пЃ± Learn p based on the past data Use p to predict the utility value of each item s (пѓЋ S) to each user u (пѓЋ U) CS583, Bing Liu, UIC 6 As Prediction пЃ® Rating prediction, i.e., predict the rating score that a user is likely to give to an item that s/he has not seen or used before. E.g., пЃ± пЃ® rating on an unseen movie. In this case, the utility of item s to user u is the rating given to s by u. Item prediction, i.e., predict a ranked list of items that a user is likely to buy or use. CS583, Bing Liu, UIC 7 Two basic approaches пЃ® Content-based recommendations: пЃ± пЃ® Collaborative filtering (or collaborative recommendations): пЃ± пЃ® The user will be recommended items similar to the ones the user preferred in the past; The user will be recommended items that people with similar tastes and preferences liked in the past. Hybrids: Combine collaborative and contentbased methods. CS583, Bing Liu, UIC 8 Road Map пЃ® пЃ® пЃ® Introduction Content-based recommendation Collaborative filtering based recommendation пЃ± пЃ± пЃ± K-nearest neighbor Association rules Matrix factorization CS583, Bing Liu, UIC 9 Content-Based Recommendation пЃ® Perform item recommendations by predicting the utility of items for a particular user based on how вЂњsimilarвЂќ the items are to those that he/she liked in the past. E.g., пЃ± пЃ± In a movie recommendation application, a movie may be represented by such features as specific actors, director, genre, subject matter, etc. The userвЂ™s interest or preference is also represented by the same set of features, called the user profile. CS583, Bing Liu, UIC 10 Content-based recommendation (contd) пЃ® пЃ® пЃ® Recommendations are made by comparing the user profile with candidate items expressed in the same set of features. The top-k best matched or most similar items are recommended to the user. The simplest approach to content-based recommendation is to compute the similarity of the user profile with each item. CS583, Bing Liu, UIC 11 Road Map пЃ® пЃ® пЃ® Introduction Content-based recommendation Collaborative filtering based recommendations пЃ± пЃ± пЃ± K-nearest neighbor Association rules Matrix factorization CS583, Bing Liu, UIC 12 Collaborative filtering пЃ® Collaborative filtering (CF) is perhaps the most studied and also the most widely-used recommendation approach in practice. пЃ± пЃ± пЃ± пЃ® k-nearest neighbor, association rules based prediction, and matrix factorization Key characteristic of CF: it predicts the utility of items for a user based on the items previously rated by other like-minded users. CS583, Bing Liu, UIC 13 k-nearest neighbor пЃ® пЃ® kNN (which is also called the memory-based approach) utilizes the entire user-item database to generate predictions directly, i.e., there is no model building. This approach includes both пЃ± пЃ± User-based methods Item-based methods CS583, Bing Liu, UIC 14 User-based kNN CF пЃ® A user-based kNN collaborative filtering method consists of two primary phases: пЃ± пЃ± пЃ® the neighborhood formation phase and the recommendation phase. There are many specific methods for both. Here we only introduce one for each phase. CS583, Bing Liu, UIC 15 Neighborhood formation phase пЃ® пЃ® Let the record (or profile) of the target user be u (represented as a vector), and the record of another user be v (v пѓЋ T). The similarity between the target user, u, and a neighbor, v, can be calculated using the PearsonвЂ™s correlation coefficient: sim(u, v) пЂЅ пѓҐ пѓҐ iпѓЋC CS583, Bing Liu, UIC iпѓЋC (ru,i пЂ ru )(rv,i пЂ rv ) (ru,i пЂ ru ) 2 пѓҐ iпѓЋC (rv,i пЂ rv ) 2 , 16 Recommendation Phase пЃ® Use the following formula to compute the rating prediction of item i for target user u p(u, i) пЂЅ ru пѓҐ пЂ« sim(u, v) п‚ґ (rv,i пЂr v ) пѓҐvпѓЋV sim(u, v) vпѓЋV where V is the set of k similar users, rv,i is the rating of user v given to item i, CS583, Bing Liu, UIC 17 Issue with the user-based kNN CF пЃ® The problem with the user-based formulation of collaborative filtering is the lack of scalability: пЃ± пЃ® it requires the real-time comparison of the target user to all user records in order to generate predictions. A variation of this approach that remedies this problem is called item-based CF. CS583, Bing Liu, UIC 18 Item-based CF пЃ® The item-based approach works by comparing items based on their pattern of ratings across users. The similarity of items i and j is computed as follows: sim(i, j) пЂЅ CS583, Bing Liu, UIC пѓҐ uпѓЋU (ru,i пЂ ru )(ru, j пЂ ru ) 2 ( r пЂ r ) пѓҐuпѓЋU u,i u 2 ( r пЂ r ) пѓҐuпѓЋU u, j u 19 Recommendation phase пЃ® After computing the similarity between items we select a set of k most similar items to the target item and generate a predicted value of user uвЂ™s rating r пѓҐ p(u, i) пЂЅ пѓҐ jпѓЋJ u, j п‚ґ sim(i, j) sim ( i , j ) jпѓЋJ where J is the set of k similar items CS583, Bing Liu, UIC 20 Road Map пЃ® пЃ® пЃ® Introduction Content-based recommendation Collaborative filtering based recommendation пЃ± пЃ± пЃ± K-nearest neighbor Association rules Matrix factorization CS583, Bing Liu, UIC 21 Association rule-based CF пЃ® пЃ® пЃ® пЃ® Association rules obviously can be used for recommendation. Each transaction for association rule mining is the set of items bought by a particular user. We can find item association rules, e.g., buy_X, buy_Y -> buy_Z Rank items based on measures such as confidence, etc. пЃ± See Chapter 3 for details CS583, Bing Liu, UIC 22 Road Map пЃ® пЃ® пЃ® Introduction Content-based recommendation Collaborative filtering based recommendation пЃ± пЃ± пЃ± K-nearest neighbor Association rules Matrix factorization CS583, Bing Liu, UIC 23 Matrix factorization пЃ® The idea of matrix factorization is to decompose a matrix M into the product of several factor matrices, i.e., M пЂЅ F1F2 ...Fn where n can be any number, but it is usually 2 or 3. CS583, Bing Liu, UIC 24 CF using matrix factorization пЃ® пЃ® Matrix factorization has gained popularity for CF in recent years due to its superior performance both in terms of recommendation quality and scalability. Part of its success is due to the Netflix Prize contest for movie recommendation, which popularized a Singular Value Decomposition (SVD) based matrix factorization algorithm. пЃ± The prize winning method of the Netflix Prize Contest employed an adapted version of SVD CS583, Bing Liu, UIC 25 The abstract idea пЃ® Matrix factorization a latent factor model. Latent variables (also called features, aspects, or factors) are introduced to account for the underlying reasons of a user purchasing or using a product. пЃ± пЃ± When the connections between the latent variables and observed variables (user, product, rating, etc.) are estimated during the training recommendations can be made to users by computing their possible interactions with each product through the latent variables. CS583, Bing Liu, UIC 26 Netflix Prize Contest CS583, Bing Liu, UIC 27 Netflix Prize Task пЃ® Training data: Quadruples of the form (user, movie, rating, time) пЃ± For our purpose here, we only use triplets, i.e., (user, movie, rating) пЃ± пЃ® For example, (132456, 13546, 4) means that the user with ID 132456 gave the movie with ID 13546 a rating of 4 (out of 5). Testing: predict the rating of each triplet: (user, movie, ?) CS583, Bing Liu, UIC 28 SVD factorization пЃ® The technique discussed here is based on the SVD method given by пЃ± пЃ± пЃ± пЃ® Simon Funk at his blog site, the derivation of FunkвЂ™s method described by Wagman in the Netflix forums. the paper by Takacs et al. The method was later improved by Koren et al., Paterek and several other researchers. CS583, Bing Liu, UIC 29 Intuitive Idea CS583, Bing Liu, UIC 30 Simon FunkвЂ™s SVD method where U = [u1, u2, вЂ¦, uI] and M = [m1, m2, вЂ¦, mJ] CS583, Bing Liu, UIC 31 SVD method (contd) пЃ® пЃ® пЃ® Let us use K = 90 latent aspects (K needs to be set experimentally). Then, each movie will be described by only ninety aspect values indicating how much that movie exemplifies each aspect. Correspondingly, each user is also described by ninety aspect values indicating how much he/she prefers each aspect. CS583, Bing Liu, UIC 32 SVD method (contd) пЃ® To combine these together into a rating, we multiply each user preference by the corresponding movie aspect, and then sum them up to give a rating to indicate how much that user likes that movie: пЃ± пЃ® U = [u1, u2, вЂ¦, uI] and M = [m1, m2, вЂ¦, mJ] Using SVD, we can perform the task K rij п‚» ui m j пЂЅ пѓҐuki п‚ґ mkj T k пЂЅ1 CS583, Bing Liu, UIC 33 SVD method (contd) пЃ® пЃ® SVD is a mathematical way to find these two smaller matrices which minimizes the resulting approximation error, the mean square error (MSE). We can use the resulting matrices U and M to predict the ratings in the test set. CS583, Bing Liu, UIC 34 SVD method (contd) CS583, Bing Liu, UIC 35 SVD method (contd) пЃ® пЃ® To minimize the error, the gradient descent approach is used. For gradient descent, we take the partial derivative of the square error with respect to each parameter, i.e. with respect to each uki and mkj. п‚¶(eij )2 п‚¶eij пЂЅ 2eij п‚¶uki п‚¶uki CS583, Bing Liu, UIC 36 SVD method (contd) CS583, Bing Liu, UIC 37 SVD method (contd) CS583, Bing Liu, UIC 38 The final update rules пЃ® By the same reasoning, we can also compute the update rule for mkj. Finally, we have both rules пЃ® The final prediction uses Eq. (11) пЃ® CS583, Bing Liu, UIC 39 Further improvements пЃ® пЃ® пЃ® пЃ® пЃ® The two basic rules need some improvements to make them work well. There are also some pre-processing. Time was also added later. Etc Note: пЃ± пЃ± Funk used stochastic gradient descent Not the batch (global) gradient descent. CS583, Bing Liu, UIC 40

1/--страниц