#### DMCA

## Item Cold-Start Recommendations: Learning Local Collective Embeddings

### Citations

3222 | Modern Information Retrieval
- Baeza-Yates, Ribeiro-Neto
- 1999
(Show Context)
Citation Context ...verting all tokens to lower case and removing numbers, stop-words and infrequent tokens (appearing < 5 times). Evaluation Metrics. The output of the algorithms is a ranking of the past recipients of how likely they are to be recipients of the new email. The feedback from the users is explicit, i.e., we have ground truth of who are the recipients of the mail as specified by the user. To evaluate the ranking produced by each algorithm we use the state-of-the-art metrics from Information Retrieval: Micro and Macro F1, Mean Average Precision (MAP), and Normalized Discounted Cumulative Gain (NDCG) [6]. Evaluation Protocol. As the data is intrinsically influenced by time we sort the 10 mailboxes chronologically. We divide the messages in 80% training and 20% testing, resulting in 10 independent train/test subsets. Only the recipients that appear in the training period are considered as potential receivers. We tune the hyper-parameters of the methods on independent validation set, 10% of the training set. Finally, we evaluate the statistical significance of the differences in performance by using a Wilcoxon signed rank test [14]. Results. Figure 1 shows the average performance of each method... |

1663 |
Learning the parts of objects by nonnegative matrix factorization
- Lee, Seung
- 1999
(Show Context)
Citation Context ...rization techniques have been used to discover topics in document collections by decomposing the content, i.e., document-term matrix. Non-negative Matrix Factorization (NMF) is one such approach that factorizes the document-term matrix in two nonnegative, low-rank matrices, where one matrix corresponds to the topics in the collection and the other represents the extent to which documents belong to these topics. Due to the non-negativity constraints, NMF produces a so-called “additive parts-based” representation of the data that increases the sparsity and interpretability of the hidden factors [22]. In this paper, we propose a new hybrid recommendation approach that exploits both the properties of the items and 89 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers r to redistribute t... |

1485 | Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions
- Adomavicius, Tuzhilin
- 2005
(Show Context)
Citation Context ...uce recommendations using the description of the items and are the default solution to the item cold-start. However, they tend to achieve lower accuracy and, in practice, they are seldom the only choice. The problem of item cold-start is of great practical importance because of two main reasons. First, modern online platforms publish hundreds of new items everyday and effectively recommending them is essential for keeping the users continuously engaged. Second, collaborative filtering methods are at the core of most recommendation engines, as they tend to achieve the state-of-the-art accuracy [1]. However, to produce recommendations at the expected accuracy they require that items are rated by a sufficient number of users. Therefore, it is crucial for every collaborative recommender to reach this state as soon as possible. Having methods that produce accurate recommendations for new items will allow enough feedback to be collected in a short amount of time, making effective collaborative recommendations possible. Recently, matrix factorization techniques have been extensively used in the recommendation systems and topic modelling literature. Many collaborative filtering systems approx... |

1223 | Probabilistic latent semantic indexing
- Hofmann
(Show Context)
Citation Context ..., we compare this technique against the method we propose. Schein et al. [25] propose a probabilistic model for coldstart recommendations that is very similar to the one proposed by Soboroff. Their approach extends the work of Hoffman and Puzicha [18] which models the joint distribution of users and items through an aspect model that clusters users and items in a latent space. In order to deal with new items, instead of modelling the joint distribution of users and items, the authors propose to model the joint distribution of users and content features. At query time a “folding-in”’ technique [17] is used to embed new items into the latent space so that items can be recommended. After careful analysis one may notice that the technique essentially boils down to building user profiles and applying pLSA to discover latent factors. Taking into account that previous studies have shown the correspondence between pLSA and NMF [16], one may clearly distinguish between this approach and our proposal. Instead of explicitly building user profiles and finding latent features, we discover a latent space common to both the content and collaborative information that allows us to link one to the other... |

743 | Statistical comparisons of classifiers over multiple data-sets
- Demšar
(Show Context)
Citation Context ...recision (MAP), and Normalized Discounted Cumulative Gain (NDCG) [6]. Evaluation Protocol. As the data is intrinsically influenced by time we sort the 10 mailboxes chronologically. We divide the messages in 80% training and 20% testing, resulting in 10 independent train/test subsets. Only the recipients that appear in the training period are considered as potential receivers. We tune the hyper-parameters of the methods on independent validation set, 10% of the training set. Finally, we evaluate the statistical significance of the differences in performance by using a Wilcoxon signed rank test [14]. Results. Figure 1 shows the average performance of each method across the 10 mailboxes. LCE performs better than the other methods in all measures with differences ranging from 5%-15%. All differences are statistically significant (Wilcoxon signed rank test, p < 0.05). Imposing locality leads to small performance improvements in some measures, however the differences are not statistically significant. This indicates that nearest neighbor graph does not bring additional information in the case of emails. One explanation may be that emails are user generated content and as such contain a lot o... |

663 | Laplacian eigenmaps and spectral techniques for embedding and clustering
- Belkin
- 2002
(Show Context)
Citation Context ...m both views. We make an implicit assumption that the data from both views is drawn from a common distribution. One may hope that additional knowledge of this distribution can be exploited to discover a better low-dimensional space. A natural assumption could be that: if two data points xi and xj , in any view, are close in the intrinsic geometry of the distribution, then their representations in the low-dimensional space should also be close to each other. This assumption is commonly referred to as the manifold assumption and plays an essential role in algorithms for dimensionality reduction [8] and semi-supervised learning [9]. In reality the geometric structure of the distribution is not known and cannot be directly used. However, recent studies on spectral graph theory [12] and manifold learning [7] have shown that the local geometric structure can be effectively modeled through a nearest neighbor graph on a scatter of data points. Consider a graph with n nodes where each node represents a data point. For each point we find the p nearest neighbors and we connect the corresponding nodes in the graph. The edges may be binary (1 if one of the nearest neighbors, 0 otherwise) or may be... |

593 | Matrix factorization techniques for recommender systems
- Koren, Bell, et al.
(Show Context)
Citation Context ...er of users. Therefore, it is crucial for every collaborative recommender to reach this state as soon as possible. Having methods that produce accurate recommendations for new items will allow enough feedback to be collected in a short amount of time, making effective collaborative recommendations possible. Recently, matrix factorization techniques have been extensively used in the recommendation systems and topic modelling literature. Many collaborative filtering systems approximate the collaborative matrix by applying techniques such as Singular Value Decomposition (SVD) or UV decomposition [20]. Similar matrix factorization techniques have been used to discover topics in document collections by decomposing the content, i.e., document-term matrix. Non-negative Matrix Factorization (NMF) is one such approach that factorizes the document-term matrix in two nonnegative, low-rank matrices, where one matrix corresponds to the topics in the collection and the other represents the extent to which documents belong to these topics. Due to the non-negativity constraints, NMF produces a so-called “additive parts-based” representation of the data that increases the sparsity and interpretability ... |

575 | Manifold regularization: A geometric framework for learning from labeled and unlabeled examples
- Belkin, Niyogi, et al.
- 2006
(Show Context)
Citation Context ... assumption that the data from both views is drawn from a common distribution. One may hope that additional knowledge of this distribution can be exploited to discover a better low-dimensional space. A natural assumption could be that: if two data points xi and xj , in any view, are close in the intrinsic geometry of the distribution, then their representations in the low-dimensional space should also be close to each other. This assumption is commonly referred to as the manifold assumption and plays an essential role in algorithms for dimensionality reduction [8] and semi-supervised learning [9]. In reality the geometric structure of the distribution is not known and cannot be directly used. However, recent studies on spectral graph theory [12] and manifold learning [7] have shown that the local geometric structure can be effectively modeled through a nearest neighbor graph on a scatter of data points. Consider a graph with n nodes where each node represents a data point. For each point we find the p nearest neighbors and we connect the corresponding nodes in the graph. The edges may be binary (1 if one of the nearest neighbors, 0 otherwise) or may be weighted (e.g., cosine similarit... |

365 | 2004) “The Author-Topic Model for Authors and Documents
- Rosen-Zvi, Griffiths, et al.
(Show Context)
Citation Context ...ion of both the content and collaborative matrix, instead of factorizing only the content matrix. 92 LSI on the User Profiles (UP-LSI). We apply the hybrid recommendation system proposed in [28] (see Section 2). The approach combines the content and collaborative information by building user profiles and applying Latent Semantic Indexing (LSI) to discover latent factors. At test time, the new items are projected in the latent space and compared to the user profiles. Finally, the items are recommended to the users with the most similar profiles. Author-topic Model (ATM). The author-topic model [23] is a generative probabilistic model which extends LDA to include authorship information. It associates each author with a multinomial distribution over topics, and each topic with multinomial distribution over words. As the authors point out, the model may not only be used to find the topics associated with the authors, but also to predict the authors of unobserved documents. In the email recipient recommendation experiment we model the recipients as authors, while in the news recommendation scenario we model the users as authors. As recommended, we set the parameters as: α = 50/k, where k is... |

329 | Methods and metrics for cold-start recommendations
- Schein, Popescul, et al.
(Show Context)
Citation Context ...discover topics in the collection and implicitly learn commonalities among the user profiles. Incoming documents are projected on the LSI space and compared to the user profiles. The documents are recommended to the users with the most similar profiles. The author argues that applying LSI on the user profiles instead of the documents allows one to take into account the collaborative input and consequently improves the recommendation performance. However, the system is not evaluated in the cold-start scenario. In section 5, we compare this technique against the method we propose. Schein et al. [25] propose a probabilistic model for coldstart recommendations that is very similar to the one proposed by Soboroff. Their approach extends the work of Hoffman and Puzicha [18] which models the joint distribution of users and items through an aspect model that clusters users and items in a latent space. In order to deal with new items, instead of modelling the joint distribution of users and items, the authors propose to model the joint distribution of users and content features. At query time a “folding-in”’ technique [17] is used to embed new items into the latent space so that items can be re... |

210 | Latent class models for collaborative filtering
- Hofmann, Puzicha
(Show Context)
Citation Context ...les. The documents are recommended to the users with the most similar profiles. The author argues that applying LSI on the user profiles instead of the documents allows one to take into account the collaborative input and consequently improves the recommendation performance. However, the system is not evaluated in the cold-start scenario. In section 5, we compare this technique against the method we propose. Schein et al. [25] propose a probabilistic model for coldstart recommendations that is very similar to the one proposed by Soboroff. Their approach extends the work of Hoffman and Puzicha [18] which models the joint distribution of users and items through an aspect model that clusters users and items in a latent space. In order to deal with new items, instead of modelling the joint distribution of users and items, the authors propose to model the joint distribution of users and content features. At query time a “folding-in”’ technique [17] is used to embed new items into the latent space so that items can be recommended. After careful analysis one may notice that the technique essentially boils down to building user profiles and applying pLSA to discover latent factors. Taking into... |

196 | Algorithms and applications for approximation nonnegative matrix factorization
- BERRY, BROWNE, et al.
- 2007
(Show Context)
Citation Context ...f and only if Hu, Hs and W are at a stationary point of the function. A detailed proof of the above theorem is provided as supplement material1. 3.6 Inference Once the model has been trained to learn W, Hs and Hu, we can use these factors for prediction. For instance, given the bag-of-words vector of a new news article qs, we can predict the users that are most likely to post a comment, i.e., qu. To do so, we project the document vector qs to the common latent space by solving the overdetermined system qs = wHs using the least squares method (with a projection to 0 of the negative values, see [10]). The vector w, computed online, captures the factors – in the common latent space – that explain the observed news article qs. Then, by using this low dimensional vector w we may infer the missing part of the query: qu ← wHu. Each element of qu represents a score of how likely it is that the user will comment the new article. Then, given these scores, we may rank the users. 1https://github.com/msaveski/LCE/blob/master/ Th1Proof.pdf 4. EXPLAINING RECOMMENDATIONS Good recommendations are not only accurate but also transparent, i.e., supported with explanations. This allows the end users to und... |

193 | Collaborative filtering for implicit feedback datasets
- Hu, Koren, et al.
- 2009
(Show Context)
Citation Context ...its, punctuation, short (< 3 characters) and infrequent (appearing < 3 times) tokens are removed. Evaluation Metrics. Similar to the previous experiment, the output of each algorithm is a ranking. In this experiment, however, we do not have an explicit feedback of which news articles were undesired by the users. While, commenting an article is an evidence of the user’s interest in it, the absence of a comment is not an indication that the article was undesired, as not commenting may stem from multiple different reasons. Therefore, we adopt the average percentile ranking, a measure proposed in [19] and widely used to evaluate ranking based on implicit feedback (e.g., [24]). We define ranku,i as the percentile ranking of article i in the ranked list of articles for the user u; if ranku,i = 0%, then the article i is predicted to be the most interesting for u, while ranku,i = 100% implies that the article is predicted to be the least interesting. Our quality measure is then the total average percentile ranking of an article: rank = ∑ u,i commentu,i · ranku,i∑ u,i commentu,i , where commentu,i is an indicator function that equals to: 1 if the user u commented on article i; and 0 otherwise. ... |

180 |
Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis
- Cichocki, Zdunek, et al.
- 2009
(Show Context)
Citation Context ...earning Algorithm The optimization problem defined above is non-convex in terms of all parameters (W, Hs, Hu) together. Thus, it is 91 unrealistic to expect an algorithm to find the global minimum. In what follows, we derive an iterative algorithm based on multiplicative update rules which can achieve a stationary point. The partial derivatives of J w.r.t. W, Hs, and Hu are: ∇WJ = αWHsHsT − αXsHsT + (1− α)WHuHuT− − (1− α)XuHuT + βLW + λW, (3) ∇HsJ = αW TWHs − αWTXs + λHs (4) ∇HuJ = (1− α)W TWHu − (1− α)WTXu + λHu (5) Applying the Karush-Kuhn-Tucker (KKT) first-order optimality conditions to J [13], we derive: W ≥ 0, Hs ≥ 0, Hu ≥ 0, (6) ∇WJ ≥ 0, ∇HsJ ≥ 0, ∇HuJ ≥ 0, (7) W ∇WJ = 0, Hs ∇HsJ = 0, Hu ∇HuJ = 0, (8) where corresponds to the element-wise matrix multiplication operator. Substituting the derivatives of J from Equations (3), (4) and (5) in Equation (8) leads to the following update rules: W←W [αXsHs T + (1− α)XuHuT + βAW] [αWHsHs T + (1− α)WHuHuT + βDW + λW] , (9) Hs ← Hs [αWTXs] [αWTWHs + λHs] , (10) Hu ← Hu [(1− α)WTXu] [(1− α)WTWHu + λHu] , (11) where •• denotes the element-wise matrix division operator. We define the following theorem: Theorem 1. The objective function J in Eq... |

129 | Relational learning via collective matrix factorization
- Singh, Gordon
- 2008
(Show Context)
Citation Context ...ew items into the latent space so that items can be recommended. After careful analysis one may notice that the technique essentially boils down to building user profiles and applying pLSA to discover latent factors. Taking into account that previous studies have shown the correspondence between pLSA and NMF [16], one may clearly distinguish between this approach and our proposal. Instead of explicitly building user profiles and finding latent features, we discover a latent space common to both the content and collaborative information that allows us to link one to the other. Singh and Gordon [27] propose the idea of collective matrix factorization, a general framework for multi-relational factorization models. They subsume models on any number of relations as long as their loss function is a twice differentiable decomposable loss. In their work, they address both rating prediction and item recommendation. The matrix factorization approach proposed in this work is based on a similar idea of collective factorization. However, we enforce non-negativity constraints on the factorization to obtain sparse and interpretable factors, and we consider the specific scenario of cold-start recommen... |

98 |
Regression-based latent factor models
- Agarwal, Chen
- 2009
(Show Context)
Citation Context ...tors are used to make predictions. As proposed by the authors, we used Bayesian Personalized Ranking (BPR) to factorize the collaborative matrix. To learn the mappings we used the K-Nearest-Neighbor and BPR optimization; however, the kNN mapping was superior in all cases and thus we report the results for only for kNN mapping. This is consistent with their experimental results. We test with different number of latent factors k ∈ {100, 200, 300, 400, 500, 600, 700, 800, 900, 1000}. fLDA. Is a latent factor model proposed in [3] and it is an extension of the Regression Based Latent Factor model [2] for applications where ”bag-of-words”representation of items is natural. The main idea is regularizing the user and item factors simultaneously through the user features and the words associated with the items. The user ratings are modelled as user’s affinity to the item’s topics, where the user’s affinity to topics and topic assignments to items are learned jointly in supervised fashion. In our experiments we only use the item feature as user features are not available. As recommended by the authors, we run 20 EM iterations with 100 samples, drawn after 10 burn-in samples. In the email recom... |

82 |
Problems of Learning on Manifolds
- Belkin
- 2003
(Show Context)
Citation Context ...ow-dimensional space. A natural assumption could be that: if two data points xi and xj , in any view, are close in the intrinsic geometry of the distribution, then their representations in the low-dimensional space should also be close to each other. This assumption is commonly referred to as the manifold assumption and plays an essential role in algorithms for dimensionality reduction [8] and semi-supervised learning [9]. In reality the geometric structure of the distribution is not known and cannot be directly used. However, recent studies on spectral graph theory [12] and manifold learning [7] have shown that the local geometric structure can be effectively modeled through a nearest neighbor graph on a scatter of data points. Consider a graph with n nodes where each node represents a data point. For each point we find the p nearest neighbors and we connect the corresponding nodes in the graph. The edges may be binary (1 if one of the nearest neighbors, 0 otherwise) or may be weighted (e.g., cosine similarity). This results in a matrix A which can later be used to measure the local closeness of two points xi and xj . Recall that the collective factorization maps each data point xi i... |

70 |
Relations between PLSA and NMF and Implications
- Gaussier, Goutte
- 2005
(Show Context)
Citation Context ...at clusters users and items in a latent space. In order to deal with new items, instead of modelling the joint distribution of users and items, the authors propose to model the joint distribution of users and content features. At query time a “folding-in”’ technique [17] is used to embed new items into the latent space so that items can be recommended. After careful analysis one may notice that the technique essentially boils down to building user profiles and applying pLSA to discover latent factors. Taking into account that previous studies have shown the correspondence between pLSA and NMF [16], one may clearly distinguish between this approach and our proposal. Instead of explicitly building user profiles and finding latent features, we discover a latent space common to both the content and collaborative information that allows us to link one to the other. Singh and Gordon [27] propose the idea of collective matrix factorization, a general framework for multi-relational factorization models. They subsume models on any number of relations as long as their loss function is a twice differentiable decomposable loss. In their work, they address both rating prediction and item recommenda... |

59 | Combining content and collaboration in text filtering.
- Soboroff, Nicholas
- 1999
(Show Context)
Citation Context ...mmendation, LCE, that combines the content and collaborative information in a unified matrix factorization framework while exploiting the local geometrical structure of the data; • We propose a simple and efficient learning algorithm, based on multiplicative update rules and prove its convergence; • We conduct an extensive experimental study and we show that the proposed methods outperform six stateof-the-art methods for item-cold start recommendation. 2. RELATED WORK In this section, we briefly describe several hybrid recommender systems that can handle the item cold-start scenario. Soboroff [28] proposed a technique based on Latent Semantic Indexing (LSI) for combining the collaborative filtering input and the document content for recommendation of textual items. The method builds a content profile for each user, as a linear combination of the preferred documents, and applies LSI to discover topics in the collection and implicitly learn commonalities among the user profiles. Incoming documents are projected on the LSI space and compared to the user profiles. The documents are recommended to the users with the most similar profiles. The author argues that applying LSI on the user prof... |

44 | fLDA: matrix factorization through latent dirichlet allocation
- Agarwal, Chen
- 2010
(Show Context)
Citation Context ... (2) is used to infer the factors from the attributes, and then the factors are used to make predictions. As proposed by the authors, we used Bayesian Personalized Ranking (BPR) to factorize the collaborative matrix. To learn the mappings we used the K-Nearest-Neighbor and BPR optimization; however, the kNN mapping was superior in all cases and thus we report the results for only for kNN mapping. This is consistent with their experimental results. We test with different number of latent factors k ∈ {100, 200, 300, 400, 500, 600, 700, 800, 900, 1000}. fLDA. Is a latent factor model proposed in [3] and it is an extension of the Regression Based Latent Factor model [2] for applications where ”bag-of-words”representation of items is natural. The main idea is regularizing the user and item factors simultaneously through the user features and the words associated with the items. The user ratings are modelled as user’s affinity to the item’s topics, where the user’s affinity to topics and topic assignments to items are learned jointly in supervised fashion. In our experiments we only use the item feature as user features are not available. As recommended by the authors, we run 20 EM iteratio... |

42 | Non-negative matrix factorization on manifold
- Cai, He, et al.
- 2008
(Show Context)
Citation Context ...epresented as the sum of the latent factors associated with the textual content (tags–named entities) and the commenters. A modification of the model for real-time scenarios is presented in [4]. The authors show that the recommendation accuracy grows as the number of commenters grows. However, in an item cold-start scenario articles are not yet associated with commenters, thus an article can only be represented with the latent factors of the textual tags. Exploiting the local geometric structure of the data to discover better low-dimensional representations has been investigated by Cai et al. [11]. Inspired by the success of using the nearest neighbor graph for label propagation in semisupervised learning, they propose a clustering technique. The algorithm favours factorizations for which similar instances have similar low-dimensional representations. The authors show that, by imposing this constraint, they outperform classical clustering and factorization techniques. In this work, we impose such geometrical constraints but for collective factorization for which we can handle multiple data sources, i.e., the content and collaborative data matrices. Finally, it is worth noting the 2011 ... |

32 | Learning attribute-to-feature mappings for cold-start recommendations.
- Gantner, Drumond, et al.
- 2010
(Show Context)
Citation Context ... over topics, and each topic with multinomial distribution over words. As the authors point out, the model may not only be used to find the topics associated with the authors, but also to predict the authors of unobserved documents. In the email recipient recommendation experiment we model the recipients as authors, while in the news recommendation scenario we model the users as authors. As recommended, we set the parameters as: α = 50/k, where k is the number of topics, β = 0.01, and we perform 500 iterations of the Gibbs Sampler. Learning Attribute-to-Feature Mappings (BPR-kNN). This method [15] handles the cold-start in two steps: (1) factorizing the collaborative matrix to learn latent factor representation of the users and items, (2) learning a mapping between the user/item attributes and the corresponding latent factors. When a new user/item arrives in the system, the mapping from (2) is used to infer the factors from the attributes, and then the factors are used to make predictions. As proposed by the authors, we used Bayesian Personalized Ranking (BPR) to factorize the collaborative matrix. To learn the mappings we used the K-Nearest-Neighbor and BPR optimization; however, the ... |

32 | Automatic labelling of topic models.
- Lau, Grieser, et al.
- 2011
(Show Context)
Citation Context ...orted with explanations. This allows the end users to understand the reasoning behind the recommendations and helps them build trust towards the system. LCE provides a natural way of explaining recommendations in terms of users’ affinity to topics. To obtain the topical interest profile for a user i we can construct a vector xu ∈ R1×u (u is the number of users), where all elements are equal to zero except [xu]i = 1, and solve for w in xu = wHu; every element of w quantifies the user’s affinity to a specific topic. Topics may be presented using the top-k terms or by automatic annotation (e.g., [21]). Furthermore, by computing xs ← wHs (xs ∈ R1×m), we obtain the association between the user and every word in the vocabulary; we may present this information to the user, e.g., using a word cloud. To debug and track down the source of unexpected behaviour of the system one may examine the link between topics and communities (e.g., by looking at the top words and users). This link may also be exploited in other applications, such as advertising, where advertisers can easily identify target users based on their topical interests. 5. EXPERIMENTAL EVALUATION In this section we present a series o... |

13 | Care to comment?: recommendations for commenting on news stories.
- Shmueli, Kagian, et al.
- 2012
(Show Context)
Citation Context ...lective matrix factorization, a general framework for multi-relational factorization models. They subsume models on any number of relations as long as their loss function is a twice differentiable decomposable loss. In their work, they address both rating prediction and item recommendation. The matrix factorization approach proposed in this work is based on a similar idea of collective factorization. However, we enforce non-negativity constraints on the factorization to obtain sparse and interpretable factors, and we consider the specific scenario of cold-start recommendations. Shmueli et al. [26] consider a similar scenario of news recommendation (Section 5.3), i.e., predicting the articles a user is most likely to comment on. They combine contentbased and collaborative filtering approach using a latent factor model. The odds that a user will comment an article are estimated as the inner product of the user and article factors, where the article factors are represented as the sum of the latent factors associated with the textual content (tags–named entities) and the commenters. A modification of the model for real-time scenarios is presented in [4]. The authors show that the recommend... |

6 | Ads and the city: Considering geographic distance goes a long way.
- Saez-Trumper, Quercia, et al.
- 2012
(Show Context)
Citation Context ...s) tokens are removed. Evaluation Metrics. Similar to the previous experiment, the output of each algorithm is a ranking. In this experiment, however, we do not have an explicit feedback of which news articles were undesired by the users. While, commenting an article is an evidence of the user’s interest in it, the absence of a comment is not an indication that the article was undesired, as not commenting may stem from multiple different reasons. Therefore, we adopt the average percentile ranking, a measure proposed in [19] and widely used to evaluate ranking based on implicit feedback (e.g., [24]). We define ranku,i as the percentile ranking of article i in the ranked list of articles for the user u; if ranku,i = 0%, then the article i is predicted to be the most interesting for u, while ranku,i = 100% implies that the article is predicted to be the least interesting. Our quality measure is then the total average percentile ranking of an article: rank = ∑ u,i commentu,i · ranku,i∑ u,i commentu,i , where commentu,i is an indicator function that equals to: 1 if the user u commented on article i; and 0 otherwise. The lower rank, the better the quality of the ranking. For random predictio... |

3 | A time-based collective factorization for topic discovery and monitoring in news - Vaca, Mantrach, et al. - 2014 |

2 |
Dynamic personalized recommendation of comment-eliciting stories.
- Aharon, Kagian, et al.
- 2012
(Show Context)
Citation Context ...art recommendations. Shmueli et al. [26] consider a similar scenario of news recommendation (Section 5.3), i.e., predicting the articles a user is most likely to comment on. They combine contentbased and collaborative filtering approach using a latent factor model. The odds that a user will comment an article are estimated as the inner product of the user and article factors, where the article factors are represented as the sum of the latent factors associated with the textual content (tags–named entities) and the commenters. A modification of the model for real-time scenarios is presented in [4]. The authors show that the recommendation accuracy grows as the number of commenters grows. However, in an item cold-start scenario articles are not yet associated with commenters, thus an article can only be represented with the latent factors of the textual tags. Exploiting the local geometric structure of the data to discover better low-dimensional representations has been investigated by Cai et al. [11]. Inspired by the success of using the nearest neighbor graph for label propagation in semisupervised learning, they propose a clustering technique. The algorithm favours factorizations for... |

1 |
discovery challenge overview. Discovery Challenge,
- Antulov-Fantulin, Bosnjak, et al.
- 2011
(Show Context)
Citation Context ...upervised learning, they propose a clustering technique. The algorithm favours factorizations for which similar instances have similar low-dimensional representations. The authors show that, by imposing this constraint, they outperform classical clustering and factorization techniques. In this work, we impose such geometrical constraints but for collective factorization for which we can handle multiple data sources, i.e., the content and collaborative data matrices. Finally, it is worth noting the 2011 ECML-PKDD Discovery Challenge which focused on cold-start recommendations of video lectures [5]. 3. LOCAL COLLECTIVE EMBEDDINGS In this section we formally define the item cold-start problem, we explain the intuition behind learning local collective embeddings, and finally we show how such embeddings can be learnt and used for prediction. 3.1 Problem Statement The scenario we consider is the item cold-start recommendation, where we would like to suggest new items – for which no interests has been expressed so far – to potentially interested users. Given a new item, its corresponding description and the patterns of past activities of the users, we want to retrieve the users who are most ... |

1 |
Spectral Graph Teory.
- Chung
- 1997
(Show Context)
Citation Context ...ited to discover a better low-dimensional space. A natural assumption could be that: if two data points xi and xj , in any view, are close in the intrinsic geometry of the distribution, then their representations in the low-dimensional space should also be close to each other. This assumption is commonly referred to as the manifold assumption and plays an essential role in algorithms for dimensionality reduction [8] and semi-supervised learning [9]. In reality the geometric structure of the distribution is not known and cannot be directly used. However, recent studies on spectral graph theory [12] and manifold learning [7] have shown that the local geometric structure can be effectively modeled through a nearest neighbor graph on a scatter of data points. Consider a graph with n nodes where each node represents a data point. For each point we find the p nearest neighbors and we connect the corresponding nodes in the graph. The edges may be binary (1 if one of the nearest neighbors, 0 otherwise) or may be weighted (e.g., cosine similarity). This results in a matrix A which can later be used to measure the local closeness of two points xi and xj . Recall that the collective factorization... |