Item-based k-nearest neighbors in rrecsys

Given a target user and her positively (i.e., above a pre-defined threshold) rated items, the algorithm relies on the items' similarities for the formation of a neighborhood of nearest items.

The choice of the k nearest neighbors for the neighborhood formation results in a tradeoff: a very small k leads to few candidate items that can be recommended because there are not a lot of neighbors to support the predictions. In contrast, a very large k impacts precision as the particularities of user's preferences can be blunted due to the large neighborhood size. In most related works k has been set to be in the range of values from 10 to 100, where the optimum k also depends on data characteristics such as sparsity.

The similarity is measured based on two algorithms: cosine(simFunct ='cos') and adjusted cosine(simFunct = 'adjCos').

For the Rating Prediction task, to train a model with this algorithm, it is required to define an additional argument, neigh the neighborhood size.

data("ml100k")
d <- defineData(ml100k)
e <- evalModel(d, folds = 2)
ib_model_res <- evalPred(e, "ibknn", simFunct = "cos", neigh = 10)
ib_model_res

For the Item Recommendation task, to provide item recommendations, it is required to define two additional arguments, positiveThreshold the threshold for “positive” ratings, and the topN the number of recommended items.

data("ml100k")
d <- defineData(ml100k)
e <- evalModel(d, folds = 2)
ib_model_res <- evalRec(e, "ibknn", simFunct = "cos", neigh = 10, positiveThreshold = 3, topN = 3)
ib_model_res

The neigh default value is 10. The positiveThreshold default value is 3. The topN default value is 3.

The returned object is of type IBclass.

To get more details about the slots read the reference manual.