Building Memory-Efficient Meta-Hybrid Recommender Engine: Back to Front (part 2)

Volodymyr Holomb
Towards Data Science
4 min readMay 12, 2022

--

Image by author

Series overview

In the previous part, we reviewed the mechanics of a memory-based recommender system and built a custom collaborative filtering recommender. Today we will apply “out-of-box” recommenders from popular Python modules, evaluate their efficiency, and try some techniques to seamlessly improve the SVD prediction algorithm known as the Netflix prize winner. At RBC Group we are open to sharing our experience in designing so-called meta-hybrid recommender engines for real-business problem-solving.

Would you recommend more than that?

It is worth reminding that the magic of a memory-based recommender is within a few steps:

  • preparing and scaling vectors of user-item interactions (so-called feature vectors);
  • calculating the distances between users’ feature vectors (so-called “similarity” matrix);
  • multiplying the “similarity” matrix by the matrix of available ratings considering complementary rules and restrictions of a chosen predictive algorithm.

This suggests that the accuracy improvement of a collaborative filtering recommender involves:

  • either adjusting “users’ neighbourhoods” i.e. consider ratings’ of even more alike users;
  • filtering ratings’ matrix i.e. consider items similar to items that a user has rated earlier;
  • or both!

These improvements will turn your ‘pure’ memory-based recommender into a hybrid one which means an inevitable loss of the model’s simplicity and need for external data sources (as related to rating’s matrix), e.g. data from users’ profiles, online sessions, and/or the textual and visual content of items.

Here at RBC Group, we have developed and successfully adopted a somewhat peculiar technique to keep the balance between recsys complexity, interpretability and resource-efficiency, denoting our finding as a meta-hybrid recommender.

This technique is confined to (1) preliminary clustering of items based on their descriptive characteristics (aka metadata), and (2) subsequent processing (completing) of cluster-wise atomized ratings’ matrices with a user-based collaborative filtering recommender.

In this scenario, the neighbourhood of each user is effectively adjusted because of constructing feature partial-vectors from ratings of items of the same cluster. Such partial-vectors have less variance since users are expected to rate similar items (within a cluster) in the same manner. Thus partial vectors tend to better reflect actual preferences/antipathies of users.

Furthermore, an offered technique substantially reduces the memory consumption while fitting the model and making predictions. This is why we depict such a meta-hybrid recommender as memory-efficient.

As earlier in this series of publications, we will continue to demonstrate all those features using an “ml-latest-small” dataset of 5-star ratings and free-text tagging activity from MovieLens. We will use all this data to initially cluster the dataset. i.e movies in it.

First, let us convert genre information into dummy variables:

As one of the predictors, we have also extracted the year of the film’s release and transformed it using some frequency encoding technique.

Let’s further extract and engineer some useful predictors from tags. After combining the tag list with the text of the title we can utilize that combo to build TD-IDF vectors:

All main preprocessing steps including dimensionality reduction are wrapped into a pipeline:

Before applying simple k-means clustering we have determined the optimal number of clusters:

Let’s assign arbitrary labels to clusters and look at the distribution of movies:

Seems like now we are ready to perform our main experiment. We are going to consequently apply SVD-algorithm (from the Python surprise module) to (1) raw data and then to (2) each segment of the clustered data in a loop. Each time we will measure the distribution of precision and recall to compare results:

Some outperforming of the meta-hybrid recommender is obvious. Moreover, such an approach is also memory-efficient as has been previously stated:

To get a deeper understanding of what’s going on inside the recommender engine within this particular example — you should look through the whole code.

Take-away notes

Thus our main message is simple, yet powerful: when you start thinking of the (collaborative filtering) recommender engine as the algorithm that somehow estimates the interaction of the user A with an unknown item X (aka item’s rating) based on the similarity of the user A to other users who have already experienced interaction with item X — you will immediately realise how heavily the accuracy of that estimation should depend on the accuracy of measuring the similarity between users. One way of increasing the accuracy of measuring such a similarity (beyond the scope of pure mathematical methods) is by limiting the set of users you are going to compare to some high-level subset (meta-cluster) of them sharing common characteristics. The same is true if you prefer to (meta-) cluster your set of items as well and run your recommender engine iteratively. Such a ‘naive’ approach may significantly improve the overall efficiency of any ‘out-of-box’ recommender algorithm.

--

--

As an ML Engineer at RBC Group, I transform raw data with passion and creativity to unlock valuable insights and empower businesses to make informed decisions