Building memory-efficient meta-hybrid recommender engine: back to front (part 1)

How to build from scratch a recommender and boost its accuracy while keeping it simple

Published in

Towards Data Science

4 min readJul 23, 2021

What would you recommend?

Circumstances of 2020–2021 have made more and more business owners think of transferring major communications with their customers online. You may have noticed how great the number of online activities that anticipate, guide and enclose a purchase (even off-line one) has changed recently? Seems like any Internet business does its best to maintain a never-ending dialogue with a client. In such a dialogue the client expects to receive at least relevant personal offers from the seller to make a choice faster.

Personal offers for customers are generated by so-called recommender systems (recsys). Recsys is a subclass of machine learning algorithms for obtaining ranked lists of items that the user may prefer.

The whole variety of recsys can be classified into several categories (by Rocca, 2019):

Within this series of publications we will be going through:

mechanics of memory-based recommender systems
building your own user-based collaborative filtering recommender
applying “out-of-box” recommenders from popular python-modules
techniques of evaluating recommenders’ efficiency and accuracy

Along that way, we will share our experience in designing so-called meta-hybrid recommender engines for real-business problem-solving.

How does the recommender engine work?

Introducing data

Under the publication, we will work with the dataset.

In addition to users’ ratings, the dataset contains descriptive information about the movies themselves (such as year of release, genre, content tags), which we will use in the second part to improve the accuracy of the predictions of our recommender.

To understand how a recommender works, let’s create a micro-dataset in a few steps:

randomly select several users from the main dataset
compile a list of movies, each of which has been seen by 3 or more users from the list above
randomly select several movies from the previous list
from all user-movie pairs on the intersection of randomly selected users and movies construct a rating matrix.

Designing custom recommender engine

The resulting ratings’ matrix represents a typical context for recsys performance: some users can receive recommendations for movies from our micro-library that they have not seen before; personal recommendations have to be ranked by the recommender to preserve users’ embrace of unseen movies.

The minimum necessary information for solving these tasks is within the ratings’ micro-matrix itself — that is the history of the users’ interactions with the seen movies (i.e. all but the ‘memory’ of our collaborative filtering recommender).

Memory-based recommenders simply predict the rating of Y-user for an unseen Z-movie based on ratings assigned to that very Z- movie by another {A…X}-set of users, whose movie preferences are similar to those of Y-user.

In the simplest case, the rating’s prediction for the Y-Z pair is calculated as the weighted average of a set of available ratings of Z-movie subject to degrees of “similarity” of Y-user to each of the {A…X} users.

The degree of “similarity” is the distance between ratings’ vectors of two users, standardized (scaled) by subtracting the mean. Therefore, the mean of a scaled ratings’ vector is zero, which allows us to safely fill all missing values (i.e. rating for unseen movies by a particular user) with zeros.

A matrix of cosine distances between the standardized vectors of user ratings.

As expected, all elements of the matrix main diagonal are equal to zero (the distance from the user to himself).

After removing the main diagonal we’ll get for each user a vector of distances from him to all other users in the system. Further within the recommender engine, these distances will be converted to “inverse” weights to calculate ratings as weighted averages.

In other words, our custom recommender iteratively predicts ratings for each user-movie pair (including user-unseen-movie! pairs) as a product of the vector of available ratings of a particular movie and the vector of weights inversely proportional to distances from a particular user to other users whose ratings are considered.

Despite its simplicity, our custom recommender handles cases of absence of actual users’ ratings (i.e. missing cells in ratings’ matrix) as well as cases of ‘opposite’ movie-preferences of users (i.e. bidirectional scaled ratings’ vectors).

The mean absolute percentage error (MAPE) of our custom recommender is about 15.5%

Production-ready recommenders

We have built a recommender that predicts ratings and can produce a ranked list of recommendations — that’s so great, wow!

However, our recommender ‘as-is’ has quite a few shortcomings:

it does not allow to switch from user-based to item-based collaborative filtering
It does not allow fine-tuning the number of users to consider while calculating vectors’ product
It is not adaptable for experiments with advanced prediction algorithms
at last, it needs optimization to work with large datasets

All these drawbacks are irrelevant to recommenders built with such a python module as a surprise.

For instance, applying to the same very ratings’ micro-matrix “out-of-box” recommender with a slightly more advanced (but generally close to the one described above) prediction algorithm from a surprise family (namely KNNWithMeans) you will immediately reduce MAPE from 15.5% to 11.4%:

The resulting predictions of two demo recommenders are on the visualization:

Valuable recommendation: be careful with the evaluation of recommenders

We usually do not use MAPE / MAE / RMSE to compare the accuracy of recommenders, but rather precision and recall for lists of items with estimated ratings above a specified threshold.

Note from the visualisation above how recommendations for the same users will change if we decide to set the minimum rating for a movie to be recommended within our movies micro-library to be 3.5 or 4.0?

We will show how to use proper metrics and improve the efficiency of the “out-of-box” SVD prediction algorithm (from the surprise module) known as the Netflix-prize winner in the next part of this publication.