RECOMMENDER SYSTEMS

A Bond of Trust Formed at a Coffee Shop
You’re sitting in a coffee shop, savoring your favorite coffee variation (a cappuccino, of course) and engrossed in conversation with a friend. As the conversation flows, the topic shifts to the latest gripping TV series that you both have been hooked on. The shared excitement creates a bond of trust, to the extent that your friend eagerly turns to you and asks, "What should I watch next? Do you have any recommendations?"
At that moment, you become the curator of their entertainment experience. You feel a sense of responsibility to preserve their trust and provide suggestions that are guaranteed to captivate them. Additionally, you’re excited at the opportunity to, perhaps, introduce them to a slightly new genre or storyline they hadn’t explored before.
But what factors influence your decision-making process as you consider the perfect recommendations for your friend?
What makes a good recommendation?

First, you tap into your understanding of your friend’s tastes and interests. You recall their fondness for intricate plot twists and dark humor; furthermore, you know they enjoyed crime dramas like "Sherlock" and psychological thrillers like "Black Mirror." Armed with this knowledge, you navigate your mental library of TV shows.
To play it safe?
You’re tempted to suggest a list of shows that are almost identical, with slight variations, to the one you had just been raving over, which encompass both crime and thrill. You also think about how others with similar tastes have enjoyed these shows to narrow your choices. After all, they’re practically guaranteed to enjoy this set; it’s the safe and easy choice. However, you consider that relying solely on their past favorites may limit their exposure to new and diverse content and don’t particularly want to rely on the safe and easy choice.
You recall a recent sci-fi series that ingeniously blends mystery, adventure, and supernatural intrigue. Although it deviates from their typical genre, you feel confident it will provide a refreshing and captivating change of narrative.
The Long Tail Problem, Feedback Loop & Filter Bubbles
Recommendation systems aim to replicate this process on a larger scale. By analyzing vast amounts of data about individuals’ preferences, behaviors, and past experiences, these systems strive to generate personalized recommendations that encompass the complexity of human decision-making.
However, traditionally, recommendation systems have focused primarily – if not, solely – on playing it safe and relying on the recommendations that are guaranteed to satisfy (at least, in the short term).
One way they do this is by prioritizing popular or mainstream content. As a result, this popular content receives more exposure and interactions (popularity bias), creating a feedback loop that reinforces its prominence. Unfortunately, this often leaves lesser-known or niche content struggling to gain visibility and reach the intended audience (the long tail problem).

In fact, there has been a lot of literature in the last few years that examine "fairness" in recommendation systems. For example, in "Fairness in Music Recommender Systems: A Stakeholder-Centered Mini Review", Karlijn Dinnissen and Christine Bauer explore the issue of fairness in music recommender systems; they analyze gender fairness and popularity bias from the perspective of multiple stakeholders e.g. the impact of popularity bias on the representation of artists.
In the article, "Fairness in Question: Do Music Recommendation Algorithms Value Diversity?", Julie Knibbe shares:
As a former product director at a streaming platform, I often receive questions like "do streaming services choose to promote popular artists over indies and niche music?" Intuitively, people assume that on these big platforms "the rich get richer."
Later on in the article, Knibbe also echoes the sentiment of Dinnissen and Bauer:
"In the context of music recommendation […] fairness is often defined in terms of exposure or attention. Streaming services are also a two-sided marketplace, which means that "impartial and just treatment" must apply to both streaming services’ users and artists.
Both sources highlight the dual nature of fairness in recommender systems, underscoring the importance of considering "impartial and just treatment" for users and content creators.
What does the ideal outcome look like?
Naturally, there exists an inherent imbalance in the distribution of content. Part of what makes the human experience rich lies within its network intricacy; some content resonates with a broader audience, while others forge connections within niche groups, developing a sense of richness and Personalization. The objective is not to artificially promote less popular content for the sake of it, striving for a uniform distribution. Rather, our aim is to surface niche content to individuals who genuinely relate and can appreciate the content creator’s work, thereby minimizing missed opportunities for meaningful connections.
What does the industry say about this?
In 2020, the research team at Spotify released an article titled, "Algorithmic Effects on the Diversity of Consumption on Spotify." In their research, they examined the relationship between listening diversity and user outcomes.

They aimed to answer the questions: "How does diversity relate to important user outcomes? Are users who listen diversely more or less satisfied than those who listen narrowly?"
The researchers discovered that "users with diverse listening are between 10–20 percentage points less likely to churn than those with less diverse listening […] listening diversity is associated with user conversion and retention."
Furthermore, according to Julie Knibbe:
"TikTok’s recommendation algorithm was recently mentioned among the top 10 […] by MIT technology review. What’s innovative in their approach isn’t the algorithm itself – it’s the metrics they’re optimizing for, weighing in more on diversity than other factors."
Therefore, there is a connection between the attribute of discoverability within a platform and user retention. In other words, when recommendations become predictable, users might seek alternative platforms that offer a greater sense of "freshness" in content, allowing them to escape the confines of filter bubbles.
So how can recommendation systems emulate the thoughtfulness and intuition that you employed in curating the perfect suggestion for your friend?
The Shift to Diversity Metrics
Well, in the article, "Diversity, Serendipity, Novelty, and Coverage: A Survey and Empirical Analysis of Beyond-Accuracy Objectives in Recommender Systems", authors Marius Kaminskas and Derek Bridge highlight:
"Research into recommender systems has traditionally focused on accuracy […] however, it has been recognized that other recommendation qualities – such as whether the list of recommendations is diverse and whether it contains novel items – may have a significant impact on the overall quality of a recommender system. Consequently […] the focus of recommender systems research has shifted to include a wider range of ‘beyond accuracy’ objectives"
What are these "beyond accuracy" objectives?
Diversity
Sifting through the literature in an attempt to understand what ‘diversity‘ is in recommender systems was brutal, as each article presented its own unique definition. Diversity can be measured both at the individual level ** or at the global level**. We’ll go over three ways to conceptualize diversity, in the context of giving show recommendations to a friend.
Prediction Diversity
Prediction diversity refers to the measure of how varied the recommendations are within a given set. When you suggest a set of shows to your friend, prediction diversity assesses the extent to which the recommendations differ from one another in terms of genres, themes, or other relevant factors.
A higher prediction diversity indicates a wider range of options within the recommended set, offering your friend a more diverse and potentially enriching viewing experience.
One way this is measured is by using intra-list-diversity (ILD), which is the average pairwise dissimilarity among the recommended items. Given the recommended item list, the ILD is defined as follows:

User Diversity
User diversity, in the context of providing show recommendations to a friend, examines the average diversity of all the recommendations you have ever given to that specific friend. It considers the breadth and variety of content suggested to them over time, capturing the range of genres, themes, or other relevant factors covered.
You can also assess user diversity by analyzing the average dissimilarity between the item embeddings within each set of recommendations per friend.
Global Diversity
On the other hand, global diversity looks beyond a specific friend and assesses the average diversity of all the recommendations you have given to any friend.
Sometimes, this is referred to as congestion – a reflection of recommendation uniformity or the crowding of recommendations.
A couple of metrics that you can use to analyze global diversity include the Gini index and entropy.
The Gini index, adapted from the field of income inequality measurement, can be used to assess the fairness and balance of recommendation distributions in recommendation systems. A lower Gini index indicates a more equitable distribution, where recommendations are spread evenly, promoting greater diversity and exposure to a wider range of content. On the other hand, a higher Gini index suggests a concentration of recommendations on a few popular items, potentially limiting the visibility of niche content and reducing diversity in the recommendations.
Entropy is a measure of the amount of information contained in the recommendation process. It quantifies the level of uncertainty or randomness in the distribution of recommendations. Similar to the Gini index, optimal entropy is attained when the recommendation distribution is uniform, meaning that each item has an equal probability of being recommended. This indicates a balanced and diverse set of recommendations. Higher entropy suggests a more varied and unpredictable recommendation system, while lower entropy indicates a more concentrated and predictable set of recommendations.

Coverage
Coverage is defined as the portion/proportion of possible recommendations the algorithm can produce. In other words, how well the recommendations cover the catalog of available items.
For example, let’s consider a music streaming platform with a vast library of songs spanning various genres, artists, and decades. The coverage of the recommendation algorithm would indicate how effectively it can cover the entirety of this music catalog when suggesting songs to users.
Disadvantage: This metric treats an item recommended once as the same as an item that was recommended thousands of times
Novelty
Novelty is a metric used to gauge the level of newness or originality in recommended items. It encompasses two aspects: user-dependent and user-independent novelty. User-dependent novelty measures how different or unfamiliar the recommendations are to the user, indicating the presence of fresh and unexplored content. However, it has become increasingly common to refer to the novelty of an item in a user-independent way.
To estimate novelty, one common approach is to consider an item’s popularity, measured as Item Rarity. This approach inversely relates an item’s novelty to its popularity, recognizing that less popular items are often perceived as more novel due to their deviation from mainstream or widely-known choices. By integrating this perspective, novelty metrics provide insights into the level of innovation and diversity present in the recommended items, contributing to a more enriching and exploratory recommendation experience.
Unexpectedness (Surprise)
Surprise in recommendation systems measures the level of unexpectedness in the recommended items based on a user’s historical interactions. One way to quantify surprise is by calculating the cosine similarity between the recommended items and the user’s past interactions. A higher similarity indicates less surprise, while a lower similarity indicates greater surprise in the recommendations.

Discoverability
Discoverability in recommendation systems can be understood as the user’s ability to easily come across and find the recommendations suggested by the model. It is akin to how visible and accessible the recommendations are within the user interface or platform.
It is quantified using a decreasing rank discount function, which assigns higher importance to recommendations at the top ranks of the recommendation list and gradually decreases their weight as the rank position goes down.

Serendipity
Serendipity in recommendation systems encompasses two key aspects: unexpectedness and relevance.
Serendipity refers to the occurrence of pleasant surprises or the discovery of interesting and unexpected recommendations. To quantify serendipity, it is calculated on a per-user and per-item basis using the formula:

By multiplying unexpectedness and relevance, the serendipity metric combines the elements of pleasant surprise and suitability. It quantifies the degree to which a recommendation is both unexpected and relevant, providing a measure of serendipitous experiences in the recommendation process.
Overall serendipity averaged across users and recommended items can be computed as:

Concluding Remarks
As the industry evolves, there is a growing emphasis on refining recommendation algorithms to deliver recommendations that encompass the entirety of user preferences, including richer personalization, serendipity, and novelty. Moreover, recommendation systems that optimize the balance between these dimensions have also been associated with improved user retention metrics and user experience. Ultimately, the goal is to create recommendation systems that not only cater to users’ known preferences but also surprise and delight them with fresh, diverse, and personally relevant recommendations, fostering long-term engagement and satisfaction.
References
- Diversity, Serendipity, Novelty, and Coverage: A Survey and Empirical Analysis of Beyond-Accuracy Objectives in Recommender Systems
- Post Processing Recommender Systems for Diversity
- Diversity in recommender systems – A survey
- Avoiding congestion in recommender systems
- The Definition of Novelty in Recommendation System
- Novelty and Diversity in Recommender Systems: an Information Retrieval approach for evaluation and improvement
- Quantifying Availability and Discovery in Recommender Systems via Stochastic Reachability
- A new system-wide diversity measure for recommendations with efficient algorithms
- Automatic Evaluation of Recommendation Systems: Coverage, Novelty and Diversity
- Serendipity: Accuracy’s Unpopular Best Friend in Recommenders