The world’s leading publication for data science, AI, and ML professionals.

Say Hello To Recommendation Systems

A preview into one of the most prominent data science applications

Photo by Andrea Piacquadio: https://www.pexels.com/photo/pondering-female-secretary-picking-folder-in-workplace-3791242/
Photo by Andrea Piacquadio: https://www.pexels.com/photo/pondering-female-secretary-picking-folder-in-workplace-3791242/

Recommendation systems are everywhere. And when I say everywhere, I mean everywhere.

Most digital service providers include a feature that suggests additional content for their users. In fact, if you’re reading this right now, you can probably see a selection of articles that you are recommended to read next.

The recommendations system is a feature that has pervaded every sector from e-commerce to banking. This might not seem like a big deal, but the fact that this tool is ubiquitous is a testament to its usability.

Recommendation systems seem simple and straightforward, but they offer a plethora of benefits to businesses and have transformed the way we think and behave as consumers.

Here, we give a brief run down on recommendation systems and discuss how businesses use their data to generate recommendations for their customers.


Why Recommendation Systems?

Entities like banks, news media firms, and retailers all have their own unique business models tailored to meet their needs. However, all of them incorporate recommendation engines in some way.

So, why are these systems so universally coveted?

In summation, recommendation systems provide the following benefits:

1. They maximize customer retention

Many businesses (e.g., social media platforms) have a strong incentive to keep users engaged in their services for as long as possible. Providing recommendations alongside a piece of content will ensure that users continue using the platform instead of hopping off.

2. They minimize customer burden

Many businesses now provide a variety of options to increase customer satisfaction and beat out their competitors.

Unfortunately, this practice has led to a phenomenon known as information overload. Simply put, choosing one out of many is a mentally taxing effort that takes time and energy. Users that constantly make decisions as they use a service are more likely to disengage with the service provider.

To avoid this, businesses rely on recommendations to reduce the friction of transitioning from one piece of content to the next.

3. They identify "hidden gems"

We, as people, are pretty straightforward when we make suggestions. To our friends who enjoy the fantasy genre, we recommend fantasy movies; to our friends who enjoy baking, we recommend baking cookbooks. Unfortunately, we are unable to effectively recommend items that people don’t show any prior interest in.

Recommendations systems, on the other hand, are able to look beyond the surface level to identify items that can satisfy the user. They can unearth underlying patterns from their users’ behaviors and use those findings to put forth content that the users wouldn’t consider on their own.


Data for recommendation systems

The types of data used to build recommendation systems fall into two categories:

  1. Explicit data

Explicit data refers to the information the customers directly provide through their feedback.

Many businesses collect this data by enabling customers to express their sentiments towards their products and services, often in the form of likes, ratings, and reviews.

2. Implicit data

Implicit data refers to the information that customers provide through their behavior. This type of data can be collected by observing the users’ interactions.

For example, websites can track user behavior by recording the pages they visit and the items that they purchase and then use this information to understand their preferences.


Types of recommendation systems

All in all, there are a handful of strategies one can implement when generating recommendations for customers.

These main approaches used in recommendation systems are content-based filtering, collaborative filtering, and hybrid filtering.

  1. Content-based filtering

Content-based filtering, in layman’s terms, is the strategy of making recommendations based on the contents’ features. By examining the content that users consume, recommendation systems can make inferences on what other content the users will most likely enjoy.

Content-based filtering adheres to the following logic:

  • Person A likes item X
  • Item Y is similar to item X
  • Therefore, person A must like item Y

2. Collaborative filtering

Collaborating filtering is the strategy of making recommendations based on the users’ interactions. As the word "collaborative" suggests, this approach entails using the interactions of multiple users to find the best items for the user of interest.

Collaborative filtering adheres to the following logic:

  • Person A like item X
  • Person B, who also likes item X, likes item Y
  • Therefore, person A must like item Y

3. Hybrid filtering

Businesses can also opt to harness multiple strategies simultaneously to create a hybrid-based recommendation system to take advantage of the benefits of each approach while also minimizing the drawbacks of each approach.


Challenges

Recommendation systems are sophisticated by nature and pose many challenges and obstacles to those looking to build them.

  1. They require initial data

Needless to say, recommendation systems require a sufficient amount of data before making proper suggestions to users. However, this can be an issue when it comes to new users whom the system has no information on. This is often referred to as the cold start problem.

2. They need to adapt to change

Recommendation systems need to account for the fickle nature of customers.

What a person wants today may not be what they want tomorrow. Since customers’ interests undergo constant change, recommendation systems have to be able to adapt to their changes in preferences.

They also have to account for external factors. In many cases, customer behavior is shaped by recent trends. Clothing industries, for instance, have to keep up with what is "in style" when making suggestions.

3. They contribute to the "rich get richer" phenomenon

Recommendation systems are prone to showing bias toward the popular pieces of content since they boast a greater number of reviews and ratings. This can lead to the systems pushing those products to others, which further increases their popularity. On the other hand, less popular products can be neglected by the system and will receive little traffic as a result.

This occurrence is commonplace in many social media platforms, which have come under scrutiny due to their recommendation algorithms neglecting new content creators despite the quality of their material.

4. They can be difficult to evaluate

Recommendation systems, unlike other products of machine learning, can be difficult to evaluate.

While there are metrics for gauging the performance of recommendation systems, such as precision and mean squared error, they aren’t strong indicators of how the tool will perform when actually put to use.

Since the effectiveness of these systems can only truly be reflected by the behavior of the customers that use them, the best way to evaluate them would be through experimentation (e.g., A/B testing).


Case Study

Real recommendation systems are convoluted and account for countless factors. However, as a start, it would be ideal to use a simple example to showcase of how data can be used to build a recommendation system.

Here, we use Python to build a system that recommends movies to users based on the movies that they have already watched (content-based filtering).

The system will accomplish this by doing the following:

  1. Create a profile of the user based on the movies they’ve watched
  2. Compare every movie to the user profile
  3. Identify the movies that have the closest affinity to the user profile

The data in this case study is collected with the New York Times Movies API.

Code Output (Created By Author)
Code Output (Created By Author)

Each movie will be represented by the vectorized form of its corresponding summary. Prior to any vectorization, the text for each movie needs to undergo preprocessing, which includes stop word removal and lemmatization.

Code Output (Created By Author)
Code Output (Created By Author)

For this case, we will vectorize the processed text with the TF-IDF algorithm and store the vectors in a data frame.

Preview of the Code Output (Created By Author)
Preview of the Code Output (Created By Author)

After the text is vectorized, we can establish a user profile, which is determined by the movies that the user has already watched. Mathematically, the user profile is represented by the average of all of these movies’ vectors.

Next, we can compare this user profile to the vectors representing the other movies using the cosine similarity metric, with a higher score corresponding to a greater similarity. We will recommend the 3 movies with the greatest cosine similarity scores.

The following function carries out all of these steps. It inputs the list of movies seen by the user and returns a list of movies as a recommendation.

As an example, suppose that a user enjoys police movies and has watched "Hold Your Fire", "The Guilty", and "Body Cam".

The function, with the list of movies as the input, will first compute the average of the vectors that represent these movies, resulting in a vector that represents the user’s profile.

Next, the function will compare this vector to all of the vectors representing the remaining movies with the cosine similarity metric.

Finally, the function will return the 3 movies with the highest cosine similarity.

Code Output (Created By Author)
Code Output (Created By Author)

Naturally, this recommendation system is oversimplistic and rife with flaws. Movies can’t be evaluated based on their summaries alone. Furthermore, using a simple word vectorization method like TF-IDF will yield many false negatives (i.e., movies that should be recommended but aren’t).

That being said, this case study should give a glimpse into how data can be utilized to generate recommendations for customers.


Conclusion

Photo by Prateek Katyal on Unsplash
Photo by Prateek Katyal on Unsplash

Recommendation systems remain a major topic of research to this day, and rightly so. The prospect of effectively putting forth items that are of interest to consumers will be enticing for any business.

If reading this has piqued your interest in recommendation systems, I invite you to further explore the various tools and techniques that are used in building these systems.

You can even take things to the next level by building your own recommendation system. This might be a painstaking effort, but harnessing your Data Science skill set to create such a tool will undeniably be a gratifying and rewarding experience.

I wish you the best of luck in your data science endeavors!


Related Articles