The world’s leading publication for data science, AI, and ML professionals.

Two examples of a content-based recommendation system with the most efficient array functions

Content-based, weighted content, array functions

Two examples of a content-based recommendation system

Photo from Michal Matlon on Unsplash
Photo from Michal Matlon on Unsplash

Today I would like to discuss two examples for content-based recommendation systems and some efficient array functions I learn from them. The two examples are

1: Based on item content recommendation

2: Based on weighted content recommendation

I use a simple movie set as an example and would like to focus on the main process and ignore other processes and special cases. Let’s get started.

Datasets preparing:

Use the below codes to generate two datasets: movie_df and review_df

The two tables as:

Method 1: based on the movie content, make a recommendation for each user when the similarity of the content is greater than 0.

Here, the content we can use for the movies is only genres. Keeping genres in a list format isn’t optimal for the content-based recommendation system technique, we will use the One Hot Encoding technique to convert the list of genres to a vector where each column corresponds to one possible value of the feature.

Every genre is separated by a ",", so we simply have to call the split function on

Step 1: Calculate the movie-movie similarity matrix :

The dot product shows the similarity among the movies.

Step 2: Find similar movies: here the standard is that if there is a relationship then choose to recommend as the small dataset.

The criteria I gave is if the two movies are correlated (dot product is greater than 0), doesn’t matter the value. Below also give other criteria, which only choose the highest value of the similarity.

np.where() shows the item position (index) in the table.

Make a test for movie_id =1:

A similar movie with toy story is superman, whose genre is children. The result is as expected.

Then get a list of movie name, if the name of the movie is a list:

Step 3: Make recommendations for a specific user:

Make a test for user_id=100

Both of them are drama, as user 100 has also reviewed drama ‘superman’, so the recommendation makes sense.

Method 2: Based on weighted content

Method 1 is easy to understand, but it seems the rating information has not been used. Now I want to integrate the info to calculate the weighted genres.

For example, I want to construct weighted genres based on the user’s rating. Let’s choose user_id=100.

Step 1: Filter out the movies which are rated by user 100 and get the genres only:

Step 2: Get weighted genres for this user:

Step 3: Get a recommendation (user-reviewed also included)

This rec_movies shows all of the recommended movies if the weighted genres ≥0.5, but also included the movies which have been reviewed by the user.

Step 4: Filter the reviewed movies

In this case, as minari’s weighted genres for user 100 is 0.2, so it is excluded from the recommendation.

Summary:

  • From the two cases, we can see that no reviewed genres will not be recommended, which is the character of a content-based recommendation system. It is highly personalized for the user.
  • To some degree, a recommendation system is like an art, and you can also create your criteria to adjust to your target, like in this story I have displayed various criteria even for the same method.
  • There are some array functions which I believe are very efficient when dealing with recommendation system, and would like to summarize here again:
  • 1) np.setdiff1d(): this function can find the difference between two Arrays. Even if you have a list, you can transfer it into an array and use it.
  • 2)np.dot(np.transpose()): this dot product can be used not only for the item itself but also for getting the weighted items.
  • 3) np.where(): here I use it to find the location(index) of the item.
  • Of course, you can also extrapolate them to other situations, not only the Recommendation System. Because of the limitation, I haven’t enough time to show how the functions are efficient, and I explain them in this story in detail.

Thank you for your reading.


Related Articles