Customers Who Viewed This Item Also Viewed …

Published in

Towards Data Science

7 min readJun 19, 2020

Every time you finish a movie on Netflix, you see a recommendation for a new one. Every time you open YouTube, you see some suggested videos because you watched something similar. My favorite one is Amazon’s recommendation algorithm, because you do not even need to purchase or view something to see it as recommendation. If you have talked about a book near your phone, you see it in Amazon’s suggestions. It’s that simple!

These recommendation features, also known as recommender systems, became a useful and popular machine learning algorithms in the last decade. The algorithm behind it, is not actually confusing. The system, for instance, can recommend a product that a person may be interested in based on past purchases or reviews. In order to understand details, we should be familiar with cosine similarity concept.

Cosine Similarity

Cosine similarity shows how much one’s taste or preference can be described with another person’s choice. In math, it calculates the cosine value of 2 vectors. In this case, the vectors are represented by people’s choices. We find cosine similarity, we calculate the dot product of unit vectors of 2 vectors.

Let’s say person A’s preferences are represented with vector A, and person B’s preferences are represented with vector B.

Cosine similarity is a number between -1 and 1. If the choices of person A and person B are completely the same, their cosine similarity is 1. If they are completely dissimilar, their cosine similarity is -1. This is an important detail, because -1 cosine similarly tells a lot about these 2 people’s preferences. That means they like opposite products of each other. Person A likes a certain movie while person B dislikes that same movie. There is still information in -1 cosine similarity. If the cosine similarity is 0, however, person A’s choices give no information about person B’s preferences.

Once we understand the cosine similarity concept, the recommender systems make more sense. There are typically 2 common types of recommender systems. These are content based recommender systems and collaborative recommender systems.

Content-Based Recommender Systems

In the content-based recommender systems the goal is to group similar instances (movies, books, etc.) into clusters based on their product features. The clustering is built based on the actual characteristics of the product.

In the above example each movie has distinct characteristics regardless what users think about them. In content-based recommendation systems users (customers) do not play an active role. Here our vectors are the rows of the table. We find the cosine similarity of each vector with every other vector.

In practice, we start with feature engineering to make each record a numerical value, and then we calculate the cosine similarly. Cosine similarity in that case informs us about the most similar vectors. With this method, when you plug a new movie title, it would return movies that are most similar product features.

Collaborative Recommender Systems

Collaborative recommender systems takes user into consideration. The users’ behavior is a very important input to generate recommendations. The algorithm utilizes user-product interaction. The collaborative recommender systems also split into 2 major models.

User based Collaborative Recommender Systems
Item based Collaborative Recommender Systems

User based collaborative recommender systems cluster users using cosine similarity math based on their purchasing or ratings.

In the above example, we see the user ratings for 4 movies. There is a similarity between User 1 and User 2, because they both liked and disliked the same movies. Based on this fact, we can recommend “12 Angry Men” to User 1 because User 2 liked it. Also, we should not recommend “The Godfather” to User 5, because User 4 did not like it and they have the similar preferences.

One handicap of this kind of recommender system is that it completely relies on customer ratings. Not every customer likes rating the products. Also, it is known that people with negative experience have more likely to submit rating score than the ones with positive experience. That’s the reason why a lot of companies moved from rating based recommender systems to purchase based recommender systems. Purchase based systems are more reliable because they take users actual behavior into consideration. Also, it is much more easier to collect data. Let’s look at the below example:

Based on the purchase decisions we can recommend “Forest Gump” to User A since his purchasing behavior is similar with User B, and User B watched it. This particular model is same with “Customer who bought this product also bought …” concept.

Item based collaborative recommender systems are very similar to user based collaborative recommender systems. The difference is that the matrix is transposed. In item based collaborative recommender systems, we are looking at the product screen and find the cosine similarity of the movies.

In the above example, users rated the movies on 1–5 scale. The movie “12 Angry Men” can be presented to User1 because this movie has received same reactions with “Fight Club” from same people. If User 1 liked “Fight Club” he may like “12 Angry Men” also.

Excluding User 1; “Fight Club”s vector = [ 4, 3, 4, 4] and “12 Angry Men”s vector = [ 5, 3, 4, 4] are very similar to each other.

Recommender system implementation in Python

In practice when we create a recommender system, we first create a pivot table. In Python, we use .pivot_table( ) function to do that. For example of an item-based recommender system, the title of the product (that can be movie, book, or music name etc.) will be the index of our dataframe, and user information (name, or Id’s, depending on your dataset) will be in the columns. This .pivot_table( ) function will return a giant table that shows us each user’s behavior for each of the movie. Of course, not everybody has a behavior for all products (this behavior means rating, or purchase information), that’s why we will see many NaN or “0” values in the pivot table. This is where sparse matrix comes in handy. A quick reminder; a sparse matrix takes a dataframe and squishes it in a lighter format by removing all “0” values. The information of our dataframe does not change, but it takes much less space in the computer because sparse matrix does not carry the burden of zeros. Here we can think that “0” and NaN values carry equal information; because when the data is binary, “0” means no behavior occurred or no purchase happened. So, you can convert “NaN” values to “0”, before you use the sparse matrix.

Once you created the sparse matrix, all you need to do is to calculate cosine similarity. Scikit-Learn library has a pairwise_distances function that calculates cosine for us. This function returns a square matrix comparing every product with every other product.

Here we have to notice that our recommender_matrix does not have user information anymore. While our sparse_matrix carries the user input, the pairwise_distances function transformed it into recommender_matrix. Recommender matrix is a square matrix and it has only cosine similarity information of each product with each other.

This recommender_matrix does not have the item titles either. We can bring these titles from pivot_table and combine with the recommender_matrix in a big happy dataframe.

Another important note to notice here, unlike the actual cosine similarity values, scipy library’s .pairwise_distances( ) function returns similarity numbers in a different scale. Normally, cosine similar scale is between -1 and 1. As discussed earlier, cosine_similarity =1 means the items have very similar acceptance and cosine_similarity = -1 means the items have exact opposite acceptance. In the result of .pairwise_distances( ) function though, this is quite different. When we see “0” in our recommender matrix, we understand that the items are similar and we see “1”, we understand that the items are very dissimilar. So, scipy.pairwise_distances( ) function’s cosine scale is between 0 an 1.

Hope you find this post useful to understand the algorithm behind the recommender systems. If you have any question or comments, please leave a comment below.

Customers Who Viewed This Item Also Viewed …

Cosine Similarity

Written by Kemalcan Jimmerson