“MOVIE RECOMMENDATION SYSTEM “

Published in

Towards Data Science

10 min readOct 24, 2018

Welcome to the Fifth Episode of Fastdotai where we will deal with Movie Recommendation System. Before we start , I would like to thank Jeremy Howard and Rachel Thomas for their efforts to democratize AI.

To make best out of this blog post Series , feel free to explore the first Part of this Series in the following order:-

Grab a popcorn and lets get started.

First of all, lets import all the required packages.

%reload_ext autoreload
%autoreload 2
%matplotlib inlinefrom fastai.learner import *
from fastai.column_data import *

Set the path where

Input data is stored.
Temporary files will be stored. (Optional- To be used in kaggle kernels)
Model weights will be stored. (Optional- To be used in kaggle kernels)

path='../input/'
tmp_path='/kaggle/working/tmp/'
models_path='/kaggle/working/models/'

Reading of the Data.

ratings = pd.read_csv(path+'ratings.csv')
ratings.head()
# This contains the userid , the movie that the userid watched , the time that movie has been watched , the ratings that has provided by the user .

movies = pd.read_csv(path+'movies.csv')
movies.head()
# This table is just for information purpose and not intended for         # modelling purpose

CREATING A CROSSTAB OF TOP MOVIES AND USERID :-

g=ratings.groupby('userId')['rating'].count()
topUsers=g.sort_values(ascending=False)[:15]
# Users who have given most number of ratings to the movies are          # considered as top Users.g=ratings.groupby('movieId')['rating'].count()
topMovies=g.sort_values(ascending=False)[:15]
# Movies that have got most number of ratings are topMovies .top_r = ratings.join(topUsers, rsuffix='_r', how='inner', on='userId')
top_r = top_r.join(topMovies, rsuffix='_r', how='inner', on='movieId')
pd.crosstab(top_r.userId, top_r.movieId, top_r.rating, aggfunc=np.sum)

So we will go through three ways of dealing with the Movie Recommendation .

MATRIX FACTORIZATION.
COLLABORATIVE FILTERING FROM SCRATCH.
NEURAL NETWORK APPROACH.

First of all we will dive into the matrix factorization approach :-

MATRIX FACTORIZATION:-

The table in the left box has the actual ratings . Its our actual data.

Let me discuss in detail how the right table is made up of and what’s the relation between Left table and Right table .

MAKING UP OF PREDICTED RATINGS TABLE (RIGHT TABLE)

The right Table has User id (users) as the rows and Movie id (movies) as columns . The Movie id and User id is described in terms of Embedding matrix . Remember Embedding Matrix that we discussed in the last blog post . So as we know an Embedding Matrix is made up of Embedding vectors which are , at the beginning , just random numbers . Represented in purple , in the diagram above.
Take for e.g User Id 14 is represented by four random numbers. Similarly Movie Id 27 is represented by 4 random numbers . The summation of the product of these numbers give rise to the predicted ratings . Every number of this Embedding vectors are initialized randomly at the beginning. In other words every predicted rating is the matrix product of two embedding vectors.
Our objective function is to minimize the RMSE between the predicted ratings and actual ratings . If we see the formulae below

SUMXMY2 function calculates the sum of the squares of the differences between corresponding items in the arrays and returns the sum of the results. To break the formulae down , it takes the MSE(Mean Squared Error) between predicted and actual values , and then sums it up , which gives rise to a single number . This number is then divided by the count of number of ratings. And then we take the square root of that number. In the figure above , this number has been denoted in blue . And that’s our Objective function to be minimized.

??? ANY QUESTIONS ???

What does these Embedding vectors mean?

Initially these are random , but after training , it starts making sense . Check for these rating values , after a couple of epochs. These values keep updating themselves. So after couple of epochs these predicted rating values would be close to the Actual rating values . And according to that these embedding vectors would have adjusted themselves. For e.g movie Id 27 (Lord of the rings):- The Embedding vector consisting of 4 numbers as shown below:-

Say each cell denotes (%Sci-fi,%CGI based, % dialogue driven, %Modern, %Comedy ). It denotes the genre of the movies.

Similarly Each number in case of User Id Embedding vector denotes how much User Id 14 likes Scifi movies , modern CGI movies, Dialogue driven movies and so on.
We will discuss about the bias later on.

NOTE:- Here we don’t have any non-linear activation function or any kind of hidden layer . Hence it would be considered as an example of Shallow Learning.

Qs:- How Collaborative Filtering is same as Probabilistic Matrix Factorization?

Here we are getting the predicted results as a cross-product of two different vectors. The problem is that we don’t have proper information about each user or movie , so we are assuming this is the reasonable way of understanding the system and use SGD to find the optimized numbers that will work.

Qs:- How to decide the length of these Embedding vectors?

We should choose an embedding dimensionality which is enough to represent the true complexity of the problem at hand . At the same time it should not be so big that it would have too many parameters and take too long to run or would produce overfitting results even with regularization.

Qs:- What does negative number denotes in an embedding vector?

Negative number in case of movie id denotes that a particular movie doesn’t belong to that particular Genre. Negative number in case of User ID denotes that a particular User doesn’t like that particular genre of movie.

Qs:- What happens when we have a new Movie or new User?

In case we use Netflix as a new user , it always asks about what movies do we like . And it retrains it model so as to give good recommendations.

TIME FOR SOME HANDS ON COLLABORATIVE FILTERING :-

Collaborative filtering Recommendation system approach is a concept of user and item . Suppose there is a User Id -14 who likes Movie Id- 24 , then collaborative filtering approach says , which other Users liked that movie -24 , that User ID-14 liked . Then it goes through the list of movies that other users who shared the same preference as User Id-14 , and recommends those movies to UserId-14.

So it has two parts :-

UserId and MovieId
Rating Values(The Dependent Variable)

val_idxs = get_cv_idxs(len(ratings))
wd=2e-4  # L2 Regularization , helps in preventing overfitting
n_factors = 50 # Embedding dimensionalitiescf = CollabFilterDataset.from_csv(path, 'ratings.csv', 'userId', 'movieId', 'rating')
# 1. path - where the file is stored.
# 2. 'ratings.csv' - The excel file which contains the data to be read.
# 3. 'userId' - What should be the rows .
# 4. 'movieId' - What should be the columns .
# 5. 'rating' - Values for predictions.
learn = cf.get_learner(n_factors, val_idxs, 64, opt_fn=optim.Adam, tmp_name=tmp_path, models_name=models_path)# Finally, we train our model
learn.fit(1e-2, 2, wds=wd, cycle_len=1, cycle_mult=2)

math.sqrt(0.766)
# 0.8752142594816426
# Let's compare to some benchmarks. Here's some benchmarks on the same   # dataset for the popular Librec system for collaborative filtering. They # show best results based on RMSE of 0.91. We'll need to take the square # root of our loss, since we use plain MSE.preds = learn.predict()

LETS ANALYZE THE RESULTS:

movie_names = movies.set_index('movieId')['title'].to_dict()
# Contains movieid and their title in form of dictionaries

g=ratings.groupby('movieId')['rating'].count()
# Which  movie got how many ratings

topMovies=g.sort_values(ascending=False).index.values[:3000]
# Take the movieid of 3000 movies which has got most number of ratings.

topMovieIdx = np.array([cf.item2idx[o] for o in topMovies])
# Replace the movieid with contigious ids.
# Check out our model below. It has 50 embedding vectors for each of movies and users . And a bias for each movie and each user.

# First,  we'll look at the movie bias term. Here, our input is the movie # id (a  single id), and the output is the movie bias (a single float).
movie_bias = to_np(m.ib(V(topMovieIdx)))

The code to_np(m.ib(V(topMovieIdx))) is going to give a Variable after going through each of the MovieIds in the embedding layer and returning its bias.
m.ib refers to embedding layer for an item/movie, which is the bias layer. As we know there are 9066 movies and a bias associated with it. m.ib would return the value of that layer.
model / layers require Variables to keep track of gradients , hence V(…) .
To convert a tensor into numpy use to_np() .
To move a model from GPU to CPU for inference purpose, use m.cpu() . And to move it to GPU use m.cuda() .

movie_ratings = [(b[0], movie_names[i]) for i,b in zip(topMovies,movie_bias)]
# A list comprehension where movie_bias is stored in b and topMovies in  # movie_names. Check out the below output which returns a list of tuples # having movies and its bias .

Sort the movies by its bias (i.e the 0th element of each tuple by using lambda function) .On inspection we find that, the bias denotes quality of the movie . Good movies have positive bias and bad movies have negative. This is how to interpret the bias terms.

sorted(movie_ratings, key=lambda o: o[0], reverse=True)[:15]
# Sort the movies by its bias (i.e the 0th element of each tuple by using # lambda function). Reverse=True means in descending order of Bias       # values.

LET’S INTERPRET THE EMBEDDING VECTORS:-

movie_emb = to_np(m.i(V(topMovieIdx)))
# m.i(...) for item embeddings.movie_emb.shape

# Because it's hard to interpret 50 embeddings, we use PCA to simplify    # them down to just 3 vectors.from sklearn.decomposition import PCA
pca = PCA(n_components=3)
movie_pca = pca.fit(movie_emb.T).components_
movie_pca.shape

fac0 = movie_pca[0]
movie_comp = [(f, movie_names[i]) for f,i in zip(fac0, topMovies)]
# Here's the 1st component. It seems to be 'easy watching' vs 'serious'.# Its upto us to decide what does these Embeddings mean . Check the output belowsorted(movie_comp, key=itemgetter(0), reverse=True)[:10]

# Lets interpret the 2nd component
fac1 = movie_pca[1]
movie_comp = [(f, movie_names[i]) for f,i in zip(fac1, topMovies)]
#  It seems to be 'CGI' vs 'dialog driven'.
sorted(movie_comp, key=itemgetter(0), reverse=True)[:10]

This is how we analyze a Movie Recommendation System. The evaluation criteria is to minimize RMSE. Earlier the error benchmark was 0.91 . Using fastai we arrive at 0.87.

Hence this model is performing good.

In the next part , we will deal with Collaborative Filtering from Scratch.

If you like it , then ABC (Always be clapping . 👏 👏👏👏👏😃😃😃😃😃😃😃😃😃👏 👏👏👏👏👏)

If you have any questions, feel free to reach out on the fast.ai forums or on Twitter:@ashiskumarpanda

P.S. -This blog post will be updated and improved as I further continue with other lessons. For more interesting stuff , Feel free to checkout my Github account.

To make best out of this blog post Series , feel free to explore the first Part of this Series in the following order:-

Edit 1:- TFW Jeremy Howard approves of your post . 💖💖 🙌🙌🙌 💖💖 .

“MOVIE RECOMMENDATION SYSTEM “

Written by Ashis Kumar Panda