The world’s leading publication for data science, AI, and ML professionals.

Neural Network Collaborative Filtering with Amazon Book Reviews

In my project, I built a book recommendation system with Amazon Review data.

The aim of this project is to create a Collaborative Filtering Book Recommendation System by analyzing Amazon Reviews and developing a Neural Network Model using Tensorflow and Keras.

Image by Shiromani Kant on Unsplash
Image by Shiromani Kant on Unsplash

TL;DR:


I. Background

Recommendation systems are useful tools that businesses are employing to help match customers with products they are likely to engage with. These systems, when developed properly, are extremely powerful and directly improve a company’s ability to engage users. Collaborative Filtering is a method of making predictions about a user’s preferences by analyzing the preferences of other user’s with similar tastes. There are different types of collaborative filtering systems, which this reference explains quite well.

Traditional Collaborative Filtering Models are prone to underfitting, which causes them to have relatively high error in making recommendations to users. With Neural Networks becoming increasingly popular, they are well suited to create highly adaptive models for a recommendation system. There are not many guides to building a Neural Network Collaborative Filtering Model, so I set out to help demonstrate this in this article.

II. The Dataset

Our dataset is loaded in from a repository collected at UCSD [1]. The repository was created by scraping millions of Amazon pages, and so it is a very comprehensive collection of Amazon reviews and ratings of products. This data is useful for many things, and is very well suited for building out a recommendation system. The raw files are extremely large, so I chose to only choose Books with 5 or more reviews (still an extremely large file – 27,164,983 reviews).

We can theoretically create a collaborative filtering model fit to all 27,164,983 reviews, but that would result in long training and the embeddings would be of low quality, due to the user-book matrix being very sparse. So in order to overcome this, only books and users with over 100 reviews were selected for the model.

Review Data from Amazon Review Dataset
Review Data from Amazon Review Dataset

The dataset contains the above information in a .json format. Pandas can be used to load and structure the data, but it needs to be loaded in chunks, shown below:

III. Sentiment Analysis

Sentiment Analysis is a tricky process. Many packages such as NLTK (Vader) and Textblob are rules-based sentiment analysis packages, which are fast and straightforward, but don’t analyze any deeper patterns in the data. They struggled when it came to consistently analyzing the sentiment of the reviews in our dataset.

Flair is a powerful natural language processing toolkit that uses pre-trained models to predict text sentiment. It worked very well on Amazon review data, so Flair was the clear choice to obtain Sentiment values for all User-Book reviews. The only downside was runtime, since Flair takes significantly longer to run.

Because the Flair package is more of a logistic classifier, I had to transform the variable using a few QuantileTransformers to properly scale the target variable for our model.

IV. The Model

The starting point of our model is the User-Book Matrix. It is a pivot table of our original dataset, with each row representing a unique user and each column representing a unique book.

User-Book Matrix to Embeddings (Image by Author)
User-Book Matrix to Embeddings (Image by Author)

From the User-Book Matrix, embeddings can be created and that is the basis of our modeling. Embeddings are vectors that represent features in your dataset. For example in our case, the embeddings can represent Book genres, themes, lengths, styles or some combination of both. Embeddings cannot be explicitly defined, so we can never know exactly what they are, but this is a good way to think about them.

Learn more about embeddings here: Structured Deep Learning by Kerem Turgutlu

I. Matrix Factorization Approach

A simple matrix factorization involves two steps:

  • Decomposition of the initial User-Book Matrix into embeddings (in our case we will use 64 unit embeddings)
  • Weighted Dot Product of User embedding vector and Book embedding vector to obtain predicted Sentiment Score.
Matrix Factorization Approach (Image by Author)
Matrix Factorization Approach (Image by Author)

The Matrix Factorization approach was implemented as shown below. I used scipy to break down the user_book_matrix into two embedding matrices, user_embed_df and book_embed_df.

While this approach is fast and straightfoward, it has limitations in how well the user embedding vectors and the book embedding vectors can predict review sentiment. The RMSE we obtained was 0.131, which can be improved on.

II. Neural Network Approach

In order to improve the fit of our model, additional modelling is needed to improve upon the limitations of simple Matrix Factorization. Matrix Factorization is fast, but it is not adaptable enough to account for the complexities in how the User embeddings and Book embeddings can interact with one another.

This is a perfect problem for neural networks, and I built a model using TensorFlow and Keras to accomplish this. A Neural Network such as the one below has 3,050, 337 trainable weights, and each of these weights can be trained so that the Neural Network can adequately account for any complex interactions that may occur with our embeddings.

Neural Network Approach (Image by Author)
Neural Network Approach (Image by Author)

Our neural network structure contains Dense Layers of 2048, 1024, 512, 256, 64, and 16 units. We trained the model for epochs = 100and used batch_size = 64. The training only took around 5 minutes with a RTX 2060 GPU. Most CPUs should be able to handle this fairly easily.

The model had Dropout and BatchNormalization layers to prevent overfitting. I also used optimizer = ‘adam’, loss = ‘mean_squared_error’ and I added ‘mean_absolute_error’ as an additional loss variable to be tracked. Our MSE on our validation dataset was 0.07, which is a marked improvement from the Matrix Factorization model.

III. Comparison

Let’s try our Neural Network Model on a users to see how they perform:

John Smith (Read 53 books in the dataset)

Image by Author
Image by Author

For John Smith, his top books are books pertaining to topics such as history, war, and politics. These are the books where John has left the most positive reviews, according to our sentiment analysis. The recommended books seem to fit his taste, which fit the genres of history, missionary stories, and immigration. These books are the ones with the highest prediction scores from our neural network model.

The recommender suggested ‘The Black Widow’ by Daniel Silva, which is more of a thriller novel. This seems to be a bit outside the scope of what John is used to, but hey, maybe he’ll like it!

Seems like our model works reasonably well! There’s still room for improvement, such as further development of the model, or adjusting the data preparation process.


All code and notebooks can be found on my github page here. Thank you for reading along, hope it was helpful! Any feedback greatly appreciated as well.

Citations:

[1] Jianmo Ni, Jiacheng Li, Julian McAuley, Justifying recommendations using distantly-labeled reviews and fined-grained aspects (2019), Empirical Methods in Natural Language Processing (EMNLP)


Related Articles