The world’s leading publication for data science, AI, and ML professionals.

Ad2Vec: Similar Listings Recommender for Marketplaces

How to use Word2Vec in the field of Recommender Systems?

Photo by Ross Joyner on Unsplash
Photo by Ross Joyner on Unsplash

INTRODUCTION

Nowadays, product recommendations are arguably the most important component of an e-commerce website or a mobile application. Companies can easily increase the crucial business metric of Click Through Rate (CTR) on their platform by improving their recommendations. Therefore, I believe that recommendation is the best field where Data Science teams do have a chance to increase their visibility in a business by making a profound impact on the product.

This is a story of the impact that our new recommender approach has had on the marketplace platforms in eCG (eBay Classified Group).

Table of Contents

I. Problem Statement:

  • Definitions of page, recommender and ad types exist on our marketplace platforms
  • How our current recommender system works?
  • Why do we need a new recommender?

II. Our Approach and Methodology:

  • What is the use of Word2Vec?
  • How it is possible to use a Word2Vec algorithm within the Recommender context?
  • High Level Approach
  • Our Methodology in Detail
  • How to optimise the parameters of Word2Vec for a recommendation problem?

III. Model Evaluation and Spot-Check Results:

  • How to compare performance of a new recommender system with an existing one?

IV. Other Future Use-Cases:

  • User2Vec: Richer User Recommendations
  • Neat Product Categorisation
  • Personalised Search Ranking

PROBLEM STATEMENT

Page Types

In our marketplace platforms, there are two different page types on which recommendations are shown to the users:

  • HomePage: The is the page where "For You" feed is located. In this feed, we show our users products related to what they have visited on the platform recently. In simple terms, we take the last 5 ads (listings) viewed by a user and then retrieve similar ads to each of these 5 ads from our Recommendation System.
  • View Item Page (VIP): This is the page where all the details of a particular ad are listed. Towards bottom-right of this page, there is an ad list component named "Others Viewed" in which we recommend the top 5 similar ads to that particular ad.

Recommender Types

Two different algorithms are running in the production to generate those ad Recommendations.

  • Content-Based Recommender (CBR): Based on attributes of Ads
  • Behavioural-Based Recommender (BBR): Based on clicking behaviour of Users

Ad-Listing Types

Each of these algorithms has been designed to address a recommendation problem regarding different ad types on our platforms:

  • Newly Posted (Fresh) Ads:

Since marketplaces are highly dynamic platforms where a lot of new ads are added by users daily, there are also many fresh or new ads posted on the platform which have not been viewed at all yet. This is one of the most popular challenges in Recommendation Systems named "Cold Start Problem". We have our Content-Based Recommender to overcome this problem in our case. This algorithm takes the attributes of a newly posted ad such as images, size, location, price then fetches the most similar ads by comparing them with other existing ads’ attributes.

  • Existing Ads:

Another ad type on our platforms is the one which has been added a while ago, thus has been viewed by several users. To come up with recommendations for this type of ads, we make use of behavioural user browsing data rather than using ad attributes as in the case of Content-Based recommender. Broadly, we take Users-Ads matrix and feed it into our Behavioural-Based Recommender: Collaborative Filtering algorithm to end up with similar item recommendations.

How does our current recommender system work?

For a particular ad, we first try to fetch all recommendations from the BBR. If there are enough number of ads returned from this recommender, then we only use those ads to show our users, otherwise we top them up by fetching more recommendations from the CBR. It is worth mentioning here that we are in a quest for improving the BBR only. Thus we will still be needing the CBR to come up with recommendations for those fresh ads.

This article is about our journey to find a better approach than our current Behavioural Based Recommender – Collaborative Filtering.

Why do we need a new Recommender?

In our current BBR, Collaborative Filtering algorithm only takes into account a user’s ad views as a whole and does not consider its sequential or temporal aspect. We believe that our users tend to view more similar ads in close proximity during their journey on our marketplace platforms. Thus, timestamp property of ad views matters a lot in our case. We wanted to focus more on addressing this problem while discovering a new approach by going from co-occurrence of Ad Views to proximity of Ad Views.

OUR APPROACH AND METHODOLOGY

What is the use of Word2Vec?

Many of you might have heard about the Word2vec, which is one of the most popular NLP algorithms. The logic behind is simple yet powerful and depends on an assumption that words appearing in similar contexts are likely to have similar meanings. The training data is generated by fetching pairs of words from a text depending on the size of a context window. Then, a single layer Neural Network is trained by using this training data and finally hidden layer weights are used as the word embeddings.

How it is possible to use a Word2Vec algorithm as a Recommender?

It is also popular to use the Word2Vec in a recommender context, especially for e-commerce platforms where users can visit different items/products during a session. It is because __ Item Visit Sequence of a user on an e-commerce platform is structure-wise similar to Word Sequence of a sentence. In other words, this algorithm will assume that if two ads are co-visited with other ads alike (contextual similarity in the case of words), then those two will be defined as ‘similar ads’ by the algorithm. In the end, similar ads will have embeddings (numerical vectors) in close proximity and ads from distinct sub-categories will end up with embeddings further away from each other in cosine similarity space.

High Level Approach

Below, you will see a high-level overview of our approach explained for the Computers & Software category on our Dutch platform-Marktplaats.

I: User Viewing History: A sequence of recent ad views of a user on the platform

II: Neural Network Model Fitting: Training a Neural Network with a single hidden layer using ad pairs generated from the user viewing histories.

III: Ad Embeddings Space: Embed each live Ad on the platform into a representative numerical vector using the hidden layer weights in the Neural Network

High Level Overview of Our Approach
High Level Overview of Our Approach

Our Methodology in Detail

Now, it’s time to dive deeper into details of the implementation journey. The journey started with generating a sequence of recently viewed ads for each user. We only considered the ad views happened during the last 45 days while generating those sequences. This time frame has been using by our current Collaborative Filtering algorithm for a while, that’s why we also stick with the exact time interval for our new approach as well. We split the rest of the implementation into 5 different steps.

Step I: First, we filtered out successive repetitive ads from these sequences. This is an important step to apply because our users tend to view same ads repetitively during their journey on the platform and it may cause creating a lot of pairs of same ad while generating the training data for our model in the next stage.

Step II: In the second data cleaning step, we also removed non-live ads from those users’ ad views sequences. This is also a required step to end up with better embeddings for live ads. It is because all non-live ads distort the embedding generation process in such a way that live ads posted a while ago were co-visited with the non-live ads but relatively new live ads won’t have a chance to be co-visited with these non-live ads since they are already removed from the platform. The algorithm will interpret this situation as a discrepancy between these two ad groups although that’s not necessarily true. In the end, this might prevent the model from producing quality embeddings for currently live ads.

Step III: Next, we go to the step 3 in which we generate pairs of ads from those cleaned user journeys. Recall that those ad pairs are generated by sliding a ‘context window’ along a user journey given the window size parameter.

In the picture below, you will see summation of the first 3 steps.

Methodology - First 3 Steps
Methodology – First 3 Steps

Step IV: In step 4, we split the entire dataset of last 45 days of user journeys into two: training and validation datasets. We did it in such a way that first 44 days were used to generate training data for the model and the last 1 day was used to generate ad pairs for validation dataset. Keep in mind that it is vital to pick up a time period for your validation set later than your training set to get the best results out of this process.

At this stage, a question may arise in your mind:

Q: "Why do we need a validation set in the first place?"

A: It is because we have several parameters listed below needed to be tuned to find the optimum model:

  • Window Size: Size of the contextual window around an input ad in which we generate ad pairs
  • Vector Size: Dimension of ad embeddings (numerical vectors)
  • Sampling Ratio: Set a threshold to trim ads appear a lot in those sequences. In other words, balancing out between the popular and unpopular ads.
  • Number of epochs: Number of passes over the entire training data

It is worth mentioning here that we used H2O’s Word2Vec algorithm by integrating it into our Spark Pipeline via Sparkling Water package. Since H2O’s Word2Vec algorithm has more parameters to be tuned and way faster than its SparkML counterpart, we decided to go with that one. However, we made use of SparkSQL and Spark Data Frame API in the previous steps to generate those ad view sequences.

Step V: In the 5th and final step, we ran a grid search over the parameters combinations to select the best model for each category. We measured each model’s performance on the validation dataset pairs using an accuracy metric. We input the model with the first ad of each pair, then fetch the top 5 recommendations for that particular ad. If the second ad in the pair is among those top 5 recommended ads then we call it ‘success’ (1) and otherwise ‘failure’ (0). The model fetches the top 5 recommendations for a specific ad by using brute-force kNN in cosine similarity space.

Step 4 and 5 are described in the picture below.

Methodology - Last 3 Steps
Methodology – Last 3 Steps

MODEL EVALUATION AND SPOT-CHECK RESULTS

Ultimate objective of this project was to end up with an algorithm superior to our current Behavioural-Based recommender Collaborative Filtering (CF). Therefore, at this evaluation step, we conducted some spot-checks to compare performance of the current recommender with our new approach Ad2Vec. To be fair throughout the experiment, we tried to do those spot-checks in a systematic way as much as possible.

How did we compare performance of the new recommender system – Ad2Vec with the current one – Collaborative Filtering?

For each category:

  1. Randomly pick 10 ads to compare recommendations retrieved from current and new recommenders.
  2. Find 3 ads out of those 10 ads in the previous step on which we can easily spot the difference between performances of these two recommenders. (Note: For some ads, both algorithms perform equally well.)
  3. For each of those 3 ads in the previous step, fetch the top 3 recommendations from both current and new algorithms.
  4. Visualise the results in a way to make it easy to compare

In the pictures below, you will see some sample cases from different bunch of categories. During this experiment, we haven’t come across -surprisingly – even a single case, in which current recommender (CF) gives more relevant recommendations than the Ad2Vec.

Ad Screenshots taken with permission from www.marktplaats.com
Ad Screenshots taken with permission from www.marktplaats.com
Ad Screenshots taken with permission from www.marktplaats.com
Ad Screenshots taken with permission from www.marktplaats.com
Ad Screenshots taken with permission from www.marktplaats.com
Ad Screenshots taken with permission from www.marktplaats.com

OTHER FUTURE USE-CASES

The main outcome from the Ad2Vec model is a set of embeddings for all live ads on the platform in a way that similar items/ads will have similar embeddings in the cosine space. Given this information, we can make use of the exact same outcome in different use-cases than recommendations.

Here is the list of 3 use-cases we will consider in the further steps:

Use-Case 1: User2Vec- Richer User Recommendations

Instead of considering only the last 5 ads viewed by a user and generating user recommendations accordingly, we could generate embeddings for users by basically averaging the embeddings of all ads in their whole journey on the platform. In this way, user and ad embeddings will be in an exact same format so that we could fetch the most relevant ads for a particular user by searching his embedding in the ad embedding space. In the picture below, we applied the aforementioned logic and generated user-specific recommendations in a sample case.

Ad Screenshots taken with permission from www.marktplaats.com
Ad Screenshots taken with permission from www.marktplaats.com

Use Case 2: Neat Product Categorisation

In a marketplace platform, all the ads are posted and managed by users and for that reason, product categorisation might easily get out of control. A bunch of different reasons reside behind this issue. For instance, our sellers sometimes can not find a separate product category on the platform to sell their item and basically have to post it in a wrong category or they may post their ad in a wrong category accidentally. Eventually, all these things lead to a badly-organised platform for our buyers.

However, we can spot outliers in a category by clustering the embeddings of all ads belongs to that category. In this way, we could have our chance to correct their category or suggest a brand-new category to the platform regarding a product group.

In the picture below, since there is no existent sub-category specifically for the ‘Cash Register Systems’, our users had to post exact same product types in different sub-categories: ‘Monitors and Displays’, ‘Desktop PCs’, ‘Software | Others’ and ‘Other Computers and Software’. This case was easily caught by our new Ad2Vec algorithm and gives us a chance to have a neater product categorisation and thus improving the buyer experience on our platforms.

Ad Screenshots taken with permission from www.marktplaats.com
Ad Screenshots taken with permission from www.marktplaats.com

Use-Case 3: Personalised Search Ranking

Since we are able to create user embeddings besides ad embeddings, we could follow a logic to rank any search result by simply comparing similarity between embeddings of the user and ads listed in the search result.

Photo by Joshua Golde on Unsplash
Photo by Joshua Golde on Unsplash

CONCLUSION

In this article, we wanted to share our approach to how to convert a popular NLP algorithm-Word2Vec into a recommender for marketplaces. As the results indicated that this approach could be an improvement on earlier recommenders. Moreover, you could use the same model for different purposes to make a profound impact on your business. Hope you found some of the ideas presented in the article valuable for your projects and business as well.


Related Articles