The world’s leading publication for data science, AI, and ML professionals.

Attention Model for News Recommendation: Cold Start Problem

Typical SVD is less effective when data is not enough. How to leverage attention model to solve the cold start problem?

Attention Model for News Recommendation

Solving Cold Start Problem using Attention Mechanism

Photo by Sam Wheeler on Unsplash
Photo by Sam Wheeler on Unsplash

Although SVD provides a satisfactory solution to recommendation system, it is less effective when the new items did not accumulate enough data. News recommendation is even more challenging as it poses three additional challenges:

  1. News articles are highly time-sensitive
  2. Users are topic-sensitive and have various interests
  3. News language is highly condensed and comprised of a large number of new entities created every day

In this article, I will show you how to leverage the Attention Mechanism to solve the cold start problem in recommendation system.

Attention Mechanism

Attention Mechanism has a long history of applications and recently introduced to solve problems in NLP. Attention mechanism enables the model to impose different weights to inputs depending on the context. For example, in Neural Machine Translation (NMT), attention mechanism can be used to overcome the bidirectional information flows. With attention mechanism, NMT model can generate words by "looking" at different positions in the original text.

The situation is similar for news recommendation system. The recommender engine should learn to "look" at the relevant part and ignore the irrelevant part of reading history.

Deep Knowledge-Aware Network

In HK01, our data team has put tremendous efforts in the news recommendation system. We employ state-of-the-art algorithms to improve the original SVD collaborative filtering algorithm. In particular, we solve problem 3 using autoencoder with triplet loss and StarSpace to learn article embeddings.

Compute Document Similarity Using Autoencoder With Triplet Loss

Learn document embeddings using Facebook StarSpace

For problems 1 & 2, we leverage Deep Knowledge-Aware Network (DKN), which is proposed by Microsoft [1], to solve the Cold Start problem. We replace the knowledge-embeddings with our own article embeddings and keep the attention network to learn the interaction between user interests and article embeddings.


Architecture of DKN
Architecture of DKN

DKN consists of two component networks. The overall architecture of DKN is illustrated on the right-hand side. In order to predict the click probability, the algorithm learns to aggregate the sequence of reading history to form a user embedding. User embedding is regarded as the weighted sum of article embeddings in the reading history. So, the problem is: how to find the weights?

The weight of each article embedding is obtained by attention network. Attention network models the interaction between the candidate news, the news we are going to predict, and the user’s clicked news. Since a user can have various interests and there is no single news that can match every interest, the role of attention network is to match the candidate news to either one.

After obtaining the weights, the model generates the user embedding by weighted sum and passes both user and candidate news embedding to a feed-forward neural network.

DKN is basically an item-based algorithm. It doesn’t need user-item interaction data for new item recommendation. Moreover, it can handle multiple interests without intervention.

Result

We compare the original model, a regularized version of SVD, with the ensemble of SVD and DKN. With carefully tuning, the ensemble can achieve a 10% increment over the original model.


Reference

[1]: Zhou, Guorui, et al. "Deep interest network for click-through rate prediction." Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018.


Related Articles