What Is Learning to Rank: A Beginner’s Guide to Learning to Rank Methods

A guide on how to approach LTR problems in Machine Learning

Ransaka Ravihara

Published in

Towards Data Science

7 min readJan 17, 2023

Photo by Possessed Photography on Unsplash

Introduction

This article will discuss what exactly a Learning to Rank is. Before diving deep into the inner workings, let’s quickly catch up on the basic concepts required to understand.

First, let’s discover the Learning to Rank’s central intuition. In machine learning, Learning to Rank (LTR) belongs to supervised machine learning, where we need a historical dataset to train the model. But when I started learning about the Learning to Rank concept, my first confusion was how I discriminated between traditional machine learning and LTR. Because if we are building a classification or regression model, our dependent and independent variables are pretty simple and make more sense. If we need to predict loan default for the given customer, we must plug specific feature vectors into the model learned function f(x). It will return a single value or class probability for the customer being defaulted. But in the Learning to Rank model, this is quite different and confusing.

Let’s take a simple example.

User A navigates to a website and inputs a query q. In this scenario, our system returned some documents, in other words, search results, like the one below.

If we have a good ranking model, the relevance (r) of these results should be r(d1) > r(d2) > r(d3). Behind the scene, our model should return the relevance score for each document related to this query. Hence our model should learn a function that takes the query and document as a parameter and produce the relevance score for that particular query-document pair. Then we can do some computations and sort the documents in such a way high relevant documents receive a higher rank.

Pointwise Ranking

Let’s discuss the one way that we can achieve this. First thing first, we need data. For simplicity, assume a hypothetical scenario where we have two queries, q1, q2, and their associated list of documents [d1,d2,d3], [d5,d6,d7], respectively.

Generally, we know that d1,d2, and d3 documents are relevant for q1 but not q2, and the other way around. So we can populate the sample as follows.

sample 1 : d1,q1; label :1
sample 2 : d2,q1; label :1
sample 3 : d3,q1; label :1
sample 4 : d4,q2; label :1
sample 5: d5,q2; label :1
sample 6: d6,q2; label :1

If you look at the above data, this problem is now simplified into a traditional classification/regression problem where we have a one-to-one mapping between inputs (query and document pair) and labels.

Additional Fact
To solve this problem using machine learning you may perform feature engineering on query data and document data then feed it into the model and finally get predictions. Some ideal yet simple feature engineering would be retreving the number of times john and sushi appear in the query and document.

Believing or not, just now, you learned one of the three types of Learning to Rank methods called pointwise ranking.

The pointwise ranking is finding the function that returns each document’s relevance given a query. It is named pointwise because each document is scored independently with the ground truth target values, just like traditional regression and classification tasks.

Let’s discuss the pros and cons of the pointwise ranking method. One advantage is its simplicity. But this simplicity comes with significant drawbacks. Such as,

Each instance is treated as an isolated point.
Explicit pointwise labels are required to create the training dataset.

To overcome these challenges, we can use the Pairwise Ranking method.

Pairwise Ranking

Here, the goal is to define a ranking function to score each document based on a given query. The documents are then ranked in descending order of these scores, representing the relative relevance of the documents to the query.

In the learning process, several queries are provided, each with a pair of relevant documents. A ranking function is created using this training data so that the model can predict the relevant documents for future queries.

Let’s take the previous sushi recipe example. Instead of considering one document for each query, like pointwise ranking, we are now considering two documents per query. For example, we know d1,d2, and d3 are relevant to q1. In this case, all possible query document pairs should be as follows.

sample 1: q1, (d1,d2)
sample 2: q1, (d1,d3)
sample 3: q1, (d2,d3)

Where (di,dj) denotes the order of documents i and j. If we have explicit labels for how documents i and j should rank, we can derive labels for (di,dj). Assume document j has a relevance score of 3(highly relevant) and document i have a relevance score of 0 (less relevant) concerning the query q. In our optimum rankings, document i should rank higher than document j. Again this is simplified as a traditional classification task. But unlike pointwise ranking, this method considers the ranking position.

The pairwise loss function used in lambdarank objective in LightGBM. Using the LightGBM python library, we can train this state-of-art LTR method with few lines of code.

Since we can simplify this into a classification task, we can use its known methodologies. It also takes the document order into the model. This has a few drawbacks as well. Even though learning considers the order of document pairs objective is not explicitly set to order the document; instead, it tries to reduce the classification error of document pairs. Furthermore, training can be very costly when the dataset contains large document and query pairs. Document count varies from query to query; this lead model is biased towards queries with large document pairs.

Listwise Ranking

Researchers have introduced a novel listwise approach to overcome a few of the observed significant drawbacks in LTR.

In this approach, instead of considering pairs of documents, the list of ranked documents is taken into account, along with their relevance labels.

To get the relevance label per each document in the query, we can use human annotators or the number of clicks received for a particular document.

We must add features based on query and document pairs in the learning phase. If i represents the query’s index and j represents the document’s index, we can define the feature vector as follows.

Image by Author

With that, we can define the features for each document-query pair along with the ground truth labels.

Finally, we can denote training instance as,

Image by Author

where m is the number of queries in the dataset. Finally, Create a ranking function f, which outputs a score for each feature vector, x_ij. Then obtain a list of scores, z_i,

Image by Author

The goal of learning is to minimize the total losses with respect to the training data. When a ranking applies, we can use a trained function to assign scores to new documents based on their feature vectors. The documents are then ranked in descending order of the scores.

Thanks to this learning process, it can learn relationships between items in a list, such as co-occurrence or dependencies. With this plus point, It expects a large amount of labeled data to learn the relationships between items in a list; thus, It can be computationally expensive to train and optimize. Furthermore, listwise can be a problem for new or niche domains with little or no labeled data available.

Let’s summarize what we have learned in this article.