The world’s leading publication for data science, AI, and ML professionals.

How to Objectively Compare Two Ranked Lists in Python

A simplified explanation and implementation of Rank Biased Overlap

Imagine you and your friend have both watched all 8 Harry Potter movies.

But there’s a catch – you have watched each movie the day it was released, without missing a single premier.

Your friend, however, watched the 2nd movie first, then the 4th and 5th, and then binge-watched the rest when it was available on Netflix.

Theoretically, you and your friend are on an equal footing – both have watched all the movies of the series.

Is it really equal though?

Image created by the author using a commercially usable typeface (Harry P) and CC image from Wikimedia
Image created by the author using a commercially usable typeface (Harry P) and CC image from Wikimedia

You being a true fan, can never consider it equal.

How could someone not watch the movies of a __ book series, In. A. Sequence?!

You consider it a sacrilege!

The good news is – the math is on your side. You can boil this problem down into a ranked list comparison.

There are several methods to compare lists.

In this post, you will:

  • Understand why we actually need a ranked list comparison
  • See the requirements a comparison measure should satisfy
  • Get an overview of a method called Rank Biased Overlap
  • Understand the mathematical approach behind it
  • Create a simple implementation in python

At the end of it, you will finally settle the disagreement with your friend once and for all:

You != Your Friend


Why do you need a ranked list comparison?

Apart from the above movie viewing order disagreement, we are surrounded by examples of comparing lists!

In fact, we all do such comparisons all the time.

  • Which results are better for the same query on Google vs. Bing?
  • How similar is the New York Times bestseller list to that from USA Today?
  • How different are the top 10 movie rankings on Rotten Tomatoes as compared to IMDB?

In the field of natural language processing and machine learning:

  • How similar are two paragraphs once they are converted to tokenized word lists?
  • How to compare the ranked outputs generated by two different learning-to-rank / machine-learned ranking (MLR)¹ models?

In manufacturing/production and logistics:

  • Comparing the sequence of incoming parts to the outgoing parts (FIFO). An ideal FIFO sequence looks like the one below. A 1 goes in first and a 1 comes out first.

But this may not always happen in the real world. Maybe a 1 goes in first but a 3 comes out first?

Yeah, total chaos can ensue due to this.

Then, how are you supposed to measure the quality of the sequences?

You need some kind of measure.

Are there any requirements such a measure must satisfy?


Requirements for a comparison measure

Ideally, a comparison measure should be in the form of a score that indicates how similar the rankings are.

Think of the above examples again.

You can already get clues about what such a measure should tell us:

  1. The ranked lists may be indefinite and possibly of different lengths – hence the measure should be able to handle two different list sizes
  2. There should be an ability to compare at a chosen intersecting length – which means the measure should handle the intersection length of the lists
  3. The top elements have more importance (like in search results, movie order, etc.) than the bottom ones – hence there should be a possibility to give weights whenever needed

Now let’s go look at something that satisfies these conditions.


Rank Biased Overlap

As the name itself implies, this method is rank-biased.

It means you can prioritize the top few elements in the list with a higher weight.

It was proposed in a paper titled – A Similarity Measure for Infinite Rankings, by W. Weber et al²

You don’t have to go through the entire complexities in the research paper, I simplify most of the parts below so we can try a quick implementation in Python.

However, fasten your seatbelts and put on your math glasses, as we navigate through a few labyrinthine formulae!

Consider S and T as the names of the 2 ranked lists from our title image:

Let’s now define the intersection of these two lists, at a depth (d).

What does this actually mean?

It is the list of common elements from S and T, at a given length (called depth d). It will be clear from this:

The depth is taken as at most the length of the smaller list, otherwise, the intersection does not make sense.

The length of this new intersecting list is called the Overlap, which in this case is 7.

Next, we define something called the Agreement of the lists. It is simply the overlap by the depth.

The Average Overlap (AO) is the average agreement for all the depths of the lists, ranging from 1 to an integer k.

When k = 1, the depth ranges from 1 to 1, which means comparing only the first element.

Since the elements match, the intersection X is 1 and so is Agreement A.

Let’s look at k = 2.

In this case, d ranges from 1 to 2, that is we compare the first 2 elements from the lists. The overlap, in this case, is 1 as only the first element is common. Hence, the Agreement is 1/2 or 0.5.

Similarly, we can continue for k = 3.

Well, now that you understand how it works, you can continue the process until k = 7.

You will get the Average Overlap (AO) as:

This can work for any number of elements in the two lists.

We can generalize the above summation as:

This is the foundation for the similarity measures.

The Average Overlap is a similarity measure, comparing the two lists at a given depth.

Take a family of such similarity measures (SIM):

It is a geometric series where the _d’_th ** term is of form 1/(1p) where 0 < ** p < 1.

Plugging this in the formula above and rearranging the terms, we can get the Rank Based Overlap for the lists S and T as:

The variable p lies in the range (0, 1) and can be used to determine the influence of the first d elements on the final similarity score.

To get a single RBO score, the above formula can be extrapolated. Assuming that the Agreement till the depth k is continued indefinitely among the two lists, it can be generalized as:

The RBO score lies between 0 and 1.

  • 0 means completely disjoint and 1 means exactly the same ranked lists.

We will use this formula for the python implementation of an RBO function!

The paper also shows how we can choose p for giving a weightage (Wrbo) to the top d elements under comparison (I will not get into the derivation here):

We will use this formula to implement the weight calculator function.

Phew, that was a lot to process!

But don’t worry, the formulae look a lot more complicated than they actually are – you will see how simple it is with python.

So I hope you are still here because the fun part begins now!


A Simple Implementation of Rank Biased Overlap in Python

I use the above formulae to write a simple implementation of the Rank Biased Overlap and a weightage calculator function in python.

Now that you understand how RBO is derived, it is easy to implement.

import math

def rbo(S,T, p= 0.9):
    """ Takes two lists S and T of any lengths and gives out the RBO Score
    Parameters
    ----------
    S, T : Lists (str, integers)
    p : Weight parameter, giving the influence of the first d
        elements on the final score. p<0<1. Default 0.9 give the top 10 
        elements 86% of the contribution in the final score.

    Returns
    -------
    Float of RBO score
    """

    # Fixed Terms
    k = max(len(S), len(T))
    x_k = len(set(S).intersection(set(T)))

    summation_term = 0

    # Loop for summation
    # k+1 for the loop to reach the last element (at k) in the bigger list    
    for d in range (1, k+1): 
            # Create sets from the lists
            set1 = set(S[:d]) if d < len(S) else set(S)
            set2 = set(T[:d]) if d < len(T) else set(T)

            # Intersection at depth d
            x_d = len(set1.intersection(set2))

            # Agreement at depth d
            a_d = x_d/d   

            # Summation
            summation_term = summation_term + math.pow(p, d) * a_d

    # Rank Biased Overlap - extrapolated
    rbo_ext = (x_k/k) * math.pow(p, k) + ((1-p)/p * summation_term)

    return rbo_ext

Let’s check on our example lists S and T.

S = [1,2,3,4,5,6,7]
T = [1,3,2,4,5,7,6,8]

print (rbo(S,T))

>> 0.8853713875

The RBO score of 0.88 means that the lists are nearly 90% similar (which we saw also in the average overlap calculation before).

This is in no way a robust python implementation.

However, it is clearly good enough to get started!

Now let’s also implement the weight calculator function to help us choose p.

import numpy as np
import math

def weightage_calculator(p,d):
""" Takes values of p and d
    ----------
    p : Weight parameter, giving the influence of the first d
        elements on the final score. p<0<1.
    d : depth at which the weight has to be calculated

    Returns
    -------
    Float of Weightage Wrbo at depth d
    """

    summation_term = 0

    for i in range (1, d): # taking d here will loop upto the value d-1 
        summation_term = summation_term + math.pow(p,i)/i

    Wrbo_1_d = 1 - math.pow(p, d-1) +
                   (((1-p)/p) * d *(np.log(1/(1-p)) - summation_term))

    return Wrbo_1_d

Let’s check if it works.

As per the research paper, p of 0.9 gives the top 10 elements a weightage of 86% in the final similarity score¹.

Using our above function:

weightage_calculator(0.9,10)

>> 0.8555854467473518

Yay, it works well – it gives 86% as indicated in the paper!

We now have our arsenal ready.

Let’s settle the score between you and your friend!


Settle Your Fight!

Let’s test it out by creating lists of Harry Potter movies in the order of how you and your friend watched them.

And then run your RBO function on it!

you = ['''Harry Potter and the Philosopher's Stone''', 
              'Harry Potter and the Chamber of Secrets', 
              'Harry Potter and the Prisoner of Azkaban',
              'Harry Potter and the Goblet of Fire',
              'Harry Potter and the Order of Phoenix',
              'Harry Potter and the Half Blood Prince',
              'Harry Potter and the Deathly Hallows - Part_1'
              'Harry Potter and the Deathly Hallows - Part_2']

your_friend = ['Harry Potter and the Chamber of Secrets',
              'Harry Potter and the Goblet of Fire',
              'Harry Potter and the Order of Phoenix',
              '''Harry Potter and the Philosopher's Stone''', 
              'Harry Potter and the Prisoner of Azkaban',
              'Harry Potter and the Half Blood Prince',
              'Harry Potter and the Deathly Hallows - Part_1'
              'Harry Potter and the Deathly Hallows - Part_2']

rbo (you, your_friend)
>> 0.782775

0.78!

So, watching the movies out of sequence is definitely different, else the score would be 1!

But you can go even further.

Usually, the first few movies introduce the characters and build the fictional world – hence it is qualitatively more important that you watch them first and in the correct order.

Let’s use the weight calculator and give the first 4 movies more weightage (86%).

With some trials in the weight calculator function, we get a weight of 86% for p = 0.75 and d = 4.

weightage_calculator(0.75,4)

>> 0.8555854467473518

Hence, using p as 0.75 in the RBO function will give the first four movies more influence in the ranking comparison:

rbo (you, your_friend, 0.75)
>> 0.5361328125

The resulting comparison score is 0.53!

This means if you skip the first few movies, or watch them out of sequence, it is objectively bad.

In fact, your friend’s viewing order is only half as good (53%) as compared to yours!

And now you have the math to prove it!


Advantages of RBO

Rank-biased overlap is not the only method for comparing lists – the other methods are:

  • Kendall Tau³
  • Pearson’s Correlation⁴

However, RBO has several advantages over them:

  • RBO is suitable for different lengths of the comparison lists (disjoint or nonsimilar)
  • It has a weight measure – you can give more importance to the top or the bottom of the comparison

Due to these benefits, I decided to explain RBO in detail in this post.

However, feel free to look at the other two as I linked in the sources – they are also used in different situations!


Conclusion

To summarize, in this post you learned one of the measures for comparing ranked lists – the Rank Biased Overlap.

  • You ventured through the Mathematics of it
  • Did a simple implementation in python
  • Used the functions to do a practical comparison of the movie viewing order
  • Understood its benefits!

Now anytime there is a disagreement about comparing movie rankings, viewing order, or basically any sequences, you can settle it like a pro!


Fin.


Sources and notes:

[1] Learning to Rank

[2] A similarity measure for indefinite rankings – William Webber, Alistair Moffat, Justin Zobel

[3] Kendall rank correlation coefficient

[4] Pearson correlation coefficient

Copyrights to images, gifs, and scripts belong to the author.


Related Articles