Searching and sorting encrypted data

Towards secure cloud databases

Mihail-Iulian Pleșa
Towards Data Science

--

Suppose one day, your boss walks into the office with a job for you:

Boss: I want you to sort an array of data

You: Ok! Give me the input and I’ll deal with it

Boss: I don’t trust you and my data are sensible for me

You: If you don’t want to give me the data how am I supposed to sort it?

Boss: I don’t know! Find a solution! That’s why I pay you 😤

How can you sort an array of data if you can’t have the data?

As with most problems where privacy matters, the answer lays in the secrets of cryptography.

Photo by Stefan Steinbauer on Unsplash

One word about common encryption

I think all of us, as programmers, have used cryptography at least once in the software we wrote. Some may just enable TLS on a server, others may have written special encryption software. In all these situations you have some data you want to encrypt.

Most common encryption schemes like AES, RSA, etc. have one crucial property: they are semantically secure. What does this mean? Well, it’s about two player-game.

Photo by JESHOOTS.COM on Unsplash

In this game (called IND-CPA game), one player is the attacker and the other is a challenger. The game is as follows:

  1. The attacker chooses some messages and the challenger provides the ciphertexts of those messages
  2. The attacker chooses two messages of the same length which are sent to the challenger
  3. The challenger chooses at random one of the two messages, encrypt it and send it to the attacker
  4. If the attacker correctly guess which message was encrypted, the scheme is broken

The following picture illustrates this game:

Ryan Henry I 538 /B 609 : Introduction to Cryptography

Back to the boss’s problem

So your boss has some data which he wants to sort but which he doesn't want to show you 😵

You’re probably thinking of telling your boss to encrypt his data before he gives it to you. Suppose that the array contains only two elements: m1 and m2. The plan is as follows:

  1. The boss encrypts the two elements using some encryption scheme and obtains two ciphertexts: c1 and c2
  2. You compare c1 with c2. If c1 <c2 you tell your boss that the sorted array is [m1, m2]. Otherwise, you tell your boss that the sorted array is [m2, m1].

To do that you need an encryption scheme that is order-preserving.

That is, the scheme preserves the order of the plaintexts. For example, if c1 is the encryption of m1 and c2 is the encryption of m2 and m1 < m2 then c1 < c2.

If you are considering a common encryption scheme like AES or RSA it will not work. Why? Remember that all these schemes are semantically secure.

If you can compare two ciphertexts and deduce the order of the plaintexts, you can break the IND-CPA game.

Here’s how:

In the first step of the game, you will choose two messages m1 and m2 (m1 < m2). The challenger will return the ciphertexts of the two messages: c1 and c2. Then, in step 2, you will send to the challenger the same messages (m1 and m2) chosen in the first step. The challenger will return a ciphertext c which will be either the encryption of m1 or the encryption of m2. If c is the encryption of m1 then c<c2 and if c is the encryption of m2 then c>c1

So, you cannot use AES, RSA, or any other well-known encryption schemes. What should you use then?

Boldyreva’s encryption scheme is the right answer for you

If you want to know more about you can read the original paper here.

Now, let’s get to work and implement sorting and searching for encrypted data.

Photo by Corinne Kutz on Unsplash

Sorting and searching encrypted data in python

Boldyreva’s scheme is already implemented in pyope library. So let’s import the libraries:

To use the scheme, we must first generate the key and instantiate the cipher:

Let’s take a simple array to sort it:

To encrypt this array, we need to call the encrypt method of the cipher object on each element of the simple_array:

Simply sort the list:

Now that you’ve sorted the array, let’s decrypt it and verify that the original elements are in ascending order. To do that, you need to call the decrypt method of the cipher object on each element of the simple_array_encrypted:

You should see [1, 2, 3, 4, 7].

If your boss gives you an encrypted element and asks if it exists in the encrypted array given by him, you can do this by simply calling searchsorted function from numpy library:

You should see 2.

Conclusions

Congratulations! You just realized the impossible.

Photo by Wil Stewart on Unsplash

You were able to sort and search through a list of elements without knowing the elements themselves. Order-preserving and order-revealing encryption are and active areas of research in cryptography. There are many applications for these encryption methods especially in the field of databases. As the volume of data increases, it becomes increasingly difficult to maintain a database through personal resources. You need to put your data into the cloud. However, there is no need to give your data in clear form. You can use something like order-preserving encryption to first encrypt your data and then send it to the cloud. The encryption scheme allows you to continue to use the database even if the data is encrypted all the time. That’s the beauty of cryptography ❤️.

Thank you for reading 😊

P.S. I’ve prepared a colab notebook for you with all the code. You can access it from here.

--

--

I’m passionate about programming cool stuff like cryptography, machine learning or anything that looks interesting.