Python Lists Are Overrated

Consider these potentially superior alternatives

Aashish Nair
Towards Data Science

--

Photo by Christina Morillo from Pexels

If you have used Python for any length of time, you ought to be very familiar with lists; they are ubiquitous.

Lists are arguably the most popular data structure in Python, and it’s easy to see why. They have tremendous utility and versatility.

Unfortunately, with a heavy reliance on lists comes an omission of other capable data structures that can arguably perform better than lists.

You might be wondering: why does this even matter? If lists get the job done, why bother with other tools?

This line of reasoning is sound if you are content with manipulating a small amount of data. However, if you aim to use your skills to serve other companies, you should understand that you can’t be so lax about how you store and process your data.

Companies, who are often burdened with petabytes of data, stress the importance of developing efficient programs that are optimized in terms of time and memory consumption.

Efficiency is just as important as functionality!

You need to understand that even if there are multiple tools that can get the job done, some are simply superior to others. Learning which data structure is ideal for a specific scenario will serve as an immense step for enhancing the quality of your code.

While lists certainly are a reliable data structure, they are not the most fitting for every case. Here are a few alternatives that are worthy of consideration.

1. Tuples

Tuples share many key similarities with lists. Both data structures store heterogeneous data types and assign a specific order to their elements.

The main difference between the two lies in the fact that tuples, unlike lists, are immutable. Their assigned values can not be altered in any way.

This makes lists seem more appealing, given how often programmers need to modify the data they store. However, omitting tuples altogether for this reason is a mistake.

Although tuples don’t allow you to change it’s values, it still has a major advantage over lists: memory usage. Tuples require much less space than lists to store the same amount of data.

Let’s demonstrate this with a quick example.

Here, we create a list and tuple storing the same values and use the sys module to determine the sizes of both objects.

Created By Author
Code Output (Create By Author)

As shown above, tuples require much less memory to store the same data as lists.

Creating memory efficient programs is a must, which is why it is necessary to keep this data structure in your arsenal.

Since tuples are immutable, they can only be used to store values and look them up. Thus, the ideal scenario for using tuples is when you know the values in the tuple and won’t need to modify them.

2. Numpy Arrays

Unlike lists, numpy arrays are not a built-in data structure. They come from the numpy module, which specializes in conducting mathematical operations.

Unlike lists, numpy arrays only store homogenous values (i.e. the elements of the array must be of the same type).

Fortunately, they make up for this shortcoming by allowing you to carry out a variety of calculations with less memory usage and a quicker run time.

Here is an example that showcases the quick run time of numpy arrays.

Suppose we wish to create 2 lists of 1,000,000 values ranging from 0 to 999,999 and then add the elements of each list together to form a third set of values. Let’s find out the time it takes to achieve this with lists and numpy arrays.

Note: Run times will be derived with the use of the “%%timeit” magic function

Run Time: 148ms (Created By Author)
Run Time: 3.8ms (Created By Author)

The difference in run times attests to the usability of numpy arrays.

In addition to better run times, numpy arrays are also more memory efficient. Lets use the sys module to compare the size of a list with 1,000,000 values to that of a numpy array with 1,000,000 values.

Created By Author
Code Output (Created By Author)

When storing 1,000,000 values, numpy arrays use less than half the memory of lists.

Overall, numpy arrays surpass lists in both run times and memory usage. Although it is completely fine to use lists for simple calculations, when it comes to computationally intensive calculations, numpy arrays are your best best.

3. Sets

Sets are, in my opinion, the most overlooked data structure in Python.

By definition, sets store a mutable collection of distinct values. Given that sets don’t allow duplicates, people may opt to stick with lists as they are already comfortable with using the latter.

However, neglecting sets means casting aside all of the functions that come with them.

Sets are remarkable when it comes to performing searches.

As an example, let’s take these text corpuses (accessible with this link) and split them into a collection of words. These words will be stored in lists and sets.

Created By Author

Before comparing the times it takes to search for the word ‘’believe” in the list and the set of the first corpus, I will remove duplicate values from the list for a fair comparison.

Removing Duplicates (Created By Author)

Here is how the two searches compare:

Run Time: 313ns (Created By Author)
Run Time: 53.5ns (Created By Author)

The difference in run times is night and day.

Additionally, sets provide numerous functions that make searches much easier.

For instance, how would you find all the words that are present in both text corpuses? Here’s how that would be achieved using lists.

Run Time: 37.3µs (Created By Author)

Here’s how that would look like using sets.

Run Time: 586ns (Created By Author)

The set not only accomplishes the task with less time, it also manages this with fewer lines of code. Sets offer many useful functions like ‘intersection’ that enable programmers to extracted needed data from different collections.

Thus, sets easily trump lists when it comes to performing searches. However, there is a speed-memory tradeoff one must consider when using sets. Although sets can yield smaller run times, they also require more memory.

Created By Author
Code Output (Created By Author)

Whether you should utilize sets or not depends on the goals and the limitations of your project. If your priority is to minimize run time, sets are the ideal data structure.

Conclusion

Photo by Prateek Katyal from Pexels

Overall, lists are very versatile, but it would be a mistake to solely rely on them for storing data. To write effective code, it is important to be familiar with the benefits and drawbacks that come with using different data structures.

If you restrict yourself to only using lists (even if they suffice), you are limiting yourself as a programmer and are putting a cap on how efficient your code can be.

Hopefully, you have learnt now to always consider alternative data structures when carrying out data manipulation.

I wish you the best of luck in your coding endeavors!

--

--