The world’s leading publication for data science, AI, and ML professionals.

Give wings to Python – Simple tricks towards faster Python Programs

A compilation of strategies to write optimised and faster Python programs

Python isn’t particularly fast but it is one of the most preferred languages to code in especially in data analytics. Funnily enough, what makes it inherently slow also makes it the most preferred language.

  1. Dynamically typed – The user doesn’t have to provide the data type of a variable. It makes life easier for the user but raises hell for the interpreter.
  2. Interpreted, not compiled – Python is interpreted not compiled which means it can’t look ahead on the program to make high-level optimizations for memory and speed.
  3. Memory issues – Due to the flexibility of Python, each element will have its own memory rather than a contiguous block of memory. It causes time delays during the fetch operations
Source: Unsplash
Source: Unsplash

Despite those issues, I believe most of the slowness that creeps into the system can be attributed to the programmer’s ability to write concise, optimised programs. Over the years, I have made massive mistakes while processing data, building models, pushing not-so-optimised code into production. But all that experience has provided me with a few tools that I have been using to make my code cleaner, faster, and more pythonic(whatever that means).

Let’s see what all that is.

Band-aid solution

If you have a lot of time with you then you might convert your Python code into C/C++ and then write a Python wrapper on top. It’ll give you the speed of C++ and keeps Python on top for ease of use.

But, we don’t want that as there is an easier solution.

Numba enters the room.

Ok, I give it to you that Numba has been around the corner since 2012 but still it’s elusive to many coders.

It essentially transforms your Python code into machine code that has somewhat the same speed as C code.

Let’s check with a simple example. A simple function, nothing heavy.

1.43 sec per loop on an i7 machine with 6 cores. Seems slow, it can be improved for sure.

Install numba on your machine using pip

pip install numba

All we have done is added 2 lines of code, the import one and @jit(nopython=True)

14.5 milliseconds per loop.

That’s a speed gain of 1430/14.5 ~ 98X.

The gain of course will depend on your machine and what exactly you are trying to do but it is a good starting point in case you want a quick fix without dwelling too much into the sins of your code.

  • It usually works where you have to deal with agonising loops

or,

  • where the same operation needs to be executed for a large set.

Profilers

You need to figure out which part of your code is making it slow. In most cases, it’ll be some misformed loop or a function that would be the troublemaker.

Unix-timer

On the shell, you need to use Unix’s time command before calling python and your code.

cProfiler

You can run the cProfiler either from the shell by calling

python -m cProfile -s time name_of_your_file.py

It’ll inundate you with a lot of information, so you need to do something else.

You can import cProfile in your code and run it on functions that you suspect are bogging you down.

import cProfile
cProfile.run('complicated_func()') #Here I am running it on a complicated_function that has time overheads.

The output can be either taken in a file or can be seen on the shell.

cProfile.run('complicated_func()', 'output.txt')

Schumacher speed

Till now it was about the band-aid solution and how you can time profile your code, let’s move to certain strategies that are not only best practices but also speed things up for you.

Generators to save memory and gain speed

Lists are great but in case you are working on a few columns that need to be lazily generated then generators are better.

The performance improvement from the use of generators is the result of the lazy (on-demand) generation of values, which translates to lower memory usage. Furthermore, we do not need to wait until all the elements have been generated before we start to use them.

The above snippet uses 87,616 bytes of memory for the list while the generator uses only 112 bytes.

The memory saved can cause your code to run faster because all the data that the CPU is working on should be in cache; if you use lists then there are higher chances that data will overflow to L1 cache.

Enumerations over ranges

This is more of a clean code principle than related to speed. In case, you need to take care of not only the index of your data but also the data itself, then enumerate is a cleaner, better, and slightly faster choice on Python 3.8.

Imports

Don’t import entire libraries and packages if you are going to use only one function.

import math
value = math.sqrt(50)
vs
from math import sqrt
value = sqrt(50)

When writing production-level codes, such small overheads add up to monstrous proportions.

Use Itertools wisely

Use itertools wherever applicable, they save a lot of time especially in avoiding for loops.

from itertools import product
def cartesian_prod (arr1, arr2):
    return list(product(arr1, arr2))
arr1 = [1,2,3]
arr2 = [4,5,6]
cartesian_prod(arr1, arr2)
# Output is
[(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]

If you are working on a problem statement such as the Recommender system and want combinations of user-items, then calculating a cartesian product through itertools is a sane and safe choice.

One can also use permutation and combination functions of itertools to alleviate some pain in life.

Attach your strings only with f-Strings

Don’t use any older methods such as %s or format() for strings, the f string is blazing fast and much easier to use

List Comprehensions over conventional for loop

In general, for loop (at least better than while loop) is slow and if you start using 2 nested loops then the big O complexity will be n².

For the conventional for loop the time is 136 msec

For list comprehension, it is 118 msec

Not a great improvement but the code looks much cleaner and is finished in one line.

Similarly, using map, filter, and reduce functions provide some relief on the time and space complexity of your code.

What’s your hardware?

In all probability, you might be running your code on SSD(Solid State Drive). If you have reached pro level python programming then you check SFrames, short for scalable frames. They work really well on large datasets on SSD, much better than pandas.

One can use turicreate(the free version of graphlab) to use SFrame in the codes.

Conclusion

There are many ways in which you can optimise the performance of your code. All the above strategies are tried and tested but they aren’t a quick fix or sure-shot solutions as they somewhat depend on what data you are running it on and what hardware you are utilising.

There are many other methods such as using local variables, concurrency and multithreading that also help in achieving faster speeds.

While Coding, follow best practices rather than cutting corners as production environments are quite sensitive and you don’t want to spend countless hours regressing over something that you wrote aeons ago.

There is generally a tradeoff between readability and optimization of your codebase. The decision to prioritise one over the other lies with you.


Related Articles