Quick Introduction

Overview of Your Journey
- Setting the Stage
- Understanding Numba and Installation
- Using the Jit-Decorator
- Three Pitfalls
- Where Numba Shines
- Wrapping Up
1 – Setting the Stage
In data science and Data Engineering, many practitioners write code on a daily basis. Producing a working solution to a problem is definitely the most important thing. However, sometimes execution speed of your code is also important.
This is especially true in real-time analytics and prediction. If the code is too slow, then this might create a bottleneck for the whole system. Often systems get slower as time goes on. This is especially true in data disciplines due to an increase of data to process. At worst, real-time systems you build can be too slow to be useful 😮
Many compiled programming languages, such as C++, are generally faster than Python. Does this mean that you should uproot your whole Python pipeline? No. This is generally not worth the enormous effort it requires.
A different approach is to make your Python code faster. This is where Numba steps in:
Numba is a Python library that aims to increase the speed of your Python code. The aim of Numba is to, at runtime, look through your code and see whether parts of it can be translated into fast machine code.
Sounds intricate, right? It is. However, for the end-user (namely you) using Numba is ridiculously easy. With a few additional lines of Python code, you can get a significant increase in major parts of your codebase. You don’t really need to understand how Numba works under the hood to be able to see results.
In this blog post, I will show you the basics of Numba to get you started. If you need to learn more, then I recommend the Numba Documentation. If you are more of a visual learner, then I have also made a video on the topic:
2 – Understanding Numba and Installation
Let me give you a high-level overview of Numba first 👍
Numba describes itself in the following way:
Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. – Numba Documentation
Let’s unpack the above statement. Numba is an open-source and lightweight Python library that tries to make your code faster. The way it does this is to use to industry-standard LLVM compiler library. You do not need to understand the LLVM compiler library to use Numba.
In practice, you will add certain Python decorators to tell Numba that the decorated function in question should be optimized. Then, during runtime, Numba goes through your function and tries to compile parts of it into fast machine code.
The term JIT compilation is an abbreviation for Just-in-time compilation. So rather than compiling the code beforehand (like with e.g. C++), the compilation step happens during the execution of the code. The practical difference? Rather than generating binary files that are cumbersome to share, you are left with only Python files!
Let me show you a code example from Numba’s homepage to demonstrate how easy Numba is to use. The following code is a Monto Carlo Method for approximating the value of pi.
import random
def monte_carlo_pi(nsamples):
acc = 0
for i in range(nsamples):
x = random.random()
y = random.random()
if (x ** 2 + y ** 2) < 1.0:
acc += 1
return 4.0 * acc / nsamples
Numba has not been applied to the above code. If the variable nsamples
is large, then the function monte_carlo_pi
is pretty slow. However, adding the following two lines of code makes it a lot faster:
from numba import jit # <-- importing jit from numba
import random
@jit(nopython=True) # <-- The only difference
def monte_carlo_pi(nsamples):
acc = 0
for i in range(nsamples):
x = random.random()
y = random.random()
if (x ** 2 + y ** 2) < 1.0:
acc += 1
return 4.0 * acc / nsamples
That was not that bad, right? 😃
If you are working in Jupyter Notebooks through Anaconda, then run the following command in the Anaconda Prompt to install Numba:
conda install numba
If you are writing your code in an IDE like Visual Studio Code or PyCharm, then maybe you would like to install Numba through PIP:
$ pip install numba
More advanced options, like compiling Numba from source, can be found in the Installation Pages.
3— Using The Jit-Decorator
Now that Numba is installed you can try it out. I’ve made a Python function that does a few NumPy operations:
import numpy
from numba import jit
def numpy_features(matrix: np.array) -> None:
"""Illustrates some common features of NumPy."""
cosine_trace = 0.0
for i in range(matrix.shape[0]):
cosine_trace += np.cos(matrix[i, i])
matrix = matrix + cosine_trace
Don’t think too much about the above code. The only aim of the function is to use several different features in NumPy like universal functions and broadcasting. Let me time the code above with the following magic command in Jupyter Notebooks:
x = np.arange(1000000).reshape(1000, 1000)
%time numpy_features(x)
Output:
Wall time: 32.3 ms
If you run the code above, you will get slightly different speeds depending on your hardware and other factors. It’s probably not the slowest Python code you have ever seen. However, code like this throughout the codebase really slows down the whole application.
Now let us add the @jit(nopython = true)
decorator to the function and see what happens. The code should now look like this:
import numpy
from numba import jit
@jit(nopython=True)
def numpy_features(matrix: np.array) -> None:
"""Illustrates some common features of NumPy."""
cosine_trace = 0.0
for i in range(matrix.shape[0]):
cosine_trace += np.cos(matrix[i, i])
matrix = matrix + cosine_trace
Not much has changed in how you write the code, but the speed is different. If you again time the code you get the following result:
x = np.arange(1000000).reshape(1000, 1000)
%time numpy_features(x)
Output:
Wall time: 543 ms
What? The code became over 10 times slower than the original code 😧
Don’t be discouraged. Try to run the code again:
x = np.arange(1000000).reshape(1000, 1000)
%time numpy_features(x)
Output:
Wall time: 3.32 ms
Now the code is over 10 times faster than the original code.
What is going on? 😵
4 -Three Pitfalls
The Pitfall of Compilation
The weird thing I showed you is not a bug, it’s a feature. When you run the function with the @jit(nopython=True)
decorator for the first time the code slows down. Why?
The first time, Numba has to go through the code in the function and figure out what code to optimize. This adds extra overhead, and thus the function runs slowly. However, every subsequent time the function will be much quicker.
This seems initially like a tradeoff, but not really. In data analysis and data engineering, functions are run a great number of times.
As an example, consider a function that normalizes new data that enters a data pipeline before prediction. In real-time systems, new data is arriving all the time, and the function is used up to hundreds or even thousands of times a minute. The initial slower run is saved within seconds in such systems.
Not Setting the Argument nopython to True
The decorator @jit(nopython=True)
can be used without the argument nopython
. If you do this, then by default Numba will set nopython=False
.
This is not a good idea!
If nopython=False
then Numba will not alert you when it can not optimize code. In practice, you will then just add the Numba overhead to your code without any optimization. This slows down your code 😠
If Numba does not manage to optimize your code, then you want to be told. It is better to remove the Numba decorator completely. Hence you should always use the argument @jit(nopython=True)
.
Pro Tip: The decorator
@njit
is shorthand for@jit(nopython=True)
and many people use this instead.
Don’t Over-Optimize Your Code
Over-optimizing your code means spending a lot of time on getting optimal performance when it is not needed.
Don’t over-optimize your code!
In many instances, code speed is not that important (e.g. batch handling of moderate amounts of data). Decreasing code speed almost always increases development time. Weigh your options carefully on whether you should incorporate Numba into your codebase.
5 – Where Numba Shines

Pretending that Numba is good at optimizing any type of Python code is not helping anyone. It is only some Python code that Numba is great at optimizing.
Numba is great at optimizing anything that involves loops or NumPy code. Since many machine learning libraries (like Scikit-Learn) heavily use Numpy, this is a great place to use Numba 🔥
However, Numba does not understand e.g. Pandas. Adding the @jit(nopython=True)
to a function that purely deals with Pandas dataframes will probably not result in a great performance. See the Numba Documentation for examples.
My advice is the following:
Always check whether the added Numba decorator adds value by testing the speed of the code (after the first compilation step). Don’t sprinkle Numba decorators just for fun, do it to speed up your code when nessesary.
6— Wrapping Up
If you need to learn more about Numba, then check out the Numba Documentation or my YouTube video on Numba.
Like my writing? Check out some of my other posts for more Python content:
- Modernize Your Sinful Python Code with Beautiful Type Hints
- Visualizing Missing Values in Python is Shockingly Easy
- A Quick Guide to Symbolic Mathematics with SymPy
- 5 Awesome NumPy Functions That Can Save You in a Pinch
- 5 Expert Tips to Skyrocket Your Dictionary Skills in Python 🚀
If you are interested in data science, Programming, or anything in between, then feel free to add me on LinkedIn and say hi ✋