As most of you already know, Python is a general-purpose Programming language optimized for simplicity and ease of use. While it’s a great tool for light tasks, code execution speed can soon become a major bottleneck in your programs.
In this article, we’ll discuss why Python is so slow, when compared to other programming languages. Then, we’ll see how to write a basic Rust extension for Python and compare its performance to a native Python implementation.
Why Python is slow
Before we start, I would like to point out that programming languages aren’t inherently fast or slow: their implementations are. If you want to learn about the difference between a language and its implementation, check out this article:
First of all, Python is dynamically typed, meaning that variable types are only known at runtime, and not at compile-time. While this design choice allows for more flexible code, the Python interpreter cannot make assumptions about what your variables are and their size. As a result, it cannot make optimizations like a static compiler would.
Another design choice that makes Python slower than other alternatives is the infamous GIL. The Global Interpreter Lock is a mutex lock that allows only one thread to execute at any point in time. The GIL was originally meant to guarantee thread safety but has encountered great backlash from developers of multi-threaded applications.
On top of that, Python code is executed through a virtual machine instead of running directly on the CPU. This further layer of abstraction adds a significant execution overhead, compared to statically compiled languages.
Furthermore, Python objects are internally treated as dictionaries (or hashmaps) and their attributes (properties and methods, accessed via the dot operator) aren’t usually accessed through a memory offset, but rather through a dictionary lookup, which is substantially slower. If you’re interested in learning more, I wrote an article where I dive deeper into how attribute lookup works in Python:
Python extensions
Now you might ask why Python is so widely used despite its clear performance flaws. Don’t data science and machine learning require high computational power?
To get around Python’s speed issue, developers include in their code calls to external optimized functions to do the heavy lifting. These external functions are usually written using statically typed languages such as C and pre-compiled into libraries that can then be imported into your programs. Many of Python’s most popular libraries are written in C, including numpy, TensorFlow, and pandas. They are the perfect tradeoff between the ease of use of a Python interface and the performance of C.
However, C code is inherently vulnerable to memory-related bugs and isn’t as developer-friendly as more modern programming languages. This is why I suggest using Rust instead for writing your Python extensions.
Environment setup
To build our Rust extension, we’ll be using [maturin](https://github.com/[PyO3](https://github.com/PyO3/pyo3)/maturin)
, a zero-configuration tool for building and publishing Rust-based Python packages. We will also use PyO3
to create native Python modules to be imported in your source code.
In this tutorial, we’ll write a simple Rust library to calculate the prime factors of a given number. First of all, let’s create a new directory for our project:
mkdir prime_fact
cd prime_fact
Now, let’s create a new Python virtual environment to install maturin
in:
python -m venv .venv
source .venv/bin/activate
pip install maturin
And initialize the maturin
project:
maturin init --bindings pyo3
This command will generate a Rust project structure. The files we’ll be interested in are Cargo.toml
and src/lib.rs
. The former contains information about the Rust library, while the latter contains the actual Rust source code.
Writing Rust extensions
Now that we’re set up, it’s time to write some Rust. I’ll assume you already have some basic knowledge of Rust, but you should have no problem following along even if you don’t. Open your src/lib.rs
file and add the following function:
// Include Python-related symbols
use pyo3::prelude::*;
// The 'pymodule' macro is used to implement Python modules
#[pymodule]
fn prime_fact(_py: Python, m: &PyModule) -> PyResult<()> {
Ok(())
}
The function name must match the lib.name
setting in the Cargo.toml
file. In this case, prime_fact
is the name of the Rust package, as specified in the Cargo.toml
.
It’s now time to add some functionality to our Rust library. Let’s create a function that calculates the prime factors of a given number:
/// Calculates the prime factors of the given number.
// Use the macro 'pyfunction' to expose the function to the Python interface
#[pyfunction]
// The function takes in an unsigned 128-bit integer (u128)
// The return value is a Python object constructed from a vector of u128
fn factorize(mut n: u128) -> PyResult<Vec<u128>> {
// Initialize an empty vector (equivalent of a list) for the factors
let mut factors = Vec::new();
// 2 is the only even prime, so try it first
while n % 2 == 0 {
factors.push(2);
n /= 2;
}
let mut i = 3;
while i <= n.sqrt() + 1 {
if n % i == 0 {
factors.push(i);
n /= i;
} else {
i += 2;
}
}
if n > 2 {
factors.push(n);
}
// Return successfully the vector of factors
Ok(factors)
}
To add the newly created function to the module, update the prime_fact
module function as follows:
#[pymodule]
fn prime_fact(_py: Python, m: &PyModule) -> PyResult<()> {
// Add the 'factorize' function to the module
m.add_function(wrap_pyfunction!(factorize, m)?)?;
Ok(())
}
And we’re done writing our Rust library. All is left to do is build it:
maturin develop
This command will build the Rust package and install it as a Python library in your current virtual environment.
If you wish to install the library globally, run the following commands instead:
# From within the project root directory
maturin build
pip install .
Remember to exit any virtual environment you may be in if you want to install the package globally.
Importing from Python and benchmarks
To use the newly created library, you just need to import it as a normal Python module:
# Import the module
import prime_fact
n = 17376382193
# Call the function
factors = prime_fact.factorize(n)
print(f'{n} = {" * ".join(map(str, factors))}')
The expected output is:
17376382193 = 17 * 191 * 5351519
Now let’s compare the performance of our Rust implementation with its native Python equivalent:
use pyo3::prelude::*;
use num_integer::Roots;
/// Calculates the prime factors of the given number.
#[pyfunction]
fn factorize(mut n: u128) -> PyResult<Vec<u128>> {
let mut factors = Vec::new();
while n % 2 == 0 {
factors.push(2);
n /= 2;
}
let mut i = 3;
while i <= n.sqrt() + 1 {
if n % i == 0 {
factors.push(i);
n /= i;
} else {
i += 2;
}
}
if n > 2 {
factors.push(n);
}
Ok(factors)
}
/// A Python module implemented in Rust.
#[pymodule]
fn prime_fact(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(factorize, m)?)?;
Ok(())
}
from prime_fact import factorize as rust_factorize
import math
import timeit
def py_factorize(n):
factors = []
while n % 2 == 0:
factors.append(2)
n = n / 2
for i in range(3, round(math.sqrt(n)) + 1, 2):
while n % i == 0:
factors.append(i)
n = n / i
if n > 2:
factors.append(n)
return factors
# Benchmark
COUNT_NUMBER = 10
REPEAT_NUMBER = 5
NUMBER = 123456789061514
py_time = timeit.repeat(lambda: py_factorize(NUMBER), repeat=REPEAT_NUMBER, number=COUNT_NUMBER, globals=globals())
py_time = sum(py_time) / len(py_time)
rust_time = timeit.repeat(lambda: rust_factorize(NUMBER), repeat=REPEAT_NUMBER, number=COUNT_NUMBER, globals=globals())
rust_time = sum(rust_time) / len(rust_time)
print(f"Python: {py_time} seconds")
print(f"Rust: {rust_time} seconds")
And the truncated output is:
Python: 2.531 seconds
Rust: 0.014 seconds
Clearly, the Rust implementation outperforms its Python equivalent by orders of magnitude.
For further documentation and examples, check out PyO3
on GitHub.
Conclusion
To wrap it up, Rust extensions are a great tool to boost your Python codebases both in terms of execution speed and type safety. Compared to C extensions, they are safer, easier to implement, and more developer-friendly.
I hope you enjoyed this article. If you have anything to add, please share your thoughts in a comment. Thanks for reading!
If you’re interested in learning more about how to speed up Python and how object attributes are accessed, I suggest you check out this story below: