The world’s leading publication for data science, AI, and ML professionals.

Probably the Easiest Tutorial for Python Threads, Processes and GIL

Illustrating by Diagrams and Codes, Less Dry Concepts

Image by Regina from Pixabay
Image by Regina from Pixabay

Don’t run away if you are a learner of Python, because this article is meant to use the easiest way to explain to you what is GIL. Of course, it has to be started by explaining what are threads and processes. Don’t worry, I’ll try my best to make it easy for everyone, though it sacrifices some accuracy of the definition.

Now we should begin.

1. Multi-Threading in Python

Image by Steen Jepsen from Pixabay
Image by Steen Jepsen from Pixabay

Some Concepts

Multi-threading is one of the most common Programming techniques, which also exists in Python.

It allows us to run multiple operations simultaneously. Usually, multi-threading may create extra efficiency in CPU usage. Also, most of the I/O tasks can benefit from concurrently running threads.

Please don’t be confused about the concepts "process" and "thread". A process will have certain memory allocated and is completely isolated from other processes in an operating system. Therefore, one program crashed in our OS usually will not impact the others.

Relationships between Processes and Threads
Relationships between Processes and Threads

A process may have multiple threads running under it, sharing lots of the same resources like memory. Therefore, one thread crushed will cause the entire process to crush. Because the threads share memory with each other, it may also create troubles in the process. I’ll demonstrate later.

Code Example

Now, let’s see how to write Python code with the multi-threading technique.

Firstly, let’s import the threading module that is built-in to Python.

import threading

To be able to test the multi-threading, let’s define a function that is simple enough but will take some time.

def compute():
    for _ in range(100000000):
        pass

This function does nothing, it just executes a for-loop 100 million times, without calculating anything in each loop.

Then, suppose we want to run the compute() function twice and in different threads. We can use the threading.Thread() to create a new thread in Python. After that, target=compute tells the thread to execute the compute() function.

threading.Thread(target=compute)

Now, let’s create the two threads t1 and t2 then ask them to execute the compute() function.

t1 = threading.Thread(target=compute)
t2 = threading.Thread(target=compute)
t1.start()
t2.start()
t1.join()
t2.join()

In the above code, t1.start() will tell the thread to start to do whatever tasks are assigned to it. In our case, it is executing the compute() function. Similarly, we let t2 to start immediately by running the code t2.start().

Whenever you see the code t1.join(), which means we want the process to wait for the thread t1 to be finished. In other words, we want to wait for both t1 and t2 finishing their job in the above code.

2. What is GIL in Python?

Image by Manfred Richter from Pixabay
Image by Manfred Richter from Pixabay

GIL stands for Global Interpreter Lock. That’s all you need to know about the textual concept. To understand it, please have a look at the diagram below.

How GIL works in Python
How GIL works in Python

As shown, the GIL allows only one thread to run at one time. When one thread requests to start working, it will lock the other one.

Therefore, although we let Thread1 and Thread2 run together, they cannot make use of the multiple CPU cores at all, and hence won’t be helpful to the performance even a little. In fact, most of the time it could be even worse.

Why does Python have such a mechanism? Some of the reasons are as follows.

Thread Safety

Since multi-threads share the same memory and I/O, it is possible to have "thread racing" problems. One of the typical examples is as follows.

A typical example of thread-unsafe
A typical example of thread-unsafe

At the beginning, x=1 is the same for both scenarios and the value is already in the memory. Then, because the two threads are running in parallel, we don’t know which one is running first. Eventually, the result of x will be mysterious. This is one typical example of thread-unsafe.

Historical Problem

In the early stages when Python was out, there was no common CPU that had multiple cores. Therefore, the benefits of having GIL are absolutely more than the limitations.

For example, the memory management in CPython will be much simpler, which makes it straightforward for the garbage collection process. Also, it avoided many complex bugs that could be caused by thread unsafe such as racing conditions.

Limitations Revealed

Nowadays, with the development of CPU Technology and the wide usage of Python in Data Analytics/Science/Engineering, the GIL has become the major bottleneck of Python in terms of performance and flexibility.

However, some learners may argue that Multi-Processing could be an alternative to Multi-Thread. In the next section, I’ll try to explain what is multi-processing and why it is different.

3. Why Multi-Processing is Different?

Image by Lucent_Designs_dinoson20 from Pixabay
Image by Lucent_Designs_dinoson20 from Pixabay

Again, I don’t like to write lots of text to tell very dry knowledge. Please see the diagram as follows. It demonstrates the differences between single-thread, multi-thread and multi-process.

Differences between Multi-Threading and Multi-Processing
Differences between Multi-Threading and Multi-Processing

In a loose sense, Multi-Threading with GIL can finish a similar amount of tasks compared to a single thread. However, Multi-Processing can bypass the limitation of Multi-Threading. In the same period, it may be able to finish twice the amount of tasks compared to the other two.

Prove the Theory

Let’s code!

Since it is not very easy to use multi-processing in an interactive environment such as Jupyter Notebook, let’s write the simplest Python Script file to compare the 3 scenarios:

  • Single Thread
  • Multiple Threads
  • Multiple Processes with Single Threads

First of all, we need to import relevant modules. Then, let’s use the same function compute() that we have used above. For your convenience, I’ll paste the code here again.

from threading import Thread
from multiprocessing import Process
import time

def compute():
    for _ in range(100000000):
        pass

The code for single threading is as simple as running the function twice because, in the other two examples, we will need to put them into two separate threads and processes respectively.

# Single Threads
start = time.time()

compute()
compute()

end = time.time()
print("Time taken with single threads:", end - start)

The code for multi-threading is the same as in the previous example. Again, for your convenience, I’ll paste the code here.

# Multi-Threading
start = time.time()

t1 = Thread(target=compute)
t2 = Thread(target=compute)
t1.start()
t2.start()
t1.join()
t2.join()

end = time.time()
print("Time taken with multi-threads:", end - start)

The code for multi-processing is mostly the same as the multi-threading. The only difference is to use the Process factory class.

# Multi-Processing
start = time.time()

p1 = Process(target=compute)
p2 = Process(target=compute)
p1.start()
p2.start()
p1.join()
p2.join()

end = time.time()
print("Time taken with multi-process:", end - start)

Then, let’s put all the above code into a script file called my_test.py. After that, we can test these scenarios by running the Python Script file.

$ python my_test.py

Here are the results. It shows that the result roughly complies with what was demonstrated in the above diagram.

If you still don’t understand what happened, I’ve created another diagram as follows. It illustrates why Multi-Processing can reduce the elapsed time.

Why Multi-Process is faster than GIL-enabled Multi-Thread?
Why Multi-Process is faster than GIL-enabled Multi-Thread?

BTW, the result cannot be accurate since my compute OS has lots of processes running while the test was conducted, but it should prove the theory at a high level.

Why Multi-Processing is not Ideal?

As shown in the earlier diagram, multi-processes don’t share anything like memory allocation, I/O resources, etc. That’s why GIL doesn’t have an impact on it. However, that is also the reason why it is not flexible enough.

In practice, the scenarios can be much more complex. Although communication between processes is still possible, it creates lots of overhead. Sometimes, the performance that we will gain from multi-processing could be exhausted.

Back to the multi-threading, multiple threads can easily work on the same memory location and together finish a big task by making use of the hardware resource in parallel. Of course, the responsibility to avoid complex threading-related bugs would be fallback to the developers.

4. It is going to be resolved!

Image by Gabriele Lässer from Pixabay
Image by Gabriele Lässer from Pixabay

Finally, the reason why I came up with the idea to introduce GIL is because it will be hopefully removed soon. The PEP 703 that is making the GIL optional in CPython has been accepted

screenshots from public website: https://peps.python.org/pep-0703/
screenshots from public website: https://peps.python.org/pep-0703/

PEP 703 – Making the Global Interpreter Lock Optional in CPython

Summary

Image by TImor from Pixabay
Image by TImor from Pixabay

In this article, I have introduced one of the most unique mechanisms in Python – Global Interpreter Lock (GIL). It is a double-edged sword. While it simplifies memory management and avoids some thread-unsafe scenarios, it also limits developers’ ability to fully utilise modern hardware such as multi-core CPUs to process tasks more effectively.

Hopefully, this article helps you to understand what are threads and processes. The acceptance of PEP 703 also gives us the hope that the flexibility of Python will be even better in the future.

Unless otherwise noted all images are by the author


Related Articles