The world’s leading publication for data science, AI, and ML professionals.

Parallel web requests in Python

Performing webrequests in parallel improves performance dramatically in Python

Performing web requests in parallel improves performance dramatically. The proposed Python implementation uses Queue and Thread to create a simple method saving a lot of time.

Photo from E Kemmel on Unsplash
Photo from E Kemmel on Unsplash

I have recently posted several articles using the Open Trip Planner as a source for the analysis of public transport. Trip routing was obtained from OTP through its REST API. OTP was running on the local machine but it still took a lot of time to make all required requests. The shown implementation in the articles is sequential. For simplicity I posted this sequential implementation but in other cases I use a parallel implementation. This article shows a parallel implementation for performing a lot of webrequests.

Though I have some experience the tutorials I found were quite difficult to master. This article contains my lessons learned and can be used for performing parallel web requests.

The core of the function is a queue with all requests performed. In this case, a request is specified by its url:

The Python package queue implements a queue with multiple producers and consumers. This means that the queue can be filled from multiple sources (Queue.put())and multiple workers (Threads) can obtain an item from the queue. The default implementation is a First In First Out (FIFO) queue, meaning that the first element added is the first element extracted. For this purpose, the default behaviour is fine.

The second component is the worker. The worker gets an element from the queue, executes the required logic and repeats this for all elements it can obtain from the queue.

When a worker is constructed, a reference to the queue is required (line 2). The Worker extends the Python class Thread. It inherits the start() method from Thread. When this method is called, a new thread is created and the run() method is called in this thread. The run method is implemented in lines 7 till 15. Line 9 retrieves an element from the queue (the element is also removed from the queue by this call). When there is no element in the queue, it blocks till a new element is added. Since we are adding URL strings to the queue, a request can be created (line 12) and executed (line 13). The result of the request is added to the list of results for this worker (line 14). Finally, line 15 notifies the queue that a task has been successfully executed.

Since we want the Thread to end when all calls have been made, we have to implement a stopping mechanism. One way is to call the Queue.get() method with a timeout value. When there is no object in the queue, an Empty exception is thrown when the timeout exceeds. Personally, I don’t like this solution. Exceptions are for exceptional situations, not for expected functionality. So this code is using a stopping value in the queue, in this case an empty string. When an empty string is retrieved, the while loop is ended, hereby ending the _run-_method. When the run method is ended, the Thread is automatically ended also.

Putting it all together:

A method is created that will perform the parallel web requests. The parameters are a list of URL’s and the number of worker threads to create. After creating the queue (lines 21–24) a set of workers is created. Each worker is attached to the queue and started. The threads will be stopped when an empty string is retrieved from the queue, so for each worker an empty string is added to the queue (line 33–34). Because we have a FIFO queue these will be retrieved from the queue last. By joining the worker, the next code is executed when all workers have ended.

All workers store the results in their own memory space. All these results must be combined before returning them to the caller (lines 40–42). With this method in place, making multiple web calls in parallel is implemented in one line of code (line 46). The code size can be reduced by combing a couple of loops but for readability they are separated.

Now, it is time to answer the question how much it improves performance. So a small experiment is implemented. By sending a thousand requests to a local instance of OTP it is possible to establish the impact of the number of workers. A local server instance is used to reduce the impact of network traffic and internet speed. To prevent some sort of caching all requests are different. By timing the 1000 requests we can determine the total number of calls that can be made in an hour (throughput).

Throughput is calculated and plotted as function of the number of workers:

Throughput (image by author)
Throughput (image by author)

With only one worker, the performance is equal to an implementation without parallelism. It will be slightly worse due to the added overhead of threading but this is not significant compared to the other measurements.

Without threading, the througput is 11.000 calls per hour. By adding threads this is improved to 56.000. After reaching this optimum, adding more threads does not improve performance, it even decreases a little. This test is performed on a CPU with 8 cores so finding the optimum at 8 is as expected. OTP is implemented multi-threaded and with 8 cores it can handle 8 requests in parallel. Depending on the other jobs running on the system, the optimum is equal to the number of cores or slightly less.

In our OTP example we can increase the throughput by a factor 5. With the amount of requests made for making the OTP graphs in my other articles, runtime is decreased from 6 hours to a little below 1. This performance increase can be realised by implementing this relative simple multi-threaded method. I hope it can save you some precious time to!

I hope you enjoyed this article. For inspiration on using OTP, check some of my other articles:

If you like this story, please hit the Follow button!

Disclaimer: The views and opinions included in this article belong only to the author.


Related Articles