Notes from Industry

As I continue to navigate the world of Data Science, ML, AI, whatever you want to call it I continue to find more and more terms that I used to initially be afraid of diving into. As I documented in a previous article of mine, I come from a non-technical background and often swept key software terms such as "Testing" right under the mat. After gaining more experience in building applications powered by ML I understood the necessity of a particular form of testing called load testing, also known as performance testing. Load testing is a manner in which you can simulate or test the traffic that you expect for your application. It’s really easy to throw up an endpoint on a Flask server and expect it to hold up, but in reality testing an endpoint sequentially does not show the real picture in regards to endpoint performance. That’s why we will be using an open source Python tool called Locust to simulate many concurrent users for our endpoint and see how it holds up performance wise. This type of testing is essential to understand the changes you might need to bring to optimize your project/software as well as in regards to cost savings for properly utilizing the servers you are working with.
For this article we’ll be building a basic ML powered Flask app, but we will not be focusing much on these steps as it is centered around testing the endpoint we create with our application.
Table of Contents
- Building Flask App & ML Model
- Creating Locust Script
- Simulating Users
- Entire Code & Conclusion
Building Flask App & ML Model
Before we can get to load testing we have to create our model and set it up for inference on a simple Flask app. For our model, we will be working with a regression dataset to predict Petrol Consumption using a Random Forest model. The following code helps train and deploy our model into a format that Flask can load and work with for inference.
Now that we have our model built we can work with our Flask server to make this model accessible for inference.
Our model is now accessible at the /predict path on our server for inference. To ensure that our endpoint is working properly we create another file and use the requests library to test for inference. Using another tool such as Postman is also viable in this case.
To test the script make sure to have your flask server up with flask run and execute the API test script to see the model result.

Now we have our API with our model loaded for inference and we are ready to start load testing with Locust. If you want to learn more about deploying ML models on Flask, check out this great article which is specifically geared around that topic.
Creating Locust Script
As described before, Locust helps simulate having many users simultaneously, you can define a task to execute this simulation in your locustfile.py. Tasks are the key feature in locust, every time we start a load test an instance of the class we created is created for the user and they execute tasks defined in the Python script.
First we import our necessary modules from Locust then create a class which will contain our main functionality for the simulation. Inheriting HttpUser enables us to work with HTTP requests that we want to load test, such as the endpoint in our case.
On_Start is called before any Locust task is executed and On_Stop is called when the user has stopped executing the Task. Now we create a task to post mock data to our API.
We now have our simple Locust script ready to test, for learning more about Locust and getting started with your own script check out their documentation here.
Simulating Users
With all of our resources ready we can now finally see Locust in action and view some of the metrics it provides. Make sure to have your flask server up, to run our locust file simply type locust in the terminal. If your locust script is not named locustfile.py make sure to point to it using locust -f filename.py.

You should arrive at a screen such as the following, for Host you should generally put the actual IP you are working with. Click start and you will see Locust in action.



After clicking stop in your terminal you should see some general statistics that you can also download as a csv file to run further analysis or visualization.

We see that Locust tracks various metrics such as requests/second, average throughput, response time, as well as percentiles for end to end latency. Using more advanced Locust features as you explore the documentation you can download these statistics to run your own functions or services to identify your application’s limits or when your are nearing a certain threshold for latency or throughput that you do not want to cross.
Entire Code & Conclusion
To access the entire code for the application check out the link above. Locust is a very clean way of helping load test your applications in an efficient and simple manner. There’s numerous other softwares or manners you can go about load/performance Testing, but I just wanted to highlight the general importance of load testing when trying to level your apps up to production. From saving costs to optimizing your resources, load testing is essential to be able to see how your endpoint actually holds up against the traffic that you would expect in a real-time scenario.
I hope this article was a good introduction to Locust and Load Testing in general. Feel free to connect with me on LinkedIn or follow me on Medium for more of my writing. Share any thoughts or feedback, thank you for reading!