The world’s leading publication for data science, AI, and ML professionals.

How To Build An AutoML API

Simple guide to building reusable ML classes

Image from Unsplash by Scott Graham
Image from Unsplash by Scott Graham

There’s been a lot of interest in [AutoML](https://awesomeopensource.com/projects/automl) recently. Ranging from open-source projects to scalable algorithms in the cloud, there’s been a surge in projects that make ML more accessible for non-technical users. Examples of AutoML in the Cloud includes SageMaker Canvas and Azure AutoML to list a few. For this article we’ll not be working on the cloud, rather I’ll show you how to build a simple Python API that automates solving Regression problems. There’s numerous open source projects out there that are tackling AutoML.

I’ll be walking you through the process of building your own AutoML class. Even if you don’t end up creating your own package, building reusable classes is great Programming practice. You can also build APIs that can automate a lot of redundant tasks in ML. An example of this would be building a preprocessing class for NLP projects, you could reuse your API for text based projects. To follow along with this article you should have a good understanding of Python and Object Oriented Programming.

Table of Contents

  1. Setup
  2. Building Your API
  3. Testing API
  4. Next Steps & Conclusion

1. Setup

Before we can get to work with building our API, you will need the following libraries installed: pandas, numpy, and sklearn. The following block of code will contain the imports we are dealing with.

Now that you have all the requisite libraries installed, the next step is going to be setting up the class that we are dealing with. There are four main arguments for our simple API: dataset, target column, model type, and metric type. For the dataset we will provide a path to our CSV dataset. For target column we need to select the column we are trying to predict as it is stated in the dataset. For model type we have two options in Linear and Random Forest Regression that you can choose between. Lastly, for metric type there’s Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).

There’s also a few arguments we will populate with our class methods.

Now we’re ready to build our methods for preprocessing, model training, and evaluation.

2. Building Your API

The first step for building our AutoML API is going to be a preprocessing function. We’ll be reading in the data and then splitting it into train and test sets. We also check to see if our target column argument is present within the dataset that has been provided.

Now that we have our train and test data, we can focus on our model building/training method.

At the moment there’s only two models offered: linear_regression and random_forest. We check to make sure that you’ve entered one of these two models and then depending on your selection the chosen model is created and trained.

Now we can return a metric based off of the metric_type argument. For that we can create an evaluate method where we can return either MAE or RMSE based off of the input for that argument.

First we check to make sure the input is one of the metrics that is supported and then we return the user selection. If we want complete information we can make logModelInfo method that has model type and both metrics. Generally as API’s get more sophisticated you will want a logging method that tracks your query.

3. Testing API

Now we can test our sample API with a few sample inputs.

Here we request a Linear Regression model with an MAE metric.

Sample Output (Screenshot by Author)
Sample Output (Screenshot by Author)

We can try another sample with a Random Forest model and RMSE metric.

Sample Output (Screenshot by Author)
Sample Output (Screenshot by Author)

4. Next Steps & Conclusion

To access the entire code for the example check out this link. As you can see it’s a pretty simple process to build a Python class for ML. To build sophisticated real-world level packages, it’s only a few extra steps and a setup that is very reproducible, check out this resource to get started. There’s an extensive amount of AutoML open-source projects you can start contributing on. It’s crucial even as Data Scientists to get familiar with building classes, packages, and also understanding the open-source tools or AutoML services you’re using at a deeper level.


If you enjoyed this article feel free to connect with me on LinkedIn and subscribe to my Medium Newsletter. If you’re new to Medium, sign up using my Membership Referral.


Related Articles