A step-by-step guide to understanding and using AutoNLP from scratch

Developing an end-to-end Natural Language Processing model is not an easy task. This is because several factors must be considered, such as model selection, data preprocessing, training, optimization, the infrastructure where the model will be served among other factors. For this reason, interesting alternatives are emerging today to streamline and automate this set of tasks. Such is the case with AutoNLP, a tool that allows us to automate the end-to-end life cycle of an NLP model.
In this blog, we will see what AutoNLP is and how to use it well as the installation process, project creation, training, metrics, cost estimation, and model serving. So this blog will cover the following sections:
- What is AutoNLP?
- AutoNLP in practice
What is AutoNLP?
AutoNLP [1] is a tool to automate the process of creating end-to-end NLP models. AutoNLP is a tool developed by the Hugging Face [2] team which was launched in its beta phase in March 2021. AutoNLP aims to automate each phase that makes up the life cycle of an NLP model, from training and optimizing the model to deploying it.
"AutoNLP is an automatic way to train and deploy state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem." – AutoNLP team
One of the great virtues of AutoNLP is that it implements state-of-the-art models for the tasks of binary classification, multi-class classification, and entity recognition, supported in 8 languages which are: English, German, French, Spanish, Finnish, Swedish, Hindi, and Dutch. Likewise, AutoNLP takes care of the optimization and fine-tuning of the models. In the security and privacy part, AutoNLP implements data transfers protected under SSL, also the data is private to each user account.
As we can see, AutoNLP emerges as a tool that facilitates and speeds up the process of creating NLP models. In the next section, will see how the experience was like from start to finish when creating a Text Classification model using AutoNLP.
AutoNLP in practice
For this example, we are going to tackle a binary classification problem. The dataset was taken from the Semantic Analysis at SEPLN (TASS) [3] workshop, which consists of tweets in Spanish labeled with 3 classes negative, positive and neutral. For the purpose of this example, we will remove the samples from the neutral class to have a dataset that fits a binary classification problem. You can download the dataset here. In Figure X we can observe the characteristics of the training and validation datasets.

Now, we don’t have to do anything else to the dataset. What follows is to start using AutoNLP, so let’s go for it!
Installing AutoNLP
To make use of the Hugging Face infrastructure through the AutoNLP tool, we need to register and create an account in which our models will be contained as well as our datasets. This account will provide a token that will be used to establish communication between AutoNLP CLI and the Hugging Face infrastructure.
Install AutoNLP can be done directly from the command line via pip
, as follows:

We are also going to require installing Git Large File Storage (Git LFS). In my case since I am working on a macOS operating system, I do it as follows:

Then, for setting up git-lfs
you need type:

Once autonlp
is installed and its requirements, we proceed to create a project, upload the data and train our model, let’s go for it.
Creating an AutoNLP project
The first step to create a project is to authenticate, for this, we will only need the token that is in the settings of your account, using autonlp
CLI we type:

Once authenticated, we proceed to create our project. For this example, our project will be called polarity_detection
, which will process data in the spanish
language, the task will be binary_classification
and the maximum number of models we want to train is 5
. Then the command looks like the following figure:

When creating our project, the terminal will show us information regarding our created project. The information of our example project is shown in Figure 8.

As we can see, the information shows us details such as the identifier of our project (in this example it is 128
), attributes such as the name
, owner
, etc. However, a very important piece of information that is also displayed is the cost. At this stage, the cost still shows USD 0.00
because we have not yet trained any model, therefore, this value will change only when the training of our models has finished, which we will see in detail later.
Uploading your data
Once our project is created, we will proceed to upload our datasets. It is recommended to upload the training and validation datasets separately. To upload the datasets it is enough to assign the name of the project (in our case it is polarity_detection
), the type of split (that is, train
or valid
), the mapping of the name of the columns (in our case we have tweet
and polarity
which are mapped to text
and target
respectively) and the dataset file. The following figure shows how the commands would look:

As with the creation of the project, when uploading datasets in the terminal we will be shown information relevant to our process, in this case, the information of our datasets is shown in the following figure:

Once the datasets are uploaded, we would be ready to start the training. However, an important aspect to consider is the cost. AutoNLP provides us with a command to estimate the cost of our project considering the number of samples in our training dataset. To obtain the estimate, we use the command shown in the following figure:

As we can see, the cost is provided in a range, for this example, the cost is estimated to be 7.5
to 12.5 USD
. The final cost is provided once the training ends, which we will see below.
Training
To start with the training of the models, we only need to use the train
argument as well as the project name, as shown in the following figure:

Once the training has been launched, we will be shown a question about whether we agree with the estimated cost (that is, the cost range that we saw in the previous section), when accepting the estimated cost, we will observe the status of each model as in the following figure:

Depending on the number of models we have launched as well as the characteristics of our dataset, the training time will vary. However, we can monitor the status of our models with the project_info
argument, as shown in the following figure:

As can be seen in the previous image, each of the models launched has finished satisfactorily (remember that we only launched 5 models). Also, the updated final cost is shown, which is 11.70 USD
.
During the training of each model, we can monitor some metrics, for practical purposes of this example, we only show the metrics obtained at the end of the training of all the models. Therefore, to visualize the metrics we make use of the metrics
argument as well as the name of the project, as shown below:

The reported metrics are loss
, accuracy
, precision
, recall
, AUC
and f1-score
for each trained model. In our case, we see that on average the accuracy
is 0.85
, which could be an acceptable value given the characteristics of our dataset. We can also see that it shows us the total cost of our training again.
Inferring
Once our models are trained, they are ready to make predictions. In order to make predictions, we are going to use the predict
argument as well as the model identifier, the name of the project and the phrase to be predicted, as shown in the following figure:

As we can see in the previous figure, the sentence pretends to be positive and indeed, the model yielded a score of 0.99
for the positive class. In the second example, the sentence pretends to be negative and indeed, the model yielded a score of 0.99
for the negative class.
Likewise, AutoNLP allows making predictions through a cURL request and through Python API as shown in Figures 17 and 18 respectively.


Conclusion
In this tutorial blog, we saw what AutoNLP is, its components, and how it works.
It is important to mention that once we have used the Hugging Face infrastructure through AutoNLP, we will receive an invoice with the amount shown on the command line.
My experience testing AutoNLP was pleasant. I had some problems when training for the first time however the support they provide is efficient.
References
[1] AutoNLP
[2] Hugging Face