The world’s leading publication for data science, AI, and ML professionals.

How To Perform an Automatic Hyperparameter Optimization In Python?

An explanation of a nifty hyperparameter tuning technique on a Twitter sentiment analysis problem

Photo by Denisse Leon on Unsplash
Photo by Denisse Leon on Unsplash

Hyperparameter tuning is one of the most important parts of a Machine Learning life cycle. This is computationally expensive and also a time-consuming process.

During my Master’s program, I stumbled upon Optuna which is an automatic hyperparameter optimization framework. One interesting aspect is that we can use Optuna with standard machine learning algorithms and also neural network methods.

In this article, I will document all the things that I found useful about Optuna by showing some examples. Most importantly, I will address the common issues that you are going to experience while testing out this Optuna library.


By continuing to read this post, you will discover:

  1. How to model a sentiment analysis problem using XGBoost and LSTM.
  2. How to integrate Optuna to both XGBoost and LSTM models and perform hyperparameter tuning.
  3. How we can mitigate some common pitfalls when using Optuna.

Interested? Start reading!


The Problem

Photo by Jeremy Zero on Unsplash
Photo by Jeremy Zero on Unsplash

We can pick any problem for this but I want it to be a bit more interesting so I chose Sentiment Analysis of Twitter data.

Sentiment analysis falls in the domain of Natural Language Processing where we use several text manipulation methods to make sense of the textual data and obtain insights from it. The most popular use of this is to identify the polarity or sentiment of a tweet.

In simple words, we aim to classify a tweet/ phrase or a sentence into a set of emotions such as positive, negative or neutral. This is a broader classification but we can go a step further and classify the tweet sentiment in more details such as very happy, moderately happy, and similar versions of sad, angry, disgusted etc.

The dataset I used comes from a Kaggle competition.

Data: Under Creative Commons Attribution 4.0 International License

Although, the competition talks about tweet extraction, I repurposed the data into sentiment analysis.

Sample Tweets from the dataset (Source: Author)
Sample Tweets from the dataset (Source: Author)

For the experiments, I am going to use the text and sentiment features, where I will build a machine learning model that takes a tweet as input and tells us the emotion/ sentiment of it. Since we have 3 types of sentiments (Positive, Negative and Neutral), this is a case of a multi-class classification task.


Data Preprocessing

Not every part of the tweet is important for the text processing we do. Some aspects of the tweet like numbers, symbols, stopwords are not so useful for sentiment analysis.

So we just remove them in the preprocessing step. I used nltk python library and regular expressions to remove stopwords, emails, URLs, numbers, white spaces, punctuations, special characters and Unicode data.

The code looks like this:

As I mentioned before, we are going to use two different methods for sentiment analysis namely, XGBoost Classifier and LSTM neural network architecture.


XGBoost Classifier

Photo by Haithem Ferdi on Unsplash
Photo by Haithem Ferdi on Unsplash

After "cleaning" the text data, the next step is Vectorization. Here, we just convert the text into a numerical format so that the machine learning model can ‘understand’ it.

You can observe that data structures such as Text, Images, Graphs etc need to be converted into numerical representations before building an ML model.

Vectorization

To vectorize the text, we can simply use a Count Vectorizer method from Sci-Kit Learn. Basically, we transform the text into a sparse matrix of unique words where we use numbers to indicate the presence of a word in our text example.

We will divide the data into train, validation and test sets in the split ratio of – 80:10:10. The split is stratified so that we have an equal proportion of labels/ sentiments in all data splits.

You can use the following code to do this:


Optuna Integration

Now, we are ready to train the model and tune the hyperparameters. Install Optuna by:

pip install optuna

In the following code, you will notice an objective function that is being optimized by Optuna. Firstly, we define the hyperparameters that we are interested in tuning and add them to the trial object. Here, I chose to tune learning_rate, max_depth and n_estimators . Depending on the type of hyperparameter, we can use methods such as suggest_float, suggest_int, suggest_categorical .

Inside this objective function, we create an instance of the model and fit it on the training set. After training, we predict the sentiment on the validation set and calculate the accuracy metric. The Optuna’s objective function will try to maximize this accuracy score by performing trials with different values of hyperparameters. Different sampling techniques can be employed during this Optimization.

We can rewrite the objective function to work with the loss value of the model. In this case, we will try minimize the objective function.

An early-stopping method is implemented in the form of pruning. The trial will be skipped/ pruned if it seems unpromising.

You might have noticed the set_user_attr method. This is used to save any variable which we might find important. Here we are interested in saving the best model that is associated with the highest validation accuracy. We save the best XGboost model in this user attribute.

During the Optuna optimization process, this is what you see:

Hyperparameter tuning by running Trials (Source: Author)
Hyperparameter tuning by running Trials (Source: Author)

The number of trials can be higher if you want Optuna to cover a wider range of hyperparameter values.

After the trials have finished, we can retrieve a hyperparameter importance plot which is shown below:

XGBoost Hyperparameter Importance (Source: Author)
XGBoost Hyperparameter Importance (Source: Author)

We observe that learning_rate is a more important hyperparameter than the rest of them. With this, we understand which hyperparameters we need to focus on.


Predicting on the Test set

So we have finished with our model training and hyperparameter tuning. We performed 20 trials to find the optimal hyperparameters. Now we can retrieve our best model and make a prediction on the test set.

# retrieve the best model from optuna study
best_model = study.user_attrs['best_model']
y_pred = best_model.predict(x_test)
print(accuracy_score(y_test, y_pred))

Test Accuracy (XGBoost): 0.683

Not a shabby score! Let’s see if we can do better.


LSTM Architecture

Photo by Taylor Vick on Unsplash
Photo by Taylor Vick on Unsplash

Long short-term memory neural network architecture is popular in the domain of Natural Language Processing as it has the capability to retain the sequence information in its "memory".

Just like XGBoost, we should vectorize the text data in order to train the LSTM model. We perform tokenization and then pad the vectorized sequences into the same length.

The data is split in a similar fashion to that of the XGBoost model so that we can have a comparison between the two.


Tokenization and Padding

Now, we define the LSTM model as follows:

I selected optimizer, epochs and batch_size as the tunable hyperparameters.

This neural network model is now ready to train!!

Let’s integrate the Optuna to perform the hyperparameter tuning while we train the LSTM model.

The code for this Optuna integration looks something like this:

The structure for this Optuna integration is the same. We just change the model and hyperparameters inside the objective function.

Similarly, we obtain the hyperparameter importance plot for LSTM:

LSTM Hyperparameter Importance (Source: Author)
LSTM Hyperparameter Importance (Source: Author)

We see that optimizer is an important hyperparameter and batch size was not contributing much to the improvement in the accuracy score.


Issues that I faced

For XGBoost we could save the model directly but Optuna gives some errors when you are trying to save the Keras model in a similar fashion. From my search, I found that this is because the Keras model is non-pickleable?!

A workaround for this is to just save the weights for the best model and then use these weights to reconstruct the model.

The following code will explain more about this:

You will just create a new instance of the model and set the weights retrieved from Optuna, instead of training it again.

The test accuracy score obtained with LSTM:

Test accuracy (LSTM): 0.72

This score is better than XGBoost. Often, neural network methods perform better than standard machine learning methods. We can improve this accuracy score even further by using Transformer architectures such as BERT, RoBERTa or XLNet.

Finally, I enjoyed using Optuna for hyperparameter tuning. I could easily retrieve the best model from all the different trials and also understand which hyperparameter is important to tune during the training process (using the hyperparameter importance plot).


If you have reached this part of the post, thank you for reading and also your attention. I hope you found the post informative and if you have any questions feel free to reach me on LinkedIn, Twitter or GitHub.


Related Articles