The world’s leading publication for data science, AI, and ML professionals.

Genetic Algorithm in R: Hyperparameter Tuning

Tune your model's hyperparameter using a Genetic Algorithm in R

Photo by Markus Spiske on Unsplash
Photo by Markus Spiske on Unsplash

Motivation

Supervised learning becomes one of the main tasks in the Machine Learning (ML) domain, where the goal of supervised learning itself is to predict the target/dependent variable given the features/independent variables in the data. One of the subjects of study in ML that has been researched in these recent years is hyperparameter tuning.

One of the popular techniques that we can use for hyperparameter tuning is by using the Grid Search (GS) algorithm. Despite being a popular algorithm for hyperparameter tuning in ML, GS is very time-consuming and computationally-expensive when it comes to large datasets and/or multiple hyperparameters. For this reason, several alternative techniques are developed, one of them is by using the Genetic Algorithm. In this article, I will tell you how to tune your hyperparameter using a Genetic Algorithm in R.

So, what is Hyperparameter Tuning?

Hyperparameter tuning in the ML model can largely affect its predictive performance, thus it is important to set a suitable hyperparameter for the model. Traditionally, hyperparameter tuning in the ML model is usually performed by trial and error process. Depending on how many hyperparameters exist in the ML model, this process can be very exhausting, especially when we working with large data.

The hyperparameter tuning problem is usually treated as an optimization problem, where the objective function that we want to optimize is the predictive performance of the model itself. The challenges that usually happened in hyperparameter tuning [3] are as follows.

  1. Hyperparameter setting in one dataset can lead to high predictive performance but could be not in another dataset.
  2. Hyperparameter is sometimes dependent on each other, for example, ntree (number of trees) and mtry(number of variables that randomly selected) in Random Forest.
  3. The time-consuming process of hyperparameter tuning.

Genetic Algorithm for Hyperparameter Tuning

The idea of the Genetic Algorithm is to gain the optimal solutions of the objective function by selecting the best or fittest solution alongside the rare and random mutation occurrence. For those who want to understand how the algorithm works, I created an article that explains the algorithm’s concept here.

In this case, since we want to tune the hyperparameter on the ML model, then the "population" that is used in this algorithm is the list of models with different values of hyperparameters. The pseudocode of the genetic algorithm for hyperparameter tuning can be seen as follows.

Genetic Algorithm for Hyperparameter Tuning pseudocode (Image by the author)
Genetic Algorithm for Hyperparameter Tuning pseudocode (Image by the author)

Case Study

In this article, I implement the genetic algorithm for hyperparameter tuning using the Concrete Compressive Strength Dataset from the UCI Machine Learning Repository. The goal of this dataset is to predict the concrete compressive strength value, hence the regression problem.

The Concrete Compressive Strength Dataset (Image taken from UCI Machine Learning Repository)
The Concrete Compressive Strength Dataset (Image taken from UCI Machine Learning Repository)

The variables information is as follows.

  1. Cement (component 1) – quantitative – kg in a m3 mixture – Input Variable
  2. Blast Furnace Slag (component 2) – quantitative – kg in a m3 mixture – Input Variable
  3. Fly Ash (component 3) – quantitative – kg in a m3 mixture – Input Variable
  4. Water (component 4) – quantitative – kg in a m3 mixture – Input Variable
  5. Superplasticizer (component 5) – quantitative – kg in a m3 mixture – Input Variable
  6. Coarse Aggregate (component 6) – quantitative – kg in a m3 mixture – Input Variable
  7. Fine Aggregate (component 7) – quantitative – kg in a m3 mixture – Input Variable
  8. Age – quantitative – Day (1~365) – Input Variable
  9. Concrete compressive strength – quantitative – MPa – Output Variable

For this case, I use Random Forest (for discrete value hyperparameters case) and Gradient Boosting (for real-value hyperparameter case) model.

Data Preprocessing

First, we import the data and the library that we need.

library(car)
library(MASS)
library(tseries)
library(lmtest)
library(tidyverse)
library(GA)
library(mice)
library(caret)
library(caTools)
library(rsample)
library(gbm)
library(glmnet)
library(tictoc)
library(randomForest)
#Data Importing
data=read.csv("data_input/Concrete_Data.csv",sep=';')
summary(data)
str(data)

We check the data summary as follows.

> str(data)
'data.frame': 1030 obs. of  9 variables:
 $ Cement                       : num  540 540 332 332 199 ...
 $ Blast.Furnace.Slag           : num  0 0 142 142 132 ...
 $ Fly.Ash                      : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Water                        : num  162 162 228 228 192 228 228 228 228 228 ...
 $ Superplasticizer             : num  2.5 2.5 0 0 0 0 0 0 0 0 ...
 $ Coarse.Aggregate             : num  1040 1055 932 932 978 ...
 $ Fine.Aggregate               : num  676 676 594 594 826 ...
 $ Age                          : int  28 28 270 365 360 90 365 28 28 28 ...
 $ Concrete.compressive.strength: num  80 61.9 40.3 41 44.3 ...

Then, we check whether there are missing values that exist in the dataset using md.pattern from micelibrary as follows.

> md.pattern(data)
 /     /
{  `---'  }
{  O   O  }
==>  V <==  No need for mice. This data set is completely observed.
   |/  /
  `-----'
Cement Blast.Furnace.Slag Fly.Ash Water Superplasticizer Coarse.Aggregate Fine.Aggregate Age
1030      1                  1       1     1                1                1              1   1
          0                  0       0     0                0                0              0   0
     Concrete.compressive.strength  
1030                             1 0
                                 0 0
The missing value analysis (Image by the Author)
The missing value analysis (Image by the Author)

We can see that there are no missing values, so we are good to go to the next step. Then, we split the data into training and testing data with 70:30 proportions as follows.

#Data Splitting
split=initial_split(data,prop=0.7)
data.train=training(split)
data.test=testing(split)

Implementations (discrete value hyperparameters)

For discrete value hyperparameters case, we use the Random Forest model with the hyperparameter that we want to tune is ntree and mtry . Since ntree and mtry are discrete value hyperparameters, we use the binary encoding for the optimization process. Here I set the ntree range from 1 to 512 and mtry from 1 to 8 (you can set a larger range, but remember that random forest is pretty time-consuming, so go ahead as long as you have the patience to wait). To check how many bits that we need, we can calculate it by multiplying the maximum value of each hyperparameter and add it with number of hyperparameters as follows.

> log2(512*8)+2
[1] 14

From the calculation above, we need 14 bits. If the converted value of ntree and mtry is 0, we change it to 1 (since the minimum value range is 1). Then, if the converted value of ntree is more than 512, we change it to 512 (since the maximum value range is 512). And also, if the converted value of mtryis more than 8, we change it to 8 (since the maximum value range is 8). Next, we create the objective function that we want to implement. Remember that the genetic algorithm’s purpose is to maximize the objective value and we want the RMSE value as small as possible, so we set the RMSE value in return to negative.

fit_rf=function(x)
{
  ntree=binary2decimal(x[1:10]) #ntree from 1 to 2^9
  mtry=binary2decimal(x[11:14]) # mtry from 1 to 2^3
  if(ntree==0)
  {
    ntree=1
  }
  if(mtry==0)
  {
    mtry=1
  }
  if(ntree>512)
  {
    ntree=512
  }
  if(mtry>8)
  {
    mtry=8
  }
  model=randomForest(Concrete.compressive.strength~.,data=data.train,mtry=mtry,
            ntree=ntree)
  predictions=predict(model,data.test)
  rmse=sqrt(mean((data.test$Concrete.compressive.strength-predictions)^2))
  return(-rmse) #since GA maximize the objective function and we want to minimize RMSE
}

Now, let’s see how to implement it to gafunction by writing these lines of code.

tic()
GA3=ga(type='binary',fitness=fit_rf,nBits=14,
       maxiter=30,popSize=50,seed=1234,keepBest=TRUE)
summary(GA3)
plot(GA3)
toc()
Genetic algorithm for Random Forest hyperparameter tuning result (Image by the Author)
Genetic algorithm for Random Forest hyperparameter tuning result (Image by the Author)
> tic()
> GA3=ga(type='binary',fitness=fit_rf,nBits=14,
+        maxiter=30,popSize=50,seed=1234,keepBest=TRUE)
GA | iter = 1 | Mean = -5.277882 | Best = -4.946697
GA | iter = 2 | Mean = -5.152451 | Best = -4.946697
GA | iter = 3 | Mean = -5.182143 | Best = -4.946697
GA | iter = 4 | Mean = -5.115866 | Best = -4.921109
GA | iter = 5 | Mean = -5.075498 | Best = -4.921109
GA | iter = 6 | Mean = -5.132925 | Best = -4.921109
GA | iter = 7 | Mean = -5.089994 | Best = -4.921109
GA | iter = 8 | Mean = -5.126774 | Best = -4.921109
GA | iter = 9 | Mean = -5.078846 | Best = -4.921109
GA | iter = 10 | Mean = -5.163979 | Best = -4.919853
GA | iter = 11 | Mean = -5.205034 | Best = -4.919853
GA | iter = 12 | Mean = -5.207537 | Best = -4.919853
GA | iter = 13 | Mean = -5.098879 | Best = -4.919853
GA | iter = 14 | Mean = -5.118728 | Best = -4.919853
GA | iter = 15 | Mean = -5.202860 | Best = -4.919853
GA | iter = 16 | Mean = -5.145285 | Best = -4.919853
GA | iter = 17 | Mean = -5.107588 | Best = -4.919853
GA | iter = 18 | Mean = -5.032939 | Best = -4.919853
GA | iter = 19 | Mean = -5.041192 | Best = -4.885373
GA | iter = 20 | Mean = -5.039374 | Best = -4.885373
GA | iter = 21 | Mean = -5.034047 | Best = -4.885373
GA | iter = 22 | Mean = -5.030971 | Best = -4.885373
GA | iter = 23 | Mean = -5.023164 | Best = -4.885373
GA | iter = 24 | Mean = -5.026200 | Best = -4.829599
GA | iter = 25 | Mean = -5.077859 | Best = -4.829599
GA | iter = 26 | Mean = -5.080206 | Best = -4.829599
GA | iter = 27 | Mean = -5.033013 | Best = -4.829599
GA | iter = 28 | Mean = -5.071353 | Best = -4.809166
GA | iter = 29 | Mean = -5.057733 | Best = -4.809166
GA | iter = 30 | Mean = -5.048505 | Best = -4.809166
> summary(GA3)
-- Genetic Algorithm -------------------
GA settings: 
Type                  =  binary 
Population size       =  50 
Number of generations =  30 
Elitism               =  2 
Crossover probability =  0.8 
Mutation probability  =  0.1
GA results: 
Iterations             = 30 
Fitness function value = -4.809166 
Solution = 
     x1 x2 x3 x4 x5 x6 x7 x8 x9 x10  ...  x13 x14
[1,]  0  0  0  1  0  1  0  1  0   0         0   1
> plot(GA3)
> toc()
1101.77 sec elapsed

We can see the fittest ntree and mtry value as follows.

> ga_rf_fit=as.data.frame(GA3@bestSol)
> ga_ntree=apply(ga_rf_fit[, 1:10],1,binary2decimal)
> ga_mtry=apply(ga_rf_fit[, 11:14],1,binary2decimal)
> ga_ntree
[1] 76
> ga_mtry
[1] 4

From the result above, the RMSE that produced from the best individual in each generation is decreasing in each generation, where the fittest result in the final generation achieved with 76 ntree and 4 mtry with the RMSE value is 4.809166.

Implementations (real-valued hyperparameters)

For real-valued hyperparameters case, we use the Gradient Boosting model with the hyperparameter that we want to tune is n.trees , shrinkage , and interaction.depth . Note that n.trees and interaction.depth are discrete value hyperparameters, so the trick is applying the floor function to n.trees and interaction.depth value in the model function as follows.

fit_gbm=function(ntree,shrinkage,interaction)
{
  model=gbm(Concrete.compressive.strength~.,data=data.train,distribution="gaussian",
            n.trees=floor(ntree),shrinkage=shrinkage,interaction.depth=floor(interaction))
  predictions=predict(model,data.test)
  rmse=sqrt(mean((data.test$Concrete.compressive.strength-predictions)^2))
  return(rmse)
}

Then, we implement the objective function into the ga function as follows.

tic()
GA2=ga(type='real-valued',fitness=function(x) -fit_gbm(x[1],x[2],x[3]),
      lower=c(1,1e-4,1),upper=c(512,1e-1,3),maxiter=30,popSize=50,seed=1234,keepBest=TRUE)
summary(GA2)
plot(GA2)
toc()
Genetic algorithm for Gradient Boosting hyperparameter tuning result (Image by the Author)
Genetic algorithm for Gradient Boosting hyperparameter tuning result (Image by the Author)
> summary(GA2)
-- Genetic Algorithm -------------------
GA settings: 
Type                  =  real-valued 
Population size       =  50 
Number of generations =  30 
Elitism               =  2 
Crossover probability =  0.8 
Mutation probability  =  0.1 
Search domain = 
       x1    x2 x3
lower   1 1e-04  1
upper 512 1e-01  3
GA results: 
Iterations             = 30 
Fitness function value = -4.939432 
Solution = 
           x1         x2       x3
[1,] 470.8652 0.08298563 2.346486
> plot(GA2)
> toc()
72.26 sec elapsed

From the result above, the RMSE that produced from the best individual in each generation is decreasing in each generation, where the fittest result in the final generation achieved with 470 n.trees , 0.08298563 shrinkage , and 2 interaction.depth with the RMSE value is 4.939432.

Conclusions

And there it is, you can implement the genetic algorithm for hyperparameter tuning in R. You can try different datasets or try to implement it on the classification problem. Moreover, you can try the different prediction performance metrics to see the different results. Always remember, ML is a vast domain, so always try to experiment!

As usual, feel free to ask or discuss in my contact below if you have any questions! See you in my next post!

When it comes to hyperparameter tuning.... (Image generated from Imgflip)
When it comes to hyperparameter tuning…. (Image generated from Imgflip)

Author’s Contact

LinkedIn: Raden Aurelius Andhika Viadinugroho

Medium: https://medium.com/@radenaurelius

References

[1] I-Cheng Yeh, Modeling of the strength of high-performance concrete using artificial neural networks (1998), Cement and Concrete Research.

[2] https://rpubs.com/Argaadya/550805

[3] Rafael G. Mantovani, Tomáš Horváth, Ricardo Cerri, Joaquin Vanschoren, and André C.P.L.F. de Carvalho, Hyper-Parameter Tuning of a Decision Tree Induction Algorithm (2016), 2016 5th Brazilian Conference on Intelligent Systems.

[4] https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength

[5] Nikolaos Gorgolis, Ioannis Hatzilygeroudis, Zoltan Istenes, and Lazlo – Grad Gyenne, Hyperparameter Optimization of LSTM Network Models through Genetic Algorithm (2019), 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA).

[6] Luca Scrucca, GA: A Package for Genetic Algorithms in R (2013), Journal of Statistical Software.

[7] https://www.rdocumentation.org/packages/GA/versions/3.2/topics/ga


Related Articles