
Many people enter the data science journey by studying and applying Decision Tree algorithms. That’s no surprise as this algorithm is probably the most explainable one and that mimics human-level decision making quite well.
Understanding Decision Trees has another huge advantage: they are the base for the most famous boosting (Extreme Gradient Boosting) and bagging (Random Forest) algorithms that have swooped Kaggle competitions and solved a myriad of business problems all over the world.
After you grasp the details of how a decision tree is built and how it chooses the features that are used to perform key splits in the data, you immediately unterstand that there are a lot of decisions to be done when we start to fit it to solve problems – namely:
- How deep should the tree go?
- What’s the relevant number of examples one should consider to split a node?
- How many examples do we consider to make a decision?
- What’s the minimum gini / entropy gain threshold to consider a new split?
All these questions seem a bit arbitrary. But, you might notice that they are all tied to a a key feature of decision trees – hyperparameters: the set of values that are not learned by the model but parametrized by the user.
Tweaking these hyperparameters is crucial to achieve the end-goal of all machine learning algorithms— generalization power. And, in decision trees they are, arguably, even more important as tree-based algorithms are ultra-sensitive to small changes in the hyperparameter space.
Training hyperparameters is a fundamental task for data scientists and machine learning engineers all around the world. And, understanding the individual impact of each one of those parameters will make you trust and explain your performance much better.
In this post, we are going to use R and the mlr
library to optimize decision tree hyperparameters. I also want to show you how to visualize and evaluate the impact of each parameter in the perfromance of our algorithms. For our example, we will use the mythical Titanic dataset, available in Kaggle.
Let’s start!
Loading the Data
Before loading the data, let’s call all the dependencies of our code:
library(dplyr)
library(rpart)
library(rpart.plot)
library(Metrics)
library(mlr)
library(ggplot2)
library(plotly)
Describing our dependencies:
dplyr
to perform some data wrangling tasks.rpart
to fit decision trees without tuning.rpart.plot
to plot our decision trees.Metrics
to assess the performance of our models;mlr
to train our model’s hyperparameters.ggplot2
for general plots we will do.plotly
for 3-D plots.
The Titanic dataset is a csv file that we can load using the read.csv
function. This dataset contains information about the Titanic passengers with the following columns:
- Survived – Flag indicating if the passenger survived the Titanic crash.
- pclass— The ticket class of the passenger.
- sex—The gender of the passenger.
- Age—Age in years.
- sibsp— Number of siblings/spouses aboard the Titanic.
- parch— Number of parents/children aboard the Titanic.
- ticket—The ticket number.
- fare— Passenger fare.
- cabin – The cabin number of the passenger.
- embarked – Port of embarkation of the passenger.
I’ve loaded it using:
titanic <- read.csv('train.csv')
To simplify, I’ll only use a subset of the original columns from the titanic
dataframe – let me select them using dplyr
:
titanic <- titanic %>%
select(Fare, Age, Sex, Pclass, Survived, SibSp, Parch)
Let’s also split our data into train an test – leaving 20% of the data as an holdout group:
# Splitting data into Train and Test
titanic['row_id'] = rownames(titanic)
set.seed(123)
train_data <- titanic %>%
sample_frac(0.8)
test_data <- titanic %>%
anti_join(train_data, by='row_id')
# Drop row_id from both dataframes
train_data[,'row_id'] <- NULL
test_data[,'row_id'] <- NULL
Although we will use cross validation inside our hyperparameter tuning (I’ll refer this a bit later) – the test set will be used to make sure that we are not overfitting our train or cross-validation sets. For Decision Trees, this is extremely important as they are very prone to high variance.
Before we move on, let’s check a preview of our data:

Cool, we have our data in-place, let’s fit our first decision tree!
Fitting First Decision Tree
For a first vanilla version of a decision tree, we’ll use the rpart
package with default hyperpameters.
d.tree = rpart(Survived ~ .,
data=train_data,
method = 'class')
As we are not specifying hyperparameters, we are using rpart’s default values:
- Our tree can descend until 30 levels –
maxdepth = 30
; - The minimum number of examples in a node to perform a split is 20 –
minsplit = 20
; - The minimum number of examples in a terminal node is 7 –
minbucket = 7
; - The split must increase the "performance" (although it’s not that direct, we can consider "performance" a proxy for
cp
) of the tree by at least 0.01 –cp = 0.01
;
How do we know that these are the best hyperparameters for our data? These are random choices and using the default is a risky bet.
Maybe we can separate the nodes a bit further. Or maybe we’re using a low sample to make decisions based on a low minsplit
and minbucket
. Before move on, here is our tree:

We see that this is a relatively shallow tree with 4 levels. Let’s check the accuracy on the test set:
# Predict Values
predicted_values <- predict(d.tree, test_data, type = 'class')
# Getting Accuracy
accuracy(test_data$Survived, predicted_values)
Our accuracy is ~ 79.21%. Can we still improve this value? Maybe! And tweaking hyperparameters is one of the first ideas we can explore!
Let’s first set them manually – we can use rpart.control
in the rpart
function to override default hyperpameters:
d.tree.custom = rpart(Survived~ .,
data=train_data,
method = 'class',
control = c(maxdepth = 5, cp=0.001))
On this tree, I’m setting maxdepth
to 5, forcing my tree to be bit more deep that what we’ve seen above. Additionally, I’m also tweaking cp
– let’s see the result:

The new tree is a bit more deep and contains more rules -in terms of performance it has an accuracy of ~79.78%, a bit better than our vanilla version!
Our metric is moving as our accuracy went up by a few points. From the **** entire landscape of hyperparameters we can tune, there must be some of them that produce an optimum performance on the test set – right? Do we have to try these parameters manually?
Luckily, no! Although rpart
won’t enable us to do this search automatically, we have a library called mlr
that will come to the rescue!
Hyperparameter Tuning using MLR – Tweaking one Parameter
One cool thing is that what we will learn here is extensive to other models. The mlr
library uses exactly the same method we will learn to tweak parameters for random forests, xgboosts, SVM’s, etc.
In the past, you may have heard about caret
, a famous R data science library. Although caret
also has some built-in hyperpameter search, mlr
enable us to view the impact of those hyperparameters much better, being less "black-boxy" – that’s the main reason why I’m using mlr
in this post.
So, mlr
,the Machine Learning in R Library is a cool Artificial Intelligence package in R that gives us the tools to train several different models and perform tuning. As we’ve discussed, one of the advantages is that it let us view each hyperparameter impact on the performance of the model.
One handy function in mlr
is getParamSet
that returns all the tweakable parameters available for a specific model— for aclassification rpart
,we can call getParamSet("classif.rpart")
which yields:

All these parameters are tweakable using mlr
. Let’s focus on 3 of them – minsplit
, maxdepth
and cp
– starting with maxdepth
only.
On column constr
we can see the range of values we can tweak – for maxdepth
we can go from a deepness of 1 to 30.
Wouldn’t it be interesting to have an easy way to fit these 30 different versions of the decision tree and evaluate the accuracy of those models? That’s what mlr
does!
mlr
requires a bit more code than the normal rpart
or even caret
. First, we need to define a task – in this case, I’m defining a classification task with the train_data
and target = 'Survived
:
d.tree.params <- makeClassifTask(
data=train_data,
target="fraud"
)
Then, we need to create the grid of parameters to iterate on -let’s start slow with a single parameter as we’ve discussed. We needmakeParamSet
and use a makeDiscreteParam
:
param_grid <- makeParamSet(
makeDiscreteParam("maxdepth", values=1:30))
What I’m stating in the code above is that my tree will iterate on 30 different values of maxdepth
, a vector (1:30
) that contains 1, 2, 3 … , 30 as the values to feed into the hyperparamter.
Then, we need three things – initialize the control grid experiment, choose the cross-validation method and choose a measure that will be used to evaluate our results:
# Define Grid
control_grid = makeTuneControlGrid()
# Define Cross Validation
resample = makeResampleDesc("CV", iters = 3L)
# Define Measure
measure = acc
Cross validation is a way to improve the decision tree results. We’ll use three-fold cross validation in our example. For measure, we will use accuracy (acc
).
All set ! Time to feed everything into the magicaltuneParams
function that will kickstart our hyperparameter tuning!
set.seed(123)
dt_tuneparam <- tuneParams(learner='classif.rpart',
task=d.tree.params,
resampling = resample,
measures = measure,
par.set=param_grid,
control=control_grid,
show.info = TRUE)
As you run the code above, our hyperparameter search will start to execute! show.info = TRUE
will output the execution’s feedback:
[Tune-x] 1: maxdepth=1
[Tune-y] 1: acc.test.mean=0.7895909; time: 0.0 min
[Tune-x] 2: maxdepth=2
[Tune-y] 2: acc.test.mean=0.7881845; time: 0.0 min
[Tune-x] 3: maxdepth=3
[Tune-y] 3: acc.test.mean=0.8008132; time: 0.0 min
...
Each maxdepth
is generating an acc.test.mean, a mean of the acc
across the several datasets used in the Cross Validation. mlr
also let us evaluate the results using generateHyperParsEffectData
:
result_hyperparam <- generateHyperParsEffectData(dt_tuneparam, partial.dep = TRUE)
And we can plot the evolution of our accuracy using ggplot
:
ggplot(
data = result_hyperparam$data,
aes(x = maxdepth, y=acc.test.mean)
) + geom_line(color = 'darkblue')

Looking at our plot, we understand that after a deepness of 5, the effects on the accuracy are marginal with infinitesimal differences. Let’s confirm what’s the best model chosen by the tuneParams
function – we can check that by directly calling the dt_tuneparam
object:
Tune result:
Op. pars: maxdepth=11
f1.test.mean=0.9985403
The tuning result chose 11maxdepth
as the best parameter just because of small infinitesimal differences— nevertheless, let’s fit our best parameters using the object dt_tuneparam$x
to pick up the saved hyperparameters and store them usingsetHyperPars
:
best_parameters = setHyperPars(
makeLearner("classif.rpart"),
par.vals = dt_tuneparam$x
)
best_model = train(best_parameters, dt_task)
train
will fit a decision tree with the saved hyperparameters in the best_parameters
object.
After running the code above, we have a fitted tree with the best hyperparameters returned from the grid search on best_model
. To evaluate this model on our test set, we need to make a new makeClassifTask
pointing to the test data:
d.tree.mlr.test <- makeClassifTask(
data=test_data,
target="Survived"
)
Predicting and checking accuracy on the test_data
:
results <- predict(best_model, task = d.tree.mlr.test)$data
accuracy(results$truth, results$response)
Our accuracy is around 79.21%, the same as our vanilla version. So… probably, our tweaking of the cp
parameter was the trick to improve our model performance.
The question is.. in this example, we’ve kept other parameters constant, does that mean that we can only tweak our hyperparameters one by one? Nope!
With mlr
, we can tweak the entire landscape of parameters at the same time and with just a small tweak in our code! Let’s do that.
Tweaking Multiple Parameters
Tweaking multiple hyperparameters is easy! Remember the param_grid
object we’ve created for our grid search? Let’s recall it:
param_grid <- makeParamSet(
makeDiscreteParam("maxdepth", values=1:30))
If I add new arguments inside the makeParamSet
function, I’ll be adding new parameters that will be combined on the search – for instance, let’s add cp
and minsplit
to our landscape:
param_grid_multi <- makeParamSet(
makeDiscreteParam("maxdepth", values=1:30),
makeNumericParam("cp", lower = 0.001, upper = 0.01),
makeDiscreteParam("minsplit", values=1:30)
)
makeNumericParam
creates numeric
parameters (such as cp
that contains decimal places) – we can check which hyperparameters are discrete or numeric in our getParamSet
function (keep in mind that integers
can be called with makeDiscreteParam
).
And how can we train this multi-parameter search? By feeding our param_grid_multi
to the tuneParams
function!
dt_tuneparam_multi <- tuneParams(learner='classif.rpart',
task=d.tree.mlr,
resampling = resample,
measures = measure,
par.set=param_grid_multi,
control=control_grid,
show.info = TRUE)
There is a computational cost when we train a higher number of hyperparameters. You’ll notice that the dt_tuneparam_multi
will take more time than the dt_tuneparam
search because we will be fitting near 3000(!) trees to our data.
In the end of the search, you’ll probably have something like the following output:

In the [Tune]
output, we have the best parameters for our search:
- a
maxdepth
of 15. - a
cp
of 0.003. - a
minsplit
of 5.
This combination of hyperparameters yielded an accuracy of around 82% on the cross-validation, not bad!
Let’s extract the best parameters, train a new tree with them and see the result on our test set:
# Extracting best Parameters from Multi Search
best_parameters_multi = setHyperPars(
makeLearner("classif.rpart", predict.type = "prob"),
par.vals = dt_tuneparam_multi$x
)
best_model_multi = train(best_parameters_multi, d.tree.mlr)
# Predicting the best Model
results <- predict(best_model_multi, task = d.tree.mlr.test)$data
accuracy(results$truth, results$response)
Our accuracy in the test_set
was 81.46%! Just by tweaking these parameters we were able to improve 2 percentage points of our baseline accuracy, an excellent result!
As a final note and to help you visualize what we’ve done, let’s plot the results of the accuracy for a sample of our grid search results:
# Extracting results from multigrid
result_hyperparam.multi <- generateHyperParsEffectData(dt_tuneparam_multi, partial.dep = TRUE)
# Sampling just for visualization
result_sample <- result_hyperparam.multi$data %>%
sample_n(300)
hyperparam.plot <- plot_ly(result_sample,
x = ~cp,
y = ~maxdepth,
z = ~minsplit,
marker = list(color = ~acc.test.mean, colorscale = list(c(0, 1), c("darkred", "darkgreen")), showscale = TRUE))
hyperparam.plot <- hyperparam.plot %>% add_markers()
hyperparam.plot

On the y-axis we have the minsplit
. On the x-axis we have the maxdepth
and on the z-axis we have the cp
.
Each dot is an experiment (combination of hyperparameters) and colors are tied to the accuracy result of that experiment. Red dots mean lower accuracy. Green dots mean better performance.
There’s a clearly red area in the 3-d plot where we see that the result from cp
is not very good – let me rotate it for a better view:

Notice that a really low cp
yields worse performance, particularly when combined with a low minsplit
!
Visualizing our hyperparameters search results’ gives us a good bird’s eye view on how our training process is behaving.
If you want to check an interactive version of the plot above, follow this link!
Thank you for taking the time to read this post! I hope you’ve appreciated it and you can now understand how to train hyperparameters using R.
Hyperparameters can make or break a model and we, as data scientists, need to be aware of how to tweak them efficiently with few lines of code. If you use R, mlr
may be an excellent choice to do this common Machine Learning task!
I’ve set up an introduction to R and a Bootcamp on learning Data Science on Udemy. Both courses are tailored for beginners and I would love to have you around!

Here is a small gist with the code from this post:
Dataset License: The dataset used in this post is publicly available for usage at https://www.openml.org/d/40945