Last time we have covered some of the most important evaluation techniques for Machine Learning model. We could easily say that when someone blindly uses only accuracy as measurements, he really "can’t see the Forest for the Trees". 🙂 Joking aside, we saw that when the data is imbalanced, just accuracy is often not enough to properly evaluate the model.
As the title suggests, we are continuing our journey covering popular Machine Learning techniques. On todays topic, we have Decision Trees and Random Forests. As the later is an advanced version, or rather, a combination of multiple Decision Trees, I’ve decided to cover this two algorithms together.
With the article I will try to explain Decision Trees and Random Forest classifiers on a beginner level. Also, some interesting examples on real data are provided. After reading this, I hope you will be ready to deploy your own classifier.

Structure of the article:
- Introduction
- Dataset loading and description
- Model creation and description
- Model tuning with GridSearchCV
- Conclusion
Introduction
Decision Trees
Let’s start with the definition of a Decision Tree. When we think of a Decision Tree, the best is to think of a real tree, where nodes represent questions the model asks regarding the object or value we want to classify. In other words, based on the features of the data (either being continuous integers or floats, or being classes or labels), the model creates rules, which are used to correctly classify each object from the dataset. __ In this example I will present a simple Decision Tree used to classify precipitation according to its type. Each node has two outcomes (branches), for example the precipitation is either _SOLID or LIQUI_D or is the precipitation is either FALLING from clouds or formed at _GROUND LEVE_L.

Decision Trees have some important characteristics, root, depth, node, leaf, splits(branches) etc.
Root: Type of Precipitation
Depth: the number of splits before we get a result, in our case, 3
Decision Node: each "rectangular" on the image that is being splitted
Leaf: each "rectangular" that is NOT being splitted (the ones in the last line)
Splits: the "questions" we ask to get a smaller portion of the data, i.e liquid or solid
Decision Trees can be used for regression also, often called Decision Tree Regressors. The difference to a classifier tree is only in the outcome, where the classifier predicts a class or label, the regressor predicts a continuous value, a number, i.e. how much rain will fall on 26th of November.
Random Forests
We get the point of a Tree now, but, can there be a Forest of Trees? Surely, it can! A Random Forest is an ensemble of many Decision Trees. Every Tree for it gets train only on a RANDOM part (subset) of the features, and a RANDOM part (subset) of the training data. By such, each Tree tends to overfit to training data, but as an ensemble, the result gets averaged (or majority vote) for several dozens of Trees, hence, we get a result, spared of overfitting. Let’s see how would the our Precipitation Classifier look like, if we create it as a Random Forest.

So in this example three Trees are presented, each with a depth of 2. Now, let’s say, the first and last predict that the resulting precipitation is "Hoar", while the second predicts "Mist". The "Majority Voting" would result in the final predication as "Hoar". As denoted on the image, we can have N-number of individual Trees. It is important to note, that each individual Tree, would use different features ("questions"). In cases of Classifiers the majority vote is used, while in cases of Regression, the averaging.
In Machine Learning, the process of combining multiple individual models (in this case Decision Trees) is referred as "Ensemble Learning".
In the next few sections, we will implement Scikit-learn Decision Tree and Random Forest classifiers on two popular datasets.
Dataset loading and description
Before we load our datasets, first we import the dependencies.
Decision Tree classifier class is loaded from the tree module, while Random Forest class from the ensemble module of Scikit-learn. Additionally, we imported _train_test_split to split our data into training and test sets, confusion_matrix to evaluate the results, tree to visualize a Decision Tree and preprocessing_ in order to use the Label Encoder to convert descriptive features (non-numerical) to numerical labels.
Afterwards we can import the data. This time we use the iris dataset from Kaggle and fuel economy dataset from Udacity. Our goal is to predict flower species (Class) of the Iris flowers and vehicle class (VClass) for the cars.
A sneak peak into the data.


As we plan to use transmission type and drive type as predictors, e need to apply the Label Encoder to the cars dataset, to convert descriptive (non-numerical) features, to integers.
In the flower dataset, all the features are numerical and similar order of magnitude, so no preprocessing was applied. If the order of magnitude of the features widely differ or outliers are present, normalization is highly advisable.
As last step before creating the model, we split the data into training and test datasets.
Model creation and description
Iris Dataset
When creating the Decision Tree classifier several parameters can be chosen. Some of them were already mentioned when we’ve described the Tree model. Scikit-learn allows us to select the max_depth of the tree, criterion, random_state, max_leaf_nodes, max_features etc. With this dataset i went with the default settings, since it gave a satisfactory result.

Next, we create a Random Forest classifier for the Iris dataset. Here, Scikit-learn gave the option to change the overall number of Decision Trees in the ensemble (n_estimator), criterion, max_depth (of the trees), max_features etc, all in all, similar parameters as when creating a single Decision Tree. Again, the default settings were chosen due to good results.

With both classifiers, identical results were obtained. Only one sample of Iris-Versicolor was misclassified as Iris-Virginica.
Cars dataset
Let’s see how the classifiers perform on a bigger, more complicated dataset.

The training accuracy for the cars is satisfactory with 93%, while the test accuracy dropped to 73%.
A slightly better result is achieved with a Random Forest classifier.

Again, the training remains at 93% while the test accuracy is a bit better with 76%. Next we will try to "tune" our models, to achieve better results.
Model tuning with GridSearchCV
Scikit-learn provides a method called GridSearchCV, which makes it very easy find best the classifier for several different hyperparameters. As input GridSearchCV takes the desired classifier, a dictionary (or list) with the selected hyperparameters and the evaluation metric. The default evaluation metric is accuracy, but other metrics are possible, like i.e. area under ROC curve or F1-score.
Decision Tree
For the decision Tree model, we’ve selected only the _max_depth_. We plot the best hyperparameter for this case, training accuracy and the test accuracy.

In the case of the car fuel economy dataset, GridSearchCV found that the best depth of Decision Tree model is 19, yielding an test accuracy of 75.1%. A 2.1% increase, in contrast to the default hyperparameters.
Random Forest
In the case of the Random Forest classifier we’ve selected 3 hyperparameters to tune: _n_estimators (number of trees in the ensemble), max_depth of the trees and max_features_ (maximum number of features per tree).
I wanted to point out, that the computational requirements of GridSearchCV in this case are pretty high, so this step will take some time, depending on the type of CPU you have. For example, in this case the model has to build 6x28x7=1176 Random Forests in order to acquire the score of every single one. With the _n_jobs_ parameter one can select the number of processor threads used during the computation (-1 ensures all available threads will be used, while a positive integer select a specific number of threads).

In contrast to the default hyperparameter values in the Random Forest classifier we managed to get a slightly better accuracy. The GridSearchCV method found that 500 decision trees in the ensemble, gave a better results than the default 100 trees. Also, instead of using all seven features, the model here chose only 2 features per tree. Probably running the algorithm with different and longer ranges of the hyperparameters, would yield a further increase in the accuracy, but the additional time-effort is questionable.
The gain here is 0.5%.
Conclusion
With this article I’ve tried to simplify the terms of Decision Trees and Random Forests. Both algorithms can be used for both Classification and Regression. In this example, I’ve covered Classification.
Also, a simple approach to DT and RF classification on two datasets have been presented, with a very good, and a satisfactory result. Therefore the Scikit-learn DecisionTree and RandomForest classes were used.
Scikit-learn provides a great tool for model hyperparameter tuning called GridSearchCV. The implementation is straightforward, but requires significant computational power, especially in case of random Forests.
For any questions or suggestions regarding this article or my work feel free to contact me via LinkedIn .
I hope you found the article helpful and understandable.
Also, you can find my other articles on Medium.
Thank you for taking the time, Cheers! 🙂