The world’s leading publication for data science, AI, and ML professionals.

Configure a CNN Model using Traditional Machine Learning Algorithms

Applying Ensemble Learning Algorithms to the image dataset that are features extracted by Convolutional Layers with a python implementation

Table of Contents
1. Introduction
2. Layers
2.1. Convolutional Layer
2.2. Pooling Layer
2.3. Dropout Layer
2.4. Flatten Layer
3. Tutorial
3.1. Dense Layer Approach
3.2. Ensemble Learning Approach
4. Results
5. Discussion

1. Introduction

It is mostly converted into _(n_samples, nfeatures) and the algorithm is applied, after the necessary data preprocessing processes are applied to the dataset in the Traditional Machine Learning applications. In order to avoid situations such as overfitting and underfitting, the number of samples should be large. In order to prevent such situations, either the image augmentation for the image dataset is expanded by various methods, or methods such as features extraction, feature selection, or dimensionality reduction are applied.

Convolutional Neural Network (Cnn) is a deep learning method mostly used for image datasets. It is used for classification by training the model with artificial neural networks. When we handle the model, the features of the given image dataset are extracted with the layers in its structure, and these obtained features are trained by various layers such as a dense layer.

This article includes how to classify the image dataset, whose features are extracted using convolutional layers, with ensemble learning algorithms. The work is enriched with python implementation.

Photo by Yuri Vasconcelos on Unsplash
Photo by Yuri Vasconcelos on Unsplash

2. Layers

First of all, let’s take a brief look at the layers used in building the model used in this tutorial:

2.1. Convolutional Layer

Convolutional Layers are the main building blocks of Convolutional Neural Networks (CNNs). They take the pixel values as input and perform feature extraction. It consists of filters and kernels. Learning takes place by taking a subset from the dataset then filter and processing it. These processes are performed with linear multiplication and characteristic features such as edge detection are extracted regarding the image. The application of the gradient filter presented by the cv2 library with different kernels is shown in Figure 1.

Figure 1. Effect of Kernels in Gradient Filter, Image by author
Figure 1. Effect of Kernels in Gradient Filter, Image by author

If we go one step deeper, how does this process take place? As mentioned above, It is a simple mathematical operation, named convolution. To illustrate mathematically:

The input = [ 5 10 15 20 25 30]

The filter = [0 1 0]

The output is calculated as follows:

[5 10 15]  . [0 1 0] = 10
[10 15 20] . [0 1 0] = 15
[15 20 25] . [0 1 0] = 20
[20 25 30] . [0 1 0] = 25

The filtered matrix, that is, the output is [10 15 20 25]. This process can also be diversified as a 2D Convolutional Layer.

2.2. Pooling Layer

Pooling layers perform various operations on matrix values obtained after Convolutional Layers. Average Pooling takes the average of the pixel values in the determined matrix sizes, while max-pooling as the name suggests takes the maximum value. As seen in Figure 2, the 4×4 matrix has been reduced to 2×2 size by using 2×2 pooling. It summarizes the features in the specific region and the dimensionality of the features is reduced. In this way, while the model becomes more generalized, overfitting is avoided.

Figure 2. What do Max-Pooling(left) and Average Pooling(right) do?, source
Figure 2. What do Max-Pooling(left) and Average Pooling(right) do?, source

2.3. Dropout Layer

In the simplest terms, it deletes some of the features obtained as seen in Figure 3, that is, it causes information loss. If we handle a 128x128x3 RGB image and consider that pixels are features for image datasets, we will have the number of features approximately 50000. Some of these features can be waived by applying Dropout. In this way, overfitting is also prevented.

Figure 3. What does the dropout layer do?, source
Figure 3. What does the dropout layer do?, source

2.4. Flatten Layer

Flatten layer is used in the last layer of the first part of a CNN model that converts matrix shape from 3×3 to 9×1 as seen in Figure 4 in order to prepare the dataset for classification by Fully Connected Layers.

Figure 4. What does Flatten layer do?, source
Figure 4. What does Flatten layer do?, source

3. Tutorial

After the features of the image, the dataset is extracted using the convolutional layer, these features are trained by the Dense Layers and the Ensemble Learning Algorithms individually. Implementation and results are discussed.

The dataset can be accessed from the link.

3.1. Dense Layer Approach

The dataset, consisting of 233 samples and containing cups, dishes, plates, was trained for classification. First of all, the dataset is expanded using the image augmentation process. After the data preprocessing processes are applied, it is separated into a training set and test set. The training dataset is trained with the model built as follows:

  • [1] – Image augmentation, resize, and scaling

The predefined image augmentation process is repeated 15 times and the number of samples is increased by approximately 3500. All images into the dataset are resized to 128 x 128 pixels and each pixel is divided by 255 for scaling.

  • [2] – Model

The CNN model consists of 2 main components. The features are extracted using the'model', which is built sequentially as the first component. Since the pixels are the features for image datasets and the shape of images is 128x128x3, the total number of features is 49152. Input shape is defined in the first convolutional layer. In the next convolutional layers, L1 & L2 regularizations are applied to prevent overfitting. The role of the L1 & L2 regularization is to adjust the importance of the weights so it can be assumed that regularization parameters of the convolutional layers are the learning rate of the layers. In addition, Dropout is applied at various rates to prevent overfitting, and finally, the shape of the model is converted to (n_samples, n_features) using a Flatten layer. The summary of the model is seen in Figure 5.

Figure 5. Model Summary, Image by Author
Figure 5. Model Summary, Image by Author

Looking at the output shape of the flatten layer, it is seen that the feature number, which was 49152 at the beginning, is 1728 with the layers used.

In the second part of the model, after the features are extracted, Dense Layers are applied to perform the classification process. Finally, ‘cnn_model‘ is created by combining input (the part that are features extracted) and output (the part that is a classification made).

  • [3] – Model Evaluation

Confusion Matrix and the classification report of the test dataset are prepared to evaluate the model.

  • [4] – Model Generalization Performance

15 images of cups and dishes plates are downloaded randomly using google images and predicted by the trained model in order to test the generalization performance of the model.

3.2. Ensemble Learning Approach

In the introductory sentence, it has been mentioned that datasets in machine learning are trained in the format _(n_samples, nfeatures). In this part, ensemble learning algorithms are applied, but any classification algorithm could be used instead. Looking at the flatten layer in Figure 5, it is seen that it is extracted as 1728 features. In this part, the classification process is done with ensemble learning algorithms instead of dense layers as follows:

  • [1]- The train set, test set, and external test set are predicted by the first part 'model' of the CNN model and the derived dataset is created.
  • [2]- A function containing confusion matrix results is created in order to apply all algorithms.
  • [3]- XGBoost, LGBM, Histogram-Based GB, ExtraTree, AdaBoost, Bagging algorithms are used for classification in their base versions.

4. Results

Confusion Matrices of the Dense Layer Approach are illustrated in Figure 6.

Figure 6. Confusion Matrix of test set(left), Confusion Matrix of external test set(right), Image by Author
Figure 6. Confusion Matrix of test set(left), Confusion Matrix of external test set(right), Image by Author

As can be seen, 91% success is achieved in the test dataset, while 13/15 accuracy is obtained in the external test dataset, which was randomly generated from a different source for model generalization performance.

The results of the Ensemble Learning Approach are tabulated as in figure 7.

Figure 7. Results of Ensemble Learning Algorithms, Image by Author
Figure 7. Results of Ensemble Learning Algorithms, Image by Author

It is seen that although the test dataset accuracy is high, low accuracy results are obtained in the external test dataset. This indicates overfitting.

5. Discussion

Since this study is mainly about developing a CNN model, it is not focused on ensemble learning methods. However, it is seen that although hyperparameter is not adjusted some of these methods such as XGBoost have satisfactory results.

It is essential to understand how it works to further the Convolutional Neural Network applications. With this article, different alternatives and flexibility are provided are aimed. It is possible to improve the accuracy results obtained in Ensemble learning algorithms such as detecting the best combination of hyperparameters using GridSearch or expanding the dataset. The effect of the CNN model in Deep Learning on the image dataset is indisputably good, but this method can even be evaluated for different purposes when studied as an alternative.


Back to the guideline click here.

Machine Learning Guideline

Comprehensive guide for Principal Component Analysis


Related Articles