
When beginning a journey with artificial intelligence algorithms, it is worth asking yourself two fundamental questions:
- What is the issue we are trying to solve?
- How to build a model that will assist in solving this issue?
To answer the first question, the only limiting factors are our imagination and the availability of data we can use for training the model. And for the second question, which model to choose will largely depend on the following: the problem we are solving, available resources and the time frame we have at our disposal.
This is best shown in an example. Let’s assume that we are trying to create an algorithm that helps segregate waste properly based on a photo. When it comes to selecting the algorithm itself, it will depend on the posed problem. We could use random decision forests or support vector machines after we extract the appropriate features from the image. We could also use convolutional neural networks, which operate directly on the image data. We can design the architecture of this network ourselves. It is worth, however, to consider a pre-trained model, which will allow for saving a lot of time. To solve this problem I have decided to adopt pre-trained artificial neural networks for image classification. How does this look in practice? Let’s go through the process step by step.
Data analysis and preparation
The data set that I decided to use can be found here: Garbage Classification. It contains photos of waste divided into 6 categories: cardboard, glass, metal, paper, plastic and mixed waste. The last category contains photos of items that could largely be assigned to the other 5 groups. For this reason, we exclude it from further analysis. Below is a graph showing the number of photos available for each class.

A very important stage in the preparation of the data set is to divide it into at least two subsets: the training and validation subsets. An even better practice is to create three separate sets of data: training, validation and test. In this case, the results obtained on the test set are representative and show the real effectiveness of the system for new, previously unseen photos. In my case, 60% of the photos were used for training, 20% was a validation set, and another 20% went to the test set.
Below are sample photos for each class. Each photo is 512 x 384 pixels. When using a ready neural network, it is very important to adjust the size of the images in the set to the size of the input data accepted by the network. In the case of the Xception network, the size of the input layer is 299 x 299, while in the case of the VGG16 network, this size is 224 x 224. Therefore, before training the model, we need to scale our images.

Preparation and training of models
To solve the given problem, I used two popular network architectures: Vgg16 and Xception. Both models I selected were pre-trained on the ImageNet collection, which contains pictures of objects belonging to 1000 classes. Therefore, the output layer responsible for the classification of the input image has 1000 outputs. In the case of the problem we are analyzing, the size of the output layer should be 5. Below is the code that allows for adapting the pre-trained model to our data set.
Due to the limited data set used for training the model, I decided to expand it using data augmentation. Below is a fragment of the code responsible for training the selected model.
During the training of selected models I used early stopping. The way it works is that if the recognition efficiency of photos from the validation set does not increase over a certain number of epochs, training is interrupted. Using this type of approach reduces the risk of the model overfitting to the data. Below are the learning curves for the training and validation sets. It is evident that in this case the Xception network did much better, achieving more than 80% efficiency in recognizing photos from the validation set.


Effectiveness of the created solutions
As I mentioned beforehand, it is best to determine the actual effectiveness of our model on a new data set that did not participate in training of our models. Therefore, below I present the results that were achieved by letting the models tackle the test set. This collection contained 480 photos. The results confirm the conclusion that in this case the model based on the pre-trained Xception network did much better. It achieved an efficiency of 83%, which is almost 10 percentage points more than the model based on the VGG16 architecture.


Summary
The article shows how to use pre-trained network architectures and apply them to a specific problem that we want to solve. The process of data analysis and preparation, adaptation of pre-trained models to one’s needs and methods of assessing the effectiveness of the created solution are briefly discussed. The source code is available here.
This is the first in a series of articles intended for people who want to start their adventure artificial intelligence algorithms. I invite you to follow our entries on the Isolution blog and our company profiles on LI and FB, as well as to subscribe to our newsletter.