The world’s leading publication for data science, AI, and ML professionals.

Classification of Traffic Signs with LeNet-5 CNN

Build and utilize a simple CNN using Keras library

Photo by Rebecca Zaal from Pexels
Photo by Rebecca Zaal from Pexels

The purpose of this project is to train and test an implementation of the LeNet-5 Convolutional Neural Network for a classification task. The model will be used in an application, where the user can upload a photo of a traffic sign and get the prediction of its class.

1. Dataset

The dataset was taken from _The German Traffic Sign Recognition Benchmark (GTSRB) [1] and was presented first-time at the single-Image Classification challenge at the International Joint Conference on Neural Networks (IJCNN) 2011 [2]. It was created from about 10 hours of video recorded while driving on different roads in Germany during the daytime. The Data consists of about 40.000 real colorful photos of traffic signs. The images have a .ppm_ extension and their size varies from 15×15 to 250×250 pixels. For Notebooks, I prefer using Google Colab. I saved the dataset on my Google Drive and accessed it simply using drive.mount command:

from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

2. Libraries

For our Project, we need the following libraries: some standard ones as NumPy, OS, and Matplotlib; cv2, a powerful library developed for solving computer vision tasks; _sklearn.model_selection.train_test_split for splitting the dataset into train and test subsets; some components from tf.keras.models and tf.keras.layers_ for building the model.

import numpy as np
import random
import os
import cv2 as cv
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from Keras.models import Sequential, load_model
from keras.layers import Conv2D, Dense, Flatten, Rescaling, AveragePooling2D, Dropout

3. Reading and pre-processing image files

We start with reading images from the dataset. The images are distributed over 43 folders representing 43 classes. We loop through folders and images, open them, resize to 32×32 pixels, convert from RGB to gray, and save them as np.arrays.

images = []
labels = []
classes = 43

current_path = '/content/gdrive/My Drive/GTSRB/Images/'

for i in range(classes):
    path = os.path.join(current_path, str(str(i).zfill(5)))
    img_folder = os.listdir(path)
    for j in img_folder:
        try:
            image = cv.imread(str(path+'/'+j))
            image = cv.resize(image, (32, 32))
            image = cv.cvtColor(image, cv.COLOR_BGR2GRAY)
            image = np.array(image)
            images.append(image)
            label = np.zeros(classes)
            label[i] = 1.0
            labels.append(label)
        except:
            pass

We normalize the data now: we divide images by 255 to get pixel values between 0.0 and 1.0. Finally, we have a total amount of 39.209 images assigned to 43 classes.

images = np.array(images)
images = images/255
labels = np.array(labels)
Images shape: (39209, 32, 32)
Labels shape: (39209, 43)

When working with images, it is always worth looking at some samples. Let’s take 25 random images from the dataset and show them with their labels.

25 samples of pre-processed pictures. Image from author's Notebook
25 samples of pre-processed pictures. Image from author’s Notebook

In this set of images, we can see some possible misclassification problems. For example, on image 7 it is hard to recognize the number, images 13 and 14 are too dark. Image 8 ("Traffic signals") could be probably misclassified as "General caution".

4. Building the model

Before creating the model, we have to split our dataset into train and test subsets. For the test subset, we take out a standard 20% of the dataset. In addition, it is important to make sure that our data has np.float32 format.

X = images.astype(np.float32)
y = labels.astype(np.float32)

X_train, X_test, y_train, y_test = train_test_split(X, y,                 
test_size=0.2, random_state=123)
X_train shape: (31367, 32, 32)
y_train shape: (31367, 43)
X_test shape: (7842, 32, 32)
y_test shape: (7842, 43)

For our classification task, we will use an implementation of LeNet-5 Convolutional Neural Network. LeNet-5 was designed by Yann LeCun et al. and others [3] in 1998 and was one of the earliest convolutional neural networks. Its architecture is extremely simple but very efficient.

There are three Convolutional Layers based on 5×5 filters and followed by average pooling with 2×2 patches. We use the ReLU function for activation as it leads to faster training. Then we add Dropout Layer with a factor of 0.2 to deal with overfitting. It means that 20% of the input will be randomly nullified to prevent strong dependencies between layers. We end up with Flattening and two Dense Layers. In the last Dense Layer, we have to assign the number of neurons equal to the number of classes, and the Softmax activation function to get probabilities between 0.0 and 1.0. The resulting number of parameters in this network is 70,415.

 Model: "sequential_6"
_________________________________________________________________
 Layer (type)                  Output Shape              Param #   
=================================================================
 rescaling_7 (Rescaling)       (None, 32, 32, 1)         0         
 conv2d_19 (Conv2D)            (None, 28, 28, 6)         156       
 average_pooling2d_12          (None, 14, 14, 6)         0         
      (AveragePooling2D)                                                    
 conv2d_20 (Conv2D)            (None, 10, 10, 16)        2416      
 average_pooling2d_13          (None, 5, 5, 16)          0         
      (AveragePooling2D)                                                    
 conv2d_21 (Conv2D)            (None, 1, 1, 120)         48120     
 dropout_6 (Dropout)           (None, 1, 1, 120)         0         
 flatten_6 (Flatten)           (None, 120)               0         
 dense_12 (Dense)              (None, 120)               14520     
 dense_13 (Dense)              (None, 43)                5203      
=================================================================
 Total params: 70,415
 Trainable params: 70,415
 Non-trainable params: 0

Now with the model.compile method, we configure the model. The Adam learning rate optimization algorithm is an extension to Stochastic Gradient Descent and is a good option in terms of training speed. The Categorical Cross-Entropy loss function fits here as it delivers us multiclass probability distribution for our __ classification problem. For performance evaluation of the model, we will take accuracy metrics.

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

5. Training the model

Now it is time to train the model. We pass input data (_Xtrain) and target data (_ytrain) to the model.fit method, define 50 training epochs, also add validation data _Xtest and _ytest for evaluation of the loss and other model metrics at the end of each epoch.

history = model.fit(X_train, y_train, epochs=50,
                    validation_data=(X_test, y_test))

6. Evaluation of training results

How do we know if our model is good or bad? Let’s take a look at learning curves! Train Accuracy and Validation Accuracy curves converge in the end, and after 50 epochs we received an accuracy of 98.9%, which is quite good. The validation Loss curve jumps up and down a bit. It means it would be nice to have more validation data. After about 25 epochs Validation Loss exceeds Train Loss, which means we have a bit of overfitting here. But the curve doesn’t go up over epochs, and the difference between Validation and Train Loss is not that big, __ so this could be accepted. Thus we stop here.

Learning curves. Image by author
Learning curves. Image by author

Let’s take a look at some samples and find the wrong classified pictures. We label the images with prediction and ground truth classes. If prediction equals ground truth, we assign a green color to the label. Otherwise, we make it red:

Prediction and Grond Truth for 25 random samples. Image from author's Notebook
Prediction and Grond Truth for 25 random samples. Image from author’s Notebook

For image number 5955 we can see that the "Speed limit (50km/h)" sign was misclassified as "Speed limit (80km/h)". Obviously, the text on the sign here is very difficult to recognize.

7. Saving the model. Using the model in an application

In the end, we save the model to a separate folder on Google Drive using model.save method. The folder contains graph definitions and weights of the model and will be used for further predictions in the application for traffic signs recognition.

It is obvious that the customer or anybody out of Data Science will never dive deep into your Notebook, he will not be interested in graphs, models, accuracy, and all this machine learning stuff. He needs an application where he can upload an image and get his result. Let’s do it.

For our Streamlit app, we have to prepare a GitHub repository: we place here the _kerasmodel folder, _streamlitapp.py and requirements.txt. In the Python file, we make an app itself: define its appearance, markdowns, buttons, we load the model and make the prediction.

def sign_predict(image):
    model = load_model('./keras_model/')    
    image = np.array(image, dtype=np.float32)    
    image = image/255    
    image = np.reshape(image, (1, 32, 32))    
    x = image.astype(np.float32)    
    prediction = model.predict(x)    
    prediction_max = np.argmax(prediction)    
    prediction_label = labels_dict[prediction_max]    
    confidence = np.max(prediction)    
    return prediction_label, confidence

We let the user upload his picture and get prediction for his traffic sign. Before starting prediction, we have to pre-process the user’s image: make it gray, 32×32 pixels, and save as np.float32 type. I think it would be nice to show the uploaded image as well as its pre-processed 32×32 gray version. Also, we will show expanding list representing 43 classes available in the model. After uploading a picture, you get the prediction and the confidence rate. _Here you can try the app_ by yourself.

Application on Streamlit. Image by author
Application on Streamlit. Image by author

References

[1] The German Traffic Sign Recognition Benchmark (GTSRB): https://benchmark.ini.rub.de/gtsrb_dataset.html

[2] J. Stallkamp, M. Schlipsing, J. Salmen, C. Igel, Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition: https://www.sciencedirect.com/science/article/pii/S0893608012000457

[3] Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. (1998). Gradient-based learning applied to document recognition: http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf

[4] Suhyun Kim, A Beginner’s Guide to Convolutional Neural Networks (CNNs): https://towardsdatascience.com/a-beginners-guide-to-convolutional-neural-networks-cnns-14649dbddce8

[5] Keras, the Python deep learning API: https://keras.io/

[6] OpenCV, a Computer Vision library: https://opencv.org/


Related Articles