Traffic Sign Detection using Convolutional Neural Network

We will be building a CNN model in order to detect traffic signs.

Sanket Doshi
Towards Data Science

--

CNN Model

Convolutional neural networks or ConvNets or CNN’s are very important to learn if you want to pursue a career in the computer vision field. CNN help in running neural networks directly on images and are more efficient and accurate than many of the deep neural networks. ConvNet models are easy and faster to train on images comparatively to the other models.

If you’re not familiar with the basics of ConvNet’s you can learn it from here.

We will be using keras package to build CNN model.

Get Dataset

The German traffic signs detection dataset is provided here. The dataset consists of 39209 images with 43 different classes. The images are distributed unevenly between those classes and hence the model may predict some classes more accurately than other classes.

We can populate the dataset with various image modifying techniques such as rotation, colour distortion or blurring the image. We will be training the model on the original dataset and will see the accuracy of the model. Then we’ll be adding more data and making each class even and check the model’s accuracy.

Data Pre-Processing

One of the limitations of the CNN model is that they cannot be trained on a different dimension of images. So, it is mandatory to have same dimension images in the dataset.

We’ll check the dimension of all the images of the dataset so that we can process the images into having similar dimensions. In this dataset, the images have a very dynamic range of dimensions from 16*16*3 to 128*128*3 hence cannot be passed directly to the ConvNet model.

We need to compress or interpolate the images to a single dimension. Not, to compress much of the data and not to stretch the image too much we need to decide the dimension which is in between and keep the image data mostly accurate. I’ve decided to use dimension 64*64*3.

We will transform the image into the given dimension using opencv package.

import cv2def resize_cv(img):
return cv2.resize(img, (64, 64), interpolation = cv2.INTER_AREA)

cv2 is a package of opencv . resize method transforms the image into the given dimension. Here, we’re transforming an image into the 64*64 dimension. Interpolation will define what type of technique you want to use for stretching or for compressing the images. Opencv provides 5 types of interpolation techniques based on the method they use to evaluate the pixel values of the resulting image. The techniques are INTER_AREA, INTER_NEAREST, INTER_LINEAR, INTER_CUBIC, INTER_LANCZOS4 . We’ll be using INTER_AREA interpolation technique it’s more preferred for image decimation but for extrapolation technique it’s similar as INTER_NEAREST . We could have used INTER_CUBIC but it requires high computation power so will be not using it.

Data Loading

Above we learned how we’ll pre-process the images. Now, we’ll load the dataset along with converting them in the decided dimension.

The dataset consist of 43 classes total. In other words, 43 different types of traffic signs are present in that dataset and each sign has it’s own folder consisting of images in different sizes and clarity. Total 39209 number of images are present in the dataset.

We can plot the histogram for number of images present for different traffic signs.

import seaborn as sns
fig = sns.distplot(output, kde=False, bins = 43, hist = True, hist_kws=dict(edgecolor="black", linewidth=2))
fig.set(title = "Traffic signs frequency graph",
xlabel = "ClassId",
ylabel = "Frequency")
Traffic signs frequency graph

ClassId is the unique id given for each unique traffic signs.

As, we can see from the graph that the dataset does not contain equal amount of images for each class and hence, the model may be biased in detecting some traffic signs more accurately than other.

We can make the dataset consistent by altering the images using rotation or distortion techniques but we’ll do this some other time.

As the dataset is divided into multiple folders and the naming of images is not consistent we’ll load all the images by converting them in (64*64*3) dimension into one list list_image and the traffic sign it resembles into another list output. We’ll be reading the images using imread .

list_images = []
output = []
for dir in os.listdir(data_dir):
if dir == '.DS_Store' :
continue

inner_dir = os.path.join(data_dir, dir)
csv_file = pd.read_csv(os.path.join(inner_dir,"GT-" + dir + '.csv'), sep=';')
for row in csv_file.iterrows() :
img_path = os.path.join(inner_dir, row[1].Filename)
img = imread(img_path)
img = img[row[1]['Roi.X1']:row[1]['Roi.X2'],row[1]['Roi.Y1']:row[1]['Roi.Y2'],:]
img = resize_cv(img)
list_images.append(img)
output.append(row[1].ClassId)

data_dir is the path to the directory where the dataset is present.

The dataset is loaded and now we need to divide it into training and testing set. And also in validation set. But if we divide directly then the model will not be get trained all the traffic signs as the dataset is not randomized. So, first we’ll randomize the dataset.

input_array = np.stack(list_images)import keras
train_y = keras.utils.np_utils.to_categorical(output)
randomize = np.arange(len(input_array))
np.random.shuffle(randomize)
x = input_array[randomize]
y = train_y[randomize]

We can see that I’ve converted the output array to categorical output as the model will return in such a way.

Now, splitting the dataset. We’ll split the dataset in 60:20:20 ratio as training, validation, test dataset respectively.

split_size = int(x.shape[0]*0.6)
train_x, val_x = x[:split_size], x[split_size:]
train1_y, val_y = y[:split_size], y[split_size:]
split_size = int(val_x.shape[0]*0.5)
val_x, test_x = val_x[:split_size], val_x[split_size:]
val_y, test_y = val_y[:split_size], val_y[split_size:]

Training the model

from keras.layers import Dense, Dropout, Flatten, Input
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import BatchNormalization
from keras.optimizers import Adam
from keras.models import Sequential
hidden_num_units = 2048
hidden_num_units1 = 1024
hidden_num_units2 = 128
output_num_units = 43
epochs = 10
batch_size = 16
pool_size = (2, 2)
#list_images /= 255.0
input_shape = Input(shape=(32, 32,3))
model = Sequential([Conv2D(16, (3, 3), activation='relu', input_shape=(64,64,3), padding='same'),
BatchNormalization(),
Conv2D(16, (3, 3), activation='relu', padding='same'),
BatchNormalization(),
MaxPooling2D(pool_size=pool_size),
Dropout(0.2),

Conv2D(32, (3, 3), activation='relu', padding='same'),
BatchNormalization(),

Conv2D(32, (3, 3), activation='relu', padding='same'),
BatchNormalization(),
MaxPooling2D(pool_size=pool_size),
Dropout(0.2),

Conv2D(64, (3, 3), activation='relu', padding='same'),
BatchNormalization(),

Conv2D(64, (3, 3), activation='relu', padding='same'),
BatchNormalization(),
MaxPooling2D(pool_size=pool_size),
Dropout(0.2),
Flatten(),Dense(units=hidden_num_units, activation='relu'),
Dropout(0.3),
Dense(units=hidden_num_units1, activation='relu'),
Dropout(0.3),
Dense(units=hidden_num_units2, activation='relu'),
Dropout(0.3),
Dense(units=output_num_units, input_dim=hidden_num_units, activation='softmax'),
])
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=1e-4), metrics=['accuracy'])trained_model_conv = model.fit(train_x.reshape(-1,64,64,3), train1_y, epochs=epochs, batch_size=batch_size, validation_data=(val_x, val_y))

We’ve used keras package.

For understanding about each layers significance you can read this blog.

Evaluating the model

model.evaluate(test_x, test_y)

The model gets evaluated and you can find accuracy of 99%.

Predicting the result

pred = model.predict_classes(test_x)

You can predict the class for each image and can verify how the model works.

You can find the whole working code here.

--

--

Currently working as a Backend Developer. Exploring how to make machines smarter than me.