Traffic Sign Detection using Convolutional Neural Network
We will be building a CNN model in order to detect traffic signs.
Convolutional neural networks or ConvNets or CNN’s are very important to learn if you want to pursue a career in the computer vision field. CNN help in running neural networks directly on images and are more efficient and accurate than many of the deep neural networks. ConvNet models are easy and faster to train on images comparatively to the other models.
If you’re not familiar with the basics of ConvNet’s you can learn it from here.
We will be using keras
package to build CNN model.
Get Dataset
The German traffic signs detection dataset is provided here. The dataset consists of 39209 images with 43 different classes. The images are distributed unevenly between those classes and hence the model may predict some classes more accurately than other classes.
We can populate the dataset with various image modifying techniques such as rotation, colour distortion or blurring the image. We will be training the model on the original dataset and will see the accuracy of the model. Then we’ll be adding more data and making each class even and check the model’s accuracy.
Data Pre-Processing
One of the limitations of the CNN model is that they cannot be trained on a different dimension of images. So, it is mandatory to have same dimension images in the dataset.
We’ll check the dimension of all the images of the dataset so that we can process the images into having similar dimensions. In this dataset, the images have a very dynamic range of dimensions from 16*16*3 to 128*128*3 hence cannot be passed directly to the ConvNet model.
We need to compress or interpolate the images to a single dimension. Not, to compress much of the data and not to stretch the image too much we need to decide the dimension which is in between and keep the image data mostly accurate. I’ve decided to use dimension 64*64*3.
We will transform the image into the given dimension using opencv
package.
import cv2def resize_cv(img):
return cv2.resize(img, (64, 64), interpolation = cv2.INTER_AREA)
cv2
is a package of opencv
. resize
method transforms the image into the given dimension. Here, we’re transforming an image into the 64*64 dimension. Interpolation will define what type of technique you want to use for stretching or for compressing the images. Opencv provides 5 types of interpolation techniques based on the method they use to evaluate the pixel values of the resulting image. The techniques are INTER_AREA, INTER_NEAREST, INTER_LINEAR, INTER_CUBIC, INTER_LANCZOS4
. We’ll be using INTER_AREA
interpolation technique it’s more preferred for image decimation but for extrapolation technique it’s similar as INTER_NEAREST
. We could have used INTER_CUBIC
but it requires high computation power so will be not using it.
Data Loading
Above we learned how we’ll pre-process the images. Now, we’ll load the dataset along with converting them in the decided dimension.
The dataset consist of 43 classes total. In other words, 43 different types of traffic signs are present in that dataset and each sign has it’s own folder consisting of images in different sizes and clarity. Total 39209 number of images are present in the dataset.
We can plot the histogram for number of images present for different traffic signs.
import seaborn as sns
fig = sns.distplot(output, kde=False, bins = 43, hist = True, hist_kws=dict(edgecolor="black", linewidth=2))
fig.set(title = "Traffic signs frequency graph",
xlabel = "ClassId",
ylabel = "Frequency")
ClassId is the unique id given for each unique traffic signs.
As, we can see from the graph that the dataset does not contain equal amount of images for each class and hence, the model may be biased in detecting some traffic signs more accurately than other.
We can make the dataset consistent by altering the images using rotation or distortion techniques but we’ll do this some other time.
As the dataset is divided into multiple folders and the naming of images is not consistent we’ll load all the images by converting them in (64*64*3) dimension into one list list_image
and the traffic sign it resembles into another list output
. We’ll be reading the images using imread
.
list_images = []
output = []
for dir in os.listdir(data_dir):
if dir == '.DS_Store' :
continue
inner_dir = os.path.join(data_dir, dir)
csv_file = pd.read_csv(os.path.join(inner_dir,"GT-" + dir + '.csv'), sep=';')
for row in csv_file.iterrows() :
img_path = os.path.join(inner_dir, row[1].Filename)
img = imread(img_path)
img = img[row[1]['Roi.X1']:row[1]['Roi.X2'],row[1]['Roi.Y1']:row[1]['Roi.Y2'],:]
img = resize_cv(img)
list_images.append(img)
output.append(row[1].ClassId)
data_dir
is the path to the directory where the dataset is present.
The dataset is loaded and now we need to divide it into training and testing set. And also in validation set. But if we divide directly then the model will not be get trained all the traffic signs as the dataset is not randomized. So, first we’ll randomize the dataset.
input_array = np.stack(list_images)import keras
train_y = keras.utils.np_utils.to_categorical(output)randomize = np.arange(len(input_array))
np.random.shuffle(randomize)
x = input_array[randomize]
y = train_y[randomize]
We can see that I’ve converted the output array to categorical output as the model will return in such a way.
Now, splitting the dataset. We’ll split the dataset in 60:20:20 ratio as training, validation, test dataset respectively.
split_size = int(x.shape[0]*0.6)
train_x, val_x = x[:split_size], x[split_size:]
train1_y, val_y = y[:split_size], y[split_size:]split_size = int(val_x.shape[0]*0.5)
val_x, test_x = val_x[:split_size], val_x[split_size:]
val_y, test_y = val_y[:split_size], val_y[split_size:]
Training the model
from keras.layers import Dense, Dropout, Flatten, Input
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import BatchNormalization
from keras.optimizers import Adam
from keras.models import Sequentialhidden_num_units = 2048
hidden_num_units1 = 1024
hidden_num_units2 = 128
output_num_units = 43epochs = 10
batch_size = 16
pool_size = (2, 2)
#list_images /= 255.0
input_shape = Input(shape=(32, 32,3))model = Sequential([Conv2D(16, (3, 3), activation='relu', input_shape=(64,64,3), padding='same'),
BatchNormalization(),Conv2D(16, (3, 3), activation='relu', padding='same'),
BatchNormalization(),
MaxPooling2D(pool_size=pool_size),
Dropout(0.2),
Conv2D(32, (3, 3), activation='relu', padding='same'),
BatchNormalization(),
Conv2D(32, (3, 3), activation='relu', padding='same'),
BatchNormalization(),
MaxPooling2D(pool_size=pool_size),
Dropout(0.2),
Conv2D(64, (3, 3), activation='relu', padding='same'),
BatchNormalization(),
Conv2D(64, (3, 3), activation='relu', padding='same'),
BatchNormalization(),
MaxPooling2D(pool_size=pool_size),
Dropout(0.2),Flatten(),Dense(units=hidden_num_units, activation='relu'),
Dropout(0.3),
Dense(units=hidden_num_units1, activation='relu'),
Dropout(0.3),
Dense(units=hidden_num_units2, activation='relu'),
Dropout(0.3),
Dense(units=output_num_units, input_dim=hidden_num_units, activation='softmax'),
])model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=1e-4), metrics=['accuracy'])trained_model_conv = model.fit(train_x.reshape(-1,64,64,3), train1_y, epochs=epochs, batch_size=batch_size, validation_data=(val_x, val_y))
We’ve used keras
package.
For understanding about each layers significance you can read this blog.
Evaluating the model
model.evaluate(test_x, test_y)
The model gets evaluated and you can find accuracy of 99%.
Predicting the result
pred = model.predict_classes(test_x)
You can predict the class for each image and can verify how the model works.
You can find the whole working code here.