The world’s leading publication for data science, AI, and ML professionals.

Sign Language Recognition using Deep Learning

An End-to-End Pipeline including model building, hyperparameter tuning and the deployment

Source: Sign Language MNIST on Kaggle
Source: Sign Language MNIST on Kaggle

Take a look at the model that you are going to build.

For more details about the code or models used in this article, refer to this GitHub Repo.

Okay!!! Now let’s dive into building the Convolutional Neural Network model that converts the sign language to the English alphabet.

Understanding the problem

The Problem Definition

Converting the Sign Language to Text can be broken down into the task of predicting the English letter corresponding to the representation in the sign language.

Data

We are going to use the Sign Language MNIST dataset on Kaggle which is licensed under CC0: Public Domain.

Some facts about the dataset

  1. No cases for the letters J & Z (Reason: J & Z require motion)
  2. Grayscale Images
  3. Pixel Values ranging from 0 to 255
  4. Each image contains 784 Pixels
  5. Labels are numerically encoded, which ranges from 0 to 25 for A to Z
  6. The data comes with the train, and test sets, with each group containing 784 pixels along with the label representing the image

Evaluation Metric

We are going to use accuracy as the evaluation metric. Accuracy is the ratio of correctly classified samples to the total number of samples.

Modelling

Convolutional Neural Network (Image by Sathwick)
Convolutional Neural Network (Image by Sathwick)
  1. Convolutional Neural Networks(CNN) are the go-to choice for an image classification problem.
  2. Convolutional Neural Network is an artificial neural network consisting of convolutional layers.
  3. CNN works well with the image data. The main thing that differentiates the convolutional layer from dense layers is that in the former, every neuron connects only to a particular set of neurons in the previous layer.
  4. Each convolutional layer contains a set of filters/kernels/feature maps which helps identify the different patterns in the image.
  5. Convolutional Neural Networks can find more complicated patterns as the image passes through the deeper layers.
  6. Another advantage of using CNN, unlike a typical dense layered ANN, is once it learns a pattern in a location, it can identify the pattern at any other place.

Initial End-to-End Workflow

Importing the required modules and packages

Preparing the data

Reading the CSV file (_sign_mnisttrain.csv) using pandas and shuffling the whole training data.

Separating the image pixels and labels allows us to apply feature-specific preprocessing techniques.

Normalization and Batching

Normalizing the input data is very important, especially when using gradient descent which will be more likely to converge faster when the data is normalized. Grouping training data into batches decreases the time required to train the model.

An image from the train data after applying the preprocessing

Preprocessed Image (Image by Sathwick)
Preprocessed Image (Image by Sathwick)

Binarizing the labels

Label Binarizer (Image by Sathwick)
Label Binarizer (Image by Sathwick)

The LabelBinarizer from the Scikit-Learn library binarizes the labels in a one-vs-all fashion and returns the one-hot encoded vectors.

Separating the Validation data

The validation data will helps using choosing the best model. If we use the test data here, we will select the model that is too optimistic on the test data.

Building the model

The CNN model can be defined as below

Usual selections while building the CNN is

  1. Choose a set of Convolutional-Pooing layers
  2. Increase the number of neurons in the deeper convolutional layers
  3. Add a set of dense layers after the Convolutional-Pooling layers

    Architecture of the model (Image by Sathwick)
    Architecture of the model (Image by Sathwick)

Now comes the crucial selections of the model building

  1. loss – Specifies the loss function that we are trying to minimize. As our labels are one hot encoded, we can choose categorical cross entropy as the loss function.
  2. optimizer – This algorithm finds the best weights that minimize the loss function. Adam is one such algorithm which works well in most cases.

We can specify any metrics that evaluate our model while building the model, which we chose to be the accuracy.

Checkpoints

ModelCheckpoint – Saves the best model found during training at each epoch by accessing the model performance on the validation data.

EarlyStopping – Interrupts training when there is no progress until a specified number of epochs.

Finding patterns

Reviewing the model training

History object contains the loss and specified metrics details obtained during the model’s training. This information can be used to obtain the learning curves and access the training process and the model’s performance at each epoch.

For more details about the learning curves obtained during the training, refer to this jupyter notebook.

The best model

Retrieving the best model obtained during the training as the model received at the end of the training need not be the best model.

Performance on the Test Set

Accuracy: 94%


Hyperparameter Tuning

Photo by Denisse Leon on Unsplash
Photo by Denisse Leon on Unsplash

You can find the code and the resulting models below hyperparameter tuning here.

When it comes to hyperparameter tuning, there are a plethora of choices we can tune in a given CNN. Some of the most essential and common hyperparameters that need to be tuned include

Number of Convolution and Max Pooling Pairs

This represents the number of Conv2D and MaxPooling2D pairs we stack together, building the CNN.

As we stack more pairs, the network gets deeper and deeper, increasing the model’s ability to identify complex image patterns.

But stacking too many layers would negatively impact the model’s performance (input image sizes reduce rapidly as it goes deeper into the CNN) and also increase the training time as the number of trainable parameters of the model drastically increases.

Filters

Filters determine the number of output feature maps. A filter acts as a pattern and will be able to find similarities when convoluted across an image. Increasing the number of filters in the successive layers works well in most cases.

Filter Size

It is a convention to take the filter size as an odd number, giving us a central position. One of the main problems with even-sized filters is they would require asymmetric padding.

Instead of using a single convolution layer consisting of filters with larger sizes like(7×7, 9×9), we can use multiple convolutional layers with smaller filter sizes which will more likely improve the model’s performance as deeper networks can detect complex patterns.

Dropout

Dropout acts as a regularizer and prevents the model from overfitting. The dropout layer nullifies the contribution of some neurons toward the next layer and leaves others unmodified. The dropout rate determines the probability of a particular neuron’s contribution being cancelled.

At the initial epochs, we might encounter that the training loss is greater than the validation loss as some neurons might be dropped during the training, but a complete network with all the neurons is used in the validation.

Data Augmentation

Data Augmentation (Image by Sathwick)
Data Augmentation (Image by Sathwick)

With Data Augmentation, we can generate slightly modified copies of the available images and use them for the training model. These images of different orientations help the model identify objects in different orientations.

For example, we might introduce a small rotation, zoom, and translation to the images.

Other Hyperparameters to try

  1. Batch Normalization – It normalizes the layer inputs
  2. Deeper networks work well – Replacing the single convolution layer of filter size (5X5) with two successive consecutive convolution layers of filter size (3X3)
  3. Number of units in the dense layer and number of dense layers
  4. Replacing the MaxPooling Layer with a convolution layer having a stride > 1
  5. Optimizers
  6. Learning rate of the optimizer

Evaluation of the final model

Best Model after Hyperparameter Tuning (Image by Sathwick)
Best Model after Hyperparameter Tuning (Image by Sathwick)

Accuracy: 96%


Deployment of the model to Streamlit

Photo by Praveen kumar Mathivanan on Unsplash
Photo by Praveen kumar Mathivanan on Unsplash

Streamlit is a fantastic platform that removes all the hassle required in a manual deployment. With Streamlit, all we need is a GitHub repository containing a python script that specifies the flow of the app.

Setup

Install the Streamlit library using the below command

pip install streamlit

To run the Streamlit application use streamlit run <script_name>.py

Building the app

You can find the complete Streamlit app flow here.

Getting the best model obtained after the hyperparameter tuning and the LabelBinarizer is required to convert the model’s output back to corresponding labels.

@st.cache decorator runs the function only once, preventing unnecessary rework while redisplaying the page.

Model’s Prediction

We should reshape the uploaded image to a 28×28 as it is our model’s input shape. We must also preserve the aspect ratio of the uploaded image.

Then we can use the preprocessed image as the model input and get the respective prediction which can be transformed back to the label representing the English letter using the label_binarizer

Deploying the app to the Streamlit Cloud

  1. Sign up for a Streamlit account here.
  2. Connect your GitHub account with your Streamlit account by giving all the necessary permissions to access your repositories.
  3. Ensure the repository contains a requirements.txt file specifying all the app’s dependencies.
  4. Click on the New App button available here.
  5. Give the repository, branch name, and the python script name, which contains the flow of our app.
  6. Click on the Deploy button.

Now your app will be deployed to the web and will get updated whenever you update the repository.

Summary

In this tutorial, we have understood the following,

  1. The pipeline for developing a solution using Deep Learning for a problem
  2. Preprocessing the image data
  3. Training a Convolutional Neural Network
  4. Evaluation of the model
  5. Hyperparameter Tuning
  6. Deployment

Note that this tutorial only briefly introduces the complete end-to-end pipeline of developing a solution using Deep Learning techniques. These pipelines consist of enormous training, searching a much more comprehensive range of hyperparameters and evaluation metrics specific to the use cases. A deeper understanding of the model, data and wider image preprocessing techniques are required to build a pipeline for solving complex problems.

Take a look at the model.

For more details about the code or models used in this article, refer to this GitHub Repo.

Thanks for Reading!

I hope you find this tutorial helpful in building your next fantastic Machine Learning project. If you find any details incorrect in the article, please let me know in the comments section. I’d love to have your suggestions and improvements to the repository.


Related Articles