
Take a look at the model that you are going to build.
For more details about the code or models used in this article, refer to this GitHub Repo.
Okay!!! Now let’s dive into building the Convolutional Neural Network model that converts the sign language to the English alphabet.
Understanding the problem
The Problem Definition
Converting the Sign Language to Text can be broken down into the task of predicting the English letter corresponding to the representation in the sign language.
Data
We are going to use the Sign Language MNIST dataset on Kaggle which is licensed under CC0: Public Domain.
Some facts about the dataset
- No cases for the letters J & Z (Reason: J & Z require motion)
- Grayscale Images
- Pixel Values ranging from 0 to 255
- Each image contains 784 Pixels
- Labels are numerically encoded, which ranges from 0 to 25 for A to Z
- The data comes with the train, and test sets, with each group containing 784 pixels along with the label representing the image
Evaluation Metric
We are going to use accuracy as the evaluation metric. Accuracy is the ratio of correctly classified samples to the total number of samples.
Modelling

- Convolutional Neural Networks(CNN) are the go-to choice for an image classification problem.
- Convolutional Neural Network is an artificial neural network consisting of convolutional layers.
- CNN works well with the image data. The main thing that differentiates the convolutional layer from dense layers is that in the former, every neuron connects only to a particular set of neurons in the previous layer.
- Each convolutional layer contains a set of filters/kernels/feature maps which helps identify the different patterns in the image.
- Convolutional Neural Networks can find more complicated patterns as the image passes through the deeper layers.
- Another advantage of using CNN, unlike a typical dense layered ANN, is once it learns a pattern in a location, it can identify the pattern at any other place.
Initial End-to-End Workflow
Importing the required modules and packages
Preparing the data
Reading the CSV file (_sign_mnisttrain.csv) using pandas and shuffling the whole training data.
Separating the image pixels and labels allows us to apply feature-specific preprocessing techniques.
Normalization and Batching
Normalizing the input data is very important, especially when using gradient descent which will be more likely to converge faster when the data is normalized. Grouping training data into batches decreases the time required to train the model.
An image from the train data after applying the preprocessing

Binarizing the labels

The LabelBinarizer
from the Scikit-Learn library binarizes the labels in a one-vs-all fashion and returns the one-hot encoded vectors.
Separating the Validation data
The validation data will helps using choosing the best model. If we use the test data here, we will select the model that is too optimistic on the test data.
Building the model
The CNN model can be defined as below
Usual selections while building the CNN is
- Choose a set of Convolutional-Pooing layers
- Increase the number of neurons in the deeper convolutional layers
-
Add a set of dense layers after the Convolutional-Pooling layers
Architecture of the model (Image by Sathwick)
Now comes the crucial selections of the model building
loss
– Specifies the loss function that we are trying to minimize. As our labels are one hot encoded, we can choose categorical cross entropy as the loss function.optimizer
– This algorithm finds the best weights that minimize the loss function. Adam is one such algorithm which works well in most cases.
We can specify any metrics that evaluate our model while building the model, which we chose to be the accuracy.
Checkpoints
ModelCheckpoint
– Saves the best model found during training at each epoch by accessing the model performance on the validation data.
EarlyStopping
– Interrupts training when there is no progress until a specified number of epochs.
Finding patterns
Reviewing the model training
History object contains the loss and specified metrics details obtained during the model’s training. This information can be used to obtain the learning curves and access the training process and the model’s performance at each epoch.
For more details about the learning curves obtained during the training, refer to this jupyter notebook.
The best model
Retrieving the best model obtained during the training as the model received at the end of the training need not be the best model.
Performance on the Test Set
Accuracy: 94%
Hyperparameter Tuning

You can find the code and the resulting models below hyperparameter tuning here.
When it comes to hyperparameter tuning, there are a plethora of choices we can tune in a given CNN. Some of the most essential and common hyperparameters that need to be tuned include
Number of Convolution and Max Pooling Pairs
This represents the number of Conv2D
and MaxPooling2D
pairs we stack together, building the CNN.
As we stack more pairs, the network gets deeper and deeper, increasing the model’s ability to identify complex image patterns.
But stacking too many layers would negatively impact the model’s performance (input image sizes reduce rapidly as it goes deeper into the CNN) and also increase the training time as the number of trainable parameters of the model drastically increases.
Filters
Filters determine the number of output feature maps. A filter acts as a pattern and will be able to find similarities when convoluted across an image. Increasing the number of filters in the successive layers works well in most cases.
Filter Size
It is a convention to take the filter size as an odd number, giving us a central position. One of the main problems with even-sized filters is they would require asymmetric padding.
Instead of using a single convolution layer consisting of filters with larger sizes like(7×7, 9×9), we can use multiple convolutional layers with smaller filter sizes which will more likely improve the model’s performance as deeper networks can detect complex patterns.
Dropout
Dropout acts as a regularizer and prevents the model from overfitting. The dropout layer nullifies the contribution of some neurons toward the next layer and leaves others unmodified. The dropout rate determines the probability of a particular neuron’s contribution being cancelled.
At the initial epochs, we might encounter that the training loss is greater than the validation loss as some neurons might be dropped during the training, but a complete network with all the neurons is used in the validation.
Data Augmentation

With Data Augmentation, we can generate slightly modified copies of the available images and use them for the training model. These images of different orientations help the model identify objects in different orientations.
For example, we might introduce a small rotation, zoom, and translation to the images.
Other Hyperparameters to try
- Batch Normalization – It normalizes the layer inputs
- Deeper networks work well – Replacing the single convolution layer of filter size (5X5) with two successive consecutive convolution layers of filter size (3X3)
- Number of units in the dense layer and number of dense layers
- Replacing the MaxPooling Layer with a convolution layer having a stride > 1
- Optimizers
- Learning rate of the optimizer
Evaluation of the final model

Accuracy: 96%
Deployment of the model to Streamlit

Streamlit is a fantastic platform that removes all the hassle required in a manual deployment. With Streamlit, all we need is a GitHub repository containing a python script that specifies the flow of the app.
Setup
Install the Streamlit library using the below command
pip install streamlit
To run the Streamlit application use streamlit run <script_name>.py
Building the app
You can find the complete Streamlit app flow here.
Getting the best model obtained after the hyperparameter tuning and the LabelBinarizer
is required to convert the model’s output back to corresponding labels.
@st.cache
decorator runs the function only once, preventing unnecessary rework while redisplaying the page.
Model’s Prediction
We should reshape the uploaded image to a 28×28 as it is our model’s input shape. We must also preserve the aspect ratio of the uploaded image.
Then we can use the preprocessed image as the model input and get the respective prediction which can be transformed back to the label representing the English letter using the label_binarizer
Deploying the app to the Streamlit Cloud
- Sign up for a Streamlit account here.
- Connect your GitHub account with your Streamlit account by giving all the necessary permissions to access your repositories.
- Ensure the repository contains a requirements.txt file specifying all the app’s dependencies.
- Click on the New App button available here.
- Give the repository, branch name, and the python script name, which contains the flow of our app.
- Click on the Deploy button.
Now your app will be deployed to the web and will get updated whenever you update the repository.
Summary
In this tutorial, we have understood the following,
- The pipeline for developing a solution using Deep Learning for a problem
- Preprocessing the image data
- Training a Convolutional Neural Network
- Evaluation of the model
- Hyperparameter Tuning
- Deployment
Note that this tutorial only briefly introduces the complete end-to-end pipeline of developing a solution using Deep Learning techniques. These pipelines consist of enormous training, searching a much more comprehensive range of hyperparameters and evaluation metrics specific to the use cases. A deeper understanding of the model, data and wider image preprocessing techniques are required to build a pipeline for solving complex problems.
Take a look at the model.
For more details about the code or models used in this article, refer to this GitHub Repo.
Thanks for Reading!
I hope you find this tutorial helpful in building your next fantastic Machine Learning project. If you find any details incorrect in the article, please let me know in the comments section. I’d love to have your suggestions and improvements to the repository.
- If you have enjoyed the article, follow me on medium.
- Let’s connect on LinkedIn.