The world’s leading publication for data science, AI, and ML professionals.

Post Training Analysis and Quantization of Machine Learning and Deep Learning Models

An extremely important but sometimes overlooked concept

VIRTUAL ASSISTANT PROJECT

Photo by Scott Graham on Unsplash
Photo by Scott Graham on Unsplash

Building and exploring Machine Learning as well as deep learning models is something I love to do. I pick a project and start working on it. Some are just hobbyist level projects and some are projects with real-use cases and advantages. I have a moderate GPU and a moderate system with decent processing capabilities. So all I focus on is just training and making sure the model works and runs on my PC. However, this is not the end of the story and sometimes I end up missing out on an extremely important concept – Post-training analysis of the model.

What is this post-training analysis and why is it so important?

Post-training analysis sometimes also referred to as post-mortem analysis plays a major role in the optimization of models. The business models built and trained need to be optimized in order for them to work efficiently on lower-end devices and embedded systems, like the raspberry pi. One of the principal components of building and evaluating models is examining the predictive capabilities and performance quality of the model. A more paramount concept is understanding the limitations of your machine learning or deep learning model. Overcoming these limitations is the key to a successful model.

In this article, we will discuss how to quantize our models effectively for particular use cases. We will be focusing mainly on the discussion of the limitations, improvements, and quantization of deep learning models. Firstly, we will discuss what exactly the post-training quantization of our models using TensorFlow is. We will learn how to make our models more dynamic and effective across all platforms to reach a wider target audience with the help of various optimization techniques. Then we will understand limitations and techniques which can be used for the improvement of a variety of models. Our main references for this project will be the TensorFlow blog for post-training quantization and the Virtual Assistant Project series which was done by me. You can check out both of these in the links provided below albeit not mandatory.

Post-training quantization | TensorFlow Lite

Virtual Assistant Project – Towards Data Science

Note: Although not an essential requirement, I would highly recommend viewers to check out my previous projects on the smart face lock system, next word prediction, and the innovative chatbot using 1-D convolution layers projects from the virtual assistant series as it would help in grasping some of the concepts in a more intuitive manner.


Photo by Michael Dziedzic on Unsplash
Photo by Michael Dziedzic on Unsplash

Post-Training Quantization:

A model that runs effectively on your system might not be able to run the same program/model effectively on a lower-end device. This could be due to the hardware constraints of the target device. Here, is where post-training quantization can help improve in the optimization of the algorithms and models for the target device. Post-training quantization is a conversion technique that can reduce model size while also improving CPU and hardware accelerator latency, with little degradation in model accuracy. You can quantize an already trained float TensorFlow model when you convert it to the TensorFlow Lite format using the TensorFlow Lite Converter. The TensorFlow Lite Converter is very useful on devices such as the Raspberry Pi for optimization of object detection models, face recognition models, etc. The object detection projects can be optimized using TensorFlow Lite up to a great effect on android or ios devices as well. You can check out this cool project for this topic here.

Below is the block diagram to help you choose the best post-training quantization and optimization methods for your model:

Source: Image from tensorflow
Source: Image from tensorflow

In brief, the various post-training quantization methods are as follows:

Dynamic range quantization

This simplest form of post-training quantization. It statically quantizes only the weights from floating point to integer, which has 8-bits of precision.

Full integer quantization

You can get further latency improvements, reductions in peak memory usage, and compatibility with integer only hardware devices or accelerators by making sure all model math is integer quantized.

Float16 quantization

You can reduce the size of a floating point model by quantizing the weights to float16, the IEEE standard for 16-bit floating point numbers.

Basically, in these 3 methods of post-training optimization, we are making sure that our model design is as efficient and optimized as possible. We have to make sure that once the necessary optimization changes are done, the model remains performance efficient while maintaining more or less, the same effectiveness in parameters such as accuracy as well as loss.


Photo by Dhru J on Unsplash
Photo by Dhru J on Unsplash

Other Intuitive Examples for understanding the limitations of models and significant changes and improvements that can be made –

1. Face Recognition Models:

So, you successfully built a face recognition model using deep learning techniques like transfer learning similar to the one here. The overall model performs extremely well on the trained data. Upon testing on the training and validation data, the results produced are still very good. The model performs as desired in most scenarios. However, there are some limitations about the model. Let us understand and look at these limitations of the face recognition model. Let us also look at the improvements that can be made on the following.

Limitations:

  1. The face recognition model cannot perform very well with a low-quality camera. The camera quality must at least be average to capture the real-time face correctly and grant access.
  2. The surrounding lighting must not be dark. It needs to be at least moderately bright. Otherwise, the model will have issues detecting and recognizing the face during real-time analysis.
  3. We make use of the haarcascade_frontalface_default.xml for the detection of the faces. This can be problematic for identifying the faces at certain angles and the detection of the face might not work as desired.

Improvements:

  1. One-shot learning and training methods can be used for reducing the training time for each face. Since the current model recognizes only one face, if we want to add more faces we need to re-train the entire model. For this reason methods like one-shot learning needs to be considered for improving the quality and performance of the models.
  2. Alternatives to haarcascade_frontalface_default.xml can be found to improve the accuracy of the detection of the faces at any particular angle. An alternative can be to make a custom XML file for both front and side faces.
  3. To make it run on embedded devices changes can be made on the memory constraints like converting to tf.float(32) and also changes can be made on the model by considering the use of tflite.

2. Next Word Prediction:

The next word prediction model is able to achieve a decent loss on the model. The next word model is a predictive search type approach which can be used for google searches. These can be also used to detect certain patterns of users and can be used in email and texting next word prediction tasks. The model despite being decent has flaws and there are ways to improve the flaws that exist on the next word prediction model. Let us look at all the limitations and the changes that can be made to improve the overall model.

Limitations:

  1. The next word prediction is limited to only a few words and cannot perform very accurately on longer sentences. This is not the best scenario while working in the industry as there might be longer sentences.
  2. The model requires a longer time to be trained and perform well compared to other models due to the use of LSTM layers. This longer training may be a limitation in the case of smaller embedded systems with lesser ram and GPU capabilities. However, once trained on a better system the deployment is not such a big issue.

Improvements:

  1. Zero-shot and one-shot learning methods even exist for natural language processing. The same methods can be used for better training of the model to improve the overall performance and avoid repeated training procedures which can be a really big hindrance in some real-life applications and scenarios. Hence, one-shot learning is a great alternative for deployment and working in other embedded systems with lower training capacities.
  2. Advanced training algorithms like the GPT-3 model could very well be extremely effective for these predictive tasks.
  3. More methods of pre-processing can be used and many more alternative methods can be tested out to gain better accuracies, improved losses, and achieve overall high performance.

3. Chatbot Models:

The chatbot model is a text classification based innovative chatbot which can achieve an overall good accuracy and a reduced loss. The chatbot model is able to perform really well on the entire dataset which we used. However, even our chatbot has certain limitations and issues. We will cover them in the below section. But the overall prediction system and the chatbot built on the conv-1D architecture performs well on the witty.txt dataset and can be adopted for similar datasets. The model works on the classification of unique values to most frequently repeated questions.

Limitations:

  1. One of the main limitations of the conv-1D chatbot model is that it is a classification based chatbot and not a generative approach to modeling. This classification approach can lead to major setbacks as there might be issues in certain languages, like semantic issues when the user is not very proficient in the English language. This can be misleading and the chatbot might not produce the desired result.
  2. Since it is not a completely perfect model, it can be error-prone sometimes. It can predict results that can not be satisfactory to the user and this might be a concern for industrial uses.

Improvements:

  1. More methods of pre-processing and natural language processing can be used to achieve a higher accuracy and reduced loss on the training of the model. This can also improve the overall predictions of the model.
  2. Advanced training algorithms like the GPT-3 model works fantastic even for conversational chatbots and it is a great alternative method to train high-quality chatbots.
  3. Other methods like transfer learning based classification, sequence to sequence models with attention, or even certain one-shot learning methods for training can be used.

Photo by William Iven on Unsplash
Photo by William Iven on Unsplash

Conclusion:

Having fun alongside experimenting with machine learning and deep learning models is always cool and the best part about them. While exploring these models, if you do want them to be converted to real-life use cases where it can reach as well as benefit a large number of people, then post-training analysis and post-training quantization of models becomes extremely important to improve the efficiency, quality, and compactness to deploy the projects to a wider audience. The post-training quantization also enables us to achieve almost identical accuracy on the quantized model similar to the accuracy obtained in the original model. This simplifies and makes our life a lot easier!

Thank you all for sticking on this the end. Have a great time in the exploration and quantization of your machine learning and deep learning models. Feel free to check out all the previously mentioned projects by me and I hope all of you enjoyed the article and wish you all a wonderful day!


Related Articles