Unexplored TensorFlow Libraries for Computer Vision

Explore toolkits and extensions like TensorFlow model optimizer, graphics, federated learning, privacy, and more to boost your computer vision workflow.

Published in

Towards Data Science

9 min readSep 2, 2020

TensorFlow is an end-to-end open-source machine learning platform capable of performing a range of tasks. It provides an ease of use for beginners and researchers alike and can be used to work on different applications like, but not limited to, computer vision, natural language processing, and reinforcement learning.

In the computer vision world, most of us are familiar with the core TensorFlow along with TensorFlow Lite and JS. They are used to run models on mobile and edge devices for the former and the web for the latter. However, TensorFlow also offers many more arcane libraries, which we will be unraveling in this article.

TensorFlow model optimization toolkit
TensorFlow Graphics
TensorFlow Federated
TensorFlow Privacy
TensorFlow Hub

TensorFlow Model Optimization Toolkit

Real-time models are essential for many commercial operations. The inference speed of MobileNet had shone it into the limelight even if it meant sacrificing a little on accuracy. The first thing that comes to mind for optimizing TensorFlow models is to convert it to TensorFlow lite serving. However, that does not work very well on desktops as it is optimized for ARM neon, explained in this issue, or we need to optimize the model even further. The model optimization toolkit comes to our rescue for these tasks. According to its homepage, it can be used to:

Reduce latency and inference cost for cloud and edge devices (e.g. mobile, IoT).
Deploy models to edge devices with restrictions on processing, memory, power consumption, network usage, and model storage space.
Enable execution on and optimize for existing hardware or new special-purpose accelerators.

It can be applied to already trained models as well as during training time to further optimize the solution. It offers three techniques at the time of writing with several others as work in progress to hone the models.

Pruning

The first method is weight pruning. It works by removing some connections between layers hence reducing the number of parameters and operations involved and consequently optimizing the model. It eliminates unnecessary values in the weight tensors and is performed during the training process. This is useful in reducing the size of models, which can be further decreased by post-training quantization.

I will not go into the details and code of each function as that will make the article too long. You can refer here for further understanding and here for its code.

Quantization

Unlike pruning, which is done only during training, quantization can be done at both training and testing. Tensorflow Lite models are also quantized to use 8-bit integers instead of the 32-bit floating points used generally. This improves performance and efficiency as integer operations are much faster than floating-point operations.

However, this comes at a price. Quantization is a lossy technique. This means that the information previously represented from -3e38 to 3e38 has to be represented from -127 to 127. During addition and multiplication operations, the 8-bit integers scale up to 32-bit integers which need to be again downscaled introducing more error. To counter this, quantization can be applied during training.

Quantization Aware Training

By applying quantization during training, we are forcing the model to learn the differences that it would cause and act accordingly. The quantization error is introduced as noise and the optimizer tries to minimize it. The models trained this way have comparable accuracy to the floating-point models. It will be interesting to see the comparison of Tensorflow Lite models created this way against the normal ones.

To read more about it refer here, and for its code, you have a look here.

Post Training Quantization

Although it is preferred to apply quantization during training, sometimes it is not feasible to do so and we may have pre-trained weights ready to use. Also, it is much easier to implement.

More information can be found here, along with the code.

Weight Clustering

It combines similar weights and replaces them by a single value. It can be imagined like JPEG compression. Moreover, it is lossy as well due to similar weights being interpolated to the same number. The weight matrix stores float values. Those values are converted to integers with a lookup table containing the clustered numbers are stored. This reduces the space required as integers require less space to store, and a limited amount of floats are left.

As seen in the example below, sixteen float-32 values are assigned to four float-32 centroids, and the layer weights are converted to integer values. The larger the weight matrices the greater is the savings.

Clustering is applied to fully trained models to find the centroids. Then any compression tool can be used to reduce the size of models. To read about it detail refer here, along with its implementation.

Different techniques can be combined to further reduce latency and many more approaches are planned as discussed in their roadmap.

TensorFlow Graphics

TensorFlow graphics aims to combine computer vision and computer graphics for solving complex 3D tasks. A computer graphics workflow requires 3D objects and their absolute positioning in the scene, a description of the material they are made of lights, and a camera to generate a synthetic rendering. On the other end, a computer vision workflow would start from an image and try to deduce its parameters.

This can be thought of as an autoencoder where the vision system (encoder) would try to find parameters, while the graphics system (decoder) would then generate an image based on them and that can be compared with the original image. Moreover, this system does not require labeled data and trains in a self-supervised manner. Some uses of TensorFlow are:

Transformations — Object transformations like rotation and translation can be performed on objects. This can be learned by the neural networks to accurately find the object’s position. It is useful for robotic arms that require a precise estimation of the position of these objects.
Modeling cameras — Different intrinsic parameters for cameras can be set which alter the way the image is perceived. For example, changing the focal length of the cameras changes the size of objects.
Materials — Different types of materials can be used that have different types of light reflecting capabilities. Hence, the scenes created can accurately mimic how the objects would behave in the real world.
3D convolutions and pooling (Point clouds & meshes) — It has 3D convolutional and pooling layers allowing us to perform semantic classification and segmentation on 3D data.
TensorBoard 3D — 3D data is becoming more and more ubiquitous and can be used to solve problems like 3D reconstruction from 2D data, point cloud segmentation, morphing 3D objects, etc. Through TensorBoard 3D, these results can be visualized offering better insights on models.

Further reading — here (It also contains links for Colab notebooks)

TensorFlow Federated

This library can be used for other areas outside of computer vision as well. With the number of mobile devices and edge devices out there, there is a lot of data generated. Federated learning aims to perform machine learning on decentralized data, i.e. on the devices itself! This means that there is no need to upload huge amounts of (sensitive) data to servers. It is already being used with Google Keyboards.

The video attached below explains everything about federated learning, from decentralized data to working with TensorFlow Federated.

Refer to the article linked below for a guide on using TensorFlow Federated for image classification.

Federated Learning for Image Classification | TensorFlow Federated

This Colab has been verified to work with the Note: the latest released version of the tensorflow_federated pip package…

www.tensorflow.org

TensorFlow Privacy

Sensitive information can be extracted from trained ML models through privacy attacks. Truex et al. presented a paper concerning the factors that drive it. The models can even reconstruct the information it was trained on as shown in this paper.

On the left side is the image reconstructed using only the name of the person and the model. The image on the right side is the original image. Taken from the paper itself.

Again, like TensorFlow Federated, this is not exclusive to computer vision. The most common technique used is differential privacy. From Wikipedia:

Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset.

It is assumed that sensitive information is not repeated thoroughly across the dataset, and by using a differential privacy model it is ensured that the model does not learn such information. For example, suppose there is a dataset of chats between people. Now, the sensitive information passed on chat can be passwords, bank account details, etc. So, if a model is created on this dataset, differential privacy will ensure that the model cannot learn these details as they will be present in a scant quantity. Read this stellar article on differential privacy, and it also contains the code on performing it.

TensorFlow Hub

Most of you must be knowing about this library, so I will keep its introduction very brief. TensorFlow Hub is a platform to publish, discover, and reuse parts of machine learning modules in TensorFlow. It won’t be wrong to call it the GitHub of TensorFlow models. Developers can share their pre-trained models that can then be reused by others. By reusing, a developer can train a model using a smaller dataset, improve generalization, or simply speed up training. Let’s have a swift look at some of the different computer vision models available.

Image Classification — There are more than a hundred models available for this task, from MobileNet to Inception to EfficientNet. Name any model you want and will most probably find it there.
Object Detection and Segmentation — Again, any model you need can be found here, especially with the collections of TensorFlow model zoo object detectors trained on the COCO dataset. Deeplab architecture dominates the image segmentation scene. There is also an abundance of TfLite and TensorFlow Js models available.
Image stylization — Different backbones for image stylization are available along with a cartooning GAN as well.
Generative Adversarial Networks — GAN models like Big-GAN and Compare-GAN are available, trained on ImageNet and Celeb dataset which can be used to train any category from ImageNet and artificial faces! There is also a Boundless-GAN that can be used to generate areas outside the scene captured by the camera. Moreover, most of them have a Colab notebook so there is no fuss on how to implement them.

I have just described the tip of an iceberg. There are a lot more models available for topics like pose estimation, feature matching, super-resolution, etc. and I have not even discussed videos. Hop onto their page to find out more. You can also find tutorials on it here.

TensorFlow offers many more libraries like TensorFlow Extended, a library used for deploying ML pipelines, Magenta, a library to generate music, etc. Check out the full list here.