The world’s leading publication for data science, AI, and ML professionals.

Complete Guide to Data Augmentation for Computer Vision

All the theory you need to know about Image Augmentation. For beginners and experts.

Image by Author
Image by Author

Data Augmentation is one of the most important topics in Deep Computer Vision. When you train your neural network, you should do data augmentation like… ALWAYS. Otherwise, you are not using your dataset effectively and your model does not perform as well as it could.

In this tutorial, I summarized all the open-source knowledge about Image Augmentation and added my experience from several commercial Computer Vision projects. Hope, you’ll find it useful!

Contents

  • What is Data Augmentation
  • How to Augment Images
  • What Papers Say
  • How to Choose Augmentations for Your Task
  • Image Augmentation in PyTorch and TensorFlow
  • What’s Next

What is Data Augmentation

Data Augmentation is a technique used to artificially increase dataset size. Take a sample from the dataset, modify it somehow, add it to the original dataset – and now your dataset is one sample larger.

You can do it with all samples in the dataset, and modify each sample several times in a different way – to get a dataset 10, 100, 1000 times larger. You can even create a dataset of infinite size, so your model will never "see" the identical samples during training.

Image 1. Visualization of Data Augmentation technique. Image by Author
Image 1. Visualization of Data Augmentation technique. Image by Author

More data = better model. Data Augmentation helps overcome the "not enough data" issue, prevents overfitting, and makes the model perform better on previously unseen samples. And no additional efforts are needed – to collect data or label data, which sometimes may be costly or unfeasible.

Data augmentation is used in many domains – natural language processing, time series analysis, audio processing… However, today we will focus on data augmentation for computer vision – Image Augmentation.

How to Augment Images

Assume, you’d like to train a "Cat versus Dog?" classifier. You’ve already collected 500 cat images and 500 dog images. However, 1000 images – probably are not enough to create high performing model, so you want to use a Data Augmentation, that you’ve heard could help here. The main question: how to modify images to increase the dataset?

Well, there are plenty of options here. Below is a list of the most popular image augmentations.

Image 2. Most popular Image Augmentations. Image by Author
Image 2. Most popular Image Augmentations. Image by Author
  • Crop – takes a part of the images.
  • Rotation – rotates image around the center (or some other point).
  • Flip – mirrors image around a horizontal or vertical line.
  • The most popular Filters are blurring and sharpening. While blurring smoothes edges and details, sharpening highlights them.
  • Affine Transformation- any transformations that preserve parallel lines.
  • Adding Noise – such as blackening and whitening random pixels (salt & pepper noise), adding Gaussian noise, or even removing the whole region from an image (cutout).
  • Color change makes the image darker or brighter, greyscaled or extremely saturated, less or more contrasted.

For those interested in photography, these image modifications may sound very familiar. And it is. If you wish, you may think about the image augmentation technique as hiring thousands of photo editors to process images with their unique styles.

Applications like Photoshop and Lightroom may also give you a clue as, what other data augmentation you can utilize during the model training.

Image 3. Basic Panel in Adobe Lightroom Classic gives a clue what other color changes may be used as data augmentations. Image by Author
Image 3. Basic Panel in Adobe Lightroom Classic gives a clue what other color changes may be used as data augmentations. Image by Author

Imagination is the only limit. First of all, the list in the Image 2 is not a complete list of augmentations. There are so many ways to change an image, so I do not believe, that a complete list even exists. So feel free to look for more augmentation types over the internet if you feel, these ones are not enough.

Secondly, to each image in the dataset, you may:

  • apply a single augmentation or a sequence;
  • change the order in which augmentations are applied;
  • randomize augmentation parameters, like rotation angle or brightness range;
  • randomize the probability of particular augmentation to be applied.

Choosing what augmentations (and in what order) to use is your task, as a Data Scientist.

What about blank borders? Sometimes, after augmentation blank borders may appear. By default, the augmentation libraries fill blank pixels with white, black, or grey color. You may leave it as it is, or go further and:

  • Change the color to some other, for instance, blue.
  • Replicate pixels – duplicate pixels close to the border to fill blank space.
  • Reflect – mirror image part that is close to the border.
  • Wrap – copy pixels from another side of the image.
Image 4. Ideas on how to deal with blank borders. Image by Author
Image 4. Ideas on how to deal with blank borders. Image by Author

Pay attention to dataset labels. Sometimes, depending on the dataset you use, some data augmentations may not only change images but also their labels.

When training the "Cat versus Dog?" classifier, none of the augmentations in Image 2 would create a cat from a dog. For instance, when cropping an image, part of the dog is still a dog. However, if you are solving an object detection task, such as "Find a dog in the image", the crop will change a location label (bounding box) of the original image.

Image 5. Depending on the task, the same augmentation may affect image labels differently. Image by Author
Image 5. Depending on the task, the same augmentation may affect image labels differently. Image by Author

Dirty labels introduce noise into the model and worsen its performance, so always when considering using particular data augmentation, ask yourself: "Does this augmentation change image labels?"

It is up to you, what to do next if the answer is "Yes":

  • You may avoid using the augmentation;
  • Or modify labels respectively to match augmented images.

What Papers Say

In this section, I will show several examples, how image augmentation is used in research projects, and what benefits it brings.

LeNet-5. Yann LeCun and co-authors published a paper [1], describing one of the earliest neural networks, and used… data augmentations. It was 1998, and at that time they called it "data distortions". They concluded that test set error slightly dropped when trained on augmented dataset compared to not using data augmentations.

"…We artificially generated more training examples by randomly distorting the original training images… The distortions were combinations of the following planar affine transformations: horizontal and vertical translations, scaling, squeezing (simultaneous horizontal compression and vertical elongation, or the reverse), and horizontal shearing…" [1]

Image 6. Image Augmentations used to train LeNet-5. Image by Author
Image 6. Image Augmentations used to train LeNet-5. Image by Author

ImageNet. When training ImageNet [2] extensive data augmentations were used. ImageNet is a huge neural network with 60 million parameters, so A LOT of data is needed to train it. The authors mentioned, that without data augmentations model suffered from overfitting.

"…The first form of data augmentation consists of generating image translations and horizontal reflections. We do this by extracting random 224×224 patches (and their horizontal reflections) from the 256×256 images and training our network on these extracted patches. Without this scheme, our network suffers from substantial overfitting, which would have forced us to use much smaller networks… The second form of data augmentation consists of altering the intensities of the RGB channels in training images…" [2]

Image 7. Data Augmentations used to train ImageNet. Image by Author
Image 7. Data Augmentations used to train ImageNet. Image by Author

U-Net. Paper [3] shows how data augmentation can help solve medical tasks when limited data is available. During one of the experiments, the authors successfully trained a segmentation model only using 30 images of size 512×512 pixels. Isn’t it impressive?

"…In case of microscopical images we primarily need shift and rotation invariance as well as robustness to deformations and gray value variations. Especially random elastic deformations of the training samples seem to be the key concept to train a segmentation network with very few annotated images…"[3]

Image 8. Data Augmentations used to train U-Net on medical data
(Medical data is usually grayscaled). Image by Author
Image 8. Data Augmentations used to train U-Net on medical data (Medical data is usually grayscaled). Image by Author

How to Choose Augmentations for Your Task

3 words here: domain expertise, business need, and common sense – that is what you need to create a good data augmentation pipeline.

Domain Expertise. Depending on your project domain, some data augmentations make sense, and some – just not.

For instance, when working with satellite images, a good choice for augmentations would be cropping, rotations, reflections, and scaling. Because they do not introduce distortion to objects like buildings [4].

Image 9. Examples of data augmentation for satellite images. Image source: [4]
Image 9. Examples of data augmentation for satellite images. Image source: [4]

On the other hand, when working with medical images, a better choice would be color transformations, grid distortion, and elastic transform [4].

Image 10. Examples of data augmentation for medical images. Image source: [4]
Image 10. Examples of data augmentation for medical images. Image source: [4]

Business Need. For instance, you are responsible for developing a Computer Vision system for a self-driving car. Should you use Horizontal Flips within your data augmentation pipeline?

Image 11. Are you going to use Horizontal Flips when developing
a Computer Vision system for a self-driving car? Image by Author
Image 11. Are you going to use Horizontal Flips when developing a Computer Vision system for a self-driving car? Image by Author

Well, it depends. So the better questions: Is your Computer Vision system expected to see upside-down images and expected to be able to segment them?

What happens when the car is in an accident and the cameras are upside down? Will the Computer Vision system stop doing segmentation and turn off? Or will it continue to work and segment objects for some reason?

There is no right answer there, and these questions you probably should discuss with Product Managers.

Common sense. There is such thing as too many augmentations. You should stop with modifications before the image becomes visually unrecognizable. If the human cannot understand what is in the image, how could you expect the model to do so?

Image 12. About half of these examples are augmented too much. Image source: [5]
Image 12. About half of these examples are augmented too much. Image source: [5]

Hint. If you are still not sure, whether using a particular data augmentation is a good idea or not – do the research. Train several models using different data augmentation pipelines, and compare model accuracy on the same validation set.

Image Augmentation in PyTorch and TensorFlow

We will not go into details here, maybe, later I will write a separate post about that. Let’s discuss only several important bullet points.

  • Augmentation in Deep Learning is usually done online. This means, that augmented images are not stored on the hard drive – only the raw dataset is there. When a batch of raw data is loaded, it is augmented on the fly, used for training, and then released from operative memory.
  • Here is how to do Image Augmentation in Pytorch: documentation
  • Here is how to do Image Augmentation in TensorFlow: documentation
  • PyTorch and TensorFlow default implementations augment only images, but not labels. If you need to augment both images and labels, you should write augmentation function on your own, or use third-party libraries, such as Albumentations.

What’s Next

I believe that’s all the theory you need to know about Image Augmentation. With time and practice, you’ll acquire more experience and can easily tell what data augmentations will perform better for a particular task. So, practice, practice, and more practice!

P.S. If you’re interested in tutorials on image augmentation in PyTorch / TensorFlow and on Albumentations – just let me know!


References

[1] "GradientBased Learning Applied to Document Recognition". Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haner. Paper

[2] "ImageNet Classification with Deep Convolutional Neural Networks". Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. Paper

[3] "U-Net: Convolutional Networks for Biomedical Image Segmentation". Olaf Ronneberger, Philipp Fischer, and Thomas Brox. Paper

[4] "Albumentations: fast and flexible image augmentations". Alexander Buslaev, Alex Parinov, Eugene Khvedchenya, Vladimir I. Iglovikov, Alexandr A. Kalinin. Paper

[5] imgaug, Python library for image augmentations. Github


Related Articles