Augmenting Your Training Data for Image Recognition

Expand your machine learning training dataset using the Augmentor library

Littleton Riggins
Towards Data Science

--

Introduction To Image Recognition

Image recognition technology has emerged to be one of the most popular applications for machine learning in recent years. Image recognition (IR) is often used to detect certain people/places/things inside another, larger image. IR has proven useful for tasks like brand detection, crime prevention, and search result optimization.

IR works through rigorous training of a machine learning model. Every object that needs to be detected is shown to the model in a series of images that act as the training data. The training data consists of many different pictures of the same object in different angles and surroundings, isolating the object of interest.

Training Data for “Apples” from Open Images

Models get stronger the more varied and numerous the training data. For common objects, such as apples, there are a plethora of training images available so that anyone with a search engine can easily generate the proper amount of training data. Obtaining copious amounts of training data is a necessity because it can take thousands of images to properly train an IR machine learning model. But a problem arises when the objects of interest are more specialized and lacking in reference images - how does one obtain enough training data?

Problem Statement

Suppose we wanted to build an application that could detect a certain Magic: The Gathering card in a user’s photo —

Thieving Otter — Art by Jakub Kasper

We can get the original source image from a Magic: The Gathering database such as Scryfall, but this image alone is not enough for training. We will need several variations of this card image, which can be especially difficult to get if we don’t have a physical copy of this card. Instead of trying to buy the card, take several pictures with our camera, and then upload and organize those images, we can take a more programmatic approach.

Augmenting The Data

Data augmentation is one way to generate a sufficient supply of training images. Starting with a single image, we can apply pre-built image manipulation algorithms to our source and generate new images that can be used in the training process. If you’re familiar with Python, you can use the Augmentor library to generate new images. Augmentor allows you to specify the number and type of manipulations you want for each of your source images and then save them as new local files.

We can use Augmentor to generate some new Thieving Otters with the following Python snippet —

With thieving_otter.jpg in the “input” folder, this script will generate 100 randomized and slightly different images that represent the source at various angles and under various lighting conditions.

Processing Pipeline

Augmentor works as a pipeline that processes images in the input folder a certain number of times. In our example on line 35, we will generate 100 images. During each processing run, Augmentor examines the operations assigned to it and will have a chance of applying that operation to the source image. A new image may have all or none of the operations executed upon it. Each operation also has a scale for which it will be applied. For example on lines 19 & 20, we define that the “rotate_without_crop” operation can turn the source image no more than 10 degrees in either direction.

Choosing Operations

The Augmentor library has a plethora of operations that you can apply to your image, but which operations you need will depend upon your use case. The idea is to make manipulations that will look similar to expected images from your users.

Suppose the images we’re trying to scan are coming from people’s phones, and are coming from a hypothetical card scanning app. Users are told by the app to take their phone, point it at some Magic cards, hold still, and take a picture. If this is the scenario we’ll be operating under, we can be reasonably sure we’ll have relatively clean pictures to scan through. As such, we can use the following operations to mimic these conditions —

The combination of these operations at relatively low thresholds can be used to reproduce a “home lighting” type scenario. We’ve left out operators that trim or crop the image because nobody will ever have a “piece of a card” to scan, the whole image will always be present. In the example, we’ve also made the upper and lower bounds of each operation relatively small, since we’re expecting the cards will be scanned in a controlled environment. We don’t expect users to be taking pictures of the cards upside down, with a bright sun-lamp glaring at the cards, or at an extreme angle. If these were possible conditions for our use case, we could change the values on our operations quite easily.

Our New Images

After running our script, the “output” folder will have 100 randomized images that we can feed into any machine learning model.

A sampling of output images from augmentor_example.py script

Once these images are fed to the machine learning tool of your choice, you can run the script again and again comparing the results and tweaking your mutations until you get a high match percentage with your own scans. For example, some of the above images are skewed quite heavily, so we might consider lowering the skew values since we expect our users to be scanning cards that are laying flat on a table. Making this change would likely increase the match rate of our image scanner.

Final Thoughts

There are many different ways to gather training data for machine learning, and data augmentation is only one such method. The Augmentor Python library is easy to use and a great tool for generating a large number of augmented images. Experiment with the operations and build whatever fits your needs. Happy coding!

Magic: The Gathering and any related content or imagery is owned by Wizards of the Coast and was used in accordance with Wizards’ Fan Content Policy.

--

--