Hands-on Tutorials

Photos, photos everywhere…
As a child of the 70s, most of my early memories are trapped on old-school photographic media like slides and negatives. While these slides are likely to outlive me physically, they are hard to index and share, and even harder to back up in any meaningful way.
In the early 2010s, I decided to embark on a scanning project to digitise my family’s photographic history, both to preserve the images for the future and to make them easier to share with my family.
While the scanners that I used for the project¹ automatically applied exposure correction and dust removal, the process left me with 16 000 images that needed to be rotated manually.
After rotating a few hundred of the images by hand and almost dying of boredom, I decided to simply archive the scans and wait for a future where either:
- I had the time to rotate them manually, or
- technology had advanced to a point where a computer could do it for me.
Fast forward to 2020, and both futures arrived at once. When the second Australian corona-virus lockdown descended upon Melbourne in late 2020, I suddenly had some indoor time to spare, and decided to take an online AI programming course. Practical Deep Learning for Coders is a free Massive-Open-Online-Course that teaches you how to use a specific type of machine learning called deep learning in just 8 days.
Coming up with a solution
Deep Learning
Deep learning emulates the way our brains work by simulating neural networks using computer code. As a type of supervised machine learning, deep learning requires you to first train the system with some example data that has been labelled manually, after which you can ask it to recognise similar features in new data.
In my case, I wanted to use a few hand-rotated images to teach a neural network to recognise whether images were rotated left, right or upside-down, and feed that information to the open-source image manipulation utility imagemagick to rotate them correctly.
Rotating vs renaming
While my initial design called for a fully automated program that could simply be pointed at a directory containing some images in need of rotation, the fact that my images were in .jpg format meant that I needed a slightly different approach.
Unfortunately, modifying a .jpg image is a bit like dubbing an old-school tape in that some of the image detail gets lost every time you open-modify-and-save the file. As I also wanted crop the images with a different piece of ML magic later in the process, actually rotating them in this step would mean that I would open-modify-and-save each image twice. So I decided a better approach would be to simply identify which way each image pointed in this stage, and then somehow embed that information in the file in a way that did not require opening it and saving it again.
Unlike modifying its contents, simply renaming a .jpg file does not have any impact on its image quality. Thus, embedding the rotation information in the filename provided a good way to carry that information downstream to the rest of the workflow, without degrading the image quality. Renaming instead of rotating the files also had the side benefit of pivoting the resulting program from being a single-purpose image rotator to being a generic content-based file renamer that can be reused for other purposes too.
User-interaction design
For the model training step, the program is provided with a path to a folder containing several subfolders, each containing a set of images that have been selected or manipulated to share some common characteristic.
The files in each subfolder are then fed to the neural network training algorithm, which attempts to extract the characteristics that the images have in common. After completing the training phase, the system saves a model file containing the trained neural network as well as a list of the categories that it has learned.
Saving the neural network to disk after training has two benefits:
- it allows running the renaming tool repeatedly without having to repeat the training if the computer is turned off, and
- it allows running the training step on a cloud service like Google Colab if you do not have access to a GPU-enabled machine at home.

For the file renaming step, the program is provided with a path to a folder containing the images to be renamed. The program loads the model file from disk and then proceeds to present each image in the input folder to the neural network. The result of the neural network calculation, called the inference, is a list of probabilities that a specific image belongs to each of the previously trained categories.
In the case of my image rotation model, a given image might score an 83% probability of being upright, a 7% probability of being rotated left, a 6% probability of being rotated right, and a 4% probability of being upside down. If any of the categories score more than 50%, the program renames the file automatically by prefixing the filename with name of the most probable category.
Preparing my data
To produce my training data, I initially took 200 images from the scanned image archive and rotated them correctly by hand. I tried to make sure that these would include at least some landscapes and some images of people so that the model would learn how to recognise both.
I then duplicated this folder three times and bulk rotated the same set of images in each folder to be either rotated left, right or upside down.
.fastai
└── data
└── familyphotos_small
├── 000
├── 090
├── 180
└── 270
This left me with four folders that I named after the degrees of rotation of the enclosed images, with folder "000" holding the images that were upright, folder "090" holding the same set of images rotated to the right, folder "180" holding the images turned upside down, and folder "270" holding the images rotated to the left.

While I happened to use the names "000", "090", "180" and "270" for my folders, these are just text strings. I could just as easily have called the folders "left", "right", "upright" and "upside down", or A, B, C and D. The program code simply reads the folder names as text strings to label the categories.
Interestingly, the program is also insensitive to what the actual differences between the image files are. I could have just as easily made folders with images containing categories like, "beach", "snow" and "sports", and the neural network would have learnt to identify those. The codebase on which I based my program is part of the pet breeds example in the FastAi code repository, and can even learn to spot the difference between 40 individual breeds of cats and dogs!
Writing the code
These days, neural networks are incredibly easy to incorporate in programs you write yourself. There are a number of competing deep learning frameworks that do the heavy lifting for you and are optimised to take advantage of the matrix multiplication hardware found in NVIDIA GPUs.
Two of the most popular deep learning frameworks, Tensor Flow (originally developed by Google) and Pytorch (originally developed by Facebook), provide a set of basic machine learning subroutines that handle the neural network training and inference.
To make these frameworks easier to use, several add-on open-source projects provide function libraries that wrap around the foundational frameworks. Some examples of these are Keras for Tensor Flow and FastAi for Pytorch.
As the Practical Deep Learning for Coders course that I followed uses Pytorch wrapped in FastAi, I used this combination for my file renaming project.
Many cloud providers, like AWS, Google Colab and Lambda Labs offer virtual machines with the Pytorch library pre-installed. If you are unfamiliar with Linux, or do not have an NVIDIA-based GPU in your computer, these virtual machines offer a fast and cost-effective solution to training your deep learning models in the cloud.
In my case, I used an Ubuntu machine with an NVIDIA 1080ti graphics card that I bought second hand for $650 from a gamer who had upgraded his PC. The FastAi documentation explains how to install both Pytorch as well as the FastAi library in a few easy steps.
Setting up the training data
The first part of the program sets up the environment and loads a list of the images that will be used for training the model.
FastAi uses the concept of a datablock to describe where to find the files that make up a dataset, how they are labelled and organised, and how to present them to the learning algorithm.
For most tasks, you wouldn’t want a trained neural network to be overly sensitive to the rotation of the objects that it is tasked to identify. Imagine you wanted to train a neural network to recognise Christmas trees. If, during training, you only showed it Christmas trees in the upright orientation, like photos that show them set up in living rooms, the network would only learn to recognise them in that orientation. If you subsequently asked the network to recognise an image of a tree lying on its side, like a picture featuring a tree strapped to the roof of a car during transportation, the network would likely fail to recognise the tree entirely.
One of the ways to ensure that a neural network learns to identify objects regardless of their orientation, is to show it the images in your training set multiple times while rotating them by random amounts each time. This way, even if you only have pictures of Christmas trees standing upright, your network would be shown some images of trees leaning to the left, some images of trees leaning to the right, and even some images of trees that are rotated entirely upside down.
This technique is so effective that the FastAi library will add image rotation automatically to your datablock configuration through an option called item transforms. For this project however, automatically adding rotation to the training data would be counterproductive. As we specifically want the network to learn to identify orientation regardless of the objects in the picture, we need to switch off automatic rotation during training in the datablock definition.
If you look closely at the datablock code above, you will see the line item_tfms=RandomResizedCrop(224,min_scale=0.5). This code tells the datablock constructor to only augment data with random amounts of resizing and cropping. This option is used instead of the more commonly used item_tfms=aug_transforms(224,min_scale=0.5), which would also add image rotation augmentation.
If you use this program for any task other than specifically identifying image rotation, using the aug_transforms
option would almost always produce better results. More information about what the aug_transforms
option does can be found in the FastAi documentation.
Training the model
Next comes the code to train the neural network. The FastAi library includes an lr_find
function that helps you find the appropriate learning rate before kicking off the actual training process.
Known as a hyper-parameter because it controls an aspect of how the learning algorithm behaves, the learning rate tells the algorithm what step size to use when increasing or decreasing network weights during training.
If your program takes steps that are too small, your model will take days to make the progress it could be making in minutes. If your program takes steps that are too large, your model will repeatedly overshoot and never find an optimal solution.
The calculated optimal learning rate is copied to step 9 and the model is asked to run through the dataset 25 times. Each of these repetitions, called epochs, took about 2 minutes on my GPU, yielding a total training time of about 45 minutes.
The same library can be instructed to use the CPU if the computer does not have a compatible GPU, at the cost of a longer wait. When I attempted to run the same code on my computer’s CPU, each epoch took about 40 minutes instead of 2 minutes, which would have added up to 16 hours in total for the same training job.
For something that you will only need to do once, an extra half a day is hardly a problem, but being able to run the entire process in 45 minutes instead of 16 hours definitely makes code debugging easier.
After the 25 epochs, the model scored a 2.7% error rate, which meant that it guessed the orientation of the images in the test set correctly 97.3% of the time.
When the error rate plateaus, it is often a good idea to have a look at which images the network still misinterprets, as this may indicate where to look for a problem, or that the network has simply learnt what it could and is now just struggling with examples that even a human would have trouble with.
In FastAi, an overview of the images that are most often misinterpreted can be produced by calling the function interp.plot_top_losses
. Looking at the output reveals that the errors are confined to images that even humans would have difficulty judging correctly:

The first image, for example, shows a 3-year-old me balancing on my dad’s feet while he lay on his back taking the picture toward the ceiling: not the easiest image to guess the rotation of, even looking at it now…
Using the model
With the model trained and saved to disk, we can move on to using it for inference. This section starts with a repeat of the environment setup, as it assumes that the inference step would happen either on a different day, or on a different system than the training step.
After the neural network is loaded from disk, a quick check can be made of the categories that the model is trained to recognise by printing the learn_inf.dls.vocab
variable.
In this case, the loaded model reports that it is trained to recognise categories called "000", "090", "180" and "270". As this matches the directory names of the original image rotation training data, I know I have the right model and I know which labels it will use to prefix my image files.
The program is then supplied with the name of the directory that contains the files to be classified and processes them one by one. The file SN_600_26.jpg, for example, is presented to the network, which predicts that there is a 92% chance that the image is upside down (i.e., that it resembles the images that were in the training directory named "180").
************************************
Old Filename: /media/streicher/2TB_ExFAT/Streicher_Negatives/SN_600_26.jpg
Predicted Class: 180
Predicted Confidence: 0.9214386343955994
New Filename: /media/streicher/2TB_ExFAT/Streicher_Negatives/180_SN_600_26.jpg
************************************
As the prediction confidence is higher than 50%, the program goes ahead and renames the file to 180_SN_600_26.jpg. If the program scored lower than 50% confidence in its prediction, it would have renamed the file UNK_SN_600_26.jpg instead, to signal that the rotation was unknown.
When all was said and done, my program’s entire functional code amounted to just 23 lines. 23 lines to assemble a data set, train a neural network, save it to disk and use it to perform a task that would have taken a human several weeks to complete.
Conclusion
Libraries like Keras and FastAi make the power of deep learning available in easy-to-use functions that even novice programmers can use to solve complex tasks.
The ease of use and low cost of implementation offered by these libraries means that the point where it makes sense to automate an otherwise manual task is tilting rapidly towards Automation.
Relentless advances in GPU technology mean that cheap second-hand gaming PCs can easily be repurposed into powerful data science workstations capable of running the latest AI software.
Free, self-paced, Massive Open Online Courses like Fastai‘s [Deep Learning](https://www.coursera.org/specializations/deep-learning) for Coders and Coursera’s Deep Learning offer the opportunity to teach yourself the basics of machine learning and how to apply it to real world tasks.
The complete source code for this project can be found on my GitHub repository at https://github.com/streicherlouw/DeepRename
