Training Yolo for Object Detection in PyTorch with Your Custom Dataset — The Simple Way

Chris Fotache
Towards Data Science
5 min readOct 9, 2019

--

In a previous story, I showed how to do object detection and tracking using the pre-trained Yolo network. Now I want to show you how to re-train Yolo with a custom dataset made of your own images.

For this story, I’ll use my own example of training an object detector for the DARPA SubT Challenge. The challenge involved detecting 9 different objects inside a tunnel network — and they are very specific objects, not the regular one included in the standard Yolo model. For this example, I’ll assume there are just 3 object classes.

There are several ways of doing it, and you can define the location of the images, configuration, annotations and other data files in the training script, according to the official specifications, but here’s a simpler and well organized way of doing it, still following Yolo’s best practices.

Folder Structure

First of all, you need to get all your training images together, using this folder structure (folder names are in italics):

Main Folder
--- data
--- dataset name
--- images
--- img1.jpg
--- img2.jpg
..........
--- labels
--- img1.txt
--- img2.txt
..........
--- train.txt
--- val.txt

Now let’s see what those files are supposed to look like (besides the image files which are obvious).

First, the annotation files. You need one .txt file for each image (same name, different extension, separate folder). Each file contains just one row, in this format:

class x y width height

So, for example one file (for class 1) could be:

1 0.351466 0.427083 0.367168 0.570486

Note that the coordinates and size are as a proportion of the entire image size. For example, if the file is 600x600px, a coordinate of (200,300) would be represented as (0.333333, 0.5).

The train.txt and val.txt contain the lists of training and validation images, one per line, with full path. For examples, the first 2 lines of such a file on my system are:

/datadrive/Alphapilot/data/alpha/images/IMG_6723.JPG
/datadrive/Alphapilot/data/alpha/images/IMG_6682.JPG

I used the following program to generate the 2 files, based on a 90% training / 10% validation split:

import glob
import os
import numpy as np
import sys
current_dir = "./data/artifacts/images"
split_pct = 10 # 10% validation set
file_train = open("data/artifacts/train.txt", "w")
file_val = open("data/artifacts/val.txt", "w")
counter = 1
index_test = round(100 / split_pct)
for fullpath in glob.iglob(os.path.join(current_dir, "*.JPG")):
title, ext = os.path.splitext(os.path.basename(fullpath))
if counter == index_test:
counter = 1
file_val.write(current_dir + "/" + title + '.JPG' + "\n")
else:
file_train.write(current_dir + "/" + title + '.JPG' + "\n")
counter = counter + 1
file_train.close()
file_val.close()

Creating Annotation Files

Now you are going to ask how to get the .txt annotation files. Well, I’m using a modified version of the BBOX tool, which is included in the Github repo. The way it works is this: You place the training images in different folders for each class. Look under the LoadDir function to figure out the folder structure (or modify for yours) — in my example, I have two folders, “forboxing” for images, and “newlabels” for the generated annotations, and under “forboxing” there are subfolders for each class (“0”, “1”, etc). You’ll have to modify the self.imgclass attribute at the top of the file, and run it separately for each class. This procedure makes everything a bit quicker. Using the tool itself is very intuitive — you just draw a square box around the object in each frame, then go to the next.

Config Files

Now for the config files in the config/ folder. First, coco.data would look like this:

classes = 3
train=data/alpha/train.txt
valid=data/alpha/val.txt
names=config/coco.names
backup=backup/

I think it’s quite self-explanatory. The backup parameter is not used but seems to be required. The coco.names file is very simple, it’s supposed to list, one per line, the names of the classes (for the annotations file, the first one corresponds to 0, next to 1, etc). In my case, the file contains the three classes:

DRILL
EXTINGUISHER
RANDY

Now, the most important of the configuration files is yolov3.cfg. It’s a big file, but here are the main things you have to change:

In the first [net] section, adjust the batch value and subdivisions to fit your GPU memory. The larger the batch size, the better and faster the training, but the more memory it will take. For an Nvidia GPU with 11Gb memory, a batch of 16 and 1 subdivision is good. Also here you can adjust the learning_rate.

Now, the most important (because if they’re not set correctly, your training program will fail) are the classes and final layerfilters values. And you have to do it in three different places in the file. If you search the file, you’ll find 3 [yolo] sections. Inside that section, set classes to the number of classes in your model. You also have to change the filters value in the [convolutional] section right above [yolo]. That value is equal to:

filters = (classes + 5) x 3

So for my 3 classes, there are 24 filters. Be careful that this is only true for Yolo V3. V2 has a different formula.

Running the Training Script

And now you’re ready for the actual training! The training program (from the Github repo) is the standard Yolo script. In the config section, set your desired number of epochs, make sure the folder paths are correct, and then run. Depending on the number of training images and your hardware, this can take between a couple of hours and more than a day.

The script will save after each epoch… grab the last file and put it back in your config folder, and then it’s ready to do object detection on your custom dataset! Details on how to run the detection functions are in the previous story, Object detection and tracking in PyTorch.

All the code referenced in this story is available in my Github repo.

Chris Fotache is an AI researcher with CYNET.ai based in New Jersey. He covers topics related to artificial intelligence in our life, Python programming, machine learning, computer vision, natural language processing, robotics and more.

--

--

AI researcher at CYNET.ai, writing about artificial intelligence, Python programming, machine learning, computer vision, robotics, natural language processing