How to Train A Custom Object Detection Model with YOLO v5

Note: We have also published here how to train YOLOv5. In this post, we will walk through how you can train the new YOLO v5 model to recognize your custom objects for your custom use case.

Jacob Solawetz
Towards Data Science

--

Our model inferencing in a preset setting. Let’s see how to make it identify any object!

We will cover the following material and you can jump in wherever you are in the process of creating your object detection model:

  • An Overview of Object Detection
  • About the YOLO v5 Model
  • Collecting Our Training Images
  • Annotating Our Training Images
  • Install YOLO v5 dependencies
  • Download Custom YOLO v5 Object Detection Data
  • Define YOLO v5 Model Configuration and Architecture
  • Train a custom YOLO v5 Detector
  • Evaluate YOLO v5 performance
  • Run YOLO v5 Inference on test images
  • Export Saved YOLO v5 Weights for Future Inference

Resources in this tutorial

Our training data ground truth — public BCCD

An Overview of Object Detection

Object detection is one of the most popular computer vision models due to its versatility. As I wrote in a previous article breaking down mAP:

Object detection models seek to identify the presence of relevant objects in images and classify those objects into relevant classes. For example, in medical images, we want to be able to count the number of red blood cells (RBC), white blood cells (WBC), and platelets in the bloodstream. In order to do this automatically, we need to train an object detection model to recognize each one of those objects and classify them correctly.

Our object detector model will separate the bounding box regression from object classifications in different areas of a connected network.

Object detection first finds boxes around relevant objects and then classifies each object among relevant class types

About the YOLOv5 Model

YOLOv5 is a recent release of the YOLO family of models. YOLO was initially introduced as the first object detection model that combined bounding box prediction and object classification into a single end to end differentiable network. It was written and is maintained in a framework called Darknet. YOLOv5 is the first of the YOLO models to be written in the PyTorch framework and it is much more lightweight and easy to use. That said, YOLOv5 did not make major architectural changes to the network in YOLOv4 and does not outperform YOLOv4 on a common benchmark, the COCO dataset.

I recommend YOLOv5 to you here because I believe it is much easier to get started with and offers you much greater development speed when moving into deployment.

If you want to dive deeper into the YOLO models, please see the following posts:

  • YOLOv5 Updates — Note YOLOv5 has improved in the short period of time since I originally wrote this article — I recommend reading about them here.
  • Comparing YOLOv4 and YOLOv5 (good for comparing performance on creating a custom model detector)
  • Explaining YOLOv4 (explaining model architecture — since not much other than framework changed in YOLOv5)
  • How to Train YOLOv4 (you should use this if you are willing to invest the time and you are seeking to do academic research or seeking to build the most accurate realtime detection model that you can.)

Collecting Our Training Images

In order to get your object detector off the ground, you need to first collect training images. You want to think carefully about the task you are trying to achieve and think ahead of time about the aspects of the task your model may find difficult. I recommend narrowing the domain that your model must handle as much as possible to improve your final model’s accuracy.

In this tutorial’s case, we have limited the scope of our object detector to only detect cells in the bloodstream. This is a narrow domain that is obtainable with current technologies.

To start, I recommend:

  • narrowing your task to only identify 10 or less classes and collecting 50–100 images.
  • try to make sure that the number of objects in each class is evenly distributed.
  • choose objects that are distinguishable. A dataset of mostly cars and only a few jeeps for example will be difficult for your model to master.

And of course, if you just want to learn the new technology, you can choose a number of free object detection datasets. Choose BCCD if you want to follow along directly in the tutorial.

Annotating Our Training Images

To train our object detector, we need to supervise its learning with bounding box annotations. We draw a box around each object that we want the detector to see and label each box with the object class that we would like the detector to predict.

I am annotating an aerial dataset in CVAT

There are many labeling tools (CVAT, LabelImg, VoTT) and large scale solutions (Scale, AWS Ground Truth, . To get started with a free labeling tool here are two useful guides:

As you are drawing your bound boxes, be sure to follow best practices:

  • Label all the way around the object in question
  • Label occluded objects entirely
  • Avoid too much space around the object in question

Ok! Now that we have prepared a dataset we are ready to head into the YOLOv5 training code. Hold on to your dataset, we will soon import it.

Open Concurrently: Colab Notebook To Train YOLOv5.

In Google Colab, you will receive a free GPU. Be sure to File → save a copy in your drive. Then you will be able to edit the code.

Installing the YOLOv5 Environment

To start off with YOLOv5 we first clone the YOLOv5 repository and install dependencies. This will set up our programming environment to be ready to running object detection training and inference commands.

!git clone https://github.com/ultralytics/yolov5  # clone repo
!pip install -U -r yolov5/requirements.txt # install dependencies
%cd /content/yolov5

Then, we can take a look at our training environment provided to us for free from Google Colab.

import torch
from IPython.display import Image # for displaying images
from utils.google_utils import gdrive_download # for downloading models/datasets
print('torch %s %s' % (torch.__version__, torch.cuda.get_device_properties(0) if torch.cuda.is_available() else 'CPU'))

It is likely that you will receive a Tesla P100 GPU from Google Colab. Here is what I received:

torch 1.5.0+cu101 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', major=6, minor=0, total_memory=16280MB, multi_processor_count=56)

The GPU will allow us to accelerate training time. Colab is also nice in that it come preinstalled with torch and cuda. If you are attempting this tutorial on local, there may be additional steps to take to set up YOLOv5.

Download Custom YOLOv5 Object Detection Data

In this tutorial we will download custom object detection data in YOLOv5 format from Roboflow. You can follow along with the public blood cell dataset or upload your own dataset.

Once you have labeled data, to get move your data into Roboflow, create a free account and then you can drag your dataset in in any format: (VOC XML, COCO JSON, TensorFlow Object Detection CSV, etc).

Once uploaded you can choose preprocessing and augmentation steps:

The settings chosen for the BCCD example dataset

Then, click Generate and Download and you will be able to choose YOLOv5 PyTorch format.

Select “YOLO v5 PyTorch”

When prompted, be sure to select “Show Code Snippet.” This will output a download curl script so you can easily port your data into Colab in the proper format.

curl -L "https://public.roboflow.ai/ds/YOUR-LINK-HERE" > roboflow.zip; unzip roboflow.zip; rm roboflow.zip

Downloading in Colab…

Download a custom object detection dataset in YOLOv5 format

The export creates a YOLOv5 .yaml file called data.yaml specifying the location of a YOLOv5 images folder, a YOLOv5 labels folder, and information on our custom classes.

Define YOLOv5 Model Configuration and Architecture

Next we write a model configuration file for our custom object detector. For this tutorial, we chose the smallest, fastest base model of YOLOv5. You have the option to pick from other YOLOv5 models including:

  • YOLOv5s
  • YOLOv5m
  • YOLOv5l
  • YOLOv5x

You can also edit the structure of the network in this step, though rarely will you need to do this. Here is the YOLOv5 model configuration file, which we term custom_yolov5s.yaml:

nc: 3
depth_multiple: 0.33
width_multiple: 0.50
anchors:
- [10,13, 16,30, 33,23]
- [30,61, 62,45, 59,119]
- [116,90, 156,198, 373,326]
backbone:
[[-1, 1, Focus, [64, 3]],
[-1, 1, Conv, [128, 3, 2]],
[-1, 3, Bottleneck, [128]],
[-1, 1, Conv, [256, 3, 2]],
[-1, 9, BottleneckCSP, [256]],
[-1, 1, Conv, [512, 3, 2]],
[-1, 9, BottleneckCSP, [512]],
[-1, 1, Conv, [1024, 3, 2]],
[-1, 1, SPP, [1024, [5, 9, 13]]],
[-1, 6, BottleneckCSP, [1024]],
]
head:
[[-1, 3, BottleneckCSP, [1024, False]],
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]],
[-2, 1, nn.Upsample, [None, 2, "nearest"]],
[[-1, 6], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]],
[-1, 3, BottleneckCSP, [512, False]],
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]],
[-2, 1, nn.Upsample, [None, 2, "nearest"]],
[[-1, 4], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-1, 3, BottleneckCSP, [256, False]],
[-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]],
[[], 1, Detect, [nc, anchors]],
]

Training Custom YOLOv5 Detector

With our data.yaml and custom_yolov5s.yaml files ready to go we are ready to train!

To kick off training we running the training command with the following options:

  • img: define input image size
  • batch: determine batch size
  • epochs: define the number of training epochs. (Note: often, 3000+ are common here!)
  • data: set the path to our yaml file
  • cfg: specify our model configuration
  • weights: specify a custom path to weights. (Note: you can download weights from the Ultralytics Google Drive folder)
  • name: result names
  • nosave: only save the final checkpoint
  • cache: cache images for faster training

And run the training command:

Training a custom YOLOv5 detector. It trains quickly!

During training, you want to be watching the mAP@0.5 to see how your detector is learning to detect on your validation set, higher is better! — see this post on breaking down mAP.

Evaluate Custom YOLOv5 Detector Performance

Now that we have completed training, we can evaluate how well the training procedure performed by looking at the validation metrics. The training script will drop tensorboard logs in runs. We visualize those here:

Visualizing tensorboard results on our custom dataset

And if you can’t visualize Tensorboard for whatever reason the results can also be plotted with utils.plot_results and saving a result.png.

Training plots in .png format

I stopped training a little early here. You want to take the trained model weights at the point where the validation mAP reaches its highest.

Run YOLOv5 Inference on Test Images

Now we take our trained model and make inference on test images. After training has completed model weights will save in weights/. For inference we invoke those weights along with a conf specifying model confidence (higher confidence required makes less predictions), and a inference source. source can accept a directory of images, individual images, video files, and also a device's webcam port. For source, I have moved our test/*jpg to test_infer/.

!python detect.py --weights weights/last_yolov5s_custom.pt --img 416 --conf 0.4 --source ../test_infer

The inference time is extremely fast. On our Tesla P100, the YOLOv5s is hitting 7ms per image. This bodes well for deploying to a smaller GPU like a Jetson Nano (which costs only $100).

Inference on YOLOv5s occurring at 142 FPS (.007s/image)

Finally, we visualize our detectors inferences on test images.

YOLOv5 inference on test images. It can also easily infer on video and webcam.

Export Saved YOLOv5 Weights for Future Inference

Now that our custom YOLOv5 object detector has been verified, we might want to take the weights out of Colab for use on a live computer vision task. To do so we import a Google Drive module and send them out.

from google.colab import drive
drive.mount('/content/gdrive')
%cp /content/yolov5/weights/last_yolov5s_custom.pt /content/gdrive/My\ Drive

Conclusion

We hoped you enjoyed training your custom YOLO v5 object detector!

YOLO v5 is lightweight and extremely easy to use. YOLO v5 trains quickly, inferences quickly, and performs well.

Let’s get it out there!

Next Steps: Stay tuned for future tutorials and how to deploy your new model to production.

--

--

Machine Learning @ Roboflow — building tools and artifacts like this one to help practitioners solve computer vision.