Illustration Photo by Ivan Babydov from Pexels

How to Fine-Tune YOLOv5 on Multiple GPUs

It is generally known that Deep Learning models tend to be sensitive to proper hyper-parameter selection. At the same time, when you search for the best configuration, you want to use maximal resources…

イルカ Borovec
Towards Data Science
5 min readFeb 15, 2022

--

Object detection is one of the advanced Computer Vision (CV) tasks that any ML/AI has not yet fully mastered. In short, the task is composed form localization and identification/classification of given objects in an image.

Object Detection is still quite a hot topic in the research space. Also, there is a relatively high demand for using such AI models in productions for many practical applications such as people detection in a scene or identification of items on a shop’s shelves. This naturally yields a larger collection of model architectures and even more implementations publically shared as open-source projects. One of them is YOLO v5 which claims to have one of the best rations between performance (accuracy/precision) and inference time.

Besides training and inference, this project also offers running hyper-parameters search based on evolution algorithm tuning. In a nutshell, the algorithm proceeds in generations, so it runs a few short training and chooses the best based on their performances. Then these best are blended with some minor random changes and trained again.

Simple screen finetuning

The simplest way to search for hyper-parameters is to run the training with an enabled evolution --evolve <number> argument. But this uses just a single GPU at most, so how about the remaining we have?

python train.py \
--weights yolov5s6.pt \
--data /home/user/Data/gbr-yolov5-train-0.1/dataset.yaml \
--hyp data/hyps/hyp.finetune.yaml \
--epochs 10 \
--batch-size 4 \
--imgsz 3000 \
--device 0 \
--workers 8 \
--single-cls \
--optimizer AdamW \
--evolve 60

Eventually, we can run multiple training but how do we push them to collaborate? Luckily they can share a file with dumped training results from which the new population is drawn. Thanks to the randomness in the next generation, this can seem as exploring a much large population, as the author states.

Illustration from Ultralytics tutorial with permission of Glenn Jocher.

What are the options?

Running multiple training processes while using a different GPU can be set by specifying it in the --device <gpu-index> argument. But how to maintain numerous processes and not lose them if you log out or accidentally brode your internet connection.

nohup

It is the first and more trivial way to keep your process running.

nohup python train.py ... > training.log

Unfortunately, there is no simple way to connect back to the once despatched process, so it is mainly paired with redirecting stream to a file. Then you can constantly refresh the file, but it still does not always play nicely with progress bars that may be stretched over many lines.

screen

This Unix application is convenient in many situations and here for spinning each process in its screen, and you can later traverse among all of them. Inside each screen, you have full access to control or kill the particular process.

screen -S training
python train.py ...

docker

Another way is using docker containers with shared volume. Its advantage is that you can prepare customer docker images with a fixed environment that can eventually run anywhere, even on another server/cluster…

docker build -t yolov5 .
docker run --detach --ipc=host --gpus all -v ~:$(pwd) yolov5 \
python train.py ...
docker ps

The commands above first build a docker image from the project folder. Later it spins a container and immediately detaches it with complete visibility to the GPUs and mapping the user home in the container to your local project folder. The last command is to list all running containers.

Spin multiple collaborative dockers

You would need to create each screen separately and start the particular training process within with screen. This extra work can be easily spoiled by choosing a wrong or already in use device.

An advantage of docker is that we can quickly write a loop to start as many containers as GPUs when spinning any docker container. The only limitation could be sufficient RAM which can also be limited on the docker side with --memory 20g argument. To properly utilize the shared dack of experiments, you need to fix project/name, set exist-ok and resume argument.

for i in 1 2 3 4 5; do
sudo docker run -d --ipc=host --gpus all \
-v ~:/home/jirka \
-v ~/gbr-yolov5/runs:/usr/src/app/runs \
yolov5 \
python /usr/src/app/train.py \
--weights yolov5s6.pt \
--data /home/jirka/gbr-yolov5-train-0.1-only_annotations/dataset.yaml \
--hyp /usr/src/app/data/hyps/hyp.finetune.yaml \
--epochs 10 \
--batch-size 4 \
--imgsz 3000 \
--workers 8 \
--device $i \
--project gbr \
--name search \
--exist-ok \
--resume \
--evolve 60
done

Later, to re-connect to the running container, for example, for monitoring the progress, you can enter the container with:

sudo docker ps
sudo docker attach --sig-proxy=false <container-id>

Then use CTRL+c to detach back to your user terminal.

The last in case you do not want to let the training finish or you need to terminate all the running containers you call:

sudo docker kill $(sudo docker ps -q)

Stay tuned and follow me to learn more!

About the Author

Jirka Borovec holds a Ph.D. in Computer Vision from CTU in Prague. He has been working in Machine Learning and Data Science for a few years in several IT startups and companies. He enjoys exploring interesting world problems, solving them with State-of-the-Art techniques, and developing open-source projects.

--

--

I have been working in ML and DS for a while in a few IT companies. I enjoy exploring interesting world problems and solving them with SOTA techniques…