
This tutorial is meant to explore how one could create synthetic data in order to train a model for object detection
The training itself is based on Jacob Solawetz Tutorial on Training custom objects with YOLOv5
And so I will be using the YOLOv5 repository by Ultralytics.
This tutorial will guide you through the steps needed to create the synthetic data and show how you can then train it with YOLOv5 in order to work on real images.
If you would like to access the full script or download the dataset you can find it all in this Git repository.
As an example, I will be training the classifier to detect oranges on a tree.
What is synthetic data and why do we need it?
In a perfect world data sets would be abundant, and we could go ahead and train our models on real images in order to make better predictions. However in reality most of an ML coder’s time will be spent on gathering data and annotating it correctly.
Bigger companies such as Google/Facebook/Amazon/Apple and even mid-sized companies that have the resources can go ahead and launch such projects. First, because they have the data itself – in our case the images. And second, because they have the ability to annotate the data in order to create an error-free dataset. But even these companies can not be sure that the data is correctly annotated and so the normal process would be to annotate each image more than once and then look for discrepancies.
For a small company or just someone like myself trying to build a ML project from the ground up, this is too large of a task. And so I find myself using one of the datasets available online. There are many great datasets out there and in some instances creating a new dataset is really not needed. But after a while, I started to realize all of my projects were not doing exactly what I want them to do, as they were trained on a different kind of data.
For this project, I wanted to count oranges on a tree and could not find a suitable dataset.
So I started downloading images and attempted to annotate them.
For example, I started with the following image taken from Wikimedia.

This is not a full tree and still, it was clear to me very quickly I will not be able to annotate this image.
I figured I can try to choose simple images but I was not sure how they would stand up when training the network.

Moving on to more complex images such as this one (also from Wikimedia), I understood this was no longer an option and started playing around with different datasets that I could find online.
This was not what I wanted to do, and quickly I was getting discouraged.
This is when I found myself searching for a different solution and watched Adam Kellys (Immersive Limit) great tutorial where he trained a network to recognize weeds using synthetic data. You can watch his video here AI Weed Detector.
Although I was not over-impressed by the results (He seems to be) as I needed better results I realized if I was going to continue the project this was the way to go.
So geared with the new keywords I need, I started my search throughout google and came along the following paper by Maryam Rahnemoonfar and Clay Sheppard – Deep Count: Fruit Counting Based on Deep Simulated Learning
This paper seemed to be exactly what I was after they went ahead and trained their network on data that was generated without the need to get up and start taking images. And even more, they were trying to do something very similar to what I was doing.
They did not provide the dataset and looking at their images I thought I can do better.
In general, they proposed the following steps

But it seemed to me that they were not taking into account the fact that the fruit may be obstructed by leaves and they also did not compute the bounding boxes.
Still, as I was encouraged by the fact that it worked I went out on creating my own synthetic data.
Steps
I realized I needed to do the following in order for the network to be able to count real data
- Gather information regarding the backgrounds I may encounter
- Create a background image that is constructed from these colors
- Create circles of varying sizes to replace the oranges
- Create a foreground from leaves colors that will obstruct some of the oranges
So to do this I wrote a simple Python program that will create the images for me – (The code was simplified so the reader can easily read through it if you want to download the full code check out my Git)
from PIL import Image, ImageDraw
from PIL import ImageFilter
from PIL import ImageColor
from pascal import PascalVOC, PascalObject, BndBox, size_block
from pathlib import Path
import cv2
import numpy as np
import random
We start with some imports I am using PIL (pillow) in order to create the images and pascal (PascalVoc) in order to save the information as annotations.
I downloaded a few images of orange trees from the web and started to sample pixels. Their color was then saved into one of three arrays
leaves, sky, ground
At this point, I did not sample the oranges as I used a different approach for them
def prepare_colors():
txt_leaves = ['#608d2a', '#a8b146', '#ccf0bc']
txt_sky = ['#e9e3c3', '#99949e', '#9bb5cf']
txt_ground = ['#3d2c15', '#dfcba6']
bg_colors = []
fg_colors = []
for t in txt_leaves:
bg_colors.append(ImageColor.getrgb(t))
fg_colors.append(ImageColor.getrgb(t))
for t in txt_sky:
bg_colors.append(ImageColor.getrgb(t))
for t in txt_ground:
bg_colors.append(ImageColor.getrgb(t))
return bg_colors, fg_colors
This was simple enough but it’s worth mentioning I sampled more colors than in the code above (you can find all the colors I sampled in the Git) but I cut back for clarity
The next step was to write a function that would randomly place colors on a layer
def plot_random_color_blobs(draw, colors, count, mins, maxs):
for i in range(count):
x = random.randint(0,width)
y = random.randint(0,height)
w = random.randint(mins,maxs)
l = random.randint(mins,maxs)
c = bg_colors[random.randint(0,len(colors)-1)]
draw.ellipse((x, y, x+w, y+l), fill=c, outline=None)
This function receives an ImageDraw.Draw object from PIL and adds count amount of ellipses in random spots
The result for a layer may look something like this Assuming we are using the colors red green and blue, and a high number of count (in this case 1500)

so now its time to construct the background layer
def create_bg(colors, width, height):
im_bg = Image.new('RGBA', (width, height),
ImageColor.getrgb('#7FCBFDFF'))
draw_bg = ImageDraw.Draw(im_bg)
plot_random_blobs(draw_bg, colors, 1500, 10, 25)
im_bg = im_bg.filter(ImageFilter.MedianFilter(size=9))
return im_bg
As you can see the image is created with a light blue background in order to eliminate any areas that were not targeted by the random ellipses.
After plotting the blobs I made sure to blur the image using a blur filter so the result was something like this

This was starting to look like I was in the correct direction.
But I was worried the network would learn how to distinguish between blurry parts of the image and the nonblurred part in our case the fruit, so I dialed it down and moved to a MedianFilter which allowed merging colors but still preserving the overall sharpness of the background.
Next, I created the foreground layer – this layer will be placed over the fruit layer in order to mask out some of the fruit
def create_fg(colors, width, height):
im_fg = Image.new('RGBA', (width, height), (0, 0, 0, 0))
draw_fg = ImageDraw.Draw(im_fg)
plot_random_el(draw_fg, colors, 40, 10, 25)
im_fg = im_fg.filter(ImageFilter.MedianFilter(size=9))
return im_fg
As you can see this function is almost identical except for the fact I set the background to transparent and used a much lower number of blobs (40) to make sure most of the fruit can be seen
Last it was time to create the fruit layer
def plot_random_fruit(color_range, count, width, height, mins,
maxs):
im_fruit = Image.new('RGBA', (width, height), (0, 0, 0, 0))
draw_fruit = ImageDraw.Draw(im_fruit)
fruit_info = []
for i in range(count):
x = random.randint(0,width-10)
y = random.randint(0,height-10)
w = random.randint(mins,maxs)
c = (random.randint(color_range[0][0],color_range[0][1]),
random.randint(color_range[1][0], color_range[1][1]),
random.randint(color_range[2][0], color_range[2][1]))
fruit_info.append([x, y, w, w, c])
draw_fruit.ellipse((x, y, x+w, y+w), fill=c, outline=None)
return im_fruit, fruit_info
Similar to the other layers this layer plots fruit in random places around the image. However, this layer differs by four points.
- The plotting is always a circle, as this is the more common shape of the fruit.
- It uses a range of colors (all orange variants in my case) to choose a color randomly.
- No filter is assigned to the image.
- The image stores the bounding boxes of the fruit as well as its color in an array that is returned as fruit info.
def create_layered_image(im_bg, im_fruit, im_fg):
img = im_bg.copy()
img.paste(im_fruit, (0, 0), im_fruit)
img.paste(im_fg, (0, 0), im_fg)
return img
This final function simply pastes the images one on the other
And the result would like something like this

Clearly, this does not look like a fruit tree but it includes the correct colors and situations the network might need to see in order to be able to train correctly.
The next part was to create an annotation file, I decided to use the PascalObject as I was more familiar with it, but any other annotation could have worked.
def create_annotation(img, fruit_info, obj_name,
img_name ,ann_name):
pobjs = []
for i in range(len(fruit_info)):
pct = 0
circle = fruit_info[i]
color = circle[4]
for i in range(circle[2]):
if (circle[0]+i >= width):
continue;
for j in range(circle[3]):
if (circle[1]+j >= height):
continue;
r = img.getpixel((circle[0]+i, circle[1]+j))
if (r[0] == color[0]):
pct = pct +1
diffculty = pct/(circle[2]*circle[3])
if (diffculty > 0.1):
dif = True
if (diffculty > 0.4):
dif = False
pobjs.append(
PascalObject(obj_name, "", truncated=False,
difficult=dif,
bndbox=BndBox(circle[0], circle[1],
circle[0]+circle[2],
circle[1]+circle[3])))
pascal_ann = PascalVOC(img_name,
size=size_block(width, height, 3),
objects=pobjs)
pascal_ann.save(ann_name)
Before adding the bounding box to the annotation the function checks how much of the fruit is not obstructed by the foreground. This allows the network to decide if a mistake in counting this fruit is critical.
However the difficulty parameter in the PascalObject is a boolean so I choose the following 3 cutoffs if less the 10% of the fruit is seen, I simply omit the information, and any fruit that is obstructed by more than 40% is considered difficult.
Putting it all together I was now ready to start generating the images
def create_training_image(counter, bg_colors, fg_colors,
fruit_color_range):
fruit_count = random.randint(0, 20)
ext = '{}_{}'.format(counter, fruit_count)
img_name = '{}/fruit_{}.png'.format(img_path, ext)
ann_name = '{}/ann_{}.xml'.format(ann_path, ext)
im_bg = create_bg(bg_colors, width, height)
im_fg = create_fg(fg_colors, width, height)
im_fruit, fruit_info = plot_random_fruit(fruit_color_range,
fruit_count, width,
height, 10, 25)
img = create_layered_image(im_bg, im_fruit, im_fg)
#create the anootation File
create_annotation(img, fruit_info, 'oranges',
img_name, ann_name)
img.save(img_name)
return img, img_name, ann_name
This function should now be self-explanatory and creates a single image and its annotation file.
And now for the production of many images, I added
def create_training_set(num, start_at=0):
bg_colors, fg_colors = prepare_colors()
fruit_color_range = [[180,230],[50,130],[0,5]]
for i in range(num):
create_training_image(num+start_at, bg_colors,
fg_colors, fruit_color_range)
And the results were



Creating the DataSet
Once I had the images and annotations ready I followed Solawetz Tutorial and used Roboflow to turn it into a readable dataset for YOLOv5 – as Roboflow’s max amount of images for free usage was 1000 images I made sure not to create too many images, in the future I will try to overcome this by simply creating the dataset in code, but for now it should do.
Following the simple 5 steps in the Roboflow setup, I was able to construct the dataset within minutes.
I choose not to create augmentations at this point but ended up using them for zoom purposes later on to allow the network to detect bigger objects.
Training the Network
After setting up the environment according to Solawetz’s tutorial training was reduced to a single line of code
%%time
%cd /content/yolov5/
!python train.py --img 416 --batch 16 --epochs 100 --data '../data.yaml' --cfg ./models/custom_yolov5s.yaml --weights '' --name yolov5s_results --cache
To train your own model I suggest checking out Roboflow’s notebook.
I originally wrote my own notebook that can be found in the Git repository, but after reviewing it, I found that the above notebook is simply written better so Kudos to them :).
As I am more accustomed to using Tensorflow and less familiar with PyTorch, and since the object of the training was to test the Synthetic Data, I opted not to change the training code and not to try to adjust it. In the future, I plan to explore this as well.
After only a few minutes (running on a GPU using Collab) and running just 100 epochs, I had the training results with a precision of almost 91%. In fact, the network was able to converge to such precision in less than 40 epochs. Amazing.
But of course, this was all on the synthetic data.
It was time to test real data.
And as always pictures are worth 1000 words.


Caveats
Not all worked perfectly.
Images that had large oranges that took a large portion of the screen did not work well, I guessed this was because all oranges in my training set were relatively small. I retrained the dataset using Roboflows Augmentations with the zoom augmentation and got better results. But in further testing, I plan to create images that include a wider range of blob and orange sizes.
The colors picked for the background may be crucial – in my first test I did not add any general colors that can appear in images such as skin tone and it picked them up as oranges in some instances. But again after adding some skin tones to the background colors this problem seemed to go away.
Conclusion
In conclusion, using synthetic data proved to be useful and easy.
I was easily able to run it on a video and as the Yolov5s is so quick it can actually run in real-time.
If you want to download the full code and data set you can check out my Git
I am now moving on to the next step which will be tracking these oranges through many different frames to allow stronger validation throughout different views and eventually Crop estimation.
Kind Regards
Amizorach Gross