In Part 1, we explored how Google’s (human) Data Labelling Service could assist in image labelling.
Google Cloud AI Platform: Human Data labeling-as-a-Service Part 1
In this article, we will be exploring how this service can assist with video labelling. We include a hands-on example for you to follow along.
Google Cloud AI Platform
Google AI Platform is a suite of services on Google Cloud specifically targeted at the building, deploying, and managing of machine learning models in the cloud.
If you are not familiar with Google AI Platform, you may want to read our first article in the series, where we present an overview of what’s available on the platform.
Google Cloud AI Platform: Hyper-Accessible AI & Machine Learning
What do we mean by labelling?
Labelling is a Data Science activity to support the training of supervised machine learning models. The term supervised is a direct reference to how these models rely on accurately labelled training data in order for them to learn.
Labelling can be applied to many different types of training media; from images and video to audio and free-text.
Labelling and unsupervised models
Unsupervised models, such as a K-means clustering, do not require labelling because, in a sense, unsupervised models discover (i.e. learn) these labels for themselves.
Why labelling is important
When training supervised Machine Learning models, labelling is one of the first activities we carry out, and therefore, takes place very early on in the ML development lifecycle.
As a data science team, when training ML models we often revert back to this mantra:
"The accuracy of a machine learning model is only as good as it’s (training) data."
Data Labelling Service on Google Cloud
Google’s Data Labelling Service provides access to human labellers to generate highly accurate labels for training data.
Currently in public-beta, the service formed part of the original AI Platform launch in April 2019.
The labelling service currently supports 3 media types:
- Images (covered in Part 1 of this series)
- Video
- Free-text (we will cover in Part 3 of this series)
We will now explore video labelling, including an easy to follow, technical walkthrough.
1. Video labelling
Training machine learning models to analyse video is a common use-case we see in the field. Google’s Data Labelling Service supports four of the common labelling tasks we encounter:
Classification
This is probably the simplest and a good place to start out if you want to try the service for the first time. With video classification, you provide one or more labels that you wish the human labellers to apply to your training set of videos.
Here is a simple example that we will use to showcase video classification. We are fed a series of short video-clips from various garden bird feeders. Our task – classify the videos as containing birds feeding, or not.


- First, upload your videos to be labelled to a Google Cloud Storage bucket. Here, you can see two videos uploaded to a bucket called birds-feeding. In one video it contains birds feeding, in the second, there is just an empty bird feeder.
Tip: Your video format must be MP4, with H.264, H.265, or MPEG4 codec. The current video size limit for the service is 2GB.

Tip: This applies to any video labelling. Make sure your training videos are as close as possible to the videos the model will be predicting on. For example, think about resolution, colour/black and white, avoid commonalities in backgrounds that can introduce bias into the model. We often analyse CCTV footage and this is often very low resolution, and often has unusual colour balance, especially if filmed at night.
- Next you need to create a .csv file that contains the URIs to your videos in your GCS bucket. So, in my example, my CSV just contains two lines:
gs://birds-feeding/BirdsFeeding.mp4
gs://birds-feeding/NoBirdsFeeding.mp4
Upload this .csv file to the same storage bucket that contains your video files.
Tip: Make sure to give your csv file a lowercase file extension of .csv (not .CSV), otherwise the next step will fail.
- Next we need to create data labelling dataset. This dataset is simply a resource that the human labellers will refer to for your lablleing request. It contains your videos, video .CSV file and your instructions (I’ll cover instructions shortly).
To create a dataset, first, navigate to the Data Labelling Service UI in the Google Cloud console. You should see a screen like this (enable the API if you are prompted).

Click the CREATE button. Then, enter the details of your new dataset in the panel, shown below. Note I have to choose the CSV file in my birds-feeding GCS bucket. Finally, click CREATE.

You should now see an entry for your dataset, as shown here. Note the status which indicates my videos are being uploaded.

Once complete, you should receive an email notification from Google Cloud. Here’s my email:

I now see my dataset with both of my videos imported.

- Next we create a label set, to define the label options we want the human labels to label our videos with. In our example, we need two labels: "Birds Feeding" and "No Birds Feeding".
To do this, from the Data Labelling UI click Label Sets from the menu on the left and then click CREATE at the top of the screen. Then, in the panel on the right, simply define your labels. Here are mine:

Tip #1. We always include a clear, unambiguous description with each label. Labellers will see these, and therefore these supplement your instructions!
Tip #2. For multi-classification use-cases (such as looking for birds or squirrels) we always include a label for "Both" (video has a bird and a squirrel at the feeder) and "None" (video has no birds or squirrel). As well as, of course, "Birds feeding" and "Squirrels feeding" 🙂
- The penultimate step is to create an instruction document. This is a PDF document that helps human labellers to understand what you want them to label. Now, for our classification task this is really simple and just requires a few example videos (positive and negative) under each of our labels. I would highlight in here any possible ambiguities – for my bird feeding example, I’d make it clear that a bird at the feeder (and not necessarily actually feeding) constitutes a positive classification of "Birds Feeding".
I’d include some examples of some edge cases, just to make sure this is clear. For example, a bird flying past in a video, but not landing on the feeder, would be classified as "No Birds Feeding".
Google has a lot of detailed guidance on writing good instruction documents. We would recommend reading this, as if these instructions aren’t clear, your labellers may not do what you want them to do!
Creating instructions for human labelers | Data Labeling Service
Once your PDF document is ready, upload it to your GCS bucket that contains your videos and .CSV file. Next, click Instructions in the UI and in the panel on the right just navigate to your instructions document. Here’s what mine looks like:

Click CREATE INSTRUCTIONS and wait for an email confirmation before continuing (it usually only takes a few seconds).
- We are now ready to submit our labelling task! This is really straightforward – navigate to your dataset (see step 3) and click on the dataset name. You will see your dataset contents. Here are mine:

Click on CREATE LABELLING TASK at the top. In the panel that’s displayed enter:
- Enter a name for the annotated dataset (this is the dataset that will contain your human labelled videos). Add a description too.
- From the objective dropdown, I choose Classification.
- From the label sets dropdown, I choose my Birds-feeding label set.
- From the instructions dropdown, I choose my instructions PDF from my GCS bucket.
- Next, I choose how many people I want to review the labelling of each video. The default is one, which we typically choose. However, if your tasks demand extra rigour you can increase this to three or five. A voting system is used if labellers disagree.
- Lastly, accept the terms and conditions and click Create.
Congratulations! Your videos will now be labelled by skilled human labellers. Pretty cool, hey?
Reviewing your labelled videos
Once the labelling process is complete, you will receive an email confirmation. To review your labelled videos, first, navigate to your dataset. Here is mine:

Click on the LABELLED DATA SETS link and you will see your labelled videos. Exporting labelled videos is a really useful feature, especially if your video training set is large. Options are to exports as a CSV or json file. To export your videos:
- From the LABELLED DATA SETS page click EXPORT
- In the dialogue that appears, select a target GCS folder and the format (we usually choose CSV).
Tip: Both CSV and json exports can easily be viewed in BigQuery (via a federated external table) for further analysis.
And that concludes my walkthrough.
How many training videos do I need?
This is a common question we are asked. Typically we aim for circa 1,000 videos per label, and this will achieve a model with an excellent degree of accuracy. Try and balance your training sets – ideally having a similar number of videos for each label.
Note that Google enforces a minimum of 10 videos for classification (we wouldn’t recommend so few unless the use-case is very simplistic).
Other video classification tasks supported
As well as classification, Google’s Data Labelling Service covers the following other video labelling tasks:
Object tracking
With this type of labelling, we want the labellers to draw rectangular boxes around the thing(s) we want labelling in the video, and track how these move throughout the video. For example, we might extend our bird feeding example to perform a second round of labelling to track the birds and how they move in the video.

Tip. Pay close attention to your instructions document and make sure it’s really clear to labellers what you want (and don’t want) boxed in your videos.
Object detection

This is quite different from our first two labelling tasks. In classification and object tracking we define the labels, and provide these to the labellers. With object detection, the labellers themselves define the labels, depending on what they see in the videos. Again, labellers apply these labels as bounded boxes to the video content.
Say for example, in our bird feeder videos we want to see if there are any other animals (squirrels, cats, mice…) that appear in the video. This could be an object detection candidate. In our instructions document, we would outline that we want any bird or mammal highlighted in the video.
One further slight variation with object detection is labellers label image frames extracted from your video content – you just supply your videos with a framerate to indicate how many images they should extract.
Video event labelling
With video event labelling, you provide human labellers with a label ("Birds Feeding"), and they will define a set of start and end time pairs, for in this case, when birds are feeding in each of my videos.
How much does it cost?
Pricing is documented really well in the Google Cloud documentation. This describes the overall pricing structure:
"Prices for the service are computed based on:
1. The type of labeling task.
2. The number of annotation units generated."
So, let’s look at the pricing for our bird feeding classification example. Here is the definition of annotation units for video classification tasks taken from the pricing documentation:
For a video classification task, units are determined by the video length (every 5 seconds is a price unit), label set size(for labeling quality concern, every 20 labels is a problem set and a price unit) and the number of human labelers. The price for single-label and multi-label classification is the same. For example, if a video is 20 seconds, the label set size is 22, the number of human labelers is 3, this video will count for 4 2 3 = 24 units.
In our example, our label set size was 2; "Birds Feeding" and "No Birds Feeding".
As a price unit covers up to 20 labels, ours constitutes 1 unit.
Both videos were 60 seconds long – so each video uses 60/5 = 12 units
We left the number of human labellers at the default of 1.
Therefore, our labelling task is priced at 24 units (12 per video).
At the time of writing, video classification is priced at $86 per 1,000 units for the first 50,000 units per month, and $60 for every 1,000 units thereafter.
These costs are shown below.

This is equivalent to 8.6 cents per image, reducing to 6 cents.
My video classification example would therefore cost $2.04
(8.612) + (8.612)
Conclusions
The first thing I’d like readers to take away with them is just how important accurate labelling is when training your models. Sometimes it can be tempting to rush this step in the excitement to see your model live, but just to remind you once again of the mantra our team refers back to:
"The accuracy of a machine learning model is only as good as it’s (training) data."
The second key takeaway is just how many different sub-classes of video labelling there are. Some of these, such as Video Event, I never even knew existed. I hope you found these interesting too.
My third takeaway is how simple the Data Labelling Service from Google Cloud is to use; Google has done a great job at making this really user friendly and accessible. I feel it follows a logical flow, and I hope this was evident as you followed along with my example. The ability to easily export to GCS and therefore into BigQuery we find super useful too.
I feel the service offers good value for money, especially for simpler tasks such as classification. I think $1 per video is reasonable, and accessible for most use-cases. Let’s say for my use case I wanted to have 1,000 videos per label – this should deliver an accurate model when I come to training. Total cost for using this service would be $2,000. This feels reasonable considering the amount of manual work involved, and how many resources this would consume if we did it ourselves!
Lastly, just keep in mind that labelling is not the silver bullet when training video related models. Be mindful to not allow other bias to creep into the model. Always try and keep your training videos as close to the ones your model will face "in the field", and you should see great results!
Next steps
- In Part 3, we will explore the Data Labelling service for free text
- Read the Google Cloud Data Labelling documentation
- Learn more about Ancoris Data, Analytics & AI