Managing Annotation Mistakes with FiftyOne and Labelbox

Find annotation mistakes in your image datasets with FiftyOne and fix them in Labelbox

Published in

Towards Data Science

8 min readFeb 25, 2021

As computer vision datasets are growing to contain millions of images, annotation errors undoubtedly creep in. While annotation errors in training data can be harmful to the performance of a model if they exist in large quantities, it is crucial that gold standard test sets be free of all errors.

In academia, prominent datasets like ImageNet and MS-COCO serve as benchmarks allowing researchers to compare computer vision methods that were created years apart. But even popular public datasets are not free of errors.

Slowly increasing performance of models on COCO dataset over 5 years (Image from paperswithcode.com, CC-BY-SA)

The performance on these datasets still improves over the course of years. The problem comes when you need to decide between specific models to use for your task. Comparing overall performance on public datasets is not enough. Especially in production settings, you need to be certain about the performance of your model when there are monetary or even ethical repercussions of poor performance.

High-quality gold standard datasets are just as valuable, if not more valuable than high-quality models. Being able to accurately select the best model and maintain confidence in its performance on data in production, especially performance on difficult edge cases, will lead to savings in time and money that would otherwise have to be spent on taking down the model and revising it.

In this blog post, I take a look at two machine learning tools that make it fast and easy to find and fix annotation mistakes in visual datasets. I am using the open-source ML developer tool FiftyOne that I have been working on to compute the mistakenness of labels. FiftyOne then directly integrates with the annotation tool Labelbox to letting us easily re-annotate problematic samples.

Setup

Both FiftyOne and Labelbox APIs are installable through pip:

pip install fiftyone
pip install labelbox

Using Labelbox first requires you to create a free account. This gives you access to the Labelbox annotation app and lets you upload raw data.

FiftyOne is an open-source tool so no account is required, just the pip install.

In order to use the Labelbox Python SDK, you will need to generate and authenticate your API key:

export LABELBOX_API_KEY="<your-api-key>"

Note: You need a paid account for model-assisted labeling in Labelbox. This is the only way to upload existing labels to Labelbox.

Load data into FiftyOne

In this blog, I am using the COCO-2017 object detection dataset as an example. This dataset is in the FiftyOne Dataset Zoo and so can be loaded into FiftyOne in one line of code:

A sample from the COCO dataset loaded in the FiftyOne App (Image by author)

Of course, in practice you’ll want to use your own data, so let’s cover a few scenarios for where your data lives and how to load it.

Case 1: Dataset is in Labelbox

If you have a dataset you are annotating in Labelbox and just want to use FiftyOne to explore the dataset and find annotation mistakes, then the Labelbox integrations provided in FiftyOne make this fast and easy.

First, you need to export your data from Labelbox. This is as simple as logging into the Labelbox app, selecting a project, going to the Exports tab, and generating a JSON export of your labels.

Next, you can use the Labelbox utilities provided in FiftyOne to import the dataset. You just need to specify the path to the exported JSON as well as a directory that will be used to download the raw data from Labelbox.

Case 2: Dataset follows a common format

FiftyOne supports loading over a dozen common computer vision dataset formats including COCO, TFRecords, YOLO, and more. If your dataset follows one of these formats, then you can load it into FiftyOne in a single line of code:

Case 3: Dataset is stored in a custom format

No matter what format you use, you can always get your data and labels into FiftyOne. All you need to do is parse your labels in Python, add your labels to FiftyOne Samples, and then add the Samples to a FiftyOne Dataset. For example, if you have detections the following code will build a dataset:

Find annotation mistakes with FiftyOne

FiftyOne also includes the FiftyOne Brain package that contains various methods that can help you analyze your dataset to find unique and hard samples as well as find potential annotation mistakes in classification and object detection datasets.

Generate model predictions

The mistakenness method in the FiftyOne Brain uses model predictions to rank possible annotation mistakes based on the confidence of the model. So, let’s add some model predictions to our dataset.

We can use the FiftyOne Model Zoo to automatically load and generate detections from a variety of pretrained models and add the detections to our dataset in just two lines of code.

In this example, I am using EfficientDet-D6 pretrained on COCO-2017.

Note: If you are using a custom dataset, then you may need to use your own trained model to generate predictions and add them to your FiftyOne dataset manually. This process is straightforward no matter if you are working with classifications, detections, or more. For classifications, you can also add logits for each prediction to calculate mistakenness.

Compute mistakenness

Once we have the ground truth and predicted labels added to FiftyOne, we only need to run one command to compute mistakenness.

This populates the following attributes on your labels:

mistakenness — [0,1] value on ground truth labels indicating the likelihood that the ground truth object’s label is incorrect
mistakenness_loc — [0,1] value on ground truth detections indicating the likelihood that the ground truth’s bounding box is inaccurate
possible_spurious — Boolean on ground truth detections indicating that the underlying object likely does not exist
possible_missing — Boolean on predicted detections indicating an object was possibly missed during annotation

Explore dataset

Now that mistakenness has been computed for all samples in the dataset, we can use the FiftyOne App to visualize our dataset and look for specific samples and annotations that we need to fix.

Results of mistakenness calculation in the FiftyOne App (Image by author)

Additionally, we can search, sort, and query the results of the mistakenness computation by creating views into the dataset. For example, we can sort by samples with the most possible_spurious detections:

Top samples with spurious objects (Image by author)

Or we can filter all detections to only see ones with a mistakenness greater than 0.6:

Examples of mistaken annotations of the Chair and Couch classes (Image by author)

Just looking through some of the samples from the last query, we can see that a common pattern of errors appears to be mistaking the couch and chair classes. The fact that these two classes seem to frequently contain annotation errors can artificially lower the performance of the model on these classes. Let’s select a few of these samples and load them into Labelbox for reannotation.

Fix annotations in Labelbox

FiftyOne has made it fast and easy to find samples with annotation mistakes. Thanks to the integration between FiftyOne and Labelbox, we can now load the data we want to reannotate into Labelbox. In the Labelbox app, simply create a new dataset and choose the files you want to upload.

You can also use the Labelbox utilities in FiftyOne to add media files to a dataset that you created previously in the Labelbox app:

Note: If your labeled project already exists in Labelbox, then you just need to collect the IDs of the samples you want to fix.

Using the Labelbox editor

One of the primary benefits of using Labelbox is the ability to access their powerful editors used to label images. Their online tool is fairly easy to use making it ideal for this workflow where we want to fix these mistaken annotations as fast as possible.

Once you have your raw data in a dataset in the Labelbox app, we can now create a new project and attach the data that we want to reannotate.

Selecting dataset for project in Labelbox (Image by author)

We then need to configure the editor for the classes that we want to annotate. In this example, we only want to reannotate chairs and couches in the samples we selected so we add those object classes.

Configuring editor in Labelbox (Image by author)

Once the project is set up, you can begin labeling the samples according to the classes that were defined.

Reannotating mistaken couch label (Image by author)

Now that these samples have been reannotated, we can update the labels in the Labelbox dataset or download the annotations once again and update our dataset manually outside of Labelbox.

(Optional) Model-assisted labeling

Labelbox also provides a paid model-assisted labeling feature for Pro/Enterprise Labelbox users. With this, you can load annotations directly into Labelbox and edit them. This makes it even easier to update mistaken annotations or use model predictions as a starting point for future annotation.

The Labelbox utilities in FiftyOne provide multiple functions to assist in this. You can export your dataset to disk in Labelbox format and then upload it to an existing Labelbox project.

Summary

Clean data with correct labels is the key to a gold standard test set. Annotation errors lead to unreliable model performance which can be expensive and even dangerous in production settings. Utilizing the tight integration between FiftyOne and Labelbox to quickly find and fix annotation errors can be the key to producing a successful model.

About Voxel51

High-quality, intentionally-curated data is critical to training great computer vision models. At Voxel51, we have over 25 years of CV/ML experience and care deeply about enabling the community to bring their AI solutions to life. That’s why we developed FiftyOne, an open-source tool that helps engineers and scientists to do better ML, faster.

Want to learn more? Check us out at fiftyone.ai.