
As computer vision datasets are growing to contain millions of images, annotation errors undoubtedly creep in. While annotation errors in training data can be harmful to the performance of a model if they exist in large quantities, it is crucial that gold standard test sets be free of all errors.
In academia, prominent datasets like ImageNet and MS-COCO serve as benchmarks allowing researchers to compare computer vision methods that were created years apart. But even popular public datasets are not free of errors.

The performance on these datasets still improves over the course of years. The problem comes when you need to decide between specific models to use for your task. Comparing overall performance on public datasets is not enough. Especially in production settings, you need to be certain about the performance of your model when there are monetary or even ethical repercussions of poor performance.
High-quality gold standard datasets are just as valuable, if not more valuable than high-quality models. Being able to accurately select the best model and maintain confidence in its performance on data in production, especially performance on difficult edge cases, will lead to savings in time and money that would otherwise have to be spent on taking down the model and revising it.
In this blog post, I take a look at two machine learning tools that make it fast and easy to find and fix annotation mistakes in visual datasets. I am using the open-source ML developer tool FiftyOne that I have been working on to compute the mistakenness of labels. FiftyOne then directly integrates with the annotation tool Labelbox to letting us easily re-annotate problematic samples.
Setup
Both FiftyOne and Labelbox APIs are installable through pip
:
pip install fiftyone
pip install labelbox
Using Labelbox first requires you to create a free account. This gives you access to the Labelbox annotation app and lets you upload raw data.
FiftyOne is an open-source tool so no account is required, just the pip install.
In order to use the Labelbox Python SDK, you will need to generate and authenticate your API key:
export LABELBOX_API_KEY="<your-api-key>"
Note: You need a paid account for model-assisted labeling in Labelbox. This is the only way to upload existing labels to Labelbox.
Load data into FiftyOne
In this blog, I am using the COCO-2017 object detection dataset as an example. This dataset is in the FiftyOne Dataset Zoo and so can be loaded into FiftyOne in one line of code:

Of course, in practice you’ll want to use your own data, so let’s cover a few scenarios for where your data lives and how to load it.
Case 1: Dataset is in Labelbox
If you have a dataset you are annotating in Labelbox and just want to use FiftyOne to explore the dataset and find annotation mistakes, then the Labelbox integrations provided in FiftyOne make this fast and easy.
First, you need to export your data from Labelbox. This is as simple as logging into the Labelbox app, selecting a project, going to the Exports tab, and generating a JSON export of your labels.
Next, you can use the Labelbox utilities provided in FiftyOne to import the dataset. You just need to specify the path to the exported JSON as well as a directory that will be used to download the raw data from Labelbox.
Case 2: Dataset follows a common format
FiftyOne supports loading over a dozen common computer vision dataset formats including COCO, TFRecords, YOLO, and more. If your dataset follows one of these formats, then you can load it into FiftyOne in a single line of code:
Case 3: Dataset is stored in a custom format
No matter what format you use, you can always get your data and labels into FiftyOne. All you need to do is parse your labels in Python, add your labels to FiftyOne Samples, and then add the Samples to a FiftyOne Dataset. For example, if you have detections the following code will build a dataset:
Find annotation mistakes with FiftyOne
FiftyOne also includes the FiftyOne Brain package that contains various methods that can help you analyze your dataset to find unique and hard samples as well as find potential annotation mistakes in classification and object detection datasets.
Generate model predictions
The [mistakenness](https://voxel51.com/docs/fiftyone/user_guide/brain.html#label-mistakes)
method in the FiftyOne Brain uses model predictions to rank possible annotation mistakes based on the confidence of the model. So, let’s add some model predictions to our dataset.
We can use the FiftyOne Model Zoo to automatically load and generate detections from a variety of pretrained models and add the detections to our dataset in just two lines of code.
In this example, I am using EfficientDet-D6 pretrained on COCO-2017.
_Note: If you are using a custom dataset, then you may need to use your own trained model to generate predictions and add them to your FiftyOne dataset manually. This process is straightforward no matter if you are working with classifications, detections, or more. For classifications, you can also add logits for each prediction to calculate mistakenness._
Compute mistakenness
Once we have the ground truth and predicted labels added to FiftyOne, we only need to run one command to compute mistakenness.
This populates the following attributes on your labels:
mistakenness
– [0,1] value on ground truth labels indicating the likelihood that the ground truth object’s label is incorrectmistakenness_loc
– [0,1] value on ground truth detections indicating the likelihood that the ground truth’s bounding box is inaccuratepossible_spurious
– Boolean on ground truth detections indicating that the underlying object likely does not existpossible_missing
– Boolean on predicted detections indicating an object was possibly missed during annotation
Explore dataset
Now that mistakenness has been computed for all samples in the dataset, we can use the FiftyOne App to visualize our dataset and look for specific samples and Annotations that we need to fix.

Additionally, we can search, sort, and query the results of the mistakenness computation by creating views into the dataset. For example, we can sort by samples with the most possible_spurious
detections:

Or we can filter all detections to only see ones with a mistakenness
greater than 0.6:


Just looking through some of the samples from the last query, we can see that a common pattern of errors appears to be mistaking the couch
and chair
classes. The fact that these two classes seem to frequently contain annotation errors can artificially lower the performance of the model on these classes. Let’s select a few of these samples and load them into Labelbox for reannotation.
Fix annotations in Labelbox
FiftyOne has made it fast and easy to find samples with annotation mistakes. Thanks to the integration between FiftyOne and Labelbox, we can now load the data we want to reannotate into Labelbox. In the Labelbox app, simply create a new dataset and choose the files you want to upload.
You can also use the Labelbox utilities in FiftyOne to add media files to a dataset that you created previously in the Labelbox app:
Note: If your labeled project already exists in Labelbox, then you just need to collect the IDs of the samples you want to fix.
Using the Labelbox editor
One of the primary benefits of using Labelbox is the ability to access their powerful editors used to label images. Their online tool is fairly easy to use making it ideal for this workflow where we want to fix these mistaken annotations as fast as possible.
Once you have your raw data in a dataset in the Labelbox app, we can now create a new project and attach the data that we want to reannotate.

We then need to configure the editor for the classes that we want to annotate. In this example, we only want to reannotate chairs
and couches
in the samples we selected so we add those object classes.

Once the project is set up, you can begin labeling the samples according to the classes that were defined.

Now that these samples have been reannotated, we can update the labels in the Labelbox dataset or download the annotations once again and update our dataset manually outside of Labelbox.
(Optional) Model-assisted labeling
Labelbox also provides a paid model-assisted labeling feature for Pro/Enterprise Labelbox users. With this, you can load annotations directly into Labelbox and edit them. This makes it even easier to update mistaken annotations or use model predictions as a starting point for future annotation.
The Labelbox utilities in FiftyOne provide multiple functions to assist in this. You can export your dataset to disk in Labelbox format and then upload it to an existing Labelbox project.
Summary
Clean data with correct labels is the key to a gold standard test set. Annotation errors lead to unreliable model performance which can be expensive and even dangerous in production settings. Utilizing the tight integration between FiftyOne and Labelbox to quickly find and fix annotation errors can be the key to producing a successful model.
About Voxel51
High-quality, intentionally-curated data is critical to training great Computer Vision models. At Voxel51, we have over 25 years of CV/ML experience and care deeply about enabling the community to bring their AI solutions to life. That’s why we developed FiftyOne, an open-source tool that helps engineers and scientists to do better ML, faster.
Want to learn more? Check us out at fiftyone.ai.