The AI Illustrated Guide

Why is Object Detection so Messy?

TLDR: Neural networks have fixed sized outputs

Ygor Serpa

Published in

Towards Data Science

7 min readOct 1, 2020

Those working with Neural Networks know how complicated Object Detection techniques can be. It is no wonder there is no straight forward resource for training them. You are always required to convert your data to a COCO-like JSON or some other unwanted format. It is never a plug and play experience. Moreover, no diagram thoroughly explains Faster R-CNN or YOLO as there is for U-Net or ResNet. There are just too many details.

While these models are quite messy, the explanation for their lack of simplicity is quite straight forward. It fits in a single sentence:

Neural Networks have fixed-sized outputs

In object detection, you can’t know a priori how many objects there are in a scene. There might be one, two, twelve, or none. The following images all have the same resolution but feature different numbers of objects.

Photo by You X Ventures on Unsplash. Each image has a different number of objects.

The one million dollar question is: How can we build variable-sized outputs out of fixed-sized networks? Plus, how are we supposed to…

Very interesting article! I think what makes true variable sized outputs difficult is really two things:

(1) knowing beforehand the dimension of the output before the forward pass — without a signal there’s no way to know how many objects might be in…...

The AI Illustrated Guide

Why is Object Detection so Messy?

TLDR: Neural networks have fixed sized outputs

Create an account to read the full story.

Published in Towards Data Science

Written by Ygor Serpa

Responses (2)

More from Ygor Serpa and Towards Data Science

How I Stay Updated on the Latest AI Research

It’s all about subscribing to the right feed

The Data Scientist’s Dilemma: Answering “What If?” Questions Without Experiments

A hands-on alternative to Google’s Causal Impact

Think Correlation Isn’t Causation? Meet Partial Correlation

Despite being so powerful, partial correlation is perhaps the most underrated tool in data science

Construção: The Top One Brazilian Song of All Time

An overview of the ’71 classic and its political statement, which is as much alive today as it was back in the seventies

Recommended from Medium

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.

How I’d learn ML in 2025 (if I could start over)

All you need to learn ML in 2025 is a laptop and a list of the steps you must take.

Lists

Predictive Modeling w/ Python

Natural Language Processing

Practical Guides to Machine Learning

ChatGPT prompts

Top 10 Object Detection Models in 2024

Object detection is a fundamental task in computer vision that involves identifying and localizing objects within an image. Deep learning…

10 Must-Know Machine Learning Algorithms for Data Scientists

Machine learning is the science of getting computers to act without being explicitly programmed.” — Andrew Ng

Handbook of Anomaly Detection — (3) ECOD

In the previous chapter, we have introduced HBOS (Histogram-Based Outlier Score). This chapter introduces Empirical Cumulative…

Mean Average Precision (mAP) and other Object Detection Metrics

Understand the mAP metric for your object detection applications and improve your deep learning experiments analysis.