The world’s leading publication for data science, AI, and ML professionals.

Motion-Based Object Detection and Tracking in MATLAB

Implementing MATLAB's Motion-Based Multiple Object Tracking Algorithm on Hovering Airborne Drone Video Footage

Detecting and tracking objects in full-motion video have important applications such as traffic monitoring, security, and video surveillance, among many others. Despite its various uses, most people tend to shy away from doing any Computer Vision work due to its complexity without realizing there are many libraries and packages available which make implementation straightforward.

Presented here is a simple guide in plain language for understanding and implementing Matlab’s Motion-Based Multiple Object Tracking Algorithm so that you can detect and track moving objects in your own videos. The algorithm is tested on a video where a scenario was staged and recorded from a hovering drone.


Motion-Based Multiple Object Tracking Algorithm

We’ll be using Matlab code from Mathworks which is available here.

There are a few terms that are important to know for later in the discussion, the figure below illustrates each one of them.

The terms shown above are also defined here:

  • Object identification number or object id is used to distinguish the different objects from each other.
  • Object bounding box is the box that surrounds a detected object
  • Frame X-axis is the pixel coordinate axis in each frame along the horizontal line
  • Frame Y-axis is the pixel coordinate axis in each frame along the vertical line
  • Object centroid is composed of an x pixel coordinate and y pixel coordinate for defining the center of the object in space and time
  • Frame number is the frames in which an object is visible, knowing the camera frames per second allows this parameter to measure the amount of time an object is in the scene for

Unlike popular detection algorithms which implement deep learning and extensive amounts of training data to detect objects, Matlab’s motion-based algorithm uses only movement. An object is detected by first subtracting the background of two frames and if the difference between the two frames contains enough connected pixels, an object is identified. The path of each detected object is predicted with a simple Kalman filter and if a subsequent object is detected along the predicted track, it is assigned as the same object.

The figure below shows an example of a normal video screenshot next to what the subtracted background space looks like for detecting and tracking objects.


Acquiring the Airborne Video Data

I used DJI’s Mavic Mini drone to film myself walking around with siblings and friends. The pictures below show the set up where we filmed and the equipment used. The drone was hovering the entire time as the algorithm would not work if the camera exceeded a certain amount of movement.

We all walked straight along the horizontal or vertical axis of the scene at a normal pace with the exception of two tracks being diagonal at either a significantly slower or faster pace to later test the tracking algorithm. After acquiring about 12 minutes of video, the data was retrieved from the drone for input to the detection and tracking algorithm.


Detected and Tracked Objects

Despite using shaky footage from a hovering drone, the detection and tracking algorithm worked well a shown in the busiest 15-second clip below with 5 different objects.

As mentioned beforehand, to show a simple application of the Object Detection and tracking algorithm two of the paths deviated from the normal patterns as shown above. Let’s take a look at some of the output data to try to find the deviant paths without having to manually analyze the 12-minute video. The output data includes object id, the x and y centroid of the object, and the frames in which it appears.

A total of 192 objects were detected and tracked throughout the 12-minute video although it is important to note that some of these may have been repeats or incorrect identifications due to drone movement at times. First, we can plot the tracks on to the background to see where each tracked object traversed, a figure showing a plot of all the tracks is shown below.

Although not very clear, in the image above we can see the two diagonal lines which represent the deviant tracks. If we plot the amount of time each track was present for, we should be able to easily detect both the longer and shorter deviant track. Below is a figure showing the duration of each track for all tracks.

Using the plot above we note that the deviant track ids are 1 and 22 for the slow and fast deviant track respectively. Now we can crop the video to just show when the deviant tracks 1 and 22 are present as shown below.

Using the output track data we were able to quickly identify the deviant tracks. Filtering to just show the deviant clips is much faster than manually inspecting the 12-minute video for 30 seconds worth of deviant footage.


Summary

Implementing object detection and tracking is becoming an easier process with all the libraries or algorithms being continually improved and released to the public. It is important to note that there are several different workflows for detecting and tracking objects, each with its advantages and disadvantages. The motion-based approach presented here may work well for cases such as those presented here with moving targets but the detection and tracking of stationary targets will require different methods.

In the simple application presented here the output object detection and tracking data were used to identify two deviant tracks from a total of 192 objects. Although only a 12-minute video was used to identify features summing up to 30 seconds, this approach could be applied for identifying small deviant behaviors from hours worth of video.


Related Articles