The world’s leading publication for data science, AI, and ML professionals.

Introducing PeekingDuck for Computer Vision

Open-source state-of-the-art computer vision models with minimal lines of code

Photo by Vlad Tchompalov on Unsplash.
Photo by Vlad Tchompalov on Unsplash.

Introduction

Computer Vision projects can be a very daunting, involving a wide variety of tools and packages such as OpenCV, TensorFlow and PyTorch just to name a few. Not only does one have to be familiar with the tools and APIs involved, one also needs to combine the various packages correctly in order for the entire computer vision pipeline to work properly.

For example, OpenCV handles images in the [H, W, C] format with BGR channels, while TensorFlow does so in the same format but with RGB channels, and PyTorch does so in the [C, H, W] format with RGB channels. Due to this inconsistency the image format must be constantly modified as the image is passed amongst the various libraries. Issues like this (in addition to others!) results in plenty of boilerplate code which we want to avoid in general.

What if we could streamline computer vision pipelines with a single unified pipeline which is:

  1. Open source without restrictions such as GPL-3.0 in order to cut costs.
  2. Modular for applicability to various use cases.
  3. State-of-the-art for maximum performance.
  4. Minimal to minimize pipline complexity.

It turns out that all of these issues are resolved to some extent with PeekingDuck – a computer vision package released recently by AI Singapore!

PeekingDuck

PeekingDuck is a computer vision framework which is:

  1. Open source (Apache 2.0) – no costs or restrictions.
  2. Modular – mix and match various modules to solve different use cases.
  3. State-of-the-art computer vision inference -powerful Deep Learning models.
  4. Minimal – literally no Python code needed!

After installing PeekingDuck as a Python package through a package manager such as pip, the package can be used directly from the command line/terminal, allowing for easy and direct integration with other applications.

Installing PeekingDuck

PeekingDuck is installed as a Python package:

pip install peekingduck

Nodes – PeekingDuck’s Basic Building Blocks

With PeekingDuck, computer vision pipelines are built using basic building blocks called nodes. Each node handles a different set of operations, and by mixing various nodes different pipelines can be created. As of writing PeekingDuck has 6 different types of nodes:

  1. Input -feed image data into the pipeline from live camera feeds or video/image files.
  2. Augment -preprocess image data.
  3. Model -perform computer vision tasks such as object detection or pose estimation.
  4. Dabble – post process model outputs.
  5. Draw – visualize model outputs such as bounding boxes.
  6. Output – save model outputs to disk.

Person Tracking Pipeline

Using PeekingDuck is easy! In this section we will demonstrate how to use PeekingDuck to create a person tracking pipeline using PeekingDuck!

Initialize PeekingDuck

The first step is to initialize PeekingDuck within a specified directory (person_tracking/ in this case).

mkdir person_tracking
cd person_tracking
peekingduck init

This will create a configuration file named pipeline_config.yml under person_tracking/ together with some other source code files. In order to get PeekingDuck to do what we want it to do, we have to modify pipeline_config.yml.

In our case, pipeline_config.yml should contain the following lines:

nodes:
- input.visual:
    source: venice-2-train.mp4
- model.jde
- dabble.statistics:
    maximum: obj_attrs["ids"]
- draw.bbox
- draw.tag:
    show: ["ids"]
- draw.legend:
    show: ["cum_max"]
- output.media_writer:               
    output_dir: output/

We use the following nodes for this task:

  1. input.visual – specifies the file to load the image data from. We use a video stitched from the Venice-2 images from the MOT15 dataset.
  2. model.jde – specifies the model to use. For person tracking we use the Joint Detection and Embedding (JDE) model.
  3. dabble.statistics -Performs statistical calculations based on the model’s output. In this case we calculate the maximum number of detected IDs for each frame.
  4. draw.bbox – draws the detected bounding boxes on each frame.
  5. draw.tag – draws the corresponding tag for each bounding box.
  6. draw.legend – draws the cumulative maximum number of detections.
  7. output.media_writer – outputs the model’s predictions to disk.

By mixing and matching different nodes, we can build different pipelines to solve different computer vision use cases. A detailed list of available nodes are available on PeekingDuck’s website.

Prepare the Data

The next step is to prepare the data. In our case we use OpenCV to stitch together the Venice-2 images from the MOT15 dataset into a video file named venice-2-train.mp4 with a frame rate of 30 and a resolution of [1920, 1080].

import cv2
import os

w = cv2.VideoWriter("venice-2-train.mp4", 
                    cv2.VideoWriter_fourcc(*"MP4V"), 
                    30, [1920, 1080])

files = sorted(os.listdir("MOT15/train/Venice-2/img1"))

for f in files:
    im = cv2.imread(os.path.join("MOT15/train/Venice-2/img1", f))
    w.write(im)

w.release()

Run PeekingDuck

After initializing both PeekingDuck as well as the data, all that is left is to simply run the pipeline from the command line:

peekingduck run

The pipeline’s output will be saved under output/ as specified in pipeline_config.yml which can be visualized either as a video or as a .gif image as shown below. The detected bounding boxes have been overlaid onto each tracked person together with each corresponding tracking ID. The cumulative maximum number of tracked IDs is also displayed on the lower left part of each frame.

PeekingDuck person tracking output. Figure created by the author. Original images are the Venice-2 images from the MOT15 dataset.
PeekingDuck person tracking output. Figure created by the author. Original images are the Venice-2 images from the MOT15 dataset.

Note that aside from preparing the data, we have not written a single line of Python code while using PeekingDuck to do person tracking!

Conclusion

Computer vision has come a long way, and we now have access to many fantastic packages such as PeekingDuck. PeekingDuck offers open-source, modular state-of-the-art computer vision models with minimal amounts of Python code, allowing for anyone to pursue computer vision projects with relative ease and simplicity!

References

  1. https://peekingduck.readthedocs.io/en/stable/master.html
  2. https://motchallenge.net/data/MOT15/
  3. https://github.com/Zhongdao/Towards-Realtime-MOT

Related Articles