Video Understanding with PyTorch

Learn how to inference a custom video understanding model in 3 simple steps using PyTorch Video and Lightning Flash

Aaron (Ari) Bornstein
Towards Data Science

--

Walmart recently developed a Video Understanding system that can inspect fresh food for signs of defects and spoilage. Photo by Pixabay from Pexels

Video Understanding, automates a wide range of business use cases, from retail to health care to agriculture, it enables computers to identify behaviors, objects, and activities in video.

In its latest release, Lightning Flash provides support for Video Understanding using Facebook AI Research’s new PyTorchVideo library powered by Lightning.

Flash is a library for fast prototyping, baselining, and fine-tuning scalable Deep Learning tasks. Using Flash for Video Understanding enables you to train, finetune and infer PyTorch Video models on your own data without being overwhelmed by all the details.

Once you get a baseline model you can then seamlessly override the default configurations and experiment with the full flexibility of PyTorch Lightning to get state-of-the-art results on your dataset.

In this article, you will learn how to inference a custom video classification model in 3 simple steps.

Video Understanding in 3 Simple Steps

Prerequisite Install Flash from Github Main

Step 1 Import Flash

First, we simply import the VideoClassifer Task from flash.video.

Step 2 Load Pretrained Model

Then we load the model we want to inference.

Flash makes it easy to train custom models to learn more check out the Lightning Developer tutorial on how you can easily finetune a Video Understanding model in 5 simple steps.

Step 3 Infer on your Video Data

We can then infer our model on individual videos or directories of videos just by passing a valid path as follows.

Putting it All together

And there you have it all you need to know to Inference Video understanding models on your own data. All the code is here for your convenience.

Next Steps

Now that you know how to inference a Video Understanding model in three lines of code with Flash you might check out 7 other Computer Vision Tasks that Flash makes as simple as Video Understanding.

  1. Multi-label Image Classification
  2. Image Embedding
  3. Object Detection
  4. Semantic Segmentation
  5. Style Transfer

New Tasks are being contributed all the time so keep an eye out for updates. If you have any questions feel free to comment below or engage with us by Slack or Twitter.

About the Author

Aaron (Ari) Bornstein is an AI researcher with a passion for history, engaging with new technologies and computational medicine. As Head of Developer Advocacy at Grid.ai, he collaborates with the Machine Learning Community, to solve real-world problems with game-changing technologies that are then documented, open-sourced, and shared with the rest of the world.

--

--

<Microsoft Open Source Engineer> I am an AI enthusiast with a passion for engaging with new technologies, history, and computational medicine.