Data Science challenge for Alzheimer’s Research

Are you ready for a Video Classification Challenge?

Preparation Guide for Video Classification

Neha Goel
Towards Data Science
6 min readApr 25, 2020

--

Image Source (Stall Catchers)

To help you learn new skills as well as win some prize money online, while working from home, we at MathWorks are launching a data science competition.

Teaser: The Datathon will be live in May. Signup for DrivenData account to receive the launch announcement. Request for complimentary MATLAB licenses here : Advance Alzheimer’s Research with Stall Catchers

The dataset will be comprised of image stack (a 3D image) taken from a live mouse brain showing blood vessels and blood flow. Each stack will have an outline drawn around a target vessel segment and will be converted to an .mp4 video file. The problem will be to classify the target vessel segment as either flowing or stalled. The challenge will be online, globally accessible and free to participate in. You can use any approach to solve the problem.

In this story, I will talk about the concepts and methods I learned while working on setting up this problem. I will also point you to the documents you can refer, to start preparing for the challenge.

Working with Data

Video Data

Working with videos is an extension of working with images; we additionally must consider dynamic nature of a video over the static nature of an image. A video can be defined as a stack of images, also referred to as frames arranged in a specific order. Each frame is meaningful, but the order is also very important. Hence both spatial and temporal content of the frames need to be measured.

So, the first step is extracting frames from video. Make sure that the frames should have both, the sequence modeling and the temporal reasoning.

Process Data

Another challenge in working with videos is the large size of the dataset. In MATLAB, you can use the concept of datastore, to create a repository for collections of data that are too large to fit in memory. A datastore allows you to read and process data stored in multiple files on a disk, a remote location, or a database as a single entity.

Documents to refer:

Video Classification Methods

Once the data is ready, you can use either of the 5 below methods to proceed with classification. I will talk about the most commonly used video classification methods from basic non-deep learning approach to an advanced one. But I would encourage you to use the deep learning approaches due to the size of the data and to extract features from each frame in timely manner.

Classical Computer Vision Methods

Method 1: Optical Flow, Object Tracking & Cascade Classifier

Optical flow, activity recognition, motion estimation and tracking are the key activities you can use to determine the classes and their movement in adjacent frames of the video.

Resources to refer:

Another approach can be by using the local features like blobs, corners and edge pixels of an image. The cascade classifier supports local features like Haar, local binary patterns (LBP) and histograms of oriented gradients (HOG).

Resources to refer:

Deep Learning Methods

Method 2: Convolutional Neural Network (CNN) + Long short-term memory network (LSTM)

In this method, you convert the videos to a sequence of feature vectors using a pre-trained convolutional neural network to extract features from each frame. Then train a Long short-term memory (LSTM) network on the sequences to predict the video labels. As a final step, combine layers from both networks to assemble a final network that classifies videos directly.

To learn steps for this complete workflow, check this document: Classify Videos Using Deep Learning

Image Source (MATLAB document)

Method 3: Large-scale video classification with CNN

If video classification is like image classification, why not just use convolutional neural network?

To answer this, remember I talked about the temporal component of the video. So, to capture the temporal and spatial aspects, you can use CNN, but you need to structure the network in different ways.

This paper from Stanford, Large-scale Video Classification with Convolutional Neural Networks, talks about the challenges of the basic CNN for videos. It further elaborates all the different models of CNN you can use, to fuse features from multiple frames.

Image source: Research paper

Method 4: Two-stream CNN

The other approach as explained by the researchers in this paper: Two-Stream Convolutional Networks for Action Recognition in Videos, is two conv-nets each for spatial and temporal aspect.

Image Source: Research paper

Documents to refer to develop CNN architecture in MATLAB:

Method 5: Using a 3D convolution network

3D ConvNets are on the initial choice for video classification since they inherently apply convolutions and max pooling in the 3D space. In this paper: Learning Spatiotemporal Features with 3D Convolutional Networks, researchers propose a C3D ( convolutional 3D ) with compact features and efficient compute.

Image source: Research paper

Documents to refer:

Next Steps

If you do not have a MATLAB license, start your preparation by requesting for complimentary MATLAB licenses here: Advance Alzheimer’s Research with Stall Catchers.

Stay tuned for further updates, in my next blog in May, on the competition launch day. The blog will be the benchmark code for the problem with all other details.

Feel free to give your feedback or any questions you have in the comments below.

--

--

Exploring new languages, platforms and ideas emerging in the world of AI