Understanding the Backbone of Video Classification: The I3D Architecture

Published in

Towards Data Science

4 min readJun 7, 2020

One of the distinctive differences between information in a single image and information in a video is the temporal element. This has led to improvements of deep learning model architectures to incorporate 3D processing in order to additionally process temporal information. This article summarizes the architectural changes from images to video through the I3D model.

Understanding the Backbone of Video Classification: The I3D Architecture

I3D

Written by Madeline Schiappa