Diagram Share: The Evolution of Commercial Text-to-Video

Development of Text-to-Video in the latest 3 years

Selina Li
Towards Data Science


History of of Commercial Text-to-Video in latest 3 years (as of March 2024)

In recent years we have witnessed the emergence of commercial text-to-video models and products. I would like to share a self-created comprehensive timeline diagram that captures the remarkable evolution of commercial text-to-video models / products in the latest 3 years (including 2022, 2023 and 2024 till now).

I created the diagram when preparing for a presentation on Sora to my team. It was exciting to see how such great products emerge along with the development of Computer Vision (CV) research works including but not limited to Generative Adversarial Networks (GANs), transformer architecture and diffusion models.

As suggested by the Microsoft Research paper “Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models”, we see Sora as leap because it is not just a tool, but also potentially a “world simulator” to simulate physical and contextual dynamics of the depicted scenes in physical world.

This evolution, of course, will not stop and I am sure we will see other exciting news coming in. As a witness I am keen to keep this diagram updated.

I would love to hear your thoughts on this evolution and where you see text-to-video technology heading next. Let’s discuss the impacts, the potential applications, and the ethical considerations that come with these advancements.

