Data Versioning: All You Need to Know

Manage your data like you manage code

Bex T.
Towards Data Science
10 min readDec 8, 2020

--

Photo by Tanner Boriack on Unsplash

Introduction

There is a need for a better system for versioning massive amounts of data. It has been around for years.

While Git does an excellent job of managing codebases, it sucks at versioning binary files. Even the creator of Git, Linus Torvalds, admits this:

--

--