How to Handle Large Datasets in Python

A Comparison of CSV, Pickle, Parquet, Feather, and HDF5

Leonie Monigatti
Towards Data Science
9 min readJul 26, 2022

--

As usual with large files reading and writing can take a long time. The features image shows a loading screen and the estimated time says: “Go grab a coffee.”
Image by the author.

When Kaggle finally launched a new tabular data competition after all this time, at first, everyone got excited. Until they weren’t. When the Kagglers found out that the dataset was 50 GB large, the community started discussing how to handle such large datasets [4].

--

--

Developer Advocate @ Weaviate. Follow for practical data science guides - whether you're a data scientist or not. linkedin.com/in/804250ab