How to Handle Large Datasets in Python
A Comparison of CSV, Pickle, Parquet, Feather, and HDF5
Published in
9 min readJul 26, 2022
When Kaggle finally launched a new tabular data competition after all this time, at first, everyone got excited. Until they weren’t. When the Kagglers found out that the dataset was 50 GB large, the community started discussing how to handle such large datasets [4].