Load files faster into BigQuery

Benchmarking CSV, GZIP, AVRO and PARQUET file types for ingestion

Bence Komarniczky
Towards Data Science
6 min readJun 30, 2020

--

Google Cloud Platform’s BigQuery is a managed large scale data warehouse for analytics. It supports JSON, CSV, PARQUET, OCR and AVRO file formats for importing tables. Each of these file types has its pros and cons and I already talked about why I prefer PARQUET for Data Science workflows here. But one question remains:

Which file extension gives us the quickest load times into BigQuery?

--

--

Data scientist building ML products in ad-tech. I write tutorials on data science🧑‍🔬, machine learning 🤖, Julia and cloud computing ☁️.