Best Practices for Bucketing in Spark SQL
The ultimate guide to bucketing in Spark.
Published in
21 min readApr 25, 2021
Bucketing is a feature supported by Spark since version 2.0. It is a way how to organize data in the filesystem and leverage that in the subsequent queries.
There are many resources that explain the basic idea of bucketing, in this article, we will go one step further and describe bucketing more in detail, we will see various different aspects it can have and explain how it works under the hood, how it evolved over time and — most importantly — how to efficiently…