About Joins in Spark 3.0
Tips for efficient joins in Spark SQL
One of the very frequent transformations in Spark SQL is joining two DataFrames. The syntax for that is very simple, however, it may not be so clear what is happening under the hood and whether the execution is as efficient as it could be.
Spark provides a couple of algorithms for join execution and will choose one of them according to some internal logic. This choice may not be the best in all cases and having a proper understanding of the internal behavior may allow us to lead Spark towards better…