The art of joining in Spark

Practical tips to speedup Spark joins

Andrea Ialenti
Towards Data Science
10 min readDec 9, 2019

--

I’ve met Apache Spark a few months ago and it has been love at first sight. My first thought was: “it’s incredible how something this powerful can be so easy to use, I just need to write a bunch of SQL queries!”. Indeed starting with Spark is very simple: it has very nice APIs in multiple languages (e.g. Scala, Python, Java), it’s virtually possible to…

--

--

I’m an Engineer, I usually ride a giant unicorn with a rainbow mane. I love learning by explaining. “Like a bicycle I need to move to keep my balance”.