Memory Management in Apache Spark: Disk Spill

What it is and how to handle it

Tom Corbin
Towards Data Science
12 min readSep 15, 2023

--

Photo by benjamin lehman on Unsplash

In the world of big data, Apache Spark is loved for its ability to process massive volumes of data extremely quickly. Being the number one big data processing engine in the world, learning to use this tool is a cornerstone in the skillset of any big data professional. And an important step in that path is understanding Spark’s memory…

--

--