Data Science for All

DIY: Apache Spark & Docker

Affectionately known as `Dark’ or `Spocker’

Shane De Silva
Towards Data Science
16 min readMay 7, 2020

--

Fully distributed Spark cluster running inside of Docker containers

Introduction

Two technologies that have risen in popularity over the last few years are Apache Spark and Docker.

Apache Spark provides users with a way of performing CPU intensive tasks in a distributed manner. It’s adoption has…

--

--

PhD student interested in the application of statistical learning, DS, ML, and DL to real world problems