Introduction to Apache Spark

Saloni Goyal
4 min readJul 11, 2019

MapReduce and Spark are both used for large-scale data processing. However, MapReduce has some shortcomings which renders Spark more useful in a number of scenarios.

Shortcomings of MapReduce

  1. Every workflow has to go through a map and reduce phase: Can’t accommodate a join, filter or more complicated workflows like map- reduce-map.

--

--

Saloni Goyal

What matters is going out there and doing it, not thinking about it, not worrying what others might think, not even being attached to a result, just doing it.