Getting Started with PySpark on AWS EMR
A step-by-step guide to processing data at scale with Spark on AWS
Published in
8 min readJul 19, 2019
Data Pipelines with PySpark and AWS EMR is a multi-part series. This is part 1 of 2. Check out part 2 if you’re looking for guidance on how to run a data pipeline as a product job.
- Getting Started with PySpark on AWS EMR (this article)
- Production Data Processing with PySpark on AWS EMR (up next)