How to Test PySpark ETL Data Pipeline
Validate big data pipeline with Great Expectations
Published in
6 min readDec 6, 2022
Introduction
Garbage in garbage out is a common expression used to emphasize the importance of data quality for tasks such as machine learning, data analytics and business intelligence. With increasing amount of data being created and stored, building high quality data pipelines…