Create Data Science pipelines with Luigi & PySpark and CI/CD

Arnaud Alepee
Towards Data Science
7 min readNov 12, 2019

--

Photo by Victor on Unsplash

This article will give you all the details about create a robust data pipeline using the following Python packages:

  • Luigi, a package from pipelines
  • PySpark, a package to use Spark through a Python API
  • Pandas, a package to manipulate data
  • Unittest, a package to implement unit tests.

--

--

Head of data @Hokodo, Navigating through data for the last 10 years in financial services.