From Pandas to PySpark with Koalas

Maria Karanasou
Towards Data Science
4 min readOct 28, 2019

--

Photo by Ozgu Ozden on Unsplash

For those who are familiar with pandas DataFrames, switching to PySpark can be quite confusing. The API is not the same, and when switching to a distributed nature, some things are being done quite differently because of the restrictions imposed by that nature.

I recently stumbled upon Koalas from a very interesting Databricks presentation about Apache Spark 3.0, Delta Lake and Koalas, and thought that…

--

--

A mom and a Software Engineer who loves to learn new things & all about ML & Big Data. Buy me a coffee to help me keep going buymeacoffee.com/mkaranasou