From Pandas to PySpark with Koalas
Published in
4 min readOct 28, 2019
For those who are familiar with pandas DataFrames, switching to PySpark can be quite confusing. The API is not the same, and when switching to a distributed nature, some things are being done quite differently because of the restrictions imposed by that nature.
I recently stumbled upon Koalas from a very interesting Databricks presentation about Apache Spark 3.0, Delta Lake and Koalas, and thought that…