Ranking Diamonds with PCA in PySpark

The challenges of running Principal Component Analysis in PySpark

Gustavo Santos
Towards Data Science
8 min readDec 22, 2023

--

Photo by Edgar Soto on Unsplash

Introduction

Here we go for another post about PySpark. I have been enjoying writing about this subject, as it feels to me that we are lacking of good blog posts about PySpark, especially when we talk about Machine Learning in MLlib — by the way, that is…

--

--

Data Scientist. I extract insights from data to help people and companies to make better and data driven decisions. | In: https://www.linkedin.com/in/gurezende/