Realtime prediction using Spark Structured Streaming, XGBoost and Scala

Bogdan Cojocar
Towards Data Science
5 min readJun 24, 2018

--

In this article we will discuss about building a complete machine learning pipeline. The first part will be focused on training a binary classifier in a standard batch mode and in the second part we will do some realtime prediction.

We will use data from the Titanic: Machine learning from disaster one of the many Kaggle competitions.

--

--