Map-Reduce: Gradient Descent

Using PySpark and vanilla Python

Harsh Darji
Towards Data Science
4 min readDec 11, 2019

--

Some statistical models 𝑓(π‘₯) are learned by optimizing a loss function 𝐿(Θ) that depends on a set of parameters Θ. There are several ways of finding the optimal Θ for the loss function, one of which is to iteratively update following the gradient:

To then, compute the update:

Because we assume independence between data points, the gradient becomes a summation:

where 𝐿𝑖 is the loss function for the 𝑖-th data point.

--

--

Writer | On a mission to help you heal and expand your consciousness