Map-Reduce: Gradient Descent
Using PySpark and vanilla Python
Published in
4 min readDec 11, 2019
Some statistical models π(π₯) are learned by optimizing a loss function πΏ(Ξ) that depends on a set of parameters Ξ. There are several ways of finding the optimal Ξ for the loss function, one of which is to iteratively update following the gradient:
To then, compute the update:
Because we assume independence between data points, the gradient becomes a summation:
where πΏπ is the loss function for the π-th data point.