Improving Vanilla Gradient Descent

Performance improvements applied to training neural networks

Devin Soni
Towards Data Science
5 min readFeb 22, 2018

--

Introduction

When we train neural networks with gradient descent, we risk the network falling into local minima, in which the network stops somewhere along the error surface that is not the lowest point on the overall surface. This is because the error surfaces are not inherently convex, so the surface may contain many independent local minima separate from the global minimum. Additionally, while the network may reach a global minimum and…

--

--