Improving Vanilla Gradient Descent
Performance improvements applied to training neural networks
Published in
5 min readFeb 22, 2018
Introduction
When we train neural networks with gradient descent, we risk the network falling into local minima, in which the network stops somewhere along the error surface that is not the lowest point on the overall surface. This is because the error surfaces are not inherently convex, so the surface may contain many independent local minima separate from the global minimum. Additionally, while the network may reach a global minimum and…