Improving Vanilla Gradient Descent

Performance improvements applied to training neural networks

Published in

Towards Data Science

5 min readFeb 22, 2018

Introduction

When we train neural networks with gradient descent, we risk the network falling into local minima, in which the network stops somewhere along the error surface that is not the lowest point on the overall surface. This is because the error surfaces are not inherently convex, so the surface may contain many independent local minima separate from the global minimum. Additionally, while the network may reach a global minimum and…

Improving Vanilla Gradient Descent

Performance improvements applied to training neural networks

Introduction

Written by Devin Soni