Member-only story

Stochastic-, Batch-, and Mini-Batch Gradient Descent

Why do we need Stochastic, Batch, and Mini Batch Gradient Descent when implementing Deep Neural Networks?

Artem Oppermann

Published in

Towards Data Science

13 min readApr 26, 2020

This is a detailed guide that should answer the questions of why and when we need Stochastic-, Batch-, and Mini-Batch Gradient Descent when implementing Deep Neural Networks.

In Short: We need these different ways of implementing gradient descent to address several issues we will most certainly encounter when training Neural Networks which are local minima and saddle points of the loss function and noisy gradients.

More on that will be explained in the following article — nice ;)

Table of Content

1. Introduction: Let’s recap Gradient Descent
2. Common Problems when Training Neural Networks (local minima, saddle points, noisy gradients)
3. Batch-Gradient Descent
4. Stochastic Gradient Descent
5. Mini-Batch Gradient Descent
6. Take-Home-Message

1. Introduction: Let’s recap Gradient Descent

Very nice article. I would just like to point out that there's a typo when citing the advantages of Mini-Batch gradient descent, specifically about "Faster Learning". As we perform weight updates more often than Batch Gradient Descent, and not Stochastic. Sorry if I'm nitpicking!

Great article nonetheless, thank you :)

In general a nice introduction to the topic. But unfortunately the article contains some misconceptions when explaining the differences between the variants of GD:

- In all three variants of GD allways all model parameters get updated in each…

Stochastic-, Batch-, and Mini-Batch Gradient Descent

Why do we need Stochastic, Batch, and Mini Batch Gradient Descent when implementing Deep Neural Networks?

Table of Content

1. Introduction: Let’s recap Gradient Descent

Create an account to read the full story.

Published in Towards Data Science

Written by Artem Oppermann

Responses (4)

More from Artem Oppermann and Towards Data Science

Regularization in Deep Learning — L1, L2, and Dropout

A Guide on the Theory and Practicality of the most important Regularization Techniques in Deep Learning

Injecting domain expertise into your AI system

How to connect the dots between AI technology and real life

Using LLamaIndex Workflow to Implement an Agent Handoff Feature Like OpenAI Swarm

Example: a customer service chatbot project

Deep Learning meets Physics: Restricted Boltzmann Machines Part I

Theory behind Restricted Boltzmann Machines — A powerful Tool for Recomender Systems

Recommended from Medium

Applied Reinforcement Learning IV: Implementation of DQN

Implementation of the DQN algorithm, and application to OpenAI Gym’s CartPole-v1 environment

A Brief Overview of Cross Entropy Loss

A refresher on a commonly used Loss Function

Lists

Predictive Modeling w/ Python

Natural Language Processing

Practical Guides to Machine Learning

data science and AI

Understanding Monte Carlo Tree Search (MCTS)

Introduction to MCTS

Backpropagation: The Backbone of Neural Network Training

Backpropagation, short for “backward propagation of errors,” is a fundamental algorithm in the training of deep neural networks. It…

The A|B Test And Gaining Insights

A summary on the A|B test and how to wrangle market tests for analytics in julia.

Your Definitive Guide to Machine Learning System Design Interview

As a veteran with 14 years in Tech, including roles at Adobe, Twitter, and Meta, I’ve been on both sides of the Machine Learning Design…