Forward Mode Automatic Differentiation & Dual Numbers

Published in

Towards Data Science

11 min readSep 2, 2019

Automatic Differentiation (AD) is one of the driving forces behind the success story of Deep Learning. It allows us to efficiently calculate gradient evaluations for our favorite composed functions. TensorFlow, PyTorch and all predecessors make use of AD. Along stochastic approximation techniques such as SGD (and all its variants) these gradients refine the parameters of our favorite network architectures.

Many people (until recently including myself) believe that backpropagation, the chain rule and automatic differentiation are…

Forward Mode Automatic Differentiation & Dual Numbers

Written by Robert Lange