Forward Mode Automatic Differentiation & Dual Numbers

Robert Lange
Towards Data Science
11 min readSep 2, 2019

--

Automatic Differentiation (AD) is one of the driving forces behind the success story of Deep Learning. It allows us to efficiently calculate gradient evaluations for our favorite composed functions. TensorFlow, PyTorch and all predecessors make use of AD. Along stochastic approximation techniques such as SGD (and all its variants) these gradients refine the parameters of our favorite network architectures.

Many people (until recently including myself) believe that backpropagation, the chain rule and automatic differentiation are…

--

--