Only Numpy: Implementing Simple ResNet ( Deep Networks with Stochastic Depth) for MNIST Classification with Interactive Code

Jae Duk Seo
Towards Data Science
4 min readFeb 9, 2018

--

Image from Pixel Bay

So I was reading this article “Stochastic Depth Networks will Become the New Normal” and there I saw the paper “Deep Networks with Stochastic Depth”. Upon reading that paper I saw the diagram below.

ResNet Image from Original Paper

And right away I was inspired to build my own Res Net. However, since batch normalization is bit complicated to implement for back propagation I will not count them in for today’s implementation. But I promise, I will implement them soon!

Network Architecture (Mathematical Form)

As seen above, the network architecture cannot be simpler to understand, we have some sort of function f() that transforms the input data. And we have additional function id(), which in other name Identity function, that allows a direct connection from the previous layer to current one.

Feed Forward Operation / Partial Back Propagation
(Mathematical Equation)

Green Box → Feed Forward Operation for 3 Residual Blocks
Red Box → Partial Back Propagation for Hidden Weights.

Blue Underlined → Back Propagation respect to W3H
Pink Underlined → Back Propagation respect to W2H
Purple Underlined → Back Propagation respect to W1H

Now, I won’t perform back propagation for every single weights, however back propagation respect to W3b, W3a, W2b, W2a, W1b and W1a, is easy as well.

Feed Forward Operation (Code)

Red Box → Res Block 1
Green Box → Res Block 2
Blue Box → Res Block 3

Feed Forward operation on Res Blocks are very simple yet effective. However back propagation process is bit complicated.

Back Propagation (Code)

The main reason why back propagation is bit complicated in Res Net is because of the addition that happens at the end of the Residual Block. While performing back propagation we need to make sure we add up all of the gradients respect to that weights. The Red underlined parts of the code performs additions.

Training and Results (Same Learning Rates)

Now at first I set the learning rate exactly the same for both Hidden Weights as well as other ones. And no matter how hard I tried I was not able to get a good result with that setting. So I decided to simply set different learning rates for different ‘type’ of weights.

Training and Results (Different Learning Rates)

Accuracy of 72 % is no where near impressive for a simple classification task of 10 classes images. I will come back in hopes to increase accuracy of this model. But it does seem like, setting different learning rate for different ‘type’ of weights would have better results.

Interactive Code

I moved to Google Colab for Interactive codes! So you would need a google account to view the codes, also you can’t run read only scripts in Google Colab so make a copy on your play ground. Finally, I will never ask for permission to access your files on Google Drive, just FYI. Happy Coding!

To access the interactive code, please click on this link.

Final Words

The main network presented in the paper “Deep Networks with Stochastic Depth” is not a simple Res Net, rather they introduce a network with Stochastic depth. I will try to implement that network with batch normalization soon.

If any errors are found, please email me at jae.duk.seo@gmail.com.

Meanwhile follow me on my twitter here, and visit my website, or my Youtube channel for more content. I also did comparison of Decoupled Neural Network here if you are interested.

Reference

  1. Huang, G., Sun, Y., Liu, Z., Sedra, D., & Weinberger, K. Q. (2016, October). Deep networks with stochastic depth. In European Conference on Computer Vision (pp. 646–661). Springer, Cham.
  2. D. (2016, June 05). Stochastic Depth Networks will Become the New Normal. Retrieved February 08, 2018, from http://deliprao.com/archives/134

--

--

Exploring the intersection of AI, deep learning, and art. Passionate about pushing the boundaries of multi-media production and beyond. #AIArt