Only Numpy: Implementing Mini VGG (VGG 7) and SoftMax Layer with Interactive Code

Published in

Towards Data Science

5 min readFeb 7, 2018

I wanted to practice my Back Propagation skills on Convolutional Neural Network. And I wanted to implement my own VGG net (from original paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”) for sometime now, so today I decided to combine those two needs.

If you are not aware of back propagation process on convolution neural network, please view my tutorial on back propagation on convolution neural networks, here or here.

Softmax Layer and it’s Derivative

Now there are TON’s of good article talking about the softmax function and it’s derivative. So I won’t go in depth here. However, I will link few here, here, here, and here.

Network Architecture (Diagram)

So we only have 7 layers, hence the name is VGG 7, rather than VGG 16 or 19. Also there are two major difference between the original implementation .

1. We are going to use Average Pooling rather than Max pooling. If you are wondering why, please check this link.
2. The number of channel in our network is going to be significantly smaller then the original network . For easy comparison please see the original network architecture below or here.

Data Preparation and Hyper parameter Declaration

As seen above, we now don’t have to filter out images that only contains 0 or 1. We are able to perform classification on every images from 0 to 9 thanks to SoftMax Layer.

Forward Feed Operation

It’s a standard forward feed operation with activation function ReLU() for convolution layers and tanh() and arctan() for fully connected layers.

Back Propagation

Since we are using average pooling rather than max pooling, back propagation is quite easy and simple. One thing to note is that I have set different learning rates for convolutional layers and fully connected layers. (Green Boxed Regions).

Update: Please note there was a typo in the code, I will call above back propagation as Broken Back Propagation since we are updating w2 with w1 rather than w2.

Training and Results (Correct Back Propagation)

Jan Zawadzki have pointed out the typo I made, so I changed it and retrained the network. As seen on the right is the cost value over time.

The network did much better on the test set image as well, having an 84% accuracy.

Training and Results ( Broken Back Propagation)

So as Well the cost have decreased stably overtime, however the model did not perform well on the test set images.

Only able to classify 39 numbers correctly out of 50, making it around 78% accuracy.

Interactive Code

I moved to Google Colab for Interactive codes! So you would need a google account to view the codes, also you can’t run read only scripts in Google Colab so make a copy on your play ground. Finally, I will never ask for permission to access your files on Google Drive, just FYI. Happy Coding!

To access the interactive code, please click on this link.

Final Words

VGG networks are very good networks to practice, forward feed operations as well as back propagation since they are all straight forward.

If any errors are found, please email me at jae.duk.seo@gmail.com.

Meanwhile follow me on my twitter here, and visit my website, or my Youtube channel for more content. I also did comparison of Decoupled Neural Network here if you are interested.

Reference

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Roelants, P. (n.d.). How to implement a neural network Intermezzo 2. Retrieved February 07, 2018, from http://peterroelants.github.io/posts/neural_network_implementation_intermezzo02/
R. (n.d.). Rasbt/python-machine-learning-book. Retrieved February 07, 2018, from https://github.com/rasbt/python-machine-learning-book/blob/master/faq/softmax_regression.md
A Gentle Introduction to Cross-Entropy Loss Function. (2018, January 07). Retrieved February 07, 2018, from https://sefiks.com/2017/12/17/a-gentle-introduction-to-cross-entropy-loss-function/comment-page-1/#comment-600
Vgg16. (2016, February 26). Retrieved February 07, 2018, from https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-learning-meetup-5/vgg16/
2018. [Online]. Available: https://www.quora.com/What-is-the-benefit-of-using-average-pooling-rather-than-max-pooling [Accessed: 07- Feb- 2018].

Only Numpy: Implementing Mini VGG (VGG 7) and SoftMax Layer with Interactive Code

Written by Jae Duk Seo