Only Numpy: Vanilla Recurrent Neural Network Deriving Back propagation Through Time Practice — part 1/2

Jae Duk Seo
Towards Data Science
4 min readDec 22, 2017

--

So this is my official first educational post in Medium, and I have to say Medium is truly amazing blog platform. Anyways here we go. The problem is very simple, we are going to use RNN to count how many ones there are in the given data. If you just want the video tutorial and the code please follow the links below.

Github Link: https://github.com/JaeDukSeo/Only_Numpy_Basic/blob/master/rnn/a_rnn_simple.py

As seen above at (a) the training data is X and the test data is Y (Ground truth). (b) We are only going to have two weights, Wx (Where we are going to multiple with the input x) and Wrec (Where we are going to multiple with the previous states)

It is very important to understand ( c ) since it is our network architecture that is the unrolled version of our network architecture. However it clearly show that State 0 (Denoted as S0) have some relationship with State 1(Denoted as S1). THAT RELATIONSHIP IS SHOWN BELOW!

Under (E) we can see one equation, that clears up the relation between State.

Current_State_K = Previous_State K * Wrec + Current_Input_X * Wx

That one line of math is the relationship between State 0 and State 1. But what about the numbers below each state? Very simple, those are inputs at each state, so at state 1 the input x is a vector [0,0,1].T.

Same Color Box indicate inputs at each state

They you might ask, what about State 0? Great question they are all ZEROS!

So lets get dirty with the Math — Forward Feed.

That’s it! We are done with Forward Feed Propagation. Also, just one side note we are going to use MSE as cost function, and from now on we are only going to denote that cost function as the notation beside the star. (Sorry I am not Math major, I have no idea what that notation is called.)

Now lets perform back propagation through time. We have to get derivative respect to Wx and Wrec for each state. And that is exactly what we have done below!

So that’s it! The simple math behind training RNN. However, try to fill in the state 1 by yourself. But there is one interesting fact to note, while getting the derivative respect to Wx and Wrec, there are lots of mathematical symbols that gets repeated overtime, look at the image below.

It would be more computationally efficient to get those repeated values in a matrix and use them later by just multiplying appropriate X or State. And that is exactly what I do in the Video tutorial!

Calculate the repeating terms

ANOTHER, important thing to note is that when we get the derivative we multiply Wrec multiple times, other words that variables plays a key role! (I am not going to explain further but this topic is important!)

And finally, lets perform the weight update via stochastic gradient descent.

At the top right corner we are adding up the terms accordingly and performing SGD.

Update! Here is part 2 : Link

Finally, just as a reminder, I used a lot of material from Peter! Please follow the link below to check out the amazing tutorial! And this is my first post, I am still learning how to make great tutorials, any feedback would be helpful!
[ plz be nice :’( ] Thanks!

For more tutorial check out my website and my Youtube Channel!

Website: https://jaedukseo.me/

YouTube Channel: https://www.youtube.com/c/JaeDukSeo

--

--

Exploring the intersection of AI, deep learning, and art. Passionate about pushing the boundaries of multi-media production and beyond. #AIArt