The world’s leading publication for data science, AI, and ML professionals.

How to create your own deep learning framework using only Numpy

This article will show you the challenges, components, and steps you need to make/overcome to create a basic deep learning framework

Photo by Vlado Paunovic on Unsplash
Photo by Vlado Paunovic on Unsplash

Let’s start by defining what we want to create and figure out what components we need: we want a framework that supports automatic differentiation to compute the gradient for several operations, a standardized way to build neural network layers using the aforementioned operations with a modular approach to combine them in a larger neural network model, and several tools for training a neural network like optimizers, activations functions, datasets.

We have identified the following components:

  • An autograd system
  • Neural network layers
  • Neural network model
  • Optimizers
  • Activations functions
  • Datasets

Next, we will go through each of these components, see what is their purpose and how we can implement them. For examples and references I will use gradflow, a personal open-source educational autograd system with deep neural networks support that follows the PyTorch API.


Autograd system

This is the most important component that represents the foundation of every Deep Learning framework out there. It will allow us to track the operations we are going to apply to an input tensor and update the weights of our model using the gradients of the loss functions with respect to every parameter. The one condition is that these operations must be differentiable.

The base of our autograd system is the Variable, by implementing the dunder methods for the operations we will need, we will be able to keep track for every instance what are its parents and how to compute the gradients for them. To help with some of the operations we will use a numpy array to hold the actual data.

One other important part of the Variable is the backward method, this will compute the gradient of the current instance with respect to every ancestor from the computation graph. In concrete steps, we will use the parent’s references and the embedded gradient function from the origin operations to update a grad member field.

The following code snippet contains the main Variable class initialization function, the dunder method for the add operation, and the backward method from before:

Two things to notice in the __back_gradfn, first we need to add the gradient to the existing value because we need to accumulate them in case there are multiple paths in the computation graph to reach that variable, and second, we also need to make use of the child current gradient, if you want more details about automatic differentiation and matrix calculus I strongly recommend this article.


Neural network modules

For the actual neural network modules, we want the flexibility to implement new layers and modules and reuse existing ones.

Following the PyTorch API, we will create a base class Module that will require to implement both the init and forward methods. Apart from these two, we will also need several utility-based methods to access the parameters and the sub-modules.


Linear layer

Using the abstract Module from the previous section we are going to implement a simple linear layer. The mathematical operations that we need to perform are pretty straightforward:

Linear layer transformation
Linear layer transformation

Because we will use the Variable we previously implemented to compute automatically both the actual result of the operation and the gradient the implementation is simple:


Activations functions

Most data in the real world has a non-linear relation between the independent and dependent variables, and we also want our model to be capable of learning this relation. If we don’t add a non-linear activation function over a linear layer, no matter how many linear layers we add, in the end we can represent them with just one layer (one weight matrix).

The most popular activation function is the Relu:

Relu
Relu

When we implement the relu function we also need to specify the backpropagation function:


Optimizer

After we perform the forward pass through our model and backpropagate the gradients through our layers we need to actually update the parameters in order to make the loss function smaller and here intervenes the optimizer.

One of the most simple optimizers is the SGD (stochastic gradient descent), and in our implementation, we’ll keep everything pretty simple. Using only the gradients and the learning rate we will clip the change value delta and update the weights:


Datasets

The last piece of the puzzle is a component that will allow us to organize a dataset and easily integrate it into a training procedure. For that, we create a Dataset class that implements the dunder methods of an iterator and transforms the features and labels into Variable types:


Training

Finally, we are going to put everything together and train a simple linear model using an artificially generated dataset using sklearn.datasets:


Conclusion

The implementations that were presented are not by any means production-grade and are quite limited, but I think that they serve as a good learning tool to better understand some of the operations that take place under the hood of other, popular frameworks.

While the basic calculus for scalar-based operations is pretty straightforward when we add multiple dimensions and switch to tensors the story becomes a little more complicated and we have to pay extra attention.

Thank you for reading, I hope you will find this article helpful and if you want to stay up to date with the latest programming and Machine Learning news and some good quality memes :), you can follow me on Twitter [here](https://www.linkedin.com/in/tudor-marian-surdoiu/) or connect on LinkedIn here.


References


Related Articles