Getting Started

Introduction
In my previous post "Getting started with Tensorflow" I mentioned that both Tensorflow and Pytorch are great choices if you want to build small or large scale deep learning solutions. Both platforms are widely used in academia and industry, are well maintained and open-source, provide simple APIs and high-level functionality.
In here, we will explore the PyTorch API. In a parallel to the "Getting started with Tensorflow" we will discuss how PyTorch came about and how to use it for deep learning.
When was PyTorch developed ?
Around the time PyTorch 0.1.1 version was released in September 2016¹, there were multiple deep learning frameworks available, providing low and high level wrappers for building and training complex models.
Caffe, Chainer, Theano, Tensorflow, CNTK, MXNet and Torch were just a few of the low-level libraries researchers would use to build increasingly more complicated networks²³.
On the other hand, Lasagne and Keras focused on creating high-level wrappers that could use one of many low-level libraries. This approach was greatly supported by practitioners as it allowed them to abstract from the low-level syntax of languages such as Tensorflow, CNTK or Theano²³.
In the following years the landscape changed drastically with the abandon and consolidation of frameworks⁴:
- Keras was incorporated into Tensorflow and greatly changed its direction;
- Caffe 2 was incorporated into PyTorch and replaced part of the original Lua implementation from the Torch ancestor;
- Theano, CNTK and Chainer development and support stopped.

By June 2021, more than 99% of the Google searches for major deep learning frameworks contained either Tensorflow/Keras or PyTorch/Caffe, where the first one is still noticeably more popular (see graph).
Learning through examples
In the following sub-sections I am going to introduce the key concepts to build two simple neural networks in PyTorch (one for regression, and one for classification).
To make this tutorial comparable with my previous post "Getting started with Tensorflow" I am going to re-use the same simulated datasets. To make this tutorial self-contained I am going to re-introduce them again here.
Linear Regression
In this example, we will build a simple 1 layer Neural Network in PyTorch to solve a linear regression problem. This will explain how to initialise your weights (aka the regression coefficients), and how to update them through backpropagation.
The first thing we need is a dataset. Here, we will simulate a noisy linear model as

where Y is our target, X is our input, w is the coefficient we want to determine, and N is a Gaussian distributed noise variable. To do so, in the first cell of your notebook paste and run the following snippet.
This will display a scatter plot of the relation between X and Y, clearly indicating a linear correlation under some Gaussian noise. Here, we would expect a reasonable model to estimate 2 as the ideal regression coefficient.

In PyTorch we typically use tensors to represent our inputs, targets and regression coefficients (here on called weights). A tensor is a multidimensional array of elements represented by a ‘torch.Tensor’ object. A tensor has a single data type and a shape.
Let’s now convert our inputs (x) and target (y) into Torch tensors. To do this, copy and run the following snippet in Colab.
This will return the class (torch.Tensor), shape, size and values for the input and target tensors.
Describing the features...
<class 'torch.Tensor'>
torch.float64
torch.Size([1000])
tensor([0.0000, 0.0010, 0.0020, 0.0030, 0.0040, 0.0050, 0.0060, 0.0070, 0.0080,
0.0090], dtype=torch.float64)
Describing the target...
<class 'torch.Tensor'>
torch.float64
torch.Size([1000])
tensor([ 0.1587, -0.6984, 0.1692, 0.1368, 0.1386, 0.0854, 0.2807, 0.2895,
0.5358, 0.3550], dtype=torch.float64)
Let’s now initiate our weight with a constant number (0.1). To do this we call ‘torch.tensor’ with the default 0.1 value, the desired datatype (float) and the device in which we want the tensor to be stored. In this example, we will be performing all operations in ‘cpu’. To improve performance for large NN models we could move tensors to ‘gpu’.
Running this snippet will output the class, shape, type and value of the weight tensor. Note that this tensor has a ‘[]’ shape indicating that its a 0 dimensional vector.
Describing the weights...
<class 'torch.Tensor'>
torch.float32 torch.Size([])
tensor(0.1000, requires_grad=True)
To obtain the predicted Y (Yhat) given the initialised weight tensor and the input X we can simply call ‘Yhat = x * w_tensor.detach().numpy()’. The ‘.detach().numpy()’ is used to convert the weight vector to a numpy array. Copy and run the following snippet to see how the initialised weight fits the data.
As you can observe the current value of ‘w_tensor’ is far from ideal. The regression line does fit the data at all.

To find the optimal value for ‘w_tensor’ we need to define a loss metric and an optimiser. Here, we will use the Mean Squared Error (MSE) as our loss metric, and the Stochastic Gradient Descent (SGD) as our optimiser.
We now have all the pieces in place to optimise our ‘w_tensor’. The optimisation loop requires the definition of the custom ‘forward’ step, a call to ‘backward’ step method from our loss metric, and a call to the ‘step’ method from our optimiser.
The custom ‘forward’ step tells the model how to combine the input with the weight tensor and how to calculate the error between our target and the predicted target (line 5 to 8 in the snippet below). In a more complex example, this would be a set of instructions that define the computational graph from the input X to the target Y.
The ‘backward’ step tells the model to backpropagate the errors to each layer in the network (line 22 in the snippet below).
Finally, the ‘optimizer.step()’ tells the model to calculate and apply the weight changes for this iteration (line 23 in the snippet below). Note that in most cases you need to clear the gradients before calling the optimizer step (line 21 in the snippet below).
By the end of the train loop your weight should be reasonably close to 2 (ideal value). To use this model for inference (i.e. to predict the Y variable given an X value) you can simply do ‘Yhat = x * w_tensor.detach().numpy()’.

Classification problem
In this example, we will introduce the PyTorch NN Sequential model definition to create a more complex Neural Network. If you are used to Keras, this module will look very familiar. We will apply this model to a linearly separable classification problem.
As before, let’s start by building our dataset. In the snippet below, we create two clusters of points centered at (0.2, 0.2) for the first cluster, and (0.8, 0.8) for the second cluster.
We can quickly observe that a model that linearly separates the two datasets by a line at equal distance to both clusters would be ideal.

Let’s start by defining the custom NN model to solve this problem. We’ll define a 5 layer neural network as follows:
- Linear layer with 10 nodes. This will have a shape of 2 x 10 (input_shape x layer_size).
- Batch Normalisation layer. This layer will normalise the output of the first layer for each batch, avoiding exploding / vanishing gradients.
- Relu activation layer. This layer will provide a non-linear capability to our network. Note that we only use this as an example. A relu layer is unnecessary for this problem as it is linearly separable.
- Linear layer with 2 nodes. This will have a shape of 10 x 2 (layer_size x output_shape).
-
Softmax layer. This layer will convert the output from layer 4 into a softmax.
As before, let’s also convert the x and y numpy arrays to tensors to make them available to PyTorch, and then define our loss metric and optimizer. In this example we should use a classification loss metric such as the Cross Entropy.
For the optimizer we could use the SGD as before. However, the vanilla SGD is incredibly slow to converge. Instead, we will use a more recent adaptive gradient descent approach (RMSProp).
Like before, let’s first check how our network performs before training. To use the model for inference we can simply type ‘yhat = model(x)’. Now, copy the snippet below to visualise the network output.
As you can confirm the network is not good at all. It clearly needs some training to properly separate the two classes.

To train the Sequential PyTorch model we follow the same steps as in the first example, but replacing the custom ‘forward’ step by the model call. Add the snippet below to your notebook to train the model.
By the end of the train loop your network should be reasonably good at separating both classes. To use this model for inference (i.e. to predict the Y variable given an X value) you can simply do ‘yhat = model(x)’.

Complete script
For the full script go to my github page by following this link:
Or go directly to the Google Colab notebook by following this link:
Conclusion
PyTorch is one of the best deep learning frameworks right now to develop custom deep learning solutions (with the other being Tensorflow). In this blog, I introduced the key concepts to build two simple NN models in PyTorch.
Warning!!! Just like I mentioned in my "Getting started with Tensorflow" your learning just started. To get better you will need to keep practicing. The official Pytorch website provides a good source of examples from beginner to expert level, as well as the official documentation to the PyTorch package. Good Luck!
[1] Chintala, Soumith. "PyTorch Alpha-1 release" (September 2016) https://github.com/pytorch/pytorch/releases/tag/v0.1.1
[2] Indra den Bakker. "Battle of the Deep Learning frameworks – Part I: 2017, even more frameworks and interfaces" https://towardsdatascience.com/battle-of-the-deep-learning-frameworks-part-i-cff0e3841750
[3] Madison May. "An Overview of Python Deep Learning Frameworks" https://www.kdnuggets.com/2017/02/python-deep-learning-frameworks-overview.html
[4] Eli Stevens, Luca Antiga, Thomas Viehmann. "Deep Learning with PyTorch" https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf