How to code a simple neural network in PyTorch? — for absolute beginners

An easy to comprehend tutorial on building neural networks using PyTorch using the popular Titanic Dataset from Kaggle

Harshanand B A
Towards Data Science

--

Image from Unsplash

In this tutorial, we will see how to build a simple neural network for a classification problem using the PyTorch framework. This would help us to get a command over the fundamentals and framework’s basic syntaxes. For the same, we would be using Kaggle’s Titanic Dataset.

Installing PyTorch

## For Windows
pip install torch===1.5.0 torchvision===0.6.0 -f https://download.pytorch.org/whl/torch_stable.html
## For Linux
pip install torch torchvision

If your computer configuration varies (either Conda based library or Mac system) you could check for the appropriate commands from this link.

Dataset Preparation

First, download the dataset from Kaggle by joining the competition or you could get it from other sources too (Simple googling would help). Once you are setup with the dataset as well as PyTorch package we are ready to dive in further. Before designing the architecture the first and foremost thing to do is preparing your data according to PyTorch requirements which can be done using the Dataset module provided by PyTorch itself. If you find it difficult to comprehend let me dissect it for you. Most of the data that you will be dealing with will be Numpy structures. Now, this cannot be directly fed into the network since they are not tensors. Pytorch requires you to feed the data in the form of these tensors which is similar to any Numpy array except that it can also be moved to GPU while training. All your gradients, weights that your network deals with will be of the same tensor data structure. As you further read the blog you will be able to get a better understanding. So now, let’s load the data.

Now we import the Dataset module to inherit various functions such as __getitem__(), __len__(), etc predefined in the library. These functions would help us to create our custom class for initializing the dataset. The code below shows how to create a dataset class.

Note: In the above code the last column of our data frame contains the target class while rest are input features hence we split it out to self.inp and self.oup variables accordingly and we would need both inputs as well as output if we are going to train else only the input data would be needed.

The __init__() function reads the .csv file using the pandas data frame and we do some preprocessing on it later (which is irrelevant to this tutorial). The __len__ ()function returns the number of examples and __getitem__() is used to fetch data by using its index. The important thing to note from the above piece of code is that we have converted our training examples into a tensor using the torch.tensor function while calling it using its index. So throughout the tutorial wherever we fetch examples it will all be in the form of tensors.

Now since the data is ready let’s load it into batches. This can be done easily using the DataLoader function as below.

You have to pass your dataset object resulting from the previous function as your argument. According to the number of batches, the result will be a multidimensional tensor of the shape (no_of_batches, batch_size, size_of_the_vector). Now, the number of dimensions would vary for other kinds of data like Image or Sequential Data accordingly based on its nature. But for now, just understand that there are multiple batches and each batch contains some examples equal to batch size (Irrespective of whatever data you use).

Take a breath now… You’re halfway through. :)

Neural Network Architecture

Now since we have our data ready for training we have to design the neural network before we can start training it. Any model with conventionally used hyperparameters would be fine (Adam Optimizer, MSE Loss). To code our neural network, we can make use of the nn.Module to create the same.

nn.Linear(), nn.BatchNorm1d() all become available once you inherit nn.Module class(). You can then simply use them by calling it. Since we are using simple tabular data we can use a simple dense layer (or fully connected layer) to create the model. For activation, I have used swish() by a custom definition. One could go for ReLu also. ReLu is available in the nn.functional() module. You could simply replace swish() with F.relu(). Since its a binary classification it is not very necessary to use a softmax in the final layer. I have used the sigmoid function to classify my examples. In the above code, __init__() helps you to initialize your neural network model as soon as you call the constructor and forward() function controls the data flow through the network which makes it responsible for feedforward. As we proceed to the training loop you will see how we call the forward function.

Training the Model

Your training process can be laid as follow:

  • You define your training parameters like no of epochs, loss function, optimizer. All the optimizers are available in torch.optim(). Your optimizer function takes the weights of the network as its parameters. In the below code net variable contains the neural network model we created in the above subsection and net.parameters() refer to the network’s weights.
  • Every batch of data is fed into the network. Based on the loss function we calculate the loss for that particular batch. Once the loss is calculated we calculate the gradients by calling the backward() function. Once you calculate the gradients we update the existing weights. The same process happens for every batch in every epoch. As I mentioned earlier we could do the feedforward simply bypassing the input as an argument the neural network. (Like model(x) in the below code).

optimizer.step() is used to update the weights using the calculated gradients.

  • At the end of the epoch, we make predictions on validation data. Then finally we calculate the accuracy based on predictions from training as well as validation data.
  • To get an overall idea I have pasted my entire training loop below.
  • Based upon the available computational resources you could move your data and network to GPU using the code below. Remember whichever device your using all the input, output data, as well as the network, should be on the same device (Else, it would throw some silly errors :p).

You’re done with the tutorial… :) Be proud of yourself.

--

--

Currently a Machine Learning Enthusiast being optimized to become an Adept || Struggles are local minima