The world’s leading publication for data science, AI, and ML professionals.

A Simple Maths Free PyTorch Model Framework

Love Data Science and Deep Learning but get confused by all the maths and formulas and just want a simple explanation and set of examples?

Me too. So let’s correct that now, shall we?

The goal of this article is to try and make a simple explainable example of building PyTorch Deep Learning models to give an example of Regression, Classification (Binary and Multi-Class), and Multi-Label Classification.

I found there were dozens if not hundreds of examples of each but none I found that gave examples of each in a similar way and often descended into big math formulas, so I flitted and swapped between different styles and approaches and as such confused myself time after time and so wanted something simple I could re-use. I also wanted to do them as bare-bones as possible, using the lovely widely available libraries and modules, with as little bespoke ‘my interpretation’ as possible.

As such, I did not want to get bogged down in EDA, Feature Engineering, and all that stuff that is hugely important (I would say more so than the model), and wanted to keep this as simple and ‘just a framework for models’ as possible. With that in mind, we will be using the wonderful make_regression, make_classification, and make_multilabel_classification from sklearn.datasets, therefore mimicking the state your data will be in once you have done all your EDA and Feature Engineering ready for your first baseline model, meaning we will not be doing any Label Encoding, addressing Imbalance, etc.

I also wanted to stay away from maths completely. I will explain why we are doing what we are doing without symbols and formulas and algorithms. This is not to just give you some code to cut/paste, but rather to show you some of the error’s I faced along the way, resulting in (I hope) a useful set of functions and information.

I wrote this to try and help me to have a starter notebook I can use for a wide array of purposes, and in doing so I hope it helps others, so here we go.

Prepare the Notebook

First, load the relevant modules. Basically just learn , torch, NumPy, and matplotlib.

from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression, make_classification, make_multilabel_classification
from sklearn.preprocessing import LabelEncoder, StandardScaler, MinMaxScaler
import torch
from torch.utils.data import Dataset, DataLoader
import torch.optim as torch_optim
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt

Create Reusable Pytorch components

OK, the next section is the functions we will use repeatedly throughout the article. I will outline what they do but any specifics on the loss functions and accuracy functions will be explained in detail as we progress.

Use PyTorch Datasets

As part of this, we will get familiar with, and use throughout, PyTorch Datasets. Why? Well, they offer a great way to put an iterator around our data and allow all the goodness like batching data cleanly etc, so why not use them.

This is the bare minimum to establish one, and all pretty self explanatory.

As I’m sure you know, Deep Learning models love numbers, so all Categorical features would have been encoded etc by the time you have your data ready to go. Ie np.float32 for all Features and np.int64 for the Targets if it’s an integer, otherwise it’ll be np.float32 too.

What this also does is turns our lovely numpy arrays into funky PyTorch Tensors in the process.

class MyCustomDataset(Dataset):
    def __init__(self, X, Y, scale=False):
        self.X = torch.from_numpy(X.astype(np.float32))
        y_dtype = np.int64 if Y.dtype == np.int64 else np.float32
        if scale: self.y = torch.from_numpy(MinMaxScaler().fit_transform(Y.reshape(-1,1)).astype(y_dtype))
        else:     self.y = torch.from_numpy(Y.astype(y_dtype))

    def __len__(self):
        return len(self.y)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

Create a simple PyTorch Model

Here we will create a fairly simple model, as this is not an article on the best type of models for specific problem types. What this is giving you is the structure of the class to build a PyTorch model, and you can alter/extend/swap out these models with anything as you see fit.

It starts with an input X shape of 10 (which will be the size of our data we will use in these examples), and can have a parameter passed to it to shape the final layer (y shape) which defaults to 1.

class BetterTabularModule(nn.Module):
    def __init__(self, out_features=1):
        super().__init__()
        self.lin1 = nn.Linear(10, 200)
        self.lin2 = nn.Linear(200, 70)
        self.lin3 = nn.Linear(70, out_features)
        self.bn1 = nn.BatchNorm1d(10)
        self.bn2 = nn.BatchNorm1d(200)
        self.bn3 = nn.BatchNorm1d(70)
        self.drops = nn.Dropout(0.3)
    def forward(self, x):
        x = self.bn1(x)
        x = F.relu(self.lin1(x))
        x = self.drops(x)
        x = self.bn2(x)
        x = F.relu(self.lin2(x))
        x = self.drops(x)
        x = self.bn3(x)
        x = self.lin3(x)
        return x

Set up a simple Optimizer

We will stick to the Adam optimizer for all of these problems. Seems a pretty good general fit.

def get_optimizer(model, lr=0.001, wd=0.0):
    parameters = filter(lambda p: p.requires_grad, model.parameters())
    optim = torch_optim.Adam(parameters, lr=lr, weight_decay=wd)
    return optim

Now a simple Training Function, Training Loop and Evaluation Function

These are standard approach to using PyTorch, so let’s put them in functions, and use them throughout. The only thing that would need to change is, yep you’ve guessed it, the Loss Function, which we will pass in as a parameter, and the Accuracy Function.

def train_model(model, optim, train_dl, loss_func):
    # Ensure the model is in Training mode
    model.train()
    total = 0
    sum_loss = 0
    for x, y in train_dl:
        batch = y.shape[0]
        # Train the model for this batch worth of data
        logits = model(x)
        # Run the loss function. We will decide what this will be when we call our Training Loop
        loss = loss_func(logits, y)
        # The next 3 lines do all the PyTorch back propagation goodness
        optim.zero_grad()
        loss.backward()
        optim.step()
        # Keep a running check of our total number of samples in this epoch
        total += batch
        # And keep a running total of our loss
        sum_loss += batch*(loss.item())
    return sum_loss/total
def train_loop(model, epochs, loss_func, lr=0.1, wd=0.001):
    optim = get_optimizer(model, lr=lr, wd=wd)
    train_loss_list = []
    val_loss_list = []
    acc_list = []
    for i in range(epochs): 
        loss = train_model(model, optim, train_dl, loss_func)
        # After training this epoch, keep a list of progress of the loss of each epoch 
        train_loss_list.append(loss)
        val, acc = val_loss(model, valid_dl, loss_func)
        # Likewise for the validation loss and accuracy (if applicable)
        val_loss_list.append(val)
        acc_list.append(acc)
        if acc > 0: print("training loss: %.5f     valid loss: %.5f     accuracy: %.5f" % (loss, val, acc))
        else:       print("training loss: %.5f     valid loss: %.5f" % (loss, val))

    return train_loss_list, val_loss_list, acc_list
def val_loss(model, valid_dl, loss_func):
    # Put the model into evaluation mode, not training mode
    model.eval()
    total = 0
    sum_loss = 0
    correct = 0
    for x, y in valid_dl:
        current_batch_size = y.shape[0]
        logits = model(x)
        loss = loss_func(logits, y)
        sum_loss += current_batch_size*(loss.item())
        total += current_batch_size
        # All of the code above is the same, in essence, to Training, so see the comments there
        # However the functions to assess Accuracy change based on the type of problem we are doing.
        # Therefore the following lines will make more sense as we progress through the article.
        # Accuracy for Binary and Multi-Class Classification
        if loss_func == F.cross_entropy:
          # Find out which of the returned predictions is the loudest of them all, and that's our prediction(s)
          preds = logits.sigmoid().argmax(1)
          # See if our predictions are right
          correct += (preds == y).float().mean().item()
        # Accuracy for Multi-Label Classification
        if loss_func == F.binary_cross_entropy_with_logits:
          # Find out, of ALL the returned predictions, which ones are higher than our test threshold (50%), and they are our predictions
          preds = logits
          correct += ((preds>0.5) == y.bool()).float().mean().item()
    return sum_loss/total, correct/total

Now a little function to view the results.

def view_results(train_loss_list, val_loss_list, acc_list):
  plt.figure()
  epochs = np.arange(0, len(train_loss_list))
  plt.plot(epochs-0.5, train_loss_list) # offset by half an epoch as Training calculated mid epoch but val calculated at end of epoch
  plt.plot(epochs, val_loss_list)
  plt.title('model loss')
  plt.ylabel('loss')
  plt.xlabel('epoch')
  plt.legend(['train', 'val', 'acc'], loc = 'upper left')
  plt.show()

  if acc_list[0]:
    plt.figure()
    plt.plot(acc_list)
    plt.title('accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train','val', 'acc'], loc = 'upper left')
    plt.show()

Regression

OK so to start with let’s go for a Regression model. 1000 samples with 10 features, 8 of which are informative (that is, the number that are actually useful in the model/prediction) and let’s do an 80/20 train test split.

X, y = make_regression(n_samples=1000, n_features=10, n_informative=8, random_state=1972)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=1972

Now load them into our PyTorch Dataset and chunk them into 256 size batches in PyTorch DataLoaders.

train_ds = MyCustomDataset(X_train, y_train, scale=True)
valid_ds = MyCustomDataset(X_val, y_val, scale=True)
batch_size = 256
train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
valid_dl = DataLoader(valid_ds, batch_size=batch_size, shuffle=True)

We can take a look at these to see they are playing nicely, if we wish.

train_features, train_labels = next(iter(train_dl))
train_features.shape, train_labels.shape
(torch.Size([256, 10]), torch.Size([256, 1]))
train_features[0], train_labels[0]
(tensor([ 0.8939, -1.0572, -2.1115,  0.9993, -0.4022, -0.7168, -0.1831,  0.3448, -0.6449, -0.4287]), tensor([0.5383]))

Now Train

Regression is our simplest problem type here.

Let’s build our model. We can stick to the default of 1 for the number of output targets.

Our loss function here is MSE loss. (MSE stands for Mean Square Error and basically is just how ‘wrong’ we are in our prediction compared to the y target). This is a good starting place for Regression problems so let’s train with that.

We cannot calculate accuracy for a regression model, as the performance of a regression model must be reported as an error in regression predictions.

model = BetterTabularModule()
train_loss_list, val_loss_list, acc_list = train_loop(model, epochs=10, loss_func=F.mse_loss)
view_results(train_loss_list, val_loss_list, acc_list)

Pretty good, eh ?

Classification – Single Class

OK now let’s go for a Classification model. 1000 samples with 10 features, 8 of which are informative (that is, the number that are actually useful in the model/prediction), and again let’s do an 80/20 train test split.

X, y = make_classification(n_samples=1000, n_classes=2, n_features=10, n_informative=8, n_redundant=0, random_state=1972)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=1972

Now load them into our PyTorch Dataset and chunk them into 256 size batches in PyTorch DataLoaders.

train_ds = MyCustomDataset(X_train, y_train)
valid_ds = MyCustomDataset(X_val, y_val)
batch_size = 256
train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
valid_dl = DataLoader(valid_ds, batch_size=batch_size, shuffle=True)

Now Train

Binary Classification (for example, yes/no) is slightly trickier than Regression, but not too much.

Let’s build our model. We need to pass in 2 for the number of output targets. The first is the probability the answer is the first option. The second is the probability it’s the second option.

I know what you’re thinking. Why not just have one output and depending how close to zero or one it gives our answer? Well yes, we could do that, but this way I think is better, for reasons that will become clear.

Our Loss Function here is Cross Entropy loss (F.cross_entropy) and fundamentally what this does is returns a probability for each answer. It does this by internally combining log_softmax (all values in a nice sigmoid curve and the whole row of values sums to one) and nll_loss (picking the appropriate loss based on the appropriate target label) into a single function.

For Classification we really just care about which of the returned predictions is the loudest of them all, and that’s our prediction. We can use logits.argmax(1) to basically say which of the items in our row of predictions from F.cross_entropy is the maximum.

Then to see if our predictions are right we compare all predictions to the actual targets (y) and return the number that were correct: (preds == y).float().mean().item(). This is a good starting place for Classification problems so let’s Train with that Loss Function and Validate with that Accuracy Function.

model = BetterTabularModule(2)
train_loss_list, val_loss_list, acc_list = train_loop(model, epochs=100, loss_func=F.cross_entropy, lr=0.001, wd=0.001)
view_results(train_loss_list, val_loss_list, acc_list)

Not bad at all !!

Classification – Multi Class

OK so now let’s look at a Multi Class Classification model. 1000 samples with 10 features, 8 of which are informative (that is, the number that are actually useful in the model / prediction), again with a 80/20 train test split.

X, y = make_classification(n_samples=1000, n_classes=3, n_features=10, n_informative=8, n_redundant=0, random_state=1972)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=1972)

Now load them into our PyTorch Dataset and chunk them into 256 size batches in PyTorch DataLoaders.

train_ds = MyCustomDataset(X_train, y_train)
valid_ds = MyCustomDataset(X_val, y_val)
batch_size = 256
train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
valid_dl = DataLoader(valid_ds, batch_size=batch_size, shuffle=True)

Now Train

Multi-Class Classification (for example, dog/cat/horse) is much harder than Binary Classification.

Joke!!! It’s not at all, because of what we did with Binary Classification. Think about it, we built our model with two output targets. The first was the probability the answer is the first option. The second was the probability it was the second option.

So do we really need to just build the model now with more output targets, and leave our Loss Function and Accuracy exactly the same as we did with Binary Classification?

Yep!!

Told you it was easy.

model = BetterTabularModule(3) # Now we want 3 outputs
train_loss_list, val_loss_list, acc_list = train_loop(model, epochs=100, loss_func=F.cross_entropy, lr=0.001, wd=0.001)
view_results(train_loss_list, val_loss_list, acc_list)

Classification – Multi-label

X, y = make_multilabel_classification(n_samples=1000, n_features=10, n_classes=3, n_labels=1, allow_unlabeled=False, random_state=1972)
# This returned mimicked one hot encoded data, but we need those as Floats, not Integers for our Accuracy Function
y = y.astype(np.float32)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=1972)

Now load them into our PyTorch Dataset and chunk them into 256 size batches in PyTorch DataLoaders.

train_ds = MyCustomDataset(X_train, y_train)
valid_ds = MyCustomDataset(X_val, y_val)
batch_size = 256
train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
valid_dl = DataLoader(valid_ds, batch_size=batch_size, shuffle=True)

Now Train

Our loss function here is F.binary_cross_entropy_with_logits loss which simply does a sigmoid before the BCE loss and so ultimately because we can have more than one activation that is the loudest (because it can be multi label), we want to have them through sigmoid and identify them as correct or not based on a threshold. ie: correct += ((preds>0.5) == y.bool()).float().mean().item()

model = BetterTabularModule(3)
train_loss_list, val_loss_list, acc_list = train_loop(model, epochs=50, loss_func=F.binary_cross_entropy_with_logits, lr=0.001, wd=0.001)
view_results(train_loss_list, val_loss_list, acc_list)

Again, I think that trains and validates really well on the sample data.

Lessons along the way

During putting this together I hit various bugs and issues and I have collated some below. The reason for this is when you (hopefully) take this and create your own models, alter your Loss Functions or Accuracy Functions, or even of course just use your own data, you may hit the same or similar issues.

Target 2 is out of bounds.

  • This was because I was not changing the default of one output target when I created my model. So as soon as a tried a Classification model I was getting the error By passing in 2 to the function to build my model I was correcting my mistake and making sure my model had two possible outcome targets

My training looked very static and nothing much happening

  • I had mistakenly not reset my model so the training was ‘on top’ of the training I had already done, so looked like it wasn’t making much difference at all

RuntimeError: The size of tensor a (256) must match the size of tensor b (3) at non-singleton dimension 1

  • I was argmaxing instead of sigmoid in the Multi-Label Classification, therefore not getting a nice probability between zero and one for all targets, but rather picking the loudest again as I had done for standard Classification

Conclusion

And there you have it. Examples of different problem types and their approach for a simple PyTorch use-case, that you can hopefully expand upon as per your problem needs.

I hope this has been helpful.


Related Articles