How to train your neural net

Pytorch [Tabular] — Regression

This blog post takes you through an implementation of regression on tabular data using PyTorch.

Akshaj Verma

Published in

Towards Data Science

8 min readMar 28, 2020

We will use the red wine quality dataset available on Kaggle. This dataset has 12 columns where the first 11 are the features and the last column is the target column. The data set has 1599 rows.

Import Libraries

We’re using tqdm to enable progress bars for training and testing loops.

import numpy as np
import pandas as pd
import seaborn as sns
from tqdm.notebook import tqdm
import matplotlib.pyplot as pltimport torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoaderfrom sklearn.preprocessing import MinMaxScaler    
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

Read Data

df = pd.read_csv("data/tabular/classification/winequality-red.csv")df.head()

EDA and Preprocessing

First off, we plot the output rows to observe the class distribution. There’s a lot of imbalance here. Classes 3, 4, and 8 have a very few number of samples.

We will not treat the output variables as classes here because we’re performing regression. We will convert output column, which is all integers, to float values.

sns.countplot(x = 'quality', data=df)

Create Input and Output Data

In order to split our data into train, validation, and test sets, we need to separate out our inputs and outputs.

Input X is all but the last column. Output y is the last column.

X = df.iloc[:, 0:-1]
y = df.iloc[:, -1]

Train — Validation — Test

To create the train-val-test split, we’ll use train_test_split() from Sklearn.

First, we’ll split our data into train+val and test sets. Then, we'll further split our train+val set to create our train and val sets.

Because there’s a “class” imbalance, we want to have equal distribution of all output classes in our train, validation, and test sets.

To do that, we use the stratify option in function train_test_split().

Remember that stratification only works with classes, not numbers. So, in general, we can bin our numbers into classes using quartiles, deciles, histogram(np.histogram()) and so on. So, you would have to create a new dataframe which contains the output and it's "class". This "class" was obtained using the above mentioned methods.

In our case, let’s use the numbers as is because they are already like classes. After we split our data, we can convert the output to float (because regression).

# Train - Test
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=69)# Split train into train-val
X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.1, stratify=y_trainval, random_state=21)

Normalize Input

Neural networks need data that lies between the range of (0,1). There’s a ton of material available online on why we need to do it.

To scale our values, we’ll use the MinMaxScaler() from Sklearn. The MinMaxScaler transforms features by scaling each feature to a given range which is (0,1) in our case.

x_scaled = (x-min(x)) / (max(x)–min(x))

Notice that we use .fit_transform() on X_train while we use .transform() on X_val and X_test.

We do this because we want to scale the validation and test set with the same parameters as that of the train set to avoid data leakage. fit_transform() calculates scaling values and applies them while .transform() only applies the calculated values.

scaler = MinMaxScaler()X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)X_train, y_train = np.array(X_train), np.array(y_train)
X_val, y_val = np.array(X_val), np.array(y_val)
X_test, y_test = np.array(X_test), np.array(y_test)

Visualize Class Distribution in Train, Val, and Test

Once we’ve split our data into train, validation, and test sets, let’s make sure the distribution of classes is equal in all three sets.

To do that, let’s create a function called get_class_distribution(). This function takes as input the obj y , ie. y_train, y_val, or y_test. Inside the function, we initialize a dictionary which contains the output classes as keys and their count as values. The counts are all initialized to 0.

We then loop through our y object and update our dictionary.

def get_class_distribution(obj):
    count_dict = {
        "rating_3": 0,
        "rating_4": 0,
        "rating_5": 0,
        "rating_6": 0,
        "rating_7": 0,
        "rating_8": 0,
    }
    
    for i in obj:
        if i == 3: 
            count_dict['rating_3'] += 1
        elif i == 4: 
            count_dict['rating_4'] += 1
        elif i == 5: 
            count_dict['rating_5'] += 1
        elif i == 6: 
            count_dict['rating_6'] += 1
        elif i == 7: 
            count_dict['rating_7'] += 1  
        elif i == 8: 
            count_dict['rating_8'] += 1              
        else:
            print("Check classes.")
            
    return count_dict

Once we have the dictionary count, we use Seaborn library to plot the bar charts.

To make the plot, we first convert our dictionary to a dataframe using pd.DataFrame.from_dict([get_class_distribution(y_train)]).

Subsequently, we .melt() our convert our dataframe into the long format and finally use sns.barplot() to build the plots.

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(25,7))# Train
sns.barplot(data = pd.DataFrame.from_dict([get_class_distribution(y_train)]).melt(), x = "variable", y="value", hue="variable",  ax=axes[0]).set_title('Class Distribution in Train Set')# Val
sns.barplot(data = pd.DataFrame.from_dict([get_class_distribution(y_val)]).melt(), x = "variable", y="value", hue="variable",  ax=axes[1]).set_title('Class Distribution in Val Set')# Test
sns.barplot(data = pd.DataFrame.from_dict([get_class_distribution(y_test)]).melt(), x = "variable", y="value", hue="variable",  ax=axes[2]).set_title('Class Distribution in Test Set')

Output distribution after train-val-test split [Image [4]]

Convert Output Variable to `Float`

y_train, y_test, y_val = y_train.astype(float), y_test.astype(float), y_val.astype(float)

Neural Network

Initialize Dataset

class RegressionDataset(Dataset):
    
    def __init__(self, X_data, y_data):
        self.X_data = X_data
        self.y_data = y_data
        
    def __getitem__(self, index):
        return self.X_data[index], self.y_data[index]
        
    def __len__ (self):
        return len(self.X_data)train_dataset = RegressionDataset(torch.from_numpy(X_train).float(), torch.from_numpy(y_train).float())val_dataset = RegressionDataset(torch.from_numpy(X_val).float(), torch.from_numpy(y_val).float())test_dataset = RegressionDataset(torch.from_numpy(X_test).float(), torch.from_numpy(y_test).float())

Model Params

EPOCHS = 150
BATCH_SIZE = 64
LEARNING_RATE = 0.001NUM_FEATURES = len(X.columns)

Initialize Dataloader

train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True)val_loader = DataLoader(dataset=val_dataset, batch_size=1)test_loader = DataLoader(dataset=test_dataset, batch_size=1)

Define Neural Network Architecture

We have a simple 3 layer feedforward neural net here. We use ReLU as the activation at all layers.

class MultipleRegression(nn.Module):
    def __init__(self, num_features):
        super(MultipleRegression, self).__init__()
        
        self.layer_1 = nn.Linear(num_features, 16)
        self.layer_2 = nn.Linear(16, 32)
        self.layer_3 = nn.Linear(32, 16)
        self.layer_out = nn.Linear(16, 1)
        
        self.relu = nn.ReLU()def forward(self, inputs):
        x = self.relu(self.layer_1(inputs))
        x = self.relu(self.layer_2(x))
        x = self.relu(self.layer_3(x))
        x = self.layer_out(x)return (x)def predict(self, test_inputs):
        x = self.relu(self.layer_1(test_inputs))
        x = self.relu(self.layer_2(x))
        x = self.relu(self.layer_3(x))
        x = self.layer_out(x)return (x)

Check for GPU

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")print(device)
###################### OUTPUT ######################cuda:0

Initialize the model, optimizer, and loss function. Transfer the model to GPU.

We are using the Mean Squared Error loss.

model = MultipleRegression(NUM_FEATURES)
model.to(device)print(model)criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)
###################### OUTPUT ######################MultipleRegression(
  (layer_1): Linear(in_features=11, out_features=16, bias=True)
  (layer_2): Linear(in_features=16, out_features=32, bias=True)
  (layer_3): Linear(in_features=32, out_features=16, bias=True)
  (layer_out): Linear(in_features=16, out_features=1, bias=True)
  (relu): ReLU()
)

Train Model

Before we start our training, let’s define a dictionary which will store the loss/epoch for both train and validation sets.

loss_stats = {
    'train': [],
    "val": []
}

Let the training begin.

print("Begin training.")for e in tqdm(range(1, EPOCHS+1)):
    
    # TRAINING
    train_epoch_loss = 0model.train()
    for X_train_batch, y_train_batch in train_loader:
        X_train_batch, y_train_batch = X_train_batch.to(device), y_train_batch.to(device)
        optimizer.zero_grad()
        
        y_train_pred = model(X_train_batch)
        
        train_loss = criterion(y_train_pred, y_train_batch.unsqueeze(1))
        
        train_loss.backward()
        optimizer.step()
        
        train_epoch_loss += train_loss.item()
        
        
    # VALIDATION    
    with torch.no_grad():
        
        val_epoch_loss = 0
        
        model.eval()
        for X_val_batch, y_val_batch in val_loader:
            X_val_batch, y_val_batch = X_val_batch.to(device), y_val_batch.to(device)
            
            y_val_pred = model(X_val_batch)
                        
            val_loss = criterion(y_val_pred, y_val_batch.unsqueeze(1))
            
            val_epoch_loss += val_loss.item()loss_stats['train'].append(train_epoch_loss/len(train_loader))
    loss_stats['val'].append(val_epoch_loss/len(val_loader))                              
    
    print(f'Epoch {e+0:03}: | Train Loss: {train_epoch_loss/len(train_loader):.5f} | Val Loss: {val_epoch_loss/len(val_loader):.5f}')###################### OUTPUT ######################Epoch 001: | Train Loss: 31.22514 | Val Loss: 30.50931Epoch 002: | Train Loss: 30.02529 | Val Loss: 28.97327.
.
.Epoch 149: | Train Loss: 0.42277 | Val Loss: 0.37748
Epoch 150: | Train Loss: 0.42012 | Val Loss: 0.37028

You can see we’ve put a model.train() at the before the loop. model.train() tells PyTorch that you’re in training mode.

Well, why do we need to do that? If you’re using layers such as Dropout or BatchNorm which behave differently during training and evaluation (for example; not use dropout during evaluation), you need to tell PyTorch to act accordingly.

Similarly, we’ll call model.eval() when we test our model. We’ll see that below.

Back to training; we start a for-loop. At the top of this for-loop, we initialize our loss per epoch to 0. After every epoch, we’ll print out the loss and reset it back to 0.

Then we have another for-loop. This for-loop is used to get our data in batches from the train_loader.

We do optimizer.zero_grad() before we make any predictions. Since the backward() function accumulates gradients, we need to set it to 0 manually per mini-batch.

From our defined model, we then obtain a prediction, get the loss(and accuracy) for that mini-batch, perform back-propagation using loss.backward() and optimizer.step() .

Finally, we add all the mini-batch losses to obtain the average loss for that epoch. We add up all the losses for each mini-batch and finally divide it by the number of mini-batches ie. length of train_loader to obtain the average loss per epoch.

The procedure we follow for training is the exact same for validation except for the fact that we wrap it up in torch.no_grad and not perform any back-propagation. torch.no_grad() tells PyTorch that we do not want to perform back-propagation, which reduces memory usage and speeds up computation.

Visualize Loss and Accuracy

To plot the loss line plots, we again create a dataframe from the `loss_stats` dictionary.

train_val_loss_df = pd.DataFrame.from_dict(loss_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})plt.figure(figsize=(15,8))sns.lineplot(data=train_val_loss_df, x = "epochs", y="value", hue="variable").set_title('Train-Val Loss/Epoch')

Test Model

After training is done, we need to test how our model fared. Note that we’ve used model.eval() before we run our testing code. To tell PyTorch that we do not want to perform back-propagation during inference, we use torch.no_grad(), just like we did it for the validation loop above.

y_pred_list = []with torch.no_grad():
    model.eval()
    for X_batch, _ in test_loader:
        X_batch = X_batch.to(device)
        y_test_pred = model(X_batch)
        y_pred_list.append(y_test_pred.cpu().numpy())y_pred_list = [a.squeeze().tolist() for a in y_pred_list]

Let’s check the MSE and R-squared metrics.

mse = mean_squared_error(y_test, y_pred_list)
r_square = r2_score(y_test, y_pred_list)print("Mean Squared Error :",mse)
print("R^2 :",r_square)
###################### OUTPUT ######################Mean Squared Error : 0.40861496703609534
R^2 : 0.36675687655886924

Thank you for reading. Suggestions and constructive criticism are welcome. :)

This blogpost is a part of the column — ” How to train you Neural Net”. You can find the column here.

You can find me on LinkedIn and Twitter. If you liked this, check out my other blogposts.