How to train your neural net
Pytorch [Tabular] — Regression
This blog post takes you through an implementation of regression on tabular data using PyTorch.
We will use the red wine quality dataset available on Kaggle. This dataset has 12 columns where the first 11 are the features and the last column is the target column. The data set has 1599 rows.
Import Libraries
We’re using tqdm
to enable progress bars for training and testing loops.
import numpy as np
import pandas as pd
import seaborn as sns
from tqdm.notebook import tqdm
import matplotlib.pyplot as pltimport torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoaderfrom sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
Read Data
df = pd.read_csv("data/tabular/classification/winequality-red.csv")df.head()
EDA and Preprocessing
First off, we plot the output rows to observe the class distribution. There’s a lot of imbalance here. Classes 3, 4, and 8 have a very few number of samples.
We will not treat the output variables as classes here because we’re performing regression. We will convert output column, which is all integers
, to float
values.
sns.countplot(x = 'quality', data=df)
Create Input and Output Data
In order to split our data into train, validation, and test sets, we need to separate out our inputs and outputs.
Input X
is all but the last column. Output y
is the last column.
X = df.iloc[:, 0:-1]
y = df.iloc[:, -1]
Train — Validation — Test
To create the train-val-test split, we’ll use train_test_split()
from Sklearn.
First, we’ll split our data into train+val
and test
sets. Then, we'll further split our train+val
set to create our train
and val
sets.
Because there’s a “class” imbalance, we want to have equal distribution of all output classes in our train, validation, and test sets.
To do that, we use the stratify
option in function train_test_split()
.
Remember that stratification only works with classes, not numbers. So, in general, we can bin our numbers into classes using quartiles, deciles, histogram(np.histogram()
) and so on. So, you would have to create a new dataframe which contains the output and it's "class". This "class" was obtained using the above mentioned methods.
In our case, let’s use the numbers as is because they are already like classes. After we split our data, we can convert the output to float (because regression).
# Train - Test
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=69)# Split train into train-val
X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.1, stratify=y_trainval, random_state=21)
Normalize Input
Neural networks need data that lies between the range of (0,1). There’s a ton of material available online on why we need to do it.
To scale our values, we’ll use the MinMaxScaler()
from Sklearn. The MinMaxScaler
transforms features by scaling each feature to a given range which is (0,1) in our case.
x_scaled = (x-min(x)) / (max(x)–min(x))
Notice that we use .fit_transform()
on X_train
while we use .transform()
on X_val
and X_test
.
We do this because we want to scale the validation and test set with the same parameters as that of the train set to avoid data leakage. fit_transform()
calculates scaling values and applies them while .transform()
only applies the calculated values.
scaler = MinMaxScaler()X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)X_train, y_train = np.array(X_train), np.array(y_train)
X_val, y_val = np.array(X_val), np.array(y_val)
X_test, y_test = np.array(X_test), np.array(y_test)
Visualize Class Distribution in Train, Val, and Test
Once we’ve split our data into train, validation, and test sets, let’s make sure the distribution of classes is equal in all three sets.
To do that, let’s create a function called get_class_distribution()
. This function takes as input the obj y
, ie. y_train
, y_val
, or y_test
. Inside the function, we initialize a dictionary which contains the output classes as keys and their count as values. The counts are all initialized to 0.
We then loop through our y
object and update our dictionary.
def get_class_distribution(obj):
count_dict = {
"rating_3": 0,
"rating_4": 0,
"rating_5": 0,
"rating_6": 0,
"rating_7": 0,
"rating_8": 0,
}
for i in obj:
if i == 3:
count_dict['rating_3'] += 1
elif i == 4:
count_dict['rating_4'] += 1
elif i == 5:
count_dict['rating_5'] += 1
elif i == 6:
count_dict['rating_6'] += 1
elif i == 7:
count_dict['rating_7'] += 1
elif i == 8:
count_dict['rating_8'] += 1
else:
print("Check classes.")
return count_dict
Once we have the dictionary count, we use Seaborn library to plot the bar charts.
To make the plot, we first convert our dictionary to a dataframe using pd.DataFrame.from_dict([get_class_distribution(y_train)])
.
Subsequently, we .melt()
our convert our dataframe into the long
format and finally use sns.barplot()
to build the plots.
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(25,7))# Train
sns.barplot(data = pd.DataFrame.from_dict([get_class_distribution(y_train)]).melt(), x = "variable", y="value", hue="variable", ax=axes[0]).set_title('Class Distribution in Train Set')# Val
sns.barplot(data = pd.DataFrame.from_dict([get_class_distribution(y_val)]).melt(), x = "variable", y="value", hue="variable", ax=axes[1]).set_title('Class Distribution in Val Set')# Test
sns.barplot(data = pd.DataFrame.from_dict([get_class_distribution(y_test)]).melt(), x = "variable", y="value", hue="variable", ax=axes[2]).set_title('Class Distribution in Test Set')
Convert Output Variable to Float
y_train, y_test, y_val = y_train.astype(float), y_test.astype(float), y_val.astype(float)
Neural Network
Initialize Dataset
class RegressionDataset(Dataset):
def __init__(self, X_data, y_data):
self.X_data = X_data
self.y_data = y_data
def __getitem__(self, index):
return self.X_data[index], self.y_data[index]
def __len__ (self):
return len(self.X_data)train_dataset = RegressionDataset(torch.from_numpy(X_train).float(), torch.from_numpy(y_train).float())val_dataset = RegressionDataset(torch.from_numpy(X_val).float(), torch.from_numpy(y_val).float())test_dataset = RegressionDataset(torch.from_numpy(X_test).float(), torch.from_numpy(y_test).float())
Model Params
EPOCHS = 150
BATCH_SIZE = 64
LEARNING_RATE = 0.001NUM_FEATURES = len(X.columns)
Initialize Dataloader
train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True)val_loader = DataLoader(dataset=val_dataset, batch_size=1)test_loader = DataLoader(dataset=test_dataset, batch_size=1)
Define Neural Network Architecture
We have a simple 3 layer feedforward neural net here. We use ReLU
as the activation at all layers.
class MultipleRegression(nn.Module):
def __init__(self, num_features):
super(MultipleRegression, self).__init__()
self.layer_1 = nn.Linear(num_features, 16)
self.layer_2 = nn.Linear(16, 32)
self.layer_3 = nn.Linear(32, 16)
self.layer_out = nn.Linear(16, 1)
self.relu = nn.ReLU()def forward(self, inputs):
x = self.relu(self.layer_1(inputs))
x = self.relu(self.layer_2(x))
x = self.relu(self.layer_3(x))
x = self.layer_out(x)return (x)def predict(self, test_inputs):
x = self.relu(self.layer_1(test_inputs))
x = self.relu(self.layer_2(x))
x = self.relu(self.layer_3(x))
x = self.layer_out(x)return (x)
Check for GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")print(device)
###################### OUTPUT ######################cuda:0
Initialize the model, optimizer, and loss function. Transfer the model to GPU.
We are using the Mean Squared Error loss.
model = MultipleRegression(NUM_FEATURES)
model.to(device)print(model)criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)
###################### OUTPUT ######################MultipleRegression(
(layer_1): Linear(in_features=11, out_features=16, bias=True)
(layer_2): Linear(in_features=16, out_features=32, bias=True)
(layer_3): Linear(in_features=32, out_features=16, bias=True)
(layer_out): Linear(in_features=16, out_features=1, bias=True)
(relu): ReLU()
)
Train Model
Before we start our training, let’s define a dictionary which will store the loss/epoch for both train and validation sets.
loss_stats = {
'train': [],
"val": []
}
Let the training begin.
print("Begin training.")for e in tqdm(range(1, EPOCHS+1)):
# TRAINING
train_epoch_loss = 0model.train()
for X_train_batch, y_train_batch in train_loader:
X_train_batch, y_train_batch = X_train_batch.to(device), y_train_batch.to(device)
optimizer.zero_grad()
y_train_pred = model(X_train_batch)
train_loss = criterion(y_train_pred, y_train_batch.unsqueeze(1))
train_loss.backward()
optimizer.step()
train_epoch_loss += train_loss.item()
# VALIDATION
with torch.no_grad():
val_epoch_loss = 0
model.eval()
for X_val_batch, y_val_batch in val_loader:
X_val_batch, y_val_batch = X_val_batch.to(device), y_val_batch.to(device)
y_val_pred = model(X_val_batch)
val_loss = criterion(y_val_pred, y_val_batch.unsqueeze(1))
val_epoch_loss += val_loss.item()loss_stats['train'].append(train_epoch_loss/len(train_loader))
loss_stats['val'].append(val_epoch_loss/len(val_loader))
print(f'Epoch {e+0:03}: | Train Loss: {train_epoch_loss/len(train_loader):.5f} | Val Loss: {val_epoch_loss/len(val_loader):.5f}')###################### OUTPUT ######################Epoch 001: | Train Loss: 31.22514 | Val Loss: 30.50931Epoch 002: | Train Loss: 30.02529 | Val Loss: 28.97327.
.
.Epoch 149: | Train Loss: 0.42277 | Val Loss: 0.37748
Epoch 150: | Train Loss: 0.42012 | Val Loss: 0.37028
You can see we’ve put a model.train()
at the before the loop. model.train()
tells PyTorch that you’re in training mode.
Well, why do we need to do that? If you’re using layers such as Dropout
or BatchNorm
which behave differently during training and evaluation (for example; not use dropout during evaluation), you need to tell PyTorch to act accordingly.
Similarly, we’ll call model.eval()
when we test our model. We’ll see that below.
Back to training; we start a for-loop. At the top of this for-loop, we initialize our loss per epoch to 0. After every epoch, we’ll print out the loss and reset it back to 0.
Then we have another for-loop. This for-loop is used to get our data in batches from the train_loader
.
We do optimizer.zero_grad()
before we make any predictions. Since the backward()
function accumulates gradients, we need to set it to 0 manually per mini-batch.
From our defined model, we then obtain a prediction, get the loss(and accuracy) for that mini-batch, perform back-propagation using loss.backward()
and optimizer.step()
.
Finally, we add all the mini-batch losses to obtain the average loss for that epoch. We add up all the losses for each mini-batch and finally divide it by the number of mini-batches ie. length of train_loader
to obtain the average loss per epoch.
The procedure we follow for training is the exact same for validation except for the fact that we wrap it up in torch.no_grad
and not perform any back-propagation. torch.no_grad()
tells PyTorch that we do not want to perform back-propagation, which reduces memory usage and speeds up computation.
Visualize Loss and Accuracy
To plot the loss line plots, we again create a dataframe from the `loss_stats` dictionary.
train_val_loss_df = pd.DataFrame.from_dict(loss_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})plt.figure(figsize=(15,8))sns.lineplot(data=train_val_loss_df, x = "epochs", y="value", hue="variable").set_title('Train-Val Loss/Epoch')
Test Model
After training is done, we need to test how our model fared. Note that we’ve used model.eval()
before we run our testing code. To tell PyTorch that we do not want to perform back-propagation during inference, we use torch.no_grad()
, just like we did it for the validation loop above.
y_pred_list = []with torch.no_grad():
model.eval()
for X_batch, _ in test_loader:
X_batch = X_batch.to(device)
y_test_pred = model(X_batch)
y_pred_list.append(y_test_pred.cpu().numpy())y_pred_list = [a.squeeze().tolist() for a in y_pred_list]
Let’s check the MSE and R-squared metrics.
mse = mean_squared_error(y_test, y_pred_list)
r_square = r2_score(y_test, y_pred_list)print("Mean Squared Error :",mse)
print("R^2 :",r_square)
###################### OUTPUT ######################Mean Squared Error : 0.40861496703609534
R^2 : 0.36675687655886924