The world’s leading publication for data science, AI, and ML professionals.

Logistic Regression with PyTorch

A introduction to applying logistic regression for binary classification using PyTorch.

Which door do we choose? (Image via iStock under license to Dennis Loevlie)
Which door do we choose? (Image via iStock under license to Dennis Loevlie)

Binary Logistic Regression is used to classify two linearly separable groups. This linearly separable assumption makes logistic regression extremely fast and powerful for simple ML tasks. An example of linearly separable data that we will be performing logistic regression on is shown below:

Example of Linearly Separable Data (Image by author)
Example of Linearly Separable Data (Image by author)

Here the linearly separable groups are:

  1. Red = 0
  2. Blue = 1

We want to use logistic regression to map any [x1, x2] __ pair to the corresponding class (red or blue).

Step 1. Splitting our dataset into a train/test split.

We do this so we can evaluate our models performance on data it didn’t see during training. Usually, if you tell someone your model is 97% accurate, it is assumed you are talking about the validation/testing accuracy.

You can do this yourself pretty easily, but honestly, the _sklearn.train_testsplit function is really nice to use for readability.

X_train, X_test, y_train, y_test = train_test_split(
 inputs, labels, test_size=0.33, random_state=42)

Step 2: Building the PyTorch Model Class

We can create the logistic regression model with the following code:

import torch
class LogisticRegression(torch.nn.Module):
     def __init__(self, input_dim, output_dim):
         super(LogisticRegression, self).__init__()
         self.linear = torch.nn.Linear(input_dim, output_dim)
     def forward(self, x):
         outputs = torch.sigmoid(self.linear(x))
         return outputs

In our "forward" pass of the PyTorch neural network (really just a perceptron), the visual representation and corresponding equations are shown below:

Neural Network Architecture Visualization (Image by author)
Neural Network Architecture Visualization (Image by author)
(Image by author)
(Image by author)

Where:

(Image by author)
(Image by author)

The sigmoid function is extremely useful for two main reasons:

  1. It transforms our linear regression output to a probability from 0 to 1. We can then take any probability greater than 0.5 as being 1 and below as being 0.
  2. Unlike a stepwise function (which would transform the data into the binary case as well), the sigmoid is differentiable, which is necessary for optimizing the parameters using gradient descent (we will show later).
Sigmoid Function with Decision Boundary for Choosing Blue or Red (Image by author)
Sigmoid Function with Decision Boundary for Choosing Blue or Red (Image by author)

Step 3: Initializing the Model

Also, we should assign some hyper-parameters:

epochs = 200000
input_dim = 2 # Two inputs x1 and x2 
output_dim = 1 # Single binary output 
learning_rate = 0.01

Parameter Definitions:

  • Epoch – Indicates the number of passes through the entire training dataset the network has completed
  • learning_rate – A tuning parameter in an Optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function

    • High learning rate means you might never be able to reach a minimum.
    • Low learning rate will take longer.

The code below initializes the model class that we made previously:

model = LogisticRegression(input_dim,output_dim)

Step 4: Initializing the Loss Function and the Optimizer

criterion = torch.nn.BCELoss()
Binary Cross Entropy Loss (Image by author)
Binary Cross Entropy Loss (Image by author)
  • m = Number of training examples
  • y = True y value
  • y^ = Predicted y value
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

There are a plethera of common NN optimizers but most are based on Gradient Descent. This optimization technique takes steps toward the minimum of the loss function with the direction dictated by the gradient of the loss function in terms of the weights and the magnitude or step size determined by the learning rate.

Note: To reach the loss function’s minimum accurately and quickly, it is beneficial to slowly decrease your learning rate, and optimizers like Adaptive Movement Estimation algorithm (ADAM), which Pytorch has also implemented, do this for us. You can find out more about the PyTorch implementation of these optimizers at https://pytorch.org/docs/stable/optim.html.

We update the parameters to minimize the loss function with the following equations:

(Image by author)
(Image by author)
(Image by author)
(Image by author)
  • Alpha – Learning Rate

You might be wondering where we get the dL/dw and dL/dbeta, and that would be a great question! In neural networks, we use back-propagation to get the partial derivatives. Luckily for us, in logistic regression the equations simplify, and I will show that (along with backprop for the network) below.

Using the chain rule we can deduce that:

(Image by author)
(Image by author)

The partial derivatives are shown below:

(Image by author)
(Image by author)

Simply the equations and you get:

(Image by author)
(Image by author)

So in reality you would do:

(Image by author)
(Image by author)

We can derive dL/dbeta similarly. Luckily autograd helps do all of this for us!

Step 5: Train the Model

First, we convert our inputs and labels from numpy arrays to tensors.

X_train, X_test = torch.Tensor(X_train),torch.Tensor(X_test)
y_train, y_test = torch.Tensor(y_train),torch.Tensor(y_test)

Next, we build our training loop and store the losses. Every so often we can also print out the accuracy on the test data to see how our model is doing.

Losses per 10,000 epochs:

BCE Loss as a function of Epoch (Image by author)
BCE Loss as a function of Epoch (Image by author)

Step 6: Plotting the Results

Since we know the decision boundary would be w*x + b = 0 we can plot the decision boundary. The results are below:

Train:

(Image by author)
(Image by author)

Test:

(Image by author)
(Image by author)

Step 7: How to get Predictions on New Data!

If you had a new point at x1=1, x2=1 visually (in 2-dimensional space), it’s easy to tell that we should classify the point as "red". So let’s check if our model is working correctly and show how to get a prediction from the model on new data:

x1 = 1
x2 = 1
new_data = torch.tensor([x1,x2]).type(torch.FloatTensor)
with torch.no_grad():
    prediction = model(new_data).round()
    if prediction == 1.0:
        print(f'The model classifies this point as RED')
    else:
        print(f'The model classifies this point as BLUE')

The new point is plotted against the training data below:

Predicting on New Data (Image by author)
Predicting on New Data (Image by author)

OUTPUT:

>>> The model classifies this point as RED

The full code:

https://gist.github.com/loevlie/bf867387add01904dbcba6b78b25b606?file=Logistic_Regression_PyTorch.py

Additional Resources:

Let’s Connect!

  1. Twitter
  2. LinkedIn
  3. GitHub

Related Articles