The world’s leading publication for data science, AI, and ML professionals.

How is Logistic Regression Used as A Classification Algorithm?

Does regression contradict classification?

Photo by Skye Studios on Unsplash
Image credit: Skye Studios @ Unsplash

Supervised learning algorithms can be grouped under two main categories:

  • Regression: Predicting continuous target variables. For example, predicting the price of a house is a regression task.
  • Classification: Prediction discrete target variables. For example, predicting if an email is spam is a classification task.

Logistic regression is a supervised learning algorithm which is mostly used to solve binary "classification" tasks although it contains the word "regression" . "Regression" contradicts with "classification" but the focus of logistic regression is on the word "logistic" referring to logistic function which actually does the classification task in the algorithm. Logistic regression is a simple yet very effective classification algorithm so it is commonly used for many binary classification tasks. Customer churn, spam email, website or ad click predictions are some examples of the areas where logistic regression offers a powerful solution. It is even used as an activation function for neural network layers.

The basis of logistic regression is the logistic function, also called the sigmoid function, which takes in any real valued number and maps it to a value between 0 and 1.

Logistic regression model takes a linear equation as input and use logistic function and log odds to perform a binary classification task. Before going in detail on logistic regression, it is better to review some concepts in the scope of probability.

Probability

Probability measures the likelihood of an event to occur. For example, if we say "there is a 90% chance that this email is spam":

Odds is the ratio of the probabilities of positive class (email is spam) and negative class (email is not spam).

Log odds is the logarithm of odds.

All these concepts essentially represent the same measure but in different ways. In the case of logistic regression, log odds is used. We will see the reason why log odds is preferred in logistic regression algorithm.

Log odds is the logaritm of the odds and the odds is the ratio of the probability of positive class to negative class.

Probability of 0,5 means that there is an equal chance for the email to be spam or not spam. Please note that the log odds of probability 0,5 is 0. We will use that.


Let’s go back to the sigmoid function and show it in a different way:

Taking the natural log of both sides:

In equation (1), instead of x, we can use a linear equation z:

Then equation (1) becomes:

Assume y is the probability of positive class. If z is 0, then y is 0,5. For positive values of z, y is higher than 0,5 and for negative values of z, y is less than 0,5. If the probability of positive class is more than 0,5 (i.e. more than 50% chance), we can predict the outcome as a positive class (1). Otherwise, the outcome is a negative class (0).

Photo by Franck V. on Unsplash
Photo by Franck V. on Unsplash

Note: In binary classification, there are many ways to represent two classes such as positive/negative, 1/0, True/False.

The table below shows some values of z with corresponding y (probability) values. All real numbers are mapped between 0 and 1.

If we plot this function, we will get the famous s shaped graph of logistic regression:

The classification problem comes down to solving a linear equation:

This seems just like solving a linear regression problem. Parameters of the function are determined in training phase with maximum-likelihood estimation algorithm. Then, for any given values of independent variables (x1, … xn), the probability of positive class can be calculated.

We can use the calculated probability ‘as is’. For example, the output can be a probability that the email is spam is 95% or the probability that customer will click on this ad is 70%. However, in most cases, probabilities are used to classify data points. If the probability is greater than 50%, the prediction is positive class (1). Otherwise, the prediction is negative class (0). And,we just converted the solution of a linear regression problem to a binary classification task.


Logistic regression is a simple yet very powerful algorithm to solve binary classification problems. The logistic function (i.e. sigmoid function) is also commonly used in very complex neural networks as the activation function of output layer.


Thank you for reading. Please let me know if you have any feedback.


Related Articles