The world’s leading publication for data science, AI, and ML professionals.

Regression for Classification | Hands on Experience

Logistic Regression and Softmax Regression with Core Concepts

Photo by Lance Anderson on Unsplash
Photo by Lance Anderson on Unsplash

We all have developed numerous regression models in our lives. But only few are familiar with using regression models for Classification. So my intention is to reveal the beauty of this hidden world.

As we all know, when we want to predict a continuous dependent variable from a number of independent variables, we used linear/polynomial regression. But when it comes to classification, we can’t use that anymore.

Fundamentally, classification is about predicting a label and regression is about predicting a quantity.

Why linear regression can’t use for classification? The main reason for that is the predicted values are continuous, not probabilistic. So we can’t get an exact class to accomplish the classification. You will further understand by glancing at the below predictions.

Image by author
Image by author

Probability is ranged between 0 and 1. But in linear regression, we are predicting an absolute number, which can range outside 0 and 1.

Yes, you can still normalize the value to the 0–1 range but the results may be worse. This is because Linear Regression fit is highly affected by the inclusion of an outlier. Even a small outlier will ruin your classification.

On the other hand, using linear regression for multi class prediction makes no sense. Linear regression assumes an order between 0, 1, and 2, whereas in the classification regime these numbers are mere categorical placeholders.

To overcome the aforementioned problem, there are 2 great solutions.

  1. Logistic Regression – For binary classification
  2. Softmax Regression – For multi class classification

I am using Red Wine Quality Dataset (in Kaggle) for demonstrating this to you.

Red Wine Quality

Original data set is publicly available in the UCI machine learning repository.

Note: I will not perform any detailed preprocessing or dimensional reduction techniques today as my intention is to walk through you mostly on the classification models.

Okay! First, we will see how a binary classification is achieved using Logistic Regression.

Logistic Regression

Before applying the model, let’s understand some of the core concepts in logistic regression.

I’ll show you how logistic regression works with an example. Consider 2 outcome probabilities of tossing a coin.

Image by author
Image by author

Let’s take the probability of displaying head as p and probability of displaying tail as q. Then we define a new concept called Odds as p/q.

In this scenario, we have only two possibilities. That is either Head or Tail. So we can write q as 1-p. Now we will see the variation of p. It can vary up to 0 to 1. Let’s take 3 fair points and analyze them with Odds.

Image by author
Image by author

Now we can get the log of that Odds. Then it will be known as Log Odds or Logit [log(p/q)]. These log-odds hold the value condition only for 2 classes.

Image by author
Image by author

As you can see, it is showing a symmetric distribution. When p=0.5, the Logit function does not favor anyone. When p=1, it is favored by +inf, and at the same time, q (p-1) is favored by -inf. This characteristic is the intuition for the Logistic Regression as it can classify a binary problem with its nature.

Image by author
Image by author

Then we can derive the Logistic Function from this Logit Function.

Image by author
Image by author

If we plot the graph, it will be viewed as follows.

Image by author
Image by author

The following diagram briefly shows how Logistic regression works.

Image by author
Image by author

You can dig deeper into segments like analyzing the loss function (Binary cross-entropy) and formulating the decision boundary. It is out of scope from this article as it takes more time to explain. Let’s see the implementations without any further due.

First of all let’s load our CSV file to a pandas Dataframe.

wineData = pd.read_csv('winequality-red.csv')
wineData.head()
Image by author
Image by author

Here we have the quality as the target column.

wineData.quality.unique()

We can identify the wine quality range as : 3 to 8

Image by author
Image by author

This is actually a multi-class classification problem. (having 6 classes) I will explain this scenario on softmax regression. To explain the logistic regression, I am adding another column with the following condition.

  • If wine quality is larger than or equal to 6 =>"good"(encode as 1)
  • Otherwise => "bad"(encode as 0)

After adding the category column, I drop the quality column. So now our target will be the category column.

wineData['category'] = np.where(wineData['quality'] >= 6, 1, 0)
wineData = wineData.drop(['quality'], axis = 1)
wineData.head()
Image by author
Image by author

Let’s see the data distribution to observe any class imbalance in there. If you plot the Countplot for the category column you will see it as below.

Image by author
Image by author

We can see a class imbalance in there. Either you can drop some data points or you can use resampling techniques such as Undersampling (ex: Near Miss) or Oversampling (ex: SMOTE) to overcome this issue. As for the simplicity, I am dropping the data points. After that, you can observe a balanced dataset as below.

Image by author
Image by author

You can further analyze any correlation between features. Here is the Correlation matrix for all the columns.

Image by author
Image by author

Let’s do some minor preprocessing to this. First of all, I’ll separate the dataset to train and test as we need to see the accuracy measures.

target = wineData['category'].copy()
features = wineData.drop('category', 1)
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size = 0.2, random_state = 101)

The dataset is not properly standardized. Let’s apply the standard scaler function for our data before applying the model.

scaler = StandardScaler()
# we only fit the training data to scaler
scaler.fit(X_train)
train_scaled = scaler.transform(X_train)
test_scaled = scaler.transform(X_test)
X_train = pd.DataFrame(train_scaled, columns = X_train.columns)
X_test = pd.DataFrame(test_scaled, columns = X_test.columns)

After the standardization, the training set will be as follows.

Image by author
Image by author

Now we can apply the Logistic regression model. Scikit-learn library provides an easy implementation for Logistic Regression.

from sklearn.linear_model import LogisticRegression

logReg = LogisticRegression()
logReg.fit(X_train,y_train)

Too easy right! We can get the model predictions for our test set by calling model.predict function.

predictions = logReg.predict(X_test)
y_hat = pd.DataFrame(predictions, columns=["predicted"])
Image by author
Image by author

Boom! No any continues values. As you can see, our predictions appear as classes. Further accuracy measures can calculate as follows.

from sklearn.metrics import classification_report
print(classification_report(y_test,y_hat))
Image by author
Image by author

Here we have achieved 76% accuracy with only doing the standardization. Moreover, you can improve this model using outlier handling, transformations, and discretization techniques.

Let’s plot the confusion matrix as well.

from sklearn.metrics import confusion_matrix
cf_matrix=confusion_matrix(y_test,y_hat)
ax= plt.subplot()
sns.heatmap(cf_matrix/np.sum(cf_matrix), annot=True,fmt='.2%', cmap='Blues', ax=ax);  
ax.set_xlabel('Predicted labels');
ax.set_ylabel('True labels'); 
ax.set_title('Confusion Matrix');
Image by author
Image by author

From the results of the confusion matrix, we can see our model performs quite well.

Great! We have successfully implemented the Logistic Regression model. Now let’s see how the softmax regression works.

Softmax Regression

As I mentioned to you earlier, Softmax regression is used for multi class classifications. I hope you remember the last diagram of logistic regression. Softmax is working similarly to that but here we have multiple classes. See the below diagram.

Image by author
Image by author

For a given data point, we calculate the probability of that data point belonging to each class one by one, and then we can find the maximum probability among them. It will be the correct class for that data point.

Let’s see how the calculation proceeds.

Image by author
Image by author

This is called the Softmax function.

Image by author
Image by author

In order to estimate the w weights, we need to minimize the loss function. It is known as Categorical crossentropy.

Ultimately, we can find max(softmax) for predicting the correct class.

Alright! Let’s get the hands-on experience. Previously I have separated the Wine Quality column as good and bad. Here I discretizing the original column as

  • If wine quality is larger than or equal to 7 =>"great"(encode as 2)
  • If wine quality is equal to 6 =>"good" (encode as 1)
  • Otherwise => "bad"(encode as 0)

So this will lead us to a multi class classification problem. (having 3 classes) Let’s examine the data distribution.

Image by author
Image by author

Here also we can see a class imbalance. Let’s drop some data points to eliminate the imbalance. The full implementation can be seen in my colab notebook. After doing the removal, the data will look as below.

Image by author
Image by author

Note: If you want to apply a resampling technique, you should apply it only for training data. (After splitting into train and test)

Alright! Let’s do a train-test split and apply some preprocessing techniques. Splitting and standardization are same as previous. After doing the standardization, your training set will be as follows.

Image by author
Image by author

Now Let’s apply Softmax Regression.

We will again use Logistic Regression from scikit-learn, but we need to set the multi_class parameter to multinomial in order for the function to carry out softmax regression. We will also need a solver that supports softmax regression, such as solver=’lbfgs’.

These solvers are used to find the parameter weights that minimize a cost function.

softReg = LogisticRegression(multi_class = 'multinomial', solver = 'lbfgs')
softReg.fit(X_train,y_train)

You can see the model prediction for test data as below.

predictions = softReg.predict(X_test)
y_hat = pd.DataFrame(predictions, columns=["predicted"])
print(y_hat.head())
Image by author
Image by author

Easy-peasy right? So our softmax regression model is able to classify a multi class problem. Let’s see the evaluations.

Image by author
Image by author

From the classification report, we can see accuracy of 70%. It’s actually not bad because we didn’t go for deep preprocessing stages. As I said earlier, you can enhance this further by doing proper preprocessing techniques as well as resampling techniques.

Image by author
Image by author

As the confusion matrix shows, our model is well classifying 0 and 2 classes but some problems in identifying class 1. You can try out the above-suggested method and let me know the results. My notebook will support you with the implementations.

Great! Now you know how to do classification using Regression.

Resources

  • Complete Co-lab Python Notebook.

Regression for Classification

  • Red Wine Quality Data set.

This dataset¹ is under Database Contents License (DbCL) v1.0 and publicly available in the UCI machine learning repository and Kaggle.

Red Wine Quality

[1] Paulo Cortez, University of Minho, Guimarães, Portugal, http://www3.dsi.uminho.pt/pcortez A. Cerdeira, F. Almeida, T. Matos and J. Reis, Viticulture Commission of the Vinho Verde Region(CVRVV), Porto, Portugal @2009

Conclusion

Today we have learned how to use Regression techniques to solve a classification problem. Specifically, we learned the main intuitions and core concepts in logistic regression and softmax regression. Furthermore, we have implemented the two models from scratch.

Thanks for going through this article. I hope it helps you. Feel free to leave a message with your valuable comments and suggestions. Follow me to get the latest articles on Medium. Stay Safe! Happy Learning! ❤️


Related Articles