
We all have developed numerous regression models in our lives. But only few are familiar with using regression models for Classification. So my intention is to reveal the beauty of this hidden world.
As we all know, when we want to predict a continuous dependent variable from a number of independent variables, we used linear/polynomial regression. But when it comes to classification, we can’t use that anymore.
Fundamentally, classification is about predicting a label and regression is about predicting a quantity.
Why linear regression can’t use for classification? The main reason for that is the predicted values are continuous, not probabilistic. So we can’t get an exact class to accomplish the classification. You will further understand by glancing at the below predictions.

Probability is ranged between 0 and 1. But in linear regression, we are predicting an absolute number, which can range outside 0 and 1.
Yes, you can still normalize the value to the 0–1 range but the results may be worse. This is because Linear Regression fit is highly affected by the inclusion of an outlier. Even a small outlier will ruin your classification.
On the other hand, using linear regression for multi class prediction makes no sense. Linear regression assumes an order between 0, 1, and 2, whereas in the classification regime these numbers are mere categorical placeholders.
To overcome the aforementioned problem, there are 2 great solutions.
- Logistic Regression – For binary classification
- Softmax Regression – For multi class classification
I am using Red Wine Quality Dataset (in Kaggle) for demonstrating this to you.
Original data set is publicly available in the UCI machine learning repository.
Note: I will not perform any detailed preprocessing or dimensional reduction techniques today as my intention is to walk through you mostly on the classification models.
Okay! First, we will see how a binary classification is achieved using Logistic Regression.
Logistic Regression
Before applying the model, let’s understand some of the core concepts in logistic regression.
I’ll show you how logistic regression works with an example. Consider 2 outcome probabilities of tossing a coin.

Let’s take the probability of displaying head as p and probability of displaying tail as q. Then we define a new concept called Odds as p/q.
In this scenario, we have only two possibilities. That is either Head or Tail. So we can write q as 1-p. Now we will see the variation of p. It can vary up to 0 to 1. Let’s take 3 fair points and analyze them with Odds.

Now we can get the log of that Odds. Then it will be known as Log Odds or Logit [log(p/q)]. These log-odds hold the value condition only for 2 classes.

As you can see, it is showing a symmetric distribution. When p=0.5, the Logit function does not favor anyone. When p=1, it is favored by +inf, and at the same time, q (p-1) is favored by -inf. This characteristic is the intuition for the Logistic Regression as it can classify a binary problem with its nature.

Then we can derive the Logistic Function from this Logit Function.

If we plot the graph, it will be viewed as follows.

The following diagram briefly shows how Logistic regression works.

You can dig deeper into segments like analyzing the loss function (Binary cross-entropy) and formulating the decision boundary. It is out of scope from this article as it takes more time to explain. Let’s see the implementations without any further due.
First of all let’s load our CSV file to a pandas Dataframe.
wineData = pd.read_csv('winequality-red.csv')
wineData.head()

Here we have the quality as the target column.
wineData.quality.unique()
We can identify the wine quality range as : 3 to 8

This is actually a multi-class classification problem. (having 6 classes) I will explain this scenario on softmax regression. To explain the logistic regression, I am adding another column with the following condition.
- If wine quality is larger than or equal to 6 =>"good"(encode as 1)
- Otherwise => "bad"(encode as 0)
After adding the category column, I drop the quality column. So now our target will be the category column.
wineData['category'] = np.where(wineData['quality'] >= 6, 1, 0)
wineData = wineData.drop(['quality'], axis = 1)
wineData.head()

Let’s see the data distribution to observe any class imbalance in there. If you plot the Countplot for the category column you will see it as below.

We can see a class imbalance in there. Either you can drop some data points or you can use resampling techniques such as Undersampling (ex: Near Miss) or Oversampling (ex: SMOTE) to overcome this issue. As for the simplicity, I am dropping the data points. After that, you can observe a balanced dataset as below.

You can further analyze any correlation between features. Here is the Correlation matrix for all the columns.

Let’s do some minor preprocessing to this. First of all, I’ll separate the dataset to train and test as we need to see the accuracy measures.
target = wineData['category'].copy()
features = wineData.drop('category', 1)
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size = 0.2, random_state = 101)
The dataset is not properly standardized. Let’s apply the standard scaler function for our data before applying the model.
scaler = StandardScaler()
# we only fit the training data to scaler
scaler.fit(X_train)
train_scaled = scaler.transform(X_train)
test_scaled = scaler.transform(X_test)
X_train = pd.DataFrame(train_scaled, columns = X_train.columns)
X_test = pd.DataFrame(test_scaled, columns = X_test.columns)
After the standardization, the training set will be as follows.

Now we can apply the Logistic regression model. Scikit-learn library provides an easy implementation for Logistic Regression.
from sklearn.linear_model import LogisticRegression
logReg = LogisticRegression()
logReg.fit(X_train,y_train)
Too easy right! We can get the model predictions for our test set by calling model.predict
function.
predictions = logReg.predict(X_test)
y_hat = pd.DataFrame(predictions, columns=["predicted"])

Boom! No any continues values. As you can see, our predictions appear as classes. Further accuracy measures can calculate as follows.
from sklearn.metrics import classification_report
print(classification_report(y_test,y_hat))

Here we have achieved 76% accuracy with only doing the standardization. Moreover, you can improve this model using outlier handling, transformations, and discretization techniques.
Let’s plot the confusion matrix as well.
from sklearn.metrics import confusion_matrix
cf_matrix=confusion_matrix(y_test,y_hat)
ax= plt.subplot()
sns.heatmap(cf_matrix/np.sum(cf_matrix), annot=True,fmt='.2%', cmap='Blues', ax=ax);
ax.set_xlabel('Predicted labels');
ax.set_ylabel('True labels');
ax.set_title('Confusion Matrix');

From the results of the confusion matrix, we can see our model performs quite well.
Great! We have successfully implemented the Logistic Regression model. Now let’s see how the softmax regression works.
Softmax Regression
As I mentioned to you earlier, Softmax regression is used for multi class classifications. I hope you remember the last diagram of logistic regression. Softmax is working similarly to that but here we have multiple classes. See the below diagram.

For a given data point, we calculate the probability of that data point belonging to each class one by one, and then we can find the maximum probability among them. It will be the correct class for that data point.
Let’s see how the calculation proceeds.

This is called the Softmax function.

In order to estimate the w weights, we need to minimize the loss function. It is known as Categorical crossentropy.
Ultimately, we can find max(softmax) for predicting the correct class.
Alright! Let’s get the hands-on experience. Previously I have separated the Wine Quality column as good and bad. Here I discretizing the original column as
- If wine quality is larger than or equal to 7 =>"great"(encode as 2)
- If wine quality is equal to 6 =>"good" (encode as 1)
- Otherwise => "bad"(encode as 0)
So this will lead us to a multi class classification problem. (having 3 classes) Let’s examine the data distribution.

Here also we can see a class imbalance. Let’s drop some data points to eliminate the imbalance. The full implementation can be seen in my colab notebook. After doing the removal, the data will look as below.

Note: If you want to apply a resampling technique, you should apply it only for training data. (After splitting into train and test)
Alright! Let’s do a train-test split and apply some preprocessing techniques. Splitting and standardization are same as previous. After doing the standardization, your training set will be as follows.

Now Let’s apply Softmax Regression.
We will again use Logistic Regression from scikit-learn, but we need to set the multi_class parameter to multinomial
in order for the function to carry out softmax regression. We will also need a solver that supports softmax regression, such as solver=’lbfgs’.
These solvers are used to find the parameter weights that minimize a cost function.
softReg = LogisticRegression(multi_class = 'multinomial', solver = 'lbfgs')
softReg.fit(X_train,y_train)
You can see the model prediction for test data as below.
predictions = softReg.predict(X_test)
y_hat = pd.DataFrame(predictions, columns=["predicted"])
print(y_hat.head())

Easy-peasy right? So our softmax regression model is able to classify a multi class problem. Let’s see the evaluations.

From the classification report, we can see accuracy of 70%. It’s actually not bad because we didn’t go for deep preprocessing stages. As I said earlier, you can enhance this further by doing proper preprocessing techniques as well as resampling techniques.

As the confusion matrix shows, our model is well classifying 0 and 2 classes but some problems in identifying class 1. You can try out the above-suggested method and let me know the results. My notebook will support you with the implementations.
Great! Now you know how to do classification using Regression.
Resources
- Complete Co-lab Python Notebook.
- Red Wine Quality Data set.
This dataset¹ is under Database Contents License (DbCL) v1.0 and publicly available in the UCI machine learning repository and Kaggle.
[1] Paulo Cortez, University of Minho, Guimarães, Portugal, http://www3.dsi.uminho.pt/pcortez A. Cerdeira, F. Almeida, T. Matos and J. Reis, Viticulture Commission of the Vinho Verde Region(CVRVV), Porto, Portugal @2009
Conclusion
Today we have learned how to use Regression techniques to solve a classification problem. Specifically, we learned the main intuitions and core concepts in logistic regression and softmax regression. Furthermore, we have implemented the two models from scratch.
Thanks for going through this article. I hope it helps you. Feel free to leave a message with your valuable comments and suggestions. Follow me to get the latest articles on Medium. Stay Safe! Happy Learning! ❤️