The world’s leading publication for data science, AI, and ML professionals.

The Difference Between Classification and Regression in Machine Learning

The Problem of Function Approximation

Photo by Green Chameleon on Unsplash
Photo by Green Chameleon on Unsplash

Differentiating between regression and classification algorithms can be challenging at the beginning of your machine learning career. When you’ve been around the field for coming up to 4 years, it’s very easy to forget the challenges of learning machine learning. There’s so much jargon thrown about which makes it difficult to find your bearings when you’re just starting out.

A vital part of machine learning is distinguishing whether a task is a regression or classification problem. This distinction provides practitioners with a clearer insight into what machine learning algorithms may be most suitable when approaching the problem since some models are more useful for classification than they are for regression – and vice versa.

The similarity between the 2 tasks is that they both are a form of supervised learning.

Supervised Learning

Both regression and classification problems fall into the supervised learning category. Each task involves developing a model that learns from historical data which enables it to make predictions on new instances that we do not have answers for. More formally, all supervised learning tasks involve learning a function that maps an input to an output based on example input-output pairs.

The object of supervised learning problems is to approximate the mapping function (f) from the input features (X) to the output (y) – in Mathematics, this is known as the problem of function approximation.

Regardless of whether we are attempting to solve a classification or regression problem, whenever we develop a learning algorithm for a supervised learning problem, the job of the algorithm is to find the best mapping function given the available resources.

The Difference – Classification vs Regression

Despite the similarity in the overall goal (mapping inputs to outputs based on input-output mappings), classifiaction and regression problems are different.

In classification problems, our learning algorithm learns a function to map inputs to outputs where the output value is a discrete class label (i.e. cat or dog, malignant or benign, spam or non-spam, etc). Popular classification algorithms inlcude:

In contrast, regression problems are concerned with mapping inputs to outputs where the output is a continous real number (i.e. the price of a house). Popular regression algorithms include:

Note: Learn more about Ridge and Lasso regression in Fighting overfitting with L1 or L2 Regularization

Some algorithms are either exclusively for regression style problems such as Linear Regression models, and some algorithms are exclusively for classification tasks such as Logistic regression. However, there are some algorithms that can overlap once small modifications are made to them such as Decision Trees, Random Forest, XGBoost, and Neural Networks.

Final Thoughts

Essentially, the way we determine whether a task is a classification or regression problem is by the output. Regression tasks are concerned with predicting a continuous value, whereas classification tasks are concerned with predicting discrete values. Also, the way we evaluate each type of problem is different for example the mean squared error is a useful metric for regression tasks but this wouldn’t work for a classification. Simarlarly, accuracy wouldn’t be an effective metric for regression tasks but it can be useful for classification tasks.

Thanks for Reading!

If you enjoyed this article, connect with me by subscribing ** to my FRE**E weekly newsletter. Never miss a post I make about Artificial Intelligence, Data Science, and Freelancing.

Related Articles

The Difference Between Data Scientists and ML Engineers

How To Become A Machine Learning Engineer

The Day To Day Of A Machine Learning Engineer


Related Articles