ML Algorithms: One SD (σ)- Regression

An intro to machine learning regression algorithms

Sagi Shaier
Towards Data Science

--

The obvious questions to ask when facing a wide variety of machine learning algorithms, is “which algorithm is better for a specific task, and which one should I use?”

Answering these questions vary depending on several factors, including: (1) The size, quality, and nature of data; (2) The available computational time; (3) The urgency of the task; and (4) What do you want to do with the data.

This is one section of the many algorithms I wrote about in a previous article.
In this part I tried to display and briefly explain the main algorithms (though not all of them) that are available for regression tasks as simply as possible.

Regression Algorithms:

Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (s) (predictor). It can be used for time series modelling, forecasting, and finding the causal effect relationship between the variables. For example, you can use it to find the relationship between hasty driving and the number of road accidents by a driver.

Regression analysis has several benefits:

It tells us the significant relationships between the dependent variable and the independent variable.

It tells us the strength of impact of multiple independent variables on a dependent variable. When I say multiple independent variables I mean several Xs, for example in “The Effects of Temporal Delay and Orientation on Haptic Object Recognition” we have temporal delay and orientation as Xs and object recognition as our Y.

· Ordinary Least Squares Regression (OLSR)
A method in Linear Regression for estimating the unknown parameters by creating a model which will minimize the sum of the squared errors between the observed data and the predicted one (observed values and estimated values).
Basically, a method to calculate the coefficients (β) of each Xi:

The way the β coefficients are found is by minimizing the errors, hence the name “least squares” regression”. The deviations are first squared, when added, so there is will be no cancelling out between positive and negative values.

OLSR has some limitations: the redundant info/ the linear association between two explanatory variables (aka collinearity) can lead to misinterpretation of the coefficient, hence we need more observations than x variables. To overcome these you can use PCR (principle component regression).

· Linear Regression
Used to estimate real values (cost of houses, number of calls, total sales, etc.) based on continuous variable.

Some things to consider:

There must be linear relationship between the independent and dependent variables.

Multiple regression suffers from multicollinearity (Multicollinearity in a multiple regression model are highly linearly related associations between two or more explanatory variables).

Linear Regression is very sensitive to Outliers.

Linear regression is a parametric regression. Meaning, it assumes that the nature of the relationships between the dependent and independent variables is known (e.g., is linear).

You can evaluate the model performance using the metric R-square (R2) — the percentage of the response variable variation that is explained by the linear model.

· Logistic Regression
Used to estimate discrete values (binary values like 0/1, yes/no, true/false) based on given set of independent variable.

Some things to consider:

It is used for classification problems.

It doesn’t require linear relationship between dependent and independent variables.

· Stepwise Regression
Used when we deal with multiple independent variables. It adds features into your model one by one until it finds an optimal score for your feature set. Stepwise selection alternates between forward and backward, bringing in and removing variables that meet the criteria for entry or removal, until a stable set of variables is attained.

Some things to consider:

It uses statistical values like R-square, t-stats and AIC metric to discern significant variables.

· Multivariate Adaptive Regression Splines (MARS)

A flexible regression modeling of high dimensional data that searches for interactions and non-linear relationships that help maximize predictive accuracy.

This algorithms is inherently nonlinear (meaning that you don’t need to adapt your model to nonlinear patterns in the data by manually adding model terms (squared terms, interaction effects)). MARS is a nonparametric regression- it does not make any assumption as to how the dependent variables are related to the predictors. Instead it allows the regression function to be “driven” directly from data. MARS constructs the relation between the dependent and independent variables from a set of coefficients and so-called basis functions (predictors) that are entirely determined from the regression data.

Some things to consider:

MARS is quite popular in the area of data mining because it does not assume any particular type or class of relationship (e.g., linear, logistic, etc.) between the predictor variables and the dependent (outcome) variable of interest.

MARS may be useful if you face complex non-linear relationships between predictor and target, especially in high dimension.

Both continuous and categorical predictors could be used with MARS. However, the basic MARS algorithm assumes that the predictor variables are continuous in nature.

Because MARS can handle multiple dependent variables, it is easy to apply the algorithm to classification problems as well.

MARS tends to overfit the data. To overcome this problem, MARS uses a pruning technique (similar to pruning in classification trees) to limit the complexity of the model by reducing the number of its basis functions. The selection of and pruning of basis functions makes this method a very powerful tool for predictor selection. Basically, the algorithm will pick up only those basis functions (and those predictor variables) that make a “sizeable” contribution to the prediction.

MARS is particular useful in situations where regression-tree models are also appropriate, i.e., where hierarchically organized successive splits on the predictor variables yield accurate predictions.

You should consider MARS as generalization of regression trees, where the “hard” binary splits are replaced by “smooth” basis functions, instead of considering it as a generalization of multiple regression.

· Locally Estimated Scatterplot Smoothing (LOESS)
A method for fitting a smooth curve between two variables, or fitting a smooth surface between an outcome and up to four predictor variables. Basically, it’s a tool used in regression analysis that creates a smooth line through a scatter plot to help you to see relationship between variables and foresee trends. The idea is that what if your data is not linearly distributed you can still apply the idea of regression. You can apply regression and it is called as locally weighted regression. You can apply LOESS when the relationship between independent and dependent variables is non-linear. Today, most of the algorithms (like classical feedforward neural network, support vector machines, nearest neighbor algorithms etc.) are global learning systems where they used to minimize the global loss functions (e.g. sum squared error). In contrast, local learning systems will divide the global learning problem into multiple smaller/simpler learning problems. This usually achieved by dividing the cost function into multiple independent local cost functions. One of the disadvantages of the global methods is that sometimes no parameter values can provide a sufficiently good approximation. But then comes LOESS- an alternative to global function approximation.

Some things to consider:

LOESS is typically used for fitting a line to a scatter plot where noisy data values, sparse data points or weak interrelationships interfere with your ability to see a line of best fit.

So how do I choose which one to use?

Data exploration should be your first step before selecting the right model (identify the relationship and impact of variables).

To compare how good the model is you can use different metrics like statistical significance of parameters, R-square, Adjusted r-square, AIC, BIC and error term.

Cross-validation is the best way to evaluate models used for prediction- divide your data into train and validate. A simple mean squared difference between the observed and predicted values give you a measure for the prediction accuracy

Also keep in mind that regularization methods like Lasso, Ridge and Elastic Net (keep reading) works well in the case of high dimensionality and multicollinearity among the variables in the data.

If you’re interested in more of my work you can check out my Github, my scholar page, or my website

--

--

My sights are set on using the intersection of artificial intelligence and neuroscience to improve people’s lives