What is Predictive Modeling?
Predictive modeling is the process of taking known results and developing a model that can predict values for new occurrences. It uses historical data to predict future events. There are many different types of predictive modeling techniques including ANOVA, linear Regression (ordinary least squares), logistic regression, ridge regression, time series, decision trees, neural networks, and many more. Selecting the correct predictive modeling technique at the start of your project can save a lot of time. Choosing the incorrect modeling technique can result in inaccurate predictions and residual plots that experience non-constant variance and/or mean.
Regression Analysis
Regression analysis is used to predict a continuous target variable from one or multiple independent variables. Typically, regression analysis is used with naturally-occurring variables, rather than variables that have been manipulated through experimentation. As stated above, there are many different types of regression, so once we’ve decided regression analysis should be used, how do we choose which regression technique should be applied?
ANOVA

ANOVA, or analysis of variance, is to be used when the target variable is continuous and the dependent variables are categorical. The null hypothesis in this analysis is that there is no significant difference between the different groups. The population should be normally distributed, the sample cases should be independent of each other, and the variance should be approximately equal among the groups.
Linear Regression

Linear regression is to be used when the target variable is continuous and the dependent variable(s) is continuous or a mixture of continuous and categorical, and the relationship between the independent variable and dependent variables are linear. Furthermore, all the predictor variables should be normally distributed with constant variance and should demonstrate little to no multicollinearity nor autocorrelation with one another.
Logistic Regression

Logistic regression does not require a linear relationship between the target and the dependent variable(s). The target variable is binary (assumes a value of either 0 or 1) or dichotomous. The errors/residuals of a logistic regression need not be normally distributed and the variance of the residuals does not need to be constant. However, the dependent variables are binary, the observations must be independent of each other, there must be little to no multicollinearity nor autocorrelation in the data, and the sample size should be large. Lastly, while this analysis does not require the independent and dependent variable(s) to be linearly related, the independent variables must be linearly related to the log odds.

Ridge Regression

Ridge regression is a technique for analyzing multiple regression variables that experience multicollinearity. Ridge regression takes the ordinary least squares approach, and honors that the residuals experience high variances by adding a degree of bias to the regression estimates to reduce the standard errors. The assumptions follow those of multiple regression, the scatter plots must be linear, there must be constant variance with no outliers, and the dependent variables must exhibit independence.
Time Series

Time-series regression analysis is a method for predicting future responses based on response history. The data for a time series should be a set of observations on the values that a variable takes at different points in time. The data is bivariate and the independent variable is time. The series must be stationary, meaning they are normally distributed: the mean and variance of the series are constant over long periods of time. Furthermore, the residuals should also be normally distributed with a constant mean and variance over a long period of time, as well as uncorrelated. The series should not contain any outliers. If random shocks are present, they should indeed be randomly distributed with a mean of 0 and a constant variance.
Classification Analysis
Decision Trees

Decision trees are a type of supervision learning algorithm which repeatedly splits the sample based on certain questions about the sample. These are very useful for classification problems. They are relatively easy to understand and very effective. Decision trees represent several decisions followed by different chances of occurrence. This technique helps us to define the most significant variables and the relation between two or more variables.
Neural Networks
Neural networks help to cluster and classify data. These algorithms are modeled loosely after the human brain and are designed to recognize patterns. Neural networks tend to be very complex, as they are composed of a set of algorithms. This type of analysis can be very useful, however, if you are trying to determine why something happened, this may not be the best model to use.

In conclusion, these are just a handful of the options of different predictive techniques that can be used to model data. It should be noted that making causal relationships between variables when using predictive analysis techniques is very dangerous. We cannot state that one variable caused another in predictive analysis, rather, we can state that a variable had an effect on another and what that effect was.
Let’s connect:
https://www.linkedin.com/in/mackenzie-mitchell-635378101/
Resources:
https://www.statisticssolutions.com/manova-analysis-anova/
https://skymind.ai/wiki/neural-network
https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Ridge_Regression.pdf
https://www.analyticsvidhya.com/blog/2015/01/decision-tree-simplified/2/