In this article, I will cover the four main processes in machine learning (ML) modeling that you should know thoroughly as a data practitioner.
Machine learning is a branch of Artificial Intelligence that mimics the human ability to learn by uncovering data patterns, that is, relationships between features and the target variable. Features are independent variables that represent attributes of a given observation or data point. On the other hand, a target variable is a dependent variable we are interested in modeling to make predictions.
ML modeling is an important step in the data science project life cycle and is one of the most interesting pieces of the project.
In a previous article, I discussed the main components of ML and provided an additional introduction to ML modeling. The link to the article can be found here.
4 Key Processes in ML Modeling
Now, let’s delve into the four main processes in ML modeling.
Training
This is the process of fitting ML algorithms to the Data to learn the patterns and it results in the creation of a model. In addition, the choice of an algorithm may be influenced by its training time requirement based on available computing power.
The training process is typically conducted for a baseline model as a benchmark for the project before further experimentation is performed. The baseline model may be a simple algorithm such as linear regression or a random forest algorithm with default settings. The choice of a baseline model largely depends on the problem and the experience of the data practitioner.
Most ML algorithms perform training via the fit method.
Below are common training terminologies:
Serial training: This type of training is mostly performed on a single processor and it is widely used for simple to medium training jobs.
Distributed training: Here, the workload to fit an algorithm is split up and shared among multiple mini-processors. This is known as parallel computing and it helps to speed up the process. More details can be found here.
Offline learning: In this case, the training is conducted periodically on all available data, and the model is deployed to production only if performance is satisfactory.
Online learning: Here, the model weights and parameters are constantly updated in real-time as a new stream of data becomes available.
A detailed comparison between online and offline learning can be found here.
Tuning
This is the process of selecting the optimal set of hyper-parameters that gives the best model. It is the most time-consuming process in ML modeling involving the creation of several models with different sets of hyper-parameter values. Relevant metrics such as root mean square error (RMSE), mean absolute error (MAE), and accuracy may be used to choose the best model.
One common pitfall to avoid during tuning is the use of the test set for this process. Rather a validation set needs to be created and used for this purpose. Even better, methods such as cross-validation need to be employed to prevent overfitting.
There are some easy-to-use modules already implemented in python that can be used for hyper-parameter optimization namely GridSearchCV, RandomSearchCV, and BayesSearchCV.
Prediction
Once the best model is chosen, predictions are made using the test data and other new datasets without the target variable provided in the input data to the model. This is also known as ML inference.
Evaluation
Model evaluation is the process of assessing the predictive performance of an ML model. The main idea is to quantify the quality of predictions from the model. The same metrics employed during hyper-parameter optimization may be used here and new ones may also be added for results presentation purposes.
More details about model evaluation including common metrics used in ML modeling can be found here.
Conclusions
In this article, we covered the four main processes in Machine Learning modeling: training, tuning, prediction, and evaluation. Some helpful resource links were also provided as necessary.
I hope you enjoyed this article, until next time. Cheers!
You can access more enlightening articles from me and other authors by subscribing to Medium via my referral link below which also supports my writing. Thank you!