Today, In the Era of the neural network, You might not have much time to go through in details of machine learning techniques. But, still, you might want to know how machine learning algorithms work, what are the most popular algorithms in machine learning, and what is the critical difference between the algorithms.
In this article, I will help you introduce different machine learning algorithms with their usages, their key factors, and their limitations. We will also see some of the essential metrics that assist in validating the algorithms.
Linear Regression
In Linear Regression, We try to make a model that can explain the relationship between the dependent(Y) and independent(X) variable. Our Y should be normalized, and there should be a linear relationship between X & Y.
Loss functions used: RMSE (Root Mean Square Error), MSE (Mean Square Error), MAPE(Mean Absolute Percentage Error), MAE (Mean Absolute Error)

Error Calculation: We use the sum of the square of error (Residuals) to calculate the error, and the main goal here is to minimize residual. We do it using Gradient Descent which uses the Residual on the Y-axis and Intercepts on the X-axis to define the point where we will have the minimum sum of the square of errors.
Logistic Regression
This algorithm is the extension to linear regression. We use it for classification problems. Here, Y aka dependent variable is not normalized and also it is not having a linear relationship to X variables. The first step here is to normalize Y and that we can do using different link functions. In the case of binary classification, the name of the link function is "logit". And, 95% of the classification problems are binary classification based only.
Matrix Used for Validation: Precision, Recall, Sensitivity, Specificity.
To define the threshold for a different class, we use AUC (Area Under the Curve) & ROC (Receiver Operator Characteristic ). We use them to describe the curve between Sensitivity and 1-Specificity using different threshold values. And whichever algorithms provides the largest area in different curves we select it for modelling with our data.

Decision Tree
When the data gets complex, our Linear models fail to separate the data into different classes. That introduces the need for non-linearity. Decision tree provides us with the way to introduce non-linearity in our data. Here, we can split the data based on different decisions. We can use the decision tree for regression, classification and filling missing values. The advantage of the decision tree is that it is easy to implement and explain. But as soon as condition increases the model becomes complex that introduces high variance in a model that leads to over-fitting issues in the Decision tree.

Bagging
When decision trees were facing the over-fitting issue, there comes the concept of ensembling. Bagging helps in overcome the over-fitting problem by making the model less complicated. Here, we use the concept of bootstrap samples, where we take sample data from the dataset with replacement. Means the next time we might get the same sample again.

We take bootstrap samples (with replacement) of the dataset with similar columns and based on different bootstrap samples, we create different decision trees and take consolidated output. This approach of taking consolidated helps in making model less variance as we take the consolidated result, but the model becomes much hard to explain.
Random Forest
In bagging, We were taking sample data with a similar set of columns that leads to almost identical models as the root node of the data will be same, and accordingly, the splits will follow. Here in the Random forest, we take random samples with a random set of columns that lead to the different root node in different models and finally provides better accuracy. Random forest is much useful when there is over-fitting in our model. This approach is also helpful in different competitions.
Adaboost
When there is the under-fitting issue in our model, we use boosting models. These are the sequential models. Boosting approach helps in boosting the output of incorrect output samples in giving the higher priority to those samples. Adaboost (Adapting Boosting) is one of boosting mechanism, which is based on Decision trees and here we try to give the higher weightage to the sample, which is giving the wrong prediction.
Final Model= W1M1+W2M2+W3M3+…..WnMn

Here, M1 is the first decision tree, and W1 is the weight associate with M1. Each model will help in correcting the set of sample predictions. Finally, We combine all the models to get the final predicted values for our samples.
Gradient Boost
Gradient Boosting is an improved mechanism to Adaboost. Here, We try to increase/boost the model accuracy by decreasing the error. The first model tries to find the residual/error, and the next model attempts to explain the residuals. In this way, error starts reducing in the upcoming next model in sequence. Gradient boost takes care of the under-fitting issue in your model.
XGBoost
XGBoost is like all-rounder. It takes care of both under-fitting & over-fitting issues. It has many other features like missing data treatment, in-built cross-validation mechanism and regularization.
In Conclusion
Finally, We have covered all the powerful algorithms concepts of machine learning. We observed that Random Forest is a good fit for over-fitting issues, and gradient boost is for under-fitting issues. And, at last, we introduce to XGBoost that take care of both under-fitting and over-fitting issues.
If you are an enthusiast towards Deep Learning Techniques, I have some excellent latest resources that might give you a better insight into Deep Learning Capabilities.
- Build & Deploy Custom Object Detection Model
- Increase Frame Per Second in Object Detection
- Data Augmentation in Object Detection
- Data Labeling for Object Detection
- Language Translation Model Deep Learning
- Tips for Azure Data Scientist Certification DP-100
Keep exploring Data Science!