The world’s leading publication for data science, AI, and ML professionals.

Life Insurance Risk Prediction using Machine Learning Algorithms- Part II: Algorithms and Results

Algorithmic Risk Prediction of Life Insurance Applications through supervised learning algorithms – By Bharat , Dylan , Leonie and Mingdao…

Algorithmic Risk Prediction for Life Insurance Applications through supervised learning algorithms – By Bharat , Dylan , Leonie and Mingdao (Jack)

In part 1, we described data pre-processing and dimensionality reduction for the Prudential Life Insurance Dataset. In this part we will describe the learning algorithms that we applied to the transformed dataset and the results that we obtained.

The link to the project GitHub repository is here

Algorithms

We have used four supervised learning algorithms on the dataset: Logistic Regression, Neural Networks, Random Tree, and RepTree. There are of course many other algorithms that can be used including XGB, SVM and one can also try out applying unsupervised learning algorithms like clustering on this dataset to see if there are any hidden patterns in the features. For this project, however, we used these four algorithms.

Before we discuss the results of the algorithms, here are the two measures that we have used to evaluate the performance of the algorithms:

Performance Metrics comparing Algorithm Performance
Performance Metrics comparing Algorithm Performance

MAE and RMSE are both measures of how much the predicted values are different from the actual values. While this is a Classification problem and strict classification error could be a better measure, in theory, if we make an assumption that the risk levels are ordinal,then MAE and RMSE are also acceptable measures for this dataset.

Logistic Regression

Logistic regression is a natural choice for a multi-class classification problem such as this one. We tuned the Hyperparameters of Logistic Regression using Gridsearch. The hyperparameters that we considered were the C value which is the inverse of the regularization factor lambda, for which we used 5 values in the logarithmic scale ranging from 100 to 0.01, and the solvers newton-cg’, ‘lbfgs’ and ‘liblinear. Gridsearch yielded the optimal value of C as 100 and the solver as newton-cg. The code snippet for hyperparameter tuning is displayed below:

The complete code for Logistic Regression is shown here

We got the MAE and RMSE values shown in the table below for the test data. The CFS dataset had lower MAE and RMSE scores than the PCA dataset.

Results for Logistic Regression
Results for Logistic Regression

An MAE of 1.5 means that on average, the prediction from Logistic Regression will differ from the actual value by 1.5 levels. So if the actual risk rating is 5, the algorithm will produce a rating ranging anywhere between 3.5 and 6.5, i.e. 4,5 or 6.

Neural Networks

We chose a single hidden layer neural network and its architecture is shown in the figure below:

Architecture of a Single Hidden Layer Neural Network¹
Architecture of a Single Hidden Layer Neural Network¹

The complete code for Neural Networks is available here.

The Hyperparameters that we used for the Neural Network is shown in the code snippet below:

We got the MAE and RMSE values shown in the table below for the test data. The CFS dataset again had lower MAE and RMSE scores than the PCA dataset.

Results for Neural Networks (Image by authors)
Results for Neural Networks (Image by authors)

Random Tree and REPTree

Random Tree and Reptree are both Decision Trees but work in different ways. So some intuition about their mechanisms of action will be helpful. Here’s a diagram that illustrates the concepts:

Illustration of Random Tree² (Left) and RepTree (Right) (Image by authors)
Illustration of Random Tree² (Left) and RepTree (Right) (Image by authors)

On the left, we can see a decision tree. In a standard tree, each node is split using the best splitting feature among all variables. In a random tree, each node is split using the best feature among the subset of predictors randomly chosen at that node. So in essence, a Random Tree is a Decision Tree that selects features at random for splitting data at nodes. On the right, we can see a REPTree or Reduced Error Pruning Tree. It is an algorithm that does pruning to create a tree. Initially, a complete tree is constructed through a standard decision tree construction method. After this, the algorithm uses a hold-out dataset to compare each parent node and the children nodes pairs from the bottom-up direction to see if there is any sub-tree where the error of a child node is higher than the error of a parent node. For example, the error of one child node in the red-colored boxes is higher than the error of the parent node. In such a situation, REPTree will delete the children nodes and will make the parent node a leaf node.

We implemented the REPTree algorithm in R by using the rpart package, where the function rpart is used to build a single tree model and the function prune.rpart is used to prune the tree to avoid overfitting. We implemented the RandomTree algorithm in R by using the ranger package, where the function ranger creates a random forest but using only some of the attributes in the dataset. The results of Random Tree and REPTree are shown in the table below:

Results for Random Tree(Left) and REPTree (Right)
Results for Random Tree(Left) and REPTree (Right)

The complete code for Random Tree and REPTree is available here.

Comparison of the Algorithms and Conclusion

The table comparing the results for the four algorithms for CFS and PCA is shown below:

Comparison of results of Algorithms for CFS and PCA
Comparison of results of Algorithms for CFS and PCA

For this dataset, CFS is a better dimensionality reduction method than PCA as it results in lower MAE as well as RMSE across all learning algorithms. REPTree results in the same values of MAE and RMSE for both methods but for other algorithms, CFS turns out to be better than PCA.

Within CFS, Neural Networks yield the lowest MAE and RMSE and within PCA, it’s again Neural Networks. So overall, Neural Networks is the best Supervised Learning algorithm for this dataset.

As next steps, one can investigate the following questions:

  1. What are the features or combinations of features that lead to high-risk classifications ?
  2. Are there customers who are similar to each other? If so, can we quantify similarity in a multi-variate setting like this one?
  3. If we apply the results of these learning algorithms on risk rating data from other geographical locations and periods, do we get similar results?

Please feel free to post any queries/comments that you might have about our study. Hope you could learn something that you can apply for your work/project from our post!


[1]Dake, Mysid, CC BY 1.0 <https://creativecommons.org/licenses/by/1.0>, via Wikimedia Commons

[2] T-kita at English Wikipedia, Public domain, via Wikimedia Commons


Related Articles