PUBG Winner Ranking Prediction using R Interface ‘h2o’ Scalable Machine Learning Platform

Using Machine Learning & Deep Learning Algorithms to Predict Ranking based on PUBG

Yash Indulkar

Published in

Towards Data Science

11 min readApr 19, 2021

Important Points

PUBG stands for PlayerUnknown’s Battlegrounds, which is an online multiplayer game.
The ranking is important to understand the standing of various players based on the game they play.
PUBG is multiplayer gaming that is supported on various platforms with a huge number of players online every day.
The algorithms used in this research are Linear Regression, Random Forest in the category of Machine Learning & Deep Neural Network in the category of Deep Learning.

INTRODUCTION

PUBG stands for PlayerUnkown’s Battlegrounds, which is a multiplayer game that is available on various platforms which are Windows, Android, IOS, etc. The game features different modes, the three different modes are Classic, Arcade, and EvoGround. In classic, the player will be provided with different maps that range from Erangel, Miramar, Sanhok, and Vikendi. In the arcade mode, there are War, Mini-Zone, Quick Match, and Sniper Training. There are 555 Million Players worldwide playing PUBG on all different platforms, with this huge number comes the ranking difficulty. The basic match of Battle Royal consists of 100 people playing a match with only 1 Winner (who can have Chicken Dinner). Ranking these players on basis of different attributes becomes difficult as there is a possibility of some players with more than one similar ranking. Here’s where machine learning & deep learning comes in handy by analyzing various attributes and understanding the similarities between each to predict the ranking of players based on the trained model. The dataset acquired for the training and testing purpose is from Kaggle which is an open-source platform for gathering data related to different use cases, where they combined various matches (SOLO, SQUAD, DUO) with all different sets of attributes for understanding the use case.

https://www.kaggle.com/c/pubg-finish-placement-prediction/data

Training Dataset containing 29 different attributes- Image By Author

METHODOLOGY

This section of the paper deals with different methods used to obtain the desired outputs which are explained in the experimental results. The methodology is divided into 3 sub-sections that are:

A. Linear Regression

Linear Regression is a Machine Learning algorithm that uses the best fit line for making predictions, the target attributes used in this algorithm are always numerical. The data that is plotted for this type of regression is based on the Dependent variable that is on Y-axis and the Independent variable that is on X-axis which is shown below, concerning both axis the best-fitted line or plane or slope is created that satisfies those values. The dots that are around the slope are the actual values and the slope is the predicted values.

Linear Regression Graph with Dependent & Independent Variables- Image By Author

B. Random Forest Algorithm

Random Forest or Random Decision Forest uses bagging and boosting method that is part of ensemble learning which can be used for Regression and Classification purposes based on the use case. The decision tree is a single tree from numbers of trees that together make a forest of multiple decisions, that gives better results as compared to a single decision tree based on voting from all the available trees. The random forest hierarchy consists of various levels are Root nodes, Leaf, Child, Parent, etc. The working of the random forest or random decision forest is based on the majority voting with respect to the output, every tree has some output and the final output is based on the average of the outputs, which gives accurate results. These form a better expansion of judgments based on the training data, which gives better accuracy as compared to other models, the visual representation can be observed below.

Random Forest Algorithm with Final Class- Image By Author

C. Deep Neural Networks

Non-Deep Feed Foward Neural Networks

The neural network is part of a deep learning algorithm, it consists of 3 Layers. The first layer is the input layer, this layer consists of input nodes, that pass the value by assigning some weights and biased. The second layer is the hidden layer, this layer can depend on if the neural network is deep or non-deep, for the non-deep neural network the number of hidden layers is usually 1 or 2. The last layer is the output layer, this layer gives the output value from the input & hidden layer, depending on the use the number of output nodes is decided.

Single Layer Feed Forward Neural Network- Image By Author

2. Deep Feed Forward Neural Networks

The deep neural network is the same as that of the non-deep neural network, the difference occurs in terms of the number of hidden layers. The deep neural networks have hidden layers ranging between 2–8, the larger the hidden layer, the more complex the neural network is. This type of network is usually used to solve complex problems. The structure of the deep neural network is the same as non-deep with an input layer, hidden layer & output layer.

Multi-Layer Feed Forward Neural Network- Image By Author

LITERATURE REVIEW

The experimental section of this research paper consists of various tests and results obtained by performing iterations. The experimental results of this research paper consist of various iterations done on algorithms and different exploratory data analyses done on the Kaggle datasets of 29 attributes, which has a size of 446966 for Training Dataset and 1934174 for Testing Dataset. The system architecture for the experiment can be observed below.

Exploratory Data Analysis is done for understanding the visual representation of attributes related to target variables, which will help to make better judgments based on the importance of features. The first EDA was performed for understanding the number of players playing in which perspective, the perspective in PUBG is either FPP or TPP which can be observed from below Fig, FPP stands for First Person Perspective & TPP stands for Third Person Perspective, which changes the view of the user from a wide-angle to narrow-based on focusing straight through weapons from a different perspective.

TPP VS FPP Users Perspective- Image By Author

As can be seen from Fig, the majority of players play in the FPP perspective and very few players play in TPP because TPP offers a very narrow screen that becomes difficult for aiming. The second EDA was performed for understanding the category in which players mostly play, PUBG provides 3 categories which are SOLO, DUO & SQUAD which can be observed in Fig below.

Solo vs Squad vs Duo Users- Image By Author

As can be observed from Fig, most of the players play under the category of Squad, and then Duo, and the minimum number of people play Solo.

Kill Categories vs Win Place Prediction- Image By Author

The box plot in the above Fig shows the distribution of kills concerning the win prediction and it can be seen that win prediction increases with the increase in kills, nearly 3–5 kills favor 0.8 percentile of win prediction, and more than 10 kills in a match can give 100 %-win ratios. Another EDA was done for plotting the histogram for, weapons acquired by the players in-game (total of solo, duo & squad). This plot shows the number of weapons that were acquired by the player with distribution around the weapons. It can be observed from Fig below, the number of weapons was between 0 to 15 and had an average of 8 weapons per user.

Weapons Acquired vs The Prediction Place- Image By Author

The final EDA was done to understand the correlation of attributes concerning each and it was visualized with the help of a correlation matrix. The correlation matrix for the dataset can be seen in Fig below.

Correlation Matrix for PUBG- Image By Author

The correlation matrix shows how the attributes in datasets are correlated to each other, either they are highly correlated or either they are least correlated. In the above Fig 10, the positive 1 (+1) shows high correlation and negative 1 (-1) shows low correlation. With this EDA, important information was extracted from data through visualization. The next step was to extract the important features on which the model was going to train, these features if selected based on importance, could give better results as compared to features that are randomly passed to the model for training.

Important Features for Linear Regression Algorithm- Image By Author

The above fig, shows the features that are important to consider for the model creation on linear regression algorithm, it can be observed that features such as boost & heal are the most valued ones in terms of sum, features such as heal & boost together are low valued in terms of sum. This observation shows which feature to select for modeling the algorithm for better results. Similarly, the second feature selection was done for neural networks, for selecting better features that can increase the accuracy of the model as compared to random selection. This feature selection can be observed from fig below, which shows features based on training datasets. With the total sum as another parameter.

Deep Neural Network Important Features- Image By Author

The next step is applying the machine learning & deep learning algorithms and checking how it helps to predict the win percentage of users. The linear regression used in this research is Multi-Varied Linear Regression, this multi-varied is used because of different features extracted from the feature extraction. The equation for the multi-varied linear regression can be observed from below, which shows the features in terms of x, y &z, with w1, w2 & w3 as the weights on which the model will try to learn. The loss function applied for the Multi-Varied Linear Regression for optimizing the weights is Mean Squared Error (MSE), followed by the formula shown in below, which shows the loss function for obtained value for f (x, y, z) with the actual value.
In this research paper two neural networks were used, the one was based on Non-Deep Neural Networks & the other was based on Deep Neural Networks, hyper-parameter optimization was done for extracting the preferred number of nodes. The arguments that were passed for hyper-parameter optimization were.

· Epochs [10, 20, 50, 60, 100]

· Batch-Size [10, 20, 30, 40]

· With Cross-Validation=5

The optimizer that was used to optimize the results, also for solving the vanishing gradient problem was Rectifier Activation Function, this particular activation function is used for giving the results in terms of linear data for prediction of winner from the datasets. The stochastic gradient descent (single training data) is used with the help of backpropagation for learning, the formula for rectifier can be observed from (3).

f(x) = max (0, x) (3)

For understanding the loss of data during the training & validation, MAE is used MAE stands for Mean Absolute Error, which calculates the absolute total error that occurred during the prediction of appropriate data, the formula for the MAE can be observed from, that shows an absolute error with respect to the actual values and obtained values divided by the number.

In many gaming scenarios, some players can be cheaters, who may use invalid playing methods to gain rank and increase the experience points, also some players can be AFK (Away from Keyboard) the total number of players in the datasets is shown below.

Table II shows the MAE for the algorithms used in ranking prediction of PUBG datasets for the training datasets with all the algorithms.

MAE for Training Dataset- Image By Author

Table III shows the MAE for the algorithms used in ranking prediction of PUBG datasets for the validation (testing) datasets with all the algorithms.

MAE for Testing Dataset- Image By Author

From the above Table II & II, it can be observed that the lowest MAE was obtained for Deep Neural Network Algorithm and that was 0.02012 for the training dataset and 0.03121 was for the testing dataset. The basic working of MAE is that the less error occurred the more it is better for the data, with this concept it can be observed that the highest MAE was obtained for Linear Regression Algorithm and that was 0.08521 for the training data and 0.06295 in testing data and that was obtained for Basic Random Forest Algorithm. The visual representation of MAE for all the algorithms used in the ranking prediction with the help of the H2O package in R programming can be seen in Fig below.

CONCLUSION

The study aims to evaluate the ranking for the PUBG players based on machine learning & deep learning algorithms along with doing EDA for analysis of the dataset in a much better way. The algorithms used in this research are Linear Regression, Random Forest & Deep Neural Network, MAE (Mean Absolute Error) was calculated to check which particular algorithm was fitted best for the huge dataset. This evaluation was carried out for both the training and testing datasets, with this it was observed that deep neural network performed well with the MAE value of 0.02012 for the training dataset and 0.03121 for the testing dataset respectively. The highest MAE value was obtained for Linear Regression which was 0.08521 for the training dataset and for the testing data the highest value was obtained for Basic Random Forest with n_estimators=40, max_features=Sqrt. The average person kill was 0.9248 players, 99% of players have 7.0 kills or less, while the most kill ever recorded was 72 kills in a single match. The best algorithm among all was Deep Neural Network with the lowest error value for testing data.

REFERENCES

[1] Ding, Yong. “Research on operational model of PUBG.” MATEC Web of Conferences. Vol. 173. EDP Sciences, 2018.

[2] Rokad, Brij, et al. “Survival of the Fittest in PlayerUnknown BattleGround.” arXiv preprint arXiv:1905.06052 (2019).

[3] D’Souza, Lancy, S. Manish, and S. Deeksha. “Development and Validation of PUBG Addiction Test (PAT).” (2019).

[4] Mamulpet, Madhurya Manjunath. “PUBG WINNER PLACEMENT PREDICTION USING ARTIFICIAL NEURAL NETWORK.”

[5] Melhart, David, Daniele Gravina, and Georgios N. Yannakakis. “Moment-to-moment Engagement Prediction through the Eyes of the Observer: PUBG Streaming on Twitch.” International Conference on the Foundations of Digital Games. 2020.

BEFORE YOU GO

Research Paper: https://ieeexplore.ieee.org/document/9396823