League of Legends Win Conditions

Coding in Python to Understand the Most Important Factors in Winning a Game of League of Legends

Ankush Bharadwaj
Towards Data Science

--

Photo by Mateo Vrbnjak on Unsplash

Introduction

The e-sports community has been growing rapidly in the past few years, and what used to be a casual pastime has morphed into an industry projected to generate $1.8 B in revenue by 2022. While there are many video games in this ecosystem, few have been a staple of the community like League of Legends, with the game amassing over 100 million unique viewers during their 2019 World Championship.

Released in late 2009, League of Legends is a freemium MOBA (multiplayer online battle arena) video game created by Riot Games which generated a widespread competitive scene early on, with the first World Championship in 2011 generating around 1.6 million viewers. The game has since grown, both in popularity and gameplay, as Riot began to understand how changes could make the game more competitive and fun.

The current state of the game is quite complicated, and if you’re a complete newbie, you should check out this video. To summarize, a League of Legends match poses two teams of five players, each of whom control one unique character or “champion”, and ends when one team’s Nexus, located deep in that team’s base, is destroyed. Along the way, there are many objectives that a team can achieve, such as destroying turrets, killing neutral monsters like dragon and baron for team-wide buffs, and many more. Some objectives, such as destroying at least five turrets and one inhibitor, are necessary to win the game, while others, such as getting First Blood, are helpful, but not necessary. Through this project, I would like to better understand which of these objectives are the most important to win a game of League of Legends. To that extent, the question I posed is as follows:

What objectives are the most important win conditions for a game of League of Legends?

Gathering Data

Jumping in, I first applied for an app with the Riot Developer Portal and after my app was accepted, I browsed through the APIs tab to understand the type of data I could request. Unfortunately, there wasn’t a direct way for me to pull the last X number of ranked matches from a region, so I had to figure out a way around this.

My solution was to use a list of summoner names (usernames) to generate a list of recent matches for each player. Through a series of calls to the Riot API using the Python package Riot-Watcher, I populated a Pandas DataFrame of slightly under 10,000 rows with the most recent ranked League of Legends games played by the top 100 players in each of the five regions that make up the largest amount of the League of Legends player base. At a glance, the DataFrame looks something like this:

First Ten Rows from Matches DataFrame

In the first seven columns, a 0 indicated ‘False’ and a 1 indicated ‘True’, while in the later columns, the data encoded in the cell indicated the number of times that event occurred. Each row contained the stats of one team in a League of Legends ranked match. For example, in the first row, the team that did not acquire any objectives first and at all lost the overall game.

Exploratory Data Analysis with Heat Map and PCA

I first found that around 91% of winning teams destroyed the first inhibitor, 80% killed the first baron, 70% destroyed the first tower, 63% killed the first dragon, and 59% of winning teams began the game with First Blood. Already, it seemed like the most important win condition is destroying the first inhibitor, which makes sense, as destroying a team’s inhibitor puts pressure on their base and allows the opposing team to have more map control.

Next, I visualized the correlation across columns in my dataset:

Heat Map Correlation Across Data

I also pulled up the same correlation heat maps for each individual region represented in my data to compare correlations across different regions, hoping to notice some differences in play styles. Generally though, the correlation matrices looked very alike each other. A possible reason for this is that my data included matches played by the best players of each region, many of whom play on a professional level. Therefore, since good gameplay practices are consistent among the competitive community, the matches represented in my data involve players who navigate each game similarly relative to lower ranked players of each region.

I was now curious to see how well the variance in the data could be explained by fewer features than the ten I would be using to predict the outcome of a game. To that extent, I performed a Principal Component Analysis to understand how many features I could simplify my data into and still preserve most of the variance:

Ratio of Variance Explained per New Component

Over 80% of the variance in the ten predictor columns could be explained by half the amount of features. This was definitely interesting, and by associating each component with the original dataset’s columns, I hoped to understand which features were the most important in explaining the variance of the data, which could help me figure out which columns were most critical to whether or not a team would win.

Relation Between Dataset’s Columns and Principal Components

The components that were used to generate the above heat map were from a PCA object with six components, as I wanted the components to explain more than 90% of the variance in the data. It appeared that the number of tower kills, inhibitor kills, and whether or not a team destroyed the first inhibitor were the most important features in determining variance in the data, as the first component explained 40% of the variance and the three aforementioned columns were weighted the most for this component.

To reiterate the insights I had gathered at this point:

  • From my correlation heat map, whether or not a team destroyed the first inhibitor, how many tower kills a team had, and how many inhibitors a team had destroyed all had the highest correlation with winning.
  • From my PCA analysis, whether or not a team destroyed the first inhibitor, how many tower kills a team had, and how many inhibitors a team had destroyed played the largest role in explaining the variance in the data.

Data Modeling with Logistic Regression

I used a Logistic Regression model to understand the win conditions for a ranked match of League of Legends. My process was to first split my data into a set of features and a set of targets, where my features were all the columns except for the ‘win’ and ‘region’ columns, and my target was the ‘win’ column. I then split my data into a train set and a test set, ran them through a Logistic Regression model, and checked the classification report and confusion matrix to ensure a relatively strong predictive ability. When the Logistic Regression model was run on the overall dataset, the model’s precision and recall were .86 and .85 respectively.

From here, I performed Logistic Regression on subsets of the data that included only one region, such as matches that were only played in NA, BR, etc., and recorded the model’s coefficients in a Pandas DataFrame. This DataFrame was then visualized so I could compare the different regions:

Log Regression Coefficients Across Regions and Overall

Regression coefficients describe the relationship between a predictor variable and the target variable. For example, when looking at the First Blood predictor variable above, a team getting First Blood was a moderate predictor for the outcome of the game, as a team that achieves First Blood is more likely to win. On the other hand, Rift Herald Kills were actually related in the opposite direction (except for EUNE), and teams that get more Rift Herald Kills are more likely to lose.

Using this analytical process, I understood which columns were more predictive of a win, helping me answer my question regarding win conditions in a game of League of Legends.

Conclusion

As a result of my project, here are the conclusions I made:

  • In order of greatest to least, First Inhibitor, First Tower, and Tower Kills were the most important win conditions across the dataset, according to my Logistic Regression model.
  • In order from greatest to least, Tower Kills, First Inhibitor, and Inhibitor Kills were the most important win conditions across the dataset, according to my correlation heat map.
  • Although NA and EUW teams that get the first baron were more likely to win, teams in these regions were more likely to lose with increasing numbers of baron kills.
  • The fact that teams in NA were more likely to win off a First Dragon compared to other regions perhaps indicates that games in NA were more prone to snowballing (when a team expands on a small advantage over the game to win) from dragon buffs and fights around dragon.
  • KR games weren’t unevenly impacted by one feature. This could indicate that KR players understand how to play from behind better than players in other regions, prompting a team to win off a combination of objectives more often than in other regions.

If there are any other interesting observations you think I missed out on, let us know in the comments!

The GitHub repository that contains my code and the dataset I put together for this analysis can be found here.

--

--