Hands-on Tutorials

Why read this article?
A simple python based tutorial for Data Analysis and machine learning for personal improvement through Riot API.
If you are reading this article you might be a fan of League of Legends, a popular online game of the genre MOBA (multiplayer online battle arena). Or maybe you’re interested in the possible applications of machine learning and data analysis in the world of online gaming. This game is developed and published by Riot Games.
With millions of players around the world, this game boasts a wide audience of amateurs and professional players. It takes advantage of the game’s constant evolution and statistical complexity. The basic principles of the game are (very) simple. With thousands of variables and possible scenarios, each game is different.
Given the plenty of data, the extraction of important ones can allow you to get interesting information about your style of play. You can extract information useful to improve your game-style or predict what will be your future performance.
Many sites such as op.gg provide a huge amount of data, analysis and graphics. That’s great, but in case you want to develop custom or more complex models you’ll need some manual work.
The Riot Games API is a REST API that provides useful data to the developers for building your own applications or websites.
I recommend you to read the documentation before start programming to avoid violating the legal terms of service. You might also avoid problems with the data request rate.
In the next sections we will see how to:
- Extract useful data from the Riot API
- Process data to obtain useful information
- Create simple predictive models
Extract useful data from the Riot API
Let’s start by installing and importing some basic libraries. In case you are working with a Google Colaboratory notebook, you will not have any problems, otherwise, you will need to install the individual libraries according to your operating system.
!pip3 install riotwatcher
!pip install -q seaborn
!pip install -q git+https://github.com/tensorflow/docs
import numpy as np
import matplotlib.pyplot as plt
import pathlib
import pandas as pd
import seaborn as sns
import tensorflow as tf
import time
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_docs as tfdocs
import tensorflow_docs.plots
import tensorflow_docs.modeling
from riotwatcher import LolWatcher, ApiError
For data extraction we use RiotWatcher, is a thin wrapper on top of the Riot Games API for League of Legends. It is necessary to use the Riot API key, to be generated again every 24 hours. Remember that this key is personal and should not be shared. Let’s start by extracting some information about a player (or summoner): let’s get the rank of the desired player
lol_watcher = LolWatcher('%YOUR RIOT API KEY%')
my_region = 'euw1'
me = lol_watcher.summoner.by_name(my_region, '%YOUR SUMMONER NAME%')
my_ranked_stats = lol_watcher.league.by_summoner(my_region, me['id'])
print(my_ranked_stats)
Let’s extract an updated version of champions, objects, summoner spells and any other desired properties and therefore our match history:
versions = lol_watcher.data_dragon.versions_for_region(my_region)
champions_version = versions['n']['champion']
summoner_spells_version=versions['n']['summoner']
items_version=versions['n']['item']
( ... )
current_champ_list = lol_watcher.data_dragon.champions(champions_version)
( ... )
my_matches = lol_watcher.match.matchlist_by_account(my_region, me['accountId'])
We have at our disposal a huge amount of data, the importance of which is highly subjective. More features will lead to a more complex, but more accurate model. To obtain a really accurate analysis it is necessary to have information on as many games as possible to better fit in our models and make the results more plausible. Let’s extract data from the last 100 games and define a series of Pandas data-frames containing all the main information.
n_games = 100
Games = {}
Game_duration=np.zeros(n_games)
Damage = np.zeros(n_games)
(...)
j=0
cont=0
while cont<n_games:
try:
last_match = my_matches['matches'][cont]
match_detail = lol_watcher.match.by_id(my_region, last_match['gameId'])
participants = []
for row in match_detail['participants']:
participants_row = {}
participants_row['champion'] = row['championId']
participants_row['win'] = row['stats']['win']
participants_row['assists'] = row['stats']['assists']
( ... )
participants.append(participants_row)
Games[j] = pd.DataFrame(participants)
champ_dict = {}
for key in static_champ_list['data']:
row = static_champ_list['data'][key]
champ_dict[row['key']] = row['id']
summoners_dict = {}
for key in static_summoners_list['data']:
row = static_summoners_list['data'][key]
summoners_dict[row['key']] = row['id']
Summoner_name = []
for row in match_detail['participantIdentities']:
Summoner_name_row = {}
Summoner_name_row=row['player']['summonerName']
Summoner_name.append(Summoner_name_row)
i=0
for row in participants:
row['championName'] = champ_dict[str(row['champion'])]
row['Summoner_name']=Summoner_name[i]
row['Summoner Spell 1']=summoners_dict[str(row['spell1'])]
row['Summoner Spell 2']=summoners_dict[str(row['spell2'])]
i+=1
Games[j]= pd.DataFrame(participants)
for index, row in Games[j].iterrows():
if row['Summoner_name']=='%YOUR SUMMONER NAME%':
Damage[j]=row['totalDamageDealt']
Gold[j]=row['goldEarned']
( ... )
time.sleep(10)
j+=1
cont+=1
except:
cont+=1
At this point we have extracted all the data we are interested in: let’s proceed with the data analysis. Warning: a 10-second pause has been inserted in each loop to not exceed the maximum number of hourly requests granted by Riot API.
Data Processing
At this point we have at our disposal a large amount of data, apparently not significant: to obtain useful information you need to combine your interests in terms of "game properties" with modern algorithms of data analysis and machine learning, able to give effective answers. The data can give a lot of answers about the performance in-game, with the possibility to discover strengths/weaknesses or even predict the probability of winning based on your live game statistics! In the following paragraphs are presented as an example some simple analysis conducted on a reduced set of parameters to facilitate understanding, but you can easily generate more complex and interesting models. We can start by preparing a database that can be effectively used for data analysis: let’s see some simple features of our games:
dataset={}
dataset['Total Damage']=Damage
dataset['Gold']=Gold
( ... )
dataset['Victory']=Victory #Boolean
Whether our goal is to solve a regression problem (continuous output system) or a classification (discrete output system), it is necessary to divide the starting dataset into two separate datasets:
- Training set: this dataset is used for the training of the model (i.e. Neural Network). E’ important to have good performances of forecast on the training set, but at the same time, it is necessary to avoid the phenomenon of overfitting.
- Test set: set used for model validation during the training iterating process
train_dataset_raw = dataset.sample(frac=0.8,random_state=0)
test_dataset_raw = dataset.drop(train_dataset_raw.index)
train_dataset=train_dataset_raw.iloc[:,range(0,4)]
test_dataset=test_dataset_raw.iloc[:,range(0,4)]
train_labels=train_dataset_raw.iloc[:,4]
test_labels=test_dataset_raw.iloc[:,4]
Pair plots
sns.pairplot(train_dataset_raw, diag_kind="kde")

Why analyze such a graph? Because it allows you to have quick qualitative information about the respective relationship between the chosen data. For example, we might note that an increase in total gold owned leads to greater total damage, a relationship between the number of wins and game parameters, as well as a probability distribution of the individual magnitudes in their domain.
Cross-correlation matrix
For a more qualitative analysis of the correlation between data, we can refer to the correlation matrix. "There are several methods for calculating a correlation value. The most popular one is the _Pearson Correlation Coefficient_. Nevertheless, it should be noticed that it measures only a linear relationship between two variables. In other words, it may not be able to reveal a nonlinear relationship. The value of Pearson correlation ranges from -1 to +1, where +/-1 describes a perfect positive/negative correlation and 0 means no correlation. The correlation matrix is a symmetrical matrix with all diagonal elements equal to +1"
corr = dataset.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
f, ax = plt.subplots(figsize=(11, 9))
cmap = sns.diverging_palette(230, 20, as_cmap=True)
sb.heatmap(corr, mask=mask, cmap=cmap, vmax=0.9, center=0, vmin=-0.2,
square=True, linewidths=.5, cbar_kws={"shrink": .5}, annot = True)
plt.show()

Who knows the problem could realize that information like, for example, an increase of the damage with the possessed gold or the inverse trend between dead and Victory are correct, even if apparently trivial. However, the validation of these models with simple statements can lead to a more accurate application to more complex datasets.
Estimation of the probability of victory: a simple classification problem
First of all, let’s normalize the data:
def norm(x):
return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)
Model generation is facilitated by the powerful tools provided by the Keras and TensorFlow libraries. Let’s define a simple sequential model for our classification, with the following properties:
- Model: sequential
- Input Layer: 4 nodes layer
- 2 hidden layers: activation=’relu’ ; nodes = 16/32
- 2 Dropout layers: 0.2
- Output layer: activation = ‘sigmoid’; nodes = 1;
- Model properties: loss=’binary_crossentropy’, optimizer=’adam’,metrics=[‘accuracy’]
dumbo=(normed_test_data,test_labels)
model = Sequential()
epochs=700
model.add(Dense(16, input_dim=4, activation='relu'))
layers.Dropout(0.2)
model.add(Dense(32, activation='relu'))
layers.Dropout(0.2)
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',metrics ['accuracy'])
history = model.fit(normed_train_data, train_labels, epochs=epochs,validation_data=dumbo)
Then we evaluate the quality of the model prediction on our entire test dataset:

Qualitatively we can notice very good performances both on the training and on the validation set (associated with the simplicity of the studied case). Quantitively we get:
- Test loss: 0.4501 – Test accuracy: 0.9583
- Train loss: 0.0891 -Train accuracy: 0.9688
Graphically, we get:

We have an accuracy of 97% to correctly predict the outcome of the game based on the chosen game parameters! Obviously, the quality of the prediction will depend on the features chosen, the quality of the model and so on.
Conclusions
We have seen a simple application of riot API and we have developed a series of tools to analyze our skills in the game. We also predicted our future performance! This is a starting point to develop codes, applications etc. to take advantage of the huge amount of data generated in a League of Legends game.
Thanks for reading