Predict League of Legends Matches While Learning PyTorch (Part 2)

Learn to implement a simple feedforward network in PyTorch and train with a GPU for a niche use case scenario, with a little touch of theory along the way

Richard So
Towards Data Science

--

Part 2 of this little series, handrawn edition!

Hello, fellow reader! If you haven’t read the first part of this 2-part “series”, I highly recommend it before reading this. You can do so here or below 👇

Last time, we left off with making a logistic regressor with PyTorch for this same purpose. This time, we’re kicking things up a notch: creating a feedforward neural network (one with only fully connected layers). If you want to know a little more about the intentions of doing this, the dataset to be used for this mini project, and/or the data preparation process for the dataset, then you should check out my first article here or above.

League of Legends is one of my all-time favorites, even if I honestly do suck at it. LOL is an extremely competitive MOBA, where two teams of 5 (the blue and red team) are pitted against each other to destroy the base (nexus) opposite of theirs. Winning usually requires a great amount of teamwork, coordination, or for a tilted player, “luck”. Regardless, it is not too hard for a League player (even if they’re pretty new) to tell which team’s probably going to win, based on the number of kills, deaths, and numerous other stats that the game keeps track of. Something that a neural network can predict……

Wait, what’s a neural network again?

Psst! If you are not looking for learning some theory, feel free to skip over this section. You’ll just miss some of my own drawings :(

We’ve seen how a logistic regression model could do fairly well on prediction last time (it accomplished up to 74% accuracy on the test dataset). In fact, a logistic regressor is almost exactly a linear regressor, which in itself is a matrix dot product between a batch of inputs, a weight matrix, with the addition of a bias vector. The changeable weights and biases are what enables the model to train and get better at what it’s doing. The only difference between linear and logistic regressors is the presence of a function that “squishes” the output into a range of values (usually from 0 to 1), for predictions involving a simple yes-or-no question (just like if a team in a match wins a game) or a classification problem. A logistic regressor is the one that uses this “squish” function, which is usually in the form of a sigmoid or softmax function.

The basic math behind a linear regressor, where the matrix operations synonymous of a linear regressor is wrapped with a sigmoid function. Drawn by me 😬 🔥

Well then, how does a neural network set itself up for further success? A vanilla neural network is, simply speaking, multiple linear regressors stacked together. Theoretically, this is supposed to allow the neural network to pick up on more relationships/trends between data to help with prediction. But it can’t be that simple! Without doing anything extra, chaining matrix multiplication and addition only gets us nowhere. Take a look:

This is what happens when you try to directly chain linear regression operations directly.

You can see that chaining two linear regressions is synonymous to only one, just with different weights and biases. So, how do we solve this? We introduce a non-linear activation function, which would wrap around each instance of a linear regression operation. Not only does it resolve the prevalent issue above, but it also mimics (in a sense) how biological neurons work. For instance, a neuron determines if a signal surpasses a set threshold to pass the signal forward to the next neuron. Similarly, an activation function would determine and regularize the final output of a layer of a neural network. I could elaborate more on activation functions, but we’ll be diving way too deep!

By the way, to align our usage of vocabulary more towards convention, from now on let’s call each instance of a linear regression between the input and output a hidden layer of the neural network, and each individual weight and bias a node. With this in mind, this is what a neural network “looks” like:

Or, you can google search `neural networks` and you’ll see much better images!

Alright, now let’s get back to coding! If you want a more intuitive approach to neural networks & much more, check out 3blue1brown’s video series on deep neural networks!

Making a feedforward neural network

Tl;dr for what we just went over: a neural network is basically multiple linear regression operations (hidden layers) chained together, with an activation function after each layer. Here’s what it would look like when defining the model:

The input size will be 29 for each of the features (see first article), and the output size will be 2, each being a prediction if the team wins or loses, respectively.
We now have multiple `nn.Linear` instances when we initialize the model, and we’ll pass the input through each layer and `F.relu()` (more on that later).

Cool, but what’s F.relu()? Rectified Linear Unit (ReLU) is one of the many activation functions used in deep learning, and it performs very well compared to other alternatives (e.g. Sigmoid). If you want to know more about ReLU and other activation functions, go check this article out. PyTorch provides a plethora of activation functions in torch.nn.functional (usually imported as F), so be sure to check their documentation as well to see the options you have for your own usage.

We’ll be using the SGD optimizer and the cross entropy loss function for training the model. We define the training loop below:

We are defining a lot of functions here to make the training loop. There are provided comments to show you the purpose for most lines of code.

Training on a GPU

As neural network models get more complex, the computational demands for training these models rise astronomically high. Graphics Processing Units, known as GPUs or graphics cards, are specially designed to undertake mass matrix operations. If you hadn’t known already, unless enabled, PyTorch was always using your CPU for computations, which is definitely not as efficient as GPUs. This time, we’ll find how to harness the GPU to crunch numbers for our neural network.

Before we start, only NVIDIA GPUs are supported, sorry AMD fans 😢.

PyTorch offers a function torch.cuda.is_available() , which outputs a boolean indicating the presence of a compatible (NVIDIA) GPU with CUDA installed. You could go through the setup process if you have a supported GPU, or you can make a kaggle or google colab account and have access to a free GPU for deep learning purposes (with some limitations, of course). Let’s use the is_available() function to setup for GPU use, but fallback to the CPU if a GPU is absent:

Torch.device(…) is how you refer to the available hardware in PyTorch.

With PyTorch you can move data in and out of the GPU device by using the .to() method with any tensor or a model. So, to start working with GPUs, you first have to move your model to the GPU:

We initialize the model `LOLModelmk2()` and move it to the GPU by using the method `.to(device)`, with device = `torch.device(“cuda”)`

Now, we start training:

Testing the model with test data before training. Loss hovers at around 16, and accuracy at 50%.
You can see a sharp decrease in validation loss and a spike at accuracy.
The trend continues in a very small magnitude

And here are some pretty graphs below😁:

And, here are our results from the test dataset:

hmmm…

Hmm… It looks like our model performed quite the same compared to our linear regression model (74%). Now, we have some possibilities as to this outcome:

  1. Some piece of code is incorrect
  2. A neural network is generally worse than a logistic regression model
  3. The neural network was overfitting
  4. The logistic regression model was lucky in its training (which was possible since the dataset was randomly split into train, validation, and test sets for both the regressor and neural network)
  5. Using a neural network for this scenario may not be advantageous and we are experiencing diminishing return.

Well, let’s use process of elimination, shall we?

After a long debugging session, I couldn’t find anything wrong in the code (if you do find something, please let me know!!!), so #1 is out. #2 is likely not the case: we established earlier how neural networks are based off of linear regression models, which are basically logistic regressors without a sigmoid/softmax function. They should be able to draw more relationships out of the data, which calls for better accuracy, not the opposite.

#3 is much more probable than the other two, since a neural network is much more complex than a logistic regressor and is thus more suceptible to this sort of issue. Usually, overfitting could be resolved with the use of dropout, which simply means disabling a set fraction of the model’s nodes, picked randomly, while training. For PyTorch, that would mean initializing a nn.Dropout() layer in __init__(), and putting it in between the layers with ReLU. Here is the implementation:

we only have to initialize one instance of the `nn.Dropout` since it can be used multiple times in the forward function of the model class.
Still, the model’s accuracy on the test dataset remained in the low 70s.

Surprisingly, even this didn’t work, meaning that the model didn’t overfit the training data. Finally, to test our hypothesis of #4, I revisited my old notebook on the logistic regressor, and ran a few more trials with the model. It turns out that the logistic regressor’s 74% accuracy last time was pretty lucky. In fact, let’s look at the plot of accuracy over the # of epochs again:

The accuracy was pretty unstable for the most part, but overall it hovered in the low 70s, which is more similar to the later trials I’ve run on the logistic regressor, and the neural network in this article.

Conclusion

There’s a lot to be learned with the discipline of deep learning through this example. Mainly, deep learning is no voodoo magic; it can’t magically solve every classification problem you give to it. It can’t predict every single League of Legends match; in many cases, the first 10 minutes of a match isn’t enough to determine which team’s going to win (I can testify through my experience). Nonetheless, there’s a lot to be gained out of this experience, such as learning the concept of a neural network and implementing it in PyTorch, utilizing the GPU, as well as dropout in case the model overfits. In that note, I hope you enjoyed your journey with me on building a PyTorch model for this League of Legends dataset. Happy coding (and keep on playing League)!

If you want the source for the jupyter notebook used for this mini-project, look here: https://jovian.ml/richardso21/lol-nn.

--

--