Finding patterns in deaths in Game of Thrones using Machine Learning

Valar Morghulis (All men must die) is the most haunting phrase for any GoT fan. Barely an episode goes by without a slack-jawed shocking death. Most of the characters including the prominent ones meet a violent end. But does the death come at random or does it come to selected characters who exhibit similar features? Is there some pattern among those who die or those who manage to survive?

When you play the game of thrones, you win or you die.

Machine learning is a technique which allows computers to find hidden insights without being explicitly programmed where to look. It learns from sufficient number of past examples to make predictions for the future. In this project, I applied Machine Learning to Game of Thrones dataset on Kaggle to identify features which influence deaths of characters. (The dataset is based on the book “A Song of Ice and Fire” and not the show.)

Because real fans read books.

DATASET

The dataset consists of a total of 27 features (title, gender, culture, age, nobility, presence in each book, number of dead relations, popularity etc.) for ~2000 characters. It also describes if the character is alive or dead by the end of book 5 (A Dance with Dragons).

FEATURE SELECTION

I had a total of 685 features after transforming the categorical features into numerical ones. Then, I used SelectFromModel with a linear Support Vector classifier to select the best 32 features.

CROSS-VALIDATION AND HYPERPARAMETER OPTIMIZATION

I split the dataset into training and testing set (80–20). A linear SVC with 10-fold cross validation provided an accuracy of 0.76 on testing set. Tuning the kernel parameter increased the accuracy to 0.78. An optimised Random Forest Classifier provided a further improved accuracy of 0.82.

METHOD PERFORMANCE

I measured precision, recall and f-score using ratios of TP (true positives i.e. correctly predicted dead characters), FP (false positives i.e. alive characters predicted dead), TN (true negatives i.e. correctly predicted alive characters) and FN (false negatives i.e. dead characters predicted alive).

The predicted results of my model were:

+--------+-----------+--------+---------+
| Labels | Precision | Recall | F-Score |
+--------+-----------+--------+---------+
| Alive | 93% | 85% | 86.5% |
| Dead | 47% | 70% | 63.5% |
+--------+-----------+--------+---------+

IMPORTANT FEATURES

Using feature importance measure in Random Forest, the following features were found to be most contributing (sorted from most to least contributing):

  1. Number of dead characters to whom a character is related
  2. Character’s appearance in the book “A Feast for Crows”
  3. Character’s appearance in the book “A Dance with Dragons”
  4. Gender of the character
  5. Character’s appearance in the book “A Game of Thrones”
  6. Character’s nobility
  7. Character’s appearance in the book “A Storm of Swords”
  8. Title (Social status) of the character
  9. House to which a character belongs
  10. Character’s appearance in the book “A Clash of Kings”
  11. Popularity of the character

EXPLORING RELATIONS BETWEEN FEATURES AND DEATH

Is there a relationship between survival and number of dead relatives?
Does appearing in more books relate to survival?
Does belonging to a noble family make you prone to death?
How does house relate to survival? (Only houses with more than 10 members have been considered.)

Brace yourselves, Winter is Coming! And so are more deaths and plot twists. While we can only wait for The Winds of Winter to come out to see what happens next, this analysis might help us be prepared. Because the night is dark and full of terrors.

I will be collecting more data and performing further analysis on the above results to predict the fate of characters in coming books. You can find my code on Github and connect with me on Twitter to discuss cool ideas. Let’s get our geek on to show our love for the book and the show.

Unless he kills them all…