One Graph To Rule Them All — The Lord of The Rings Network Analysis

Alon Cohen
Towards Data Science
6 min readNov 17, 2019

--

Link to the final results → LINK

The First time I watched The Lord of The Rings was in 2001 and I felt my life changed forever. It’s been a long time since 2001 but I can enjoy watching the extended trilogy until this day. Since I watched the movies too many times, I decided to do something different and to combine The Lord of The Rings with the world of data analysis.

How The Lord of The Rings and data analysis can be combined? It’s a good question…

As same as every data analysis project, it’s all starts with the data.

A quick search on the web got me to the site One Wiki to Rule Them All, a site that serves as J. R. R. Tolkien’s books encyclopedia. Fortunately, the site contains thousands of pages with details about characters, places, races and historical fiction events.

In addition, it allows anyone who wants to scrape its pages using a crawler. How do I know it? Because in every site “robots.txt/” can be added to the URL to check if the site allows scraping and under what terms. (link to the site’s robots.txt, for further reading about robots.txt)

For starters, I had to reach the site and scrape all the pages. I decided to scrape only pages of characters and to focus on them.

What data can be scraped?

Each page contains text, photos, links and in many cases there was a biographical data. I was particularly interested in the biographical data because I assumed the fields it contains are similar in all of the pages so I can arrange it into one data frame.

Let’s start with the biographical data.

After scraping all the character pages, that’s how the data frame looked like: (plotting the data frame using missingno)

As you can see there are many columns, but most of them are half-full or almost empty. After some cleaning and rearranging of the data, I run some analysis for the following fields: Weapon, Realms, Race, Gender, Culture.

Biographical data

The graphs above show the distribution of the characters’ races and origins. As the race graph shows, men are most common, then hobbits, elves, dwarfs, orcs and so on. During the analysis, I encountered characters I didn’t know about. For example, did you know there are 3 different character that belongs to the Balrogs race?

In the cultures graph, unsurprisingly, the most common cultures are the men’s cultures with Gondor and The Rohan at the top. This graph is correlated with the races graph and it tells the same story.

The gender graph shows there are many more males than females (almost 5 times more). It’s more likely that if the books were written in our days and not in the 20 century, then the graph would be more balanced.

The last graph presents the distribution of weapons. To create this graph I had to normalize the names of the weapons. In most cases, a single type of weapon had several versions. For example, under the “sword” category there was “king’s sword”, “elven sword”, “rohirrim sword” and more.

Text

NLP is a fascinating world in which one can extract insight from text using a wide range of algorithms. Since I wanted to stay focused, I decided to create only one graph that emphasizes the power of text.

The graph above was created using the words from Gandalf’s page. The words arranged in the shape of the ring and their size is relative to their frequency in the text.

Links

Links may not look like a powerful data source but in my opinion, it will be an underestimation. Using links analysis, we can point out several insights. For example: A) how popular a character is and B) what are the connections and the relations between the characters.

This graph is very interesting because it teaches us that the most mentioned characters are not necessarily the central characters. Sauron and Gandalf are important characters but in my opinion, they are less important than Frodo. Nevertheless, Frodo ended in the ninth place while Sauron and Gandalf take first and second places accordingly. The rank isn’t teaching us about the centrality in the story but it teaching us about the character’s connectivity.

In the network theory, Centrality Degree is a metric that uses to measure the centrality of nodes in a network. The more nodes pointing at you, the more central you are. The graph above is different from the previous graph because it measures the centrality of the characters by the number of unique links rather than naive links count.

An additional metric that teaches us about a node’s importance is the Betweenness Centrality Degree. This metric measures how important a node is to the shortest paths through the network. For example, Brooklyn Bridge is not a central location because of people who visit it but because of people who walk through it on their way to somewhere else.

In order to get some insights regarding the character connections, there are several communities detection algorithms. In the next example, the graph shows the characters colored by the detected communities using the Louvain algorithm.

Top 100 Characters Network
Louvain Detected Communities (Top 100)

Since the data I scraped already contained many details about the characters, I saw no use for these complex models in our case. The characters could be divided by their races or by their cultures.

Lord of The Rings Network
Characters Details

Link to An Interactive Graph

In the above graph, you can see the connection between the characters in a network graph. In this interactive network (made using pyvis) when mousing over each node, the character’s biographical data is revealed. When mousing over the edges, the numbers of references is revealed and this can teach us about the quality of the connections between the characters. The nodes size is relative to their centrality degree and their location was calculated based on their connection using Barnes–Hut simulation algorithm.

Thank you all for reading my article, if you have any questions, comments or ideas for improvement, please leave a comment below!

Code
Alon Cohen

--

--