How Geo-Mapping Helps Identify Trends in Global Terrorism

Stephen Stirling
Towards Data Science
3 min readAug 27, 2019

--

The United Nations estimates nearly 70% of the world’s population will be living in urban areas by 2050. Living in cities has obvious benefits such as public transportation infrastructure, attractions, shopping, and access to medical services. Conversely, disadvantages include increased costs of living, noise, light pollution, and limited space. Data science provides us with a multitude of methods to quantify the pros and cons of urban dwelling. The process allows us to make informed decisions using insights gleaned from the analytical process.

For this week’s article, I sought to determine whether cities offer increased safety when compared to rural areas. If this were to be the case, I next wanted to know which countries offered the safest cities. I arrived at this concept after completing a thorough comparison of different datasets. Instead of working backward from the narrative to the data, I decided to build a story around data. The data source used in this article contains more than 180,000 records. This provides more than ample information to create a narrative and derive meaningful insights.

The first step in attempting to answer the question of safety by city and country requires choosing the right indicators. This conclusion was determined by the available data. Kaggle is well known in the data science realm for maintaining a plethora of datasets ranging from wine reviews to baseball statistics. I selected the Global Terrorism Database, a comprehensive dataset maintained by the University of Melbourne’s National Consortium for the Study of Terrorism. The database contains detailed information about casualties, motivation, perpetrators, and geographical information.

The next step involved reading the Excel sheet with Python in a Jupyter Notebook. This particular dataset includes different levels of geographic indicators. To establish which countries had the highest rates of terrorist attacks, I decided to group by region. This was done using a Python package called Altair. This visualization library offers a robust feature set and impressive visuals.

The image above contains countries belonging to the three regions with the highest amount of terror attacks. Notably, France, the United Kingdom, and the United States, among others are excluded due to the comparatively low numbers of attacks in their respective regions. Instead of using raw numbers, this visual gives an idea of where attacks most often occur and identify the biggest regional contributors.

After establishing a foundation of roughly where attacks occur, the next step involves exploring the data to utilize another indicator. The pie chart in the following image depicts the proportionality of different types of attacks. While the data source has different levels of details for attack by type, only the highest-level categorization was used for simplicity. Matplotlib, another Python library, includes a technique known as “explosion” which allows for wedges to jut out.

When working with particularly slim slices from the pie chart, this technique allows for an easy visual intuition. Bar charts offer another option for displaying correlated groups, however, I felt as though a pie chart better exhibits proportionality.

While we have established a cursory understanding of countries comprising the lion’s share of global attacks, only a map can provide definitive answers for this case. The Global Terrorism Database stood out from other datasets because it contained longitude and latitude coordinates for every record. These coordinates are easily the most accurate geo-mapping option. On a global-mapping scale, ambiguity, incomplete names, and variation in spelling greatly constrain the ability to use other indicators such as cities and towns. Programs such as Tableau are adept at determining locations using these indicators, however, on a scale of more than 180,000 records, other approaches must be taken. In this particular case, there were still approximately 30,000 rows which the program could not interpret.

--

--

Returned Peace Corps volunteer in Ukraine, data visualization analyst, current business analyst at Stanford Graduate School of Business.