Data for Change

Heated Discussions: Predicting Conflict Intensity Using Climate Data

Building a machine learning model to predict the intensity of conflicts using a century of climate change data

Avril Aysha

Published in

Towards Data Science

9 min readFeb 11, 2021

Image via iStock under license to Richard Pelgrim

This story is part of a linked series tracking my progress through my first independent data science project. Find the previous post here and Jupyter Notebooks here.

tl;dr

Climate change is leading to increased political tensions and, some researchers speculate, is therefore driving increased armed conflict across the world. This project attempts to build a machine learning model to predict conflict intensity (measured as number of deaths per day) in India based on available Precipitation and Temperature data from the surrounding area (< 300km). The project concludes that it is not possible to accurately predict conflict intensity using local climate data. Nevertheless, it has stumbled upon some interesting meatballs of information along the way.

The Problem

Climate change is increasingly causing extreme weather patterns across the globe. These extreme weather situations, such as high temperatures, drought, floods, etc., are causing massive population movements and increased competition over land and other natural resources in some parts of the world. This competition can lead to political tension and may take the form of armed conflict.

Existing research in this field hints at the possible relationship between climate change and increases in armed conflict; for example here and here. However, most experts point out that this relationship remains largely speculative and would be hard to prove, since there are so many other contributing factors that drive conflict, such as geopolitical and/or socioeconomic factors.

Against this background, I decided to use my first Capstone Project (completed as part of the Springboard Data Science Career Track) to investigate whether a direct relationship can be established between climate measures and conflict intensity measures. Establishing such a relationship would be valuable to:

illustrate yet another reason to increase efforts to mitigate climate change as soon as possible,
anticipate humanitarian aid and/or military interventions in response to expected conflict developments.

The Data

This project combines data from two separate datasets:

the Uppsala Conflict Data Program Global Armed Conflict dataset containing information on 220,000+ conflict incidents between 1989 and 2019, and
the Global Historical Climatology Network dataset containing weather measurements from 115,000+ stations across the globe between 1889 and 2020.

Just plotting the conflict incidents and the weather stations already makes for a fascinating map:

It’s immediately clear that, on the whole, countries with many conflict incidents have very few weather stations, with the exception of a few countries like India, South Africa and Turkey.

Getting Down and Dirty

Now we get to the really meaty and most satisfying part of the project (remember: I LOVE MEATBALLS — especially when they show up in the right places). I spent over 70 hours on the Exploratory Data Analysis stage of the project; getting to grips with the large datasets, mastering the geopandas packages for working with spatial data, and figuring out which meatballs were actually worth staring at. Exponential learning curve, fo sho.

I’ll share three cool (and important) findings from this part of the project:

Meatball #1 — Availability of Climate Data

This was a tough one to swallow, but very important for everything that follows: not all weather stations have climate data for all years. A weather station may record only Precipitation for 2 decades, then record only Average Temperature for a few years, then nothing for a while…you get the point.

Especially the years before 1950 and after 2012 show poor coverage, which would impact the comparisons we’d be able to make across conflict incidents, which took place between 1989 and 2019.

Meatball #2 — Seasonal Peaks

The second meatball is more interesting: plotting the count of armed conflict incidents by month over the entire observation period (1989–2019) shows:

an overall increase in number of conflict incidents — which may be due to increase in actual conflicts or (more likely) an increase in coverage and recording, and
consistent seasonal peaks

Could there be something about seasonal weather patterns that increases the likelihood of conflict incidents?

Meatball #3— The Power of Maps

This final meatball is particularly powerful. The map below shows all conflict incidents, coloured by the year in which they took place, with the marker size increasing depending on the total death count of that incident. The massive purple blob in the center of the image is the superimposition of the incidents constituting the Rwandan genocide. Other atrocities, like the Srebrenica massacre and the 9/11 attacks also jump out.

I have to admit I actually choked up when this plot materialised itself onto my screen at 2AM in the morning. It’s a poignant reminder both of the power of maps to communicate information and the fact that the impersonal, sterile four-letter word we use to refer to the massive amounts of information sitting on our computers always ties back to actual human lives (or deaths) — in other words: it matters.

Hocus Pocus by Focus(sing on India)

If you don’t get that epic subculture-reference, then please pause and go to this link to enjoy six minutes and 42 seconds of life-changing musical interlude.

You back?

No need to thank me.

All these global patterns are fascinating, of course, but at this point in the project I was getting tangled up in the mass of spaghetti and possible routes to take (it turns out not all spaghetti strands lead to Rome). So I decided to narrow down the scope of the project to a single country and FOCUS. It was time to go to India.

I selected India as a case study because it had the necessary overlap of both conflict and climate data: 15,000+ conflict incidents and 3,800+ weather stations. Huge potential……

….not! It turns out that only 25 of these 3800+ weather stations had sufficient climate data for the years in which the conflict incidents took place. Luckily, these 25 stations were evenly spread throughout the country and there was at least 1 station in the vicinity of each distinguishable cluster of conflict incidents.

I ran a CkdTree classifier to match each conflict incident with its nearest of these 25 weather stations. I could then calculate a bunch of new climate features that would give us a sense what the patterns in climate change in the area around each conflict incident looked like.

Let’s Correlate.

In need of another interlude? I got you, boo — you know triangles are my favourite shape, too.

Digging into the correlations between our original and new features revealed three important things:

There are no significant correlations between the total death count and any of the climate features (new or old). This means it’s going to be pretty hard to build a predictive model…
There is some correlation between a conflict incident’s total death count and its duration (measured in days). This makes sense — longer conflict are more likely to lead to more deaths.
The climate features are almost all highly correlated with one another.

Both the total death count and the duration in days could be used as indicators of conflict intensity, the thing we are trying to predict. To collapse them both into one variable — and to simultaneously account for the correlation between them— I decided to reformulate our target feature as death rate: the number of deaths per day of the conflict incident.

Our final dataframe had 14364 observations and 33 features (1 target, 32 predictor).

SuperModel

We’re now ready to start building our predictive model. I built 6 different models to compare performance:

A Dummy Regressor that simply predicts the mean death rate for each conflict incident — this was going to be our baseline model against which to compare the performance of the actual models,
An out-of-the-box Linear Regression model
A Linear Regression model optimised using SelectKBest features
An out-of-the-box Random Forest Regression model
A RF Regression model optimised using hyperparameter tuning, and
A Lasso Regression model optimised using hyperparameter tuning.

The figure below contains a whole bunch of numbers that basically tell us that I might as well just have stopped at Step 1. For all practical purposes, none of the 5 machine learning models do any better than the baseline model, which is basically just an educated guess.

The project therefore concludes that

it is not possible to accurately predict the intensity of conflicts in India using the selected climate features.

Key Takeaways

Disappointing? A little. Unexpected? Not really. The existing research in the field already warned us that the relationship was going to be hard to pin down, and the Correlations heatmap that was generated during the EDA phase of the project basically already confirmed that.

So….time wasted? Definitely not. The project has left us with some important takeaways:

Firstly, climate data alone is likely not enough to predict conflict intensity, but may well prove a useful addition to conflict prediction models containing other features (socioeconomic, political, etc.). There is some marginal increase in predictive power over the baseline model;

Secondly, I think one of the most important findings is all the way at the top of this post, in the map of the weather stations and conflict incidents. It’s clear as day that there is a mismatch between where armed conflicts occur and where climate data is mostly collected. This is strong reason to advocate for additional research efforts in those parts of the world that are consistently experiencing armed conflict.

What’s Next?

The work is never done, of course.

I‘d say there are 3 main ways in which this current model could be improved moving forward:

The first is to expand the scope of the project, both beyond the Indian case study to other countries as well as beyond using only climate data to include socioeconomic and political factors. Patterns in other countries may be stronger, and including data other than just climate data will definitely improve performance.

The second is to source additional climate data. We saw that in the case of India, while we started out with close to 4000 weather stations, only 25 of those had sufficient climate records for the years we were interested in. Additional data would enable a more fine-grained modelling of climate patterns, which could well improve the predictive power of the model.

Finally, it would be interesting to reframe the analysis and compare areas with conflict to areas without conflict. In the current model, we compare conflict incidents with one another but it may well be that there are more pronounced differences between areas in which armed conflict occurs and areas where it doesn’t. This would mean changing our unit of observation from ‘conflict incidents’ to ‘countries’ or some other unit area. We could then build a model that predict the number of incidents per unit area. This would allow us to investigate the seasonal peaks we presented above (as meatball #1), which are peaks in conflict incident count not in death count. My intuition is that there might well be a stronger signal if we reframe the problem that way.

That’s all for now, folks. Please do reach out with any questions or feedback, especially if you are a fellow data scientist (or researcher in general) passionate about this topic — I’d love to connect!

Below is a link to a recording of my presentation of this project, please feel free to share that with anyone to whom it could be useful or instructive.

Last but not least, a big shout-out to my mentor Guy Maskall for his invaluable support and engaging feedback along the way!