AI for Earthquake Damage Modelling

Case study on how artificial intelligence and predictive analysis can help in faster damage recovery from earthquake

Published in

Towards Data Science

5 min readJul 22, 2019

In April 2015 Nepal was hit by a massive earthquake with a magnitude of 7.8Mw or 8.1Ms and a maximum Mercalli Intensity of VIII (Severe).According to the Nepal open data portal it affected 3,677,173 individuals and 762,106 properties.Post disaster it took years for Nepal to collect and asses the damage which in terms results in one of the world’s largest damage assessment data.After a large scale disaster like earthquake the recovery is generally modeled over two phases

Collection of demographic,architectural and legal data
Damage assessment by domain experts using this large scale and noisy data

Based on aspects of building location and construction, our goal is to predict the level of damage to buildings caused by the 2015 Gorkha earthquake in Nepal.

How the data looks?

In this case study we have used structural,ownership and damage data in order to prepare train and test data sets.The raw data is obtained from the open data portal,Nepal. In case you want to use my prepossessed data you can get it from the below link(End Notes section). Now let’s take a closer look at the cleaned data

shape of the cleaned training and test data

The data is imbalanced with 60% ‘high’ damage grade,22% of ‘Low’ damage grade and 18% of ‘Medium’ damage grade.To deal with the imbalanced data it is sampled manually .From the initial cleaned 700k data 100k data has been sampled of each class and a training set of 300k datapoints has been prepared for training.Stratified sampling has been used to prepare the final train,test,validation data set.A very few data points contained missing values(< 30),so we ignored these data points.

Is Age a factor?

Our final data set is of 41 dimension.Our independent variables are either numerical,categorical or binary.We have analysed the numerical and categorical variable in order to gain insights over data.For example Let’s take a snapshot of how buildings developed over last 10 years were affected

Interestingly there are some properties where age is more than 950 years ! Are these outliers? As per Wikipedia there are few properties in Nepal which are actually that old. As per our data the number is 2521.

damage grade of properties aged over 950 years

Performance Metric

We are predicting the level of damage from 1 to 3(Low,Medium,High). The level of damage is an ordinal variable meaning that ordering is important. This can be viewed as a classification or an ordinal regression problem. (Ordinal regression is sometimes described as an problem somewhere in between classification and regression.)

To measure the performance of our algorithms, we’ have used the F1 score which balances the precision and recall of a classifier. Traditionally, the F1 score is used to evaluate performance on a binary classifier, but since we have three possible labels we used a variant called the micro averaged F1 score.

|TP| is True Positive, |FP| is False Positive, |FN| is False Negative, and |k| represents each class in |1,2,3|

Models and performance:

After preprocessing and data preparation we have started with a random model as base line. Tried various machine learning models like logistic regression,linear SVM with nystrome approximation(for kernel trick),Random Forest,Light GBM etc. We started with a very basic logistic regression model and complexity has been increased gradually.

To get the best from various models GridsearchCV and simple cross validation techniques has been used as necessary.In practice tuned logistic regresion, SVM and random forest models resulted micro average f1 score in range of 0.65 to 0.69. To get better score majority voting classifier and light GBM models have been developed. Let’s take a look how we can define a custom evaluation metric for multi class classification problems to apply light GBM

With proper hyper parameter tuning of lightGBM and majority voting classifier, we were able to obtain f1 score of 0.78 and 0.74 respectively. We have also tried various deep learning architectures (MLP, LSTM, 1D CNN) but the performance was poor compare to tuned machine learning models.

Here is a comparative view of obtained result

Real world Impact

The automated assessment can help two type of end users

Government agencies : government bodies can get a closer and faster approximated view on the damage caused by earthquake without manual intervention which can catalyze damage recovery.
Insurers : After a large scale disaster claims systems of Insurers are overwhelmed with large number of new claims.It become harder for claim handlers to go through all of the damage data and decide the damage severity.Integrating claims system with AI based damage assessment services will help claim handlers to look over a single index(damage grade) and decide the severity of damage.Which in terms can results in faster claims processing.

End Notes

You can find all the necessary files,codes and data sets on this case study in my GitHub repository.