Hands-on Tutorials

Extreme Value Theory in a Nutshell with Various Applications

Published in

Towards Data Science

8 min readFeb 19, 2021

In the beginning when statistics was discovered to answer questions related to gambling win chances in the 18th century, normal distribution was a very satisfactory tool. For other various cases where you may be interested in studying the impact of large event for further understanding and future expectation, normal distribution will not do the work!. A lot of data can fit under this description, e.g. financial data where you need to study the impact of large financial losses and get its occurrence probability. With the rarity of such events normal distribution overlook it as it doesn’t happen, while Extreme Value Theory (EVT) appears to solve such problem by highlighting the extreme part of the data and model it separately to answer questions of interest related to the extreme events.

As any expression in statistics has the word “THEORY” gives an impression of having black box filled up with complicated/untouched content, that is same reputation related to the EVT. In this article we will preview simplified introduction to the EVT with various applications, at the end you will get general idea about EVT, why and when you need to use it?!.

Overview

this article will go as follow

Simplified introduction about EVT.
Listing different applications which implemented EVT.
Main packages related to the EVT in R with application on data of YouTube trending videos.

Introduction

“In cauda venenum” is the first sentence you can see in the book of Extreme value theory: an introduction by Laurens de Haan and Anna Ferreira, which is very expressive sentence about the nature of the data you are going to deal with when applying EVT. Extreme data usually has more important information towards the tail which reflects the true behavior. Kurtosis is the most suitable measure from the simple statistics to detect extreme data, where high kurtosis indicates heavy tail distribution while low kurtosis refers to rather light tail distribution. Still kurtosis isn’t enough to get accurate information about how heavy the tail, estimation of endpoint (if possible),..,etc.

Based on the EVT, for data to be considered and analyzed as an extreme data, the data has to have limit distribution for its sample maximum. Statistically speaking

The theory and fundamentals of EVT established by Fréchet, Ronald Fisher, Leonard Tippett, Richard von Mises and Boris Gnedenko. They specify the set of non-degenerate limit distributions of the sample maximum which is known as “class of extreme value distributions”,

As it’s obvious this class of distributions depends on one main parameter which is known as Extreme Value Index (EVI), this is the key parameter to understand the nature of the limit distribution. EVI divides the general class of extreme value distribution into three subclasses:

Positive EVI indicates to distribution with infinite endpoint, that means you are dealing with Heavy Tailed Distribution.
Zero EVI means that the distribution endpoint equals to infinity, which refers to Light Tailed Distribution.
Negative EVI refers to distribution with endpoint that is negative the reversible of the EVI, that indicates Short Tailed Distribution.

Usually extreme analysis begin with relatively large data, then it downsizes to analyze only extreme observations. There are two main approaches to select these observations, which are block maximum method and peak over threshold (POT) method. The block maximum method divides the data into several blocks to obtain the maximum of each block. It requires very large data set to have sufficient number of blocks. POT method is the more modern approach for modeling the extreme events, it works by specifying a certain high threshold and consider all observations above this point in the analysis. In the POT method it’s always critical to find the threshold and there are many ways to find it such as the Hill plot.

Applications

From the introduction you may have an idea about cases where you use extreme analysis. Briefly when you are interested in looking at extreme/irregular events that may not happened even before in the data, and the simple tool of kurtosis may give hint. But if it’s not clear yet, don’t worry! here I will give you several real applications with their conclusion and how EVT was incorporated in the analysis.

I. Limit of human life span

This application considers data about ages at death of Dutch residence who died between 1986 and 2015. Based on this data they wanted to decide the limit of human life span?!. Using the POT method, the EVI is estimated, by the maximum likelihood estimator, to be negative for both female and male, which strongly indicates the existence of finite endpoint for the age distribution. The endpoint is then estimated by 124 years old for female and 125 years old for male. For detailed look on the analysis and data you can check the paper Limits to Human Life Span Through Extreme Value Theory.

II. Ultimate sports record

Data about athletic records for running, throwing, and jumping were collected to answer the question, what is the ultimate record for each specific sport?!. They first estimate the EVI by the moment estimator which turns to be negative for most of the events, which then indicates having limited endpoints. The endpoints are then estimated based on the estimated EVI. Further details can be found in the paper Records in Athletics Through Extreme-Value Theory.

III. Dike’s height

This is considered as one of the most famous application for EVT. In the Netherlands, it is well known that almost 40% of the country is below the sea level. It is tremendously important to secure the country from any possible floods as what happened in 1953. EVT is then needed to answer the important question of how high the dike should be given very small floods probability in a year?!. By collecting data of the storms for 100 years, they answer such question by estimating the extreme quantile of the dike height given that the probability of a flood is 0.0001.

IV. Skyscrapers

Another interesting application was to model data about skyscrapers and check the limit of their height and number of floors. The data of the worldwide skyscrapers was obtained from the Council on Tall Buildings and Urban Habitat (CTBUH). A log linear model is been fitted for the quantity distribution of the skyscrapers. EVT analysis was conducted to predict the extreme height and number of floors. Forecasting the Urban Skyline with Extreme Value Theory paper has the detailed analysis and results.

V. Risk management

Here I won’t name one specific application because there are several applications related to risk management for insurance and banking fields use EVT. A key tools are the value at risk (VAR) and expected shortfall, which are both used to evaluate the solvency cover based on extreme scenarios. There are more other tools and implementations of the EVT for these fields, you can check EXTREME VALUE THEORY AS A RISK MANAGEMENT TOOL for further discussion and applications.

Other applications can be found in fields such as networks, classification,.., etc.

R packages and implementation on YouTube data

Now it’s time for “You need to know how do it yourself!”, for that purpose I’ll give source for some important function in R related to EVT then I move to implementation on real data. R as a prosperous statistical package is such great place for some ready to go packages related to extremes, you can check the updated list of Extreme Value Analysis by Christophe Dutang and Kevin Jaunatre for several package. This source contains the mains packages related to univariate and multivariate extreme analysis which contains different estimators for the EVI, extreme quantile, models, and important plots.

For extra exposition to the application side and working with data, I’ll briefly analyze real data regarding most trending videos on YouTube to have some insights about the tail of its distribution.

I had data about trending videos on YouTube in different countries from Kaggle. This data contains information about the number of views, likes, dislikes, and comments for each video. I mainly care to see how the views distribution behave specially in the tail, and estimate its EVI to see how heavy the tail of the views distribution in the selected countries.

Empirical distribution of the most trending videos on YouTube in different countries

The plots of the empirical distribution shows the general shape of the data and some extreme observations in each country but it doesn’t give a clear answer about the extreme observation or the heaviness of the tail. For clearer insight I estimate the kurtosis of each country.

All kurtosis are very high which may reflect heavy tailed data, but since it mainly focuses on the outliers then the highest kurtosis may not reflect the heaviest tail. Now to really check the heaviness of the tail, I’ll estimate the EVI for each country to have clearer idea about the most heavy tailed distribution among the selected countries.

EVI estimates for the selected countries

I used the POT method to estimate the EVI with the maximum likelihood estimator. The second row refers to the number of observations above the threshold, which considered to be 10% of the sample size. The estimates indicate that all the selected countries has heavy tail distribution with infinite endpoint. From the EVI estimates, we can see that the heaviest tail with the highest EVI is in France while Japan has lower value although it has higher kurtosis which again support the previous statement that kurtosis is not enough to draw conclusion regarding the tail of the data. Based on the obtained results I’m interested to calculate one of the very famous extreme measures known as Expected Shortfall. It provides the expected number of views given that the views exceeded a very high quantile, in other more simplified word, it gives the expectation of the views for those videos which exceeds a very high number of views say greater than the 99th quantile of the views.

Expected shortfall estimates for the selected countries

As shown in the previous table the maximum expected views is estimated by 142 million in UK, while the least views is in Japan. If you clicked 👉 Here , you will reach the data and code used to produce the previous results.