
Extreme Value Analysis (EVA) is a statistical methodology that is used primarily to estimate the probability of events that are rarest compared to any previously observed. The fields of application are generally engineering, meteorology, hydrology, finance, and oceanography. As is, it is an approach that exists from different years and it’s used to deal with the extreme deviations from distributions of reference.
What we can try to do here it’s a step further. We use EVA during the evaluation of our deep-learning model in an application of Anomaly Detection. We don’t try to reinvent anything, we simply use the Extreme Value Theory to provide an additional explanation of the results of our supervised approach. Our methodology is not algorithm related and can be easily generalized or modified to suit every modeling pipeline.
In this post, we develop a time series forecasting application based on a deep-learning structure. After correctly validating it, we pass to inspect the reliability of predictions with the EVA. We inspect the residuals generated on a selected control period to individuate how ‘extreme’ they are and how ‘frequent’ in the future they can appear. Generate statistics and confidences on these aspects permits us to point out how anomalies, i.e. situations not modeled by our deep-learning framework, can reveal in the future.
THE DATA
We use a dataset that comes from the Vancouver Open Data Catalogue. It is easily accessible on Kaggle and reports more than 500M records of crimes registered in the Canadian city from 2003 to 2017. We focus only on the aggregated series of daily crimes. This series appears to be very noisy as shown in the figure below.

However, some form of seasonalities can be detected in weekly and monthly levels.

THE MODEL
We use this information to build a forecasting model that predicts future crimes in the upcoming days given a past sequence of observations. For this kind of task, we develop a Seq2Seq LSTM autoencoder. Its structure suits well to model the data at our disposal because we can combine the raw time signals with the creation of temporal embeddings. The encoder is feed with sequences of numerical inputs (sequences of target history) plus the numerical embeddings of historical weekdays and months. The decoder receives the decoder output plus the numerical embeddings of the future weekdays and months to produce 7 days ahead forecast.
The training process is computed by optimizing the parameters on the validation set with a grid search approach through keras-hypetune, a simple framework for Keras Hyperparameters Tuning based only on NumPy.
The predictions generated on our validation set are depicted below together with the corresponding performances computed for each future time lag. Our baseline of reference is constituted by a dummy repetition of the last valid observation.


EXTREME VALUE ANALYSIS
At this point, an optimized version of our model is trained and ready to use. What we are trying to do now is to explain the performances applying some techniques typical of the Eva. For our experiment, the term ‘validation set’ is used to define, in a more general way, a ‘control group’, i.e. a time interval used to operate tuning and the application of the EVA.
The first ingredient we are interested in is the residuals generated by our model on the validation set. We consider the absolute values of residuals as extreme values. This choice is reasonable because they represent the unknown situations where our forecasting model registers a lack of performance. In other words, the situations where our model tends to be wrong are not studied yet and so labeled as anomalous events. The degree of the anomaly is measured by the distance from the reality and the prediction. A standard approach of the EVA consists of identifying as anomalous/extreme all the observations upper than a fixed threshold (Peak over Threshold) or construct a sequence of the maxima obtained segregated the original series into blocks (Block Maxima Method). The choice of the method is domain related and can produce different results. For sure deriving the extreme values from the model residuals can give us the confidence to operate with stationary series, which is a prerequisite of the EVA.

The second step consists of modeling extreme values. We want to assign a distribution of reference where these values come from. According to the choice made before on how to select the anomalies/extremes, we have different candidates distribution from which choose. For our work, we select the Block Maxima Method with blocks of the length of 30 days. With this approach, we are confident in thinking that the extreme events are drawn from a Generalized Extreme Value (GEV) or Gumble distribution. The choice can be mathematically computed performing the maximum likelihood estimation, from which we select the best distribution with the best parameters.

With these pieces, we are ready to move the last step. We need only to select some time steps, technically called return periods, on which to calculate the relative return values. For each return period, we can expect to see at least one instance to exceed the estimated return values. Given t as a return period, an expected return value (E) means that for the next t*30 days we can expect to see some values exceeding E with a probability equal to 1/t. In our specific case of study, the returned values are the residuals that we may expect to exceed in the upcoming future, i.e. unexpected events that we can classify as anomalies.
In the plot below, we can show the modeled return values for the future together with the observed maxima observation of our validation set. The accompanying table summarizes more formally our findings. For example, the first row says that in the upcoming 30 days we can expect (with a relatively high probability) to have an observation that exceeds 21 absolute points the predictions of our model. In the same way, the last row says that we can expect (with a relatively low probability) to have an observation that exceeds 53 absolute points the predictions of our model in the following 3000 days. The confidence intervals are calculated using bootstrapped statistics.


The explained procedure was applied to the residuals with the one-step-ahead forecasts but the same can be computed with other forecasting horizons.


SUMMARY
In this post, we presented a time series forecasting task. In the first place, we built a Seq2Seq model for multi-step ahead forecasts. Secondly, we tried to give further explanations of our model incorporating some techniques coming from the Extreme Value Theory. We took advantage of this combination to explain anomalies that can appear in the normal data stream that our model can not recognize.
Keep in touch: Linkedin