The world’s leading publication for data science, AI, and ML professionals.

Interpretable Nowcasting using DeepXF with Minimal Coding

Can very short-term strategic business goals be backed with precision future insights to take quick decisions?

In this blog, we will go through a very important topic useful from our real-world context. Through this post, we will revisit the key short-term strategic problem applicable to almost every business scenario. We will see how "Deep-XF" a python library can be used intuitively for nowcasting tasks at ease, along with a demo use-case. Further, we will also see how the nowcasting model results can be interpreted to get better insights. For quick introduction about the package go through this blog [here](https://ajay-arunachalam08.medium.com/building-explainable-forecasting-models-with-state-of-the-art-deep-neural-networks-using-a-ad3fa5844fef). Also, take a look at the related topic on Forecasting with hands-on demo tutorial from here.

Overview: Forecasting vs. Nowcasting

For intuitive detailed explanation of time-series check [here](https://ajay-arunachalam08.medium.com/multi-dimensional-time-series-data-analysis-unsupervised-feature-selection-with-msda-package-430900a3829a); And, for unsupervised feature selection for time-series data check here.

Let’s simply understand the core difference of forecasting and Nowcasting in simple layman terms.

Forecasting is the science of determining the direction of future trends using scientific approaches that uses historical and present information to infer future in near real-time for short/long intervals. So, simply forecast is a prediction or an educated good guess towards the future.

Nowcasting on the other hand is the science of determining a trend or a trend reversal objectively in real-time for very short intervals. Nowcasting is factual-based, that focuses on the known and knowable, and therefore it avoids forecasting. Simply, nowcasting is the basis of a robust decision-making process.

Nowcasting based on "Expectation-Maximization" algorithm

In general, many machine learning problems can be solved in an iterative fashion. The general principle of expectation-maximization (EM) algorithm also lies in iterations, and optimization of the likelihood of seeing observed data while estimating the parameters of a statistical model with unobserved variables. You simply start with initial random guesses, followed by using those guesses to estimate expectation formularization, and then maximizing that iteratively, until the solution converges. The real-time applications where EM algorithms are widely used include filling missing data in a sample space, image reconstruction, predicting markov model parameters, measuring gaussian density of a function, and many more.

Interpretable ML & Explainable AI (XAI)

The core idea behind building interpretable models is that no more the machine learning models, and their inferences remains a complete "black box solution" to us. In layman terms, the results of the model should be understood, interpreted, and trusted by us represented in simple human-understandable or readable form. This is where the science of Interpretable ML/explainable AI comes into picture that helps you to understand and interpret the results generated by machine learning models.

Use Case – Weather Nowcasting Problem

Let’s go through a hands-on demo with a multivariate dataset for classic meteorology application. Here, we will use the Canadian climate dataset from here. This dataset is compiled from several public sources. The dataset consists of daily temperatures and precipitation from 13 Canadian centres. Precipitation is either rain or snow (likely snow in winter months). In 1940, there was daily data for seven out of the 13 centres, but by 1960 there was daily data from all 13 centres, with the occasional missing value. We have around 80 years of records (daily frequency of data), and we want to predict the future for very short range period. The following step-by-step illustration below will guide you through the demo usecase.

Once the library is installed, we import the library, followed by importing the data, checking the shape and attributes datatypes, etc. Next, we use missing () function from the library to find the missing values for our dataset, followed by imputing/replacing the NAN values, if any in the dataset. For imputing the missing values, we use the impute () function that provides several options of imputing the values such filling with zero, mean, median, backfill, etc.

Next, once the data is preprocessed, we set custom nowcast model parameters with set_model_config () function selecting the ‘expectation-maximization‘ algorithm, the input scaler (eg: MinMaxScaler, StandardScaler, MaxAbsScaler, RobustScaler), and the nowcast period window. Next, the set_variables () and get_variables () functions are used for setting and parsing the timestamp index, specifying the outcome variable of interest, sorting the data, removing duplicates, etc.

Further, we then visualize the trend of the outcome column wrt time with the interactive graph plots using the intuitive plot_dataset () function. Then, we train the nowcast model using nowcast () function, followed by getting the nowcast model’s inferred result’s interpretation for the future data point using the explainable_nowcast () function. One can also visualize the historical and nowcasted values with interactive plots. Also, the explainability module displays the significant contributing attributes in desc order of importance depicted with a graph plot.

For library installation follow these steps [here](https://github.com/ajayarunachalam/Deep_XF#requirements) & for manual prerequisites installation check here.

Step 1: Import library

Step 2: Import data; check dataset shape & attributes information

(29221, 27)

Image by author: get attributes information
Image by author: get attributes information

Step 3: Check missing values for whole dataset

Image by author: Printing missing values information in the dataset
Image by author: Printing missing values information in the dataset

Step 4: Impute missing values

Step 5: Setting custom user-inputs for nowcasting model

Step 6: Setting and Parsing Timestamp, Outcome variable, etc

Image by author: peek into the dataset displaying a single row
Image by author: peek into the dataset displaying a single row

Step 7: Data Visualization with interactive plots

Image by author: Visualizaing interactive plotly plot with timestamp and forecast column
Image by author: Visualizaing interactive plotly plot with timestamp and forecast column

Step 8: Feature Engineering

Step 9: Train the nowcast model, and plot interactive Nowcast future predictions with endpoint visualization

Image by author: Training nowcast model until convergence
Image by author: Training nowcast model until convergence
Image by author: Visualizing interactive plotly plot with historical and nowcast values
Image by author: Visualizing interactive plotly plot with historical and nowcast values

Step 10: Get Nowcast model’s prediction interpretation

Image by author: Explainability module plot displaying the significant attributes in desc order of importance
Image by author: Explainability module plot displaying the significant attributes in desc order of importance

The above shapely plot is the result for a future nowcasted data point. It can be seen that the features that contributed significantly to the corresponding model inference towards the outcome column – ‘MEAN_TEMPERATURE_VANCOUVER’ are mainly due to the attributes such as ‘TOTAL_PERCIPITATION_VANCOUVER’, the mean temperatures from the weather stations of Saskatoon, Calgary, Moncton, and the derived date, cyclic features especially the ones that implies the importance with respect to day, week and month timeline.

Conclusion

In this post, we quickly walked through the overview of forecasting & nowcasting. We discussed about EM algorithm, and also glimsed through the need of building global intepretable models. We saw a classic nowcasting usecase problem in context to meterological domain. We used the deep-xf package to build the nowcasting predictor based on Dynamic Factor model. One can also automatically build explainable deep learning based forecasting models at ease with this ‘simple‘, ‘easy-to-use‘ and ‘low-code‘ solution. Further, one can also get the model results outputted to disk as flat csv files as well as interactive visualization plots at ease using single line of code.

The complete notebook accompanying this blog post can be found here.

Contact

You can reach me at [email protected]; Connect with me – Linkedin

Thanks for reading. Cheers, keep learning 🙂

Biography

I am an AWS Certified Machine Learning Specialist & Cloud Solutions Architect. I truly believes that Opacity in AI systems is the need of the hour, before we fully accept the power of AI. With this in mind, I have always strived to democratize AI, and be more inclined towards building Interpretable Models. My interest is in Applied Artificial Intelligence, Machine Learning, Deep Learning, Deep Reinforcement Learning, and Natural Language Processing, specifically learning good representations. From my experience working on real-world problems, I fully acknowledge that finding good representations is the key in designing the system that can solve interesting challenging real-world problems, that go beyond human-level intelligence, and ultimately explain complicated data for us that we don’t understand. In order to achieve this, I envision learning algorithms that can learn feature representations from both unlabelled and labelled data, be guided with and/or without human interaction, and that are on different levels of abstractions in order to bridge the gap between low-level data and high-level abstract concepts.

References

Nowcasting (meteorology) – Wikipedia

Time series – Wikipedia

Explainable artificial intelligence – Wikipedia

Shapley value – Wikipedia

What is explainable AI?

Heavy rainfall nowcasting (RAVAKE)

Home

GitHub – slundberg/shap: A game theoretic approach to explain the output of any machine learning…

PyTorch

TensorFlow

statsmodels.tsa.statespace.dynamic_factor_mq.DynamicFactorMQ – statsmodels

Expectation-Maximization (EM) data mining algorithm in plain English – Hacker Bits

Machine Learning -Expectation-Maximization Algorithm (EM)


Related Articles