Time Series for Climate Change: Reducing Food Waste with Clustering

Using time series clustering for better demand forecasts

Reducing Food Waste

Improving the supply chain is another key step for reducing our ecological footprint. In developed countries, there’s often a large surplus of consumer goods such as food. This surplus requires significant energy and resources, which often goes to waste.

Reducing overproduction is an important milestone for decreasing greenhouse gas emissions. We can tackle this problem by better understanding how much we need.

Let’s take food as an example. Each year, we lose about 1.3 billion metric tons of food [1]. Of course, this is not all leftovers or related to the supply chain. Some of it is lost during production or transportation, for example, due to poor refrigeration conditions. Still, better demand forecasting models can make a significant impact on reducing overproduction.

Clustering of Food Demand Time Series

We can use clustering analysis to improve demand forecasts.

Clustering involves grouping observations based on their similarity. In this case, each observation is a time series representing some product’s sales. In general, you can use time series clustering to:

  • identify time series with similar patterns, such as trend or seasonality;
  • segment time series into different groups. This is especially useful if the number of time series is large.

In the case of demand time series, clustering can be useful to identify products with similar sales patterns. Forecasting models can then be tailored to the characteristics of each cluster. Ultimately, this leads to better forecasts.

Clustering demand time series is also valuable to businesses. Identifying similar products is useful to create better marketing or promotion strategies.


In the rest of this article, we’ll do a clustering analysis of food demand time series. You’ll learn how to:

  • summarise a set of time series using feature extraction;
  • use K-Means and a hierarchical method for time series clustering.

The full code is available on Github:

Data set

We’ll use a weekly food sales time series collected by the US Department of Agriculture. This data set contains information about food sales by product category and subcategory. The time series is split by state, but we’ll use national total sales in each period.

Below is a sample of the data set:

Here’s what the whole data looks like:

Feature-based Time Series Clustering

We’ll use a feature-based approach to time series clustering. This process involves two main steps:

  1. Summarise each time series into a set of features, such as the average value;
  2. Apply a conventional clustering algorithm to the feature set, such as K-means.

Let’s do each step in turn.

Feature extraction using tsfel

We start by extracting a set of statistics to summarise each time series. The goal is to convert each series into a small set of features.

There are several tools for time series feature extraction. We’ll use tsfel, which provides a competitive performance relative to other approaches [3].

Here’s how you can use tsfel:

import pandas as pd
import tsfel

# get configuration
cfg = tsfel.get_features_by_domain()

# extract features for each food subcategory
features = {col: tsfel.time_series_features_extractor(cfg, data[col])
            for col in data}

features_df = pd.concat(features, axis=0)

This process results in a large number of features. Some of these may be redundant, so we carry a feature selection process.

Below, we apply three operations to the feature set:

  • normalization: convert variables into a 0–1 value range;
  • selection by variance: remove any variable with 0 variance;
  • selection by correlation: remove any variable with a high correlation with another existing one.
from sklearn.preprocessing import MinMaxScaler
from sklearn.feature_selection import VarianceThreshold
from src.correlation_filter import correlation_filter

# normalizing the features
features_norm_df = pd.DataFrame(MinMaxScaler().fit_transform(features_df),

# removing features with 0 variance
min_var = VarianceThreshold(threshold=0)
features_norm_df = pd.DataFrame(min_var.transform(features_norm_df),

# removing correlated features
features_norm_df = correlation_filter(features_norm_df, 0.9)
features_norm_df.index = data.columns

Clustering with K-Means

After preprocessing a data set, we’re ready to cluster time series. We summarise each series into a small set of unordered features. So, we can use any conventional algorithm for clustering. A popular choice is K-means.

With K-means, we need to pick the number of clusters we want. Unless we have some domain knowledge, there’s no obvious apriori value for this parameter. But, we can carry out a data-driven approach to select the number of clusters. We test different values and pick the best one.

Below, we test K-means with up to 24 clusters. Then, we pick the number of clusters that maximizes the silhouette score. This metric quantifies the cohesion of the clusters obtained.

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

kmeans_parameters = {
    'init': 'k-means++',
    'n_init': 100,
    'max_iter': 50,

n_clusters = range(2, 25)
silhouette_coef = []
for k in n_clusters:
    kmeans = KMeans(n_clusters=k, **kmeans_parameters)

    score = silhouette_score(features_norm_df, kmeans.labels_)


The silhouette score is maximized for 5 clusters as shown in the figure below.

We can draw a parallel coordinates plot to understand the profile of each cluster. Here’s an example with a sample of three features:

We can also use the information about clusters to improve demand forecasting models. For example, by building a model for each cluster. The paper in reference [5] is a good example of this approach.

Hierarchical clustering

Hierarchical clustering is an alternative to K-means. It combines pairs of clusters iteratively, leading to a tree-like structure. The library scipy provides an implementation for this method.

import scipy.cluster.hierarchy as shc

# hierarchical clustering using the ward method
clustering = shc.linkage(features_norm_df, method='ward')

# plotting the dendrogram
dend = shc.dendrogram(clustering,

The results of a hierarchical clustering model are best visualized with a dendrogram plot:

We can use the dendrogram to understand the clusters’ profiles. For example, we can see that most canned items are grouped (orange color). Oranges also cluster with pancake/cake mixes. These two often go together in people’s breakfast.

Key Takeaways

  • Developed countries have a large surplus of food resources. This surplus often goes to waste;
  • Reducing food waste can have a strong impact on climate change by reducing greenhouse gas emissions;
  • We can achieve this by improving food demand forecasting models;
  • Clustering demand time series can improve forecasting models involving many time series. One way to do that is by training a model for each cluster.
  • Clustering can also be valuable to understand the profile of different groups within the dataset.

Thank you for reading, and see you in the next story!


[1] Jenny Gustavsson, Christel Cederberg, Ulf Sonesson, Robert Van Otterdijk, and Alexandre Meybeck. 2011. Global food losses and food waste. Food and Agriculture Organization of the United Nations, Rome.

[2] Weekly Retail Food Sales by Economic Research Service (License: Public Domain)

[3] Henderson, Trent, and Ben D. Fulcher. "Feature-Based Time-Series Analysis in R using the theft Package." arXiv preprint arXiv:2208.06146 (2022).

[4] Rolnick, David, et al. "Tackling climate change with Machine Learning." ACM Computing Surveys (CSUR) 55.2 (2022): 1–96.

[5] Kasun Bandara, Christoph Bergmeir, and Slawek Smyl. Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach. Expert systems with applications, 140:112896, 2020.

