Temporal Graph Learning in 2023

The story so far

Shenyang(Andy) Huang
Towards Data Science

--

Real world networks such as social, traffic and citation networks often evolve over time and the field of Temporal Graph Learning (TGL) aims to extract, learn and predict from these evolving networks. Recently, TGL has gained increasing attention from the ML community, with a surge in the number of papers and the first workshop in this area held last year at NeurIPS 2022!

Evolutions in a temporal graph. Image by authors.

This post was co-authored with Emanuele Rossi, Michael Galkin and Kellin Pelrine. Thanks to Farimah Poursafaei for the helpful feedback.

In this blog post, we present major progress in TGL until 2022 and discuss promising future directions. Note that we use the term “dynamic graph” and “temporal graph” interchangeably. If you want to learn or start a project in TGL, this article would be a good reference and starting point.

Please share with us in the comment section any other advances you are excited about.

Introduction to Temporal Graph Learning

In this section, we provide a brief overview of some well-known TGL methods in the literature. There exist two main broad classes of methods for learning on Continuous-Time Dynamic Graphs (CTDGs): Temporal Graph Networks and Walk Aggregating methods. For more details on the formulation of CTDGs, see this survey by Kazemi et al.

Temporal Graph Networks (TGNs) generalize Message Passing Neural Networks (MPNNs) to temporal graphs. They do so by introducing a node memory which represents the state of the node at a given time, acting as a compressed representation of the node’s past interactions. Every time two nodes are involved in an interaction, they send messages to each other which are then used to update their memories. When computing the embedding of a node, an additional graph aggregation is performed over the temporal neighbors of the node, using both the original node features and memory at that point in time. Below we show a diagram of the computation of TGN.

Computations of TGN on a batch of training edges.
Image source: Rossi et al.

TGNs are a general framework that generalizes previous models such as Joint Dynamic User-Item Embeddings (JODIE) and Temporal Graph Attention (TGAT) as specific cases. A more comprehensive introduction to TGNs can be found in the below blog post by one of the authors.

Walk Aggregating methods such as Causal Anonymous Walks (CAW) instead rely on (temporal) random walks. In particular, to predict the existence of a link (u, v) at time t, CAW first extracts multiple random walks starting from u and v such that the timestamps of edges in a walk can only be monotonically decreasing. The walks are first anonymized by replacing each node identifier with a count vector of how many times that node appears at each of the possible positions in the walks. Each walk is then encoded using an RNN, and the encodings are then aggregated by using self-attention or taking a simple average.

Expressiveness of Temporal Graph Networks

There is a large body of work studying the expressive power of GNNs operating on static graphs. Xu et al. 2019 first characterized the discriminative power of Graph Neural Networks (GNNs) by connecting them to the Weisfeiler-Lehman (WL) graph isomorphism test and showing that many GNNs are no more powerful than the 1-WL test. Subsequent more expressive models such as subgraph GNNs, graph transformers and higher order GNNs are designed to be more expressive than 1-WL test (below is link to Michael Bronstein’s excellent blog post on how to go beyond WL test).

Until this year, there has been little work on understanding the expressive power of TGL methods. The first effort to bridge this gap was by Ribeiro et al. where the key idea is to categorize existing TGL methods into time-and-graph and time-then-graph frameworks.

Converting a TG into time-then-graph representation.
image source: Ribeiro et al.

1️). In time-and-graph, GNNs are used to generate node embeddings on the snapshot graph at each time thus forming a sequence of node embeddings.

2️). In time-then-graph, each edge in the TG is converted to a time series which indicates at which time the edge exists, therefore collapsing the temporal edges into edge features in a static graph.

It was shown that a time-then-graph representation can be constructed from any given time-and-graph representation thus proving time-then-graph is at least as expressive as time-and-graph. With the static representation in time-then-graph, we can directly use the WL-test expressiveness framework from the static graph for TGL methods. In this way, time-then-graph is more expressive than time-and-graph as long as a 1-WL GNN is used as the backbone model.

Souza et al. also aims to establish the 1-WL expressiveness framework for TGL methods. Notably, they view a CTDG as a sequence of time-stamped multi-graphs where the multi-graph G(t) at a given time t is obtained by sequentially applying all events prior to t. A multi-graph here means there can be multiple edges between two nodes in the graph and the edge attribute is the timestamp information.

Now, the Temporal WL test can be defined by applying the WL test on the multigraphs constructed from CTDGs. Therefore, more expressive TGN methods must be injective on its temporal neighborhood (i.e. hashing two different multi-set nodes into different colors), called injective MP-TGNs. Souza et al. also analyzed walk based TGNs such as CAW and show that MP-TGNs and CAW are not more expressive than each other (as seen above). Their proposed PINT method combines benefits from both categories of methods thus being the most expressive. The example below shows two temporal graphs that MP-TGNs are unable to distinguish. The colors are node labels and the edge has timestamps starting from t₁.

Examples of temporal graphs for which MP-TGNs are unable to distinguish graph structures such as diameter, girth, and number of cycles.
Image source: Souza et al.

Rethinking Evaluation in Temporal Graphs

To a large extent, the evaluation procedure in TGL is relatively under-explored and heavily influenced by static graph learning. For example, evaluation on the link prediction task on dynamic graphs (or dynamic link prediction) often involves: 1). fixed train, test split, 2). random negative edge sampling and 3). small datasets from similar domains. Such evaluation protocols often lead to result tables where reported metrics are already around 95+% and it’s really hard to distinguish whether new models bring any benefit or just rehash existing methods again.

A typical temporal link prediction result table reporting Average Precision (AP). Are we really making any progress when even baselines yield 98%?
Image source: Souza et al.

You et al. discussed the limitations of current TGL methods in model design, evaluation settings and training strategies for Discrete Time Dynamic Graphs (DTDGs). They argue that the evolving nature of data and models are not accounted for. In standard evaluation, all time points are split chronologically into a training set, an evaluation set and then a test set. The split is fixed for a given dataset.

They pointed out that such a fixed split means only edges from the chosen test period would be evaluated thus long term behavior potentially spanning training, validation and test period would not be correctly evaluated. In addition, many TGL methods are stale at test time, meaning that the model representation is not updated with information during evaluation. Consider an example transaction graph, if information on the prior day is available, the user would likely want to update the model with such information to achieve the best possible performance. Therefore, a live-update evaluation is proposed where models are finetuned with newly observed data, utilizing historical information and predicting future links.

Grey / red bars indicate the amount of recurring / novel edges respectively in the Wikipedia / MOOC dataset. Many edges recur over time in temporal graphs.
Image source: Poursafaei et al.

Recent work by two of the authors examines which negative edges to sample for evaluation of CTDG methods and introduces more datasets from diverse domains. When evaluating dynamic link prediction, negative edges are often randomly sampled from any node pair. However, many edges in a temporal graph recur over time (as seen in the figure above). Considering the sparsity of real world graphs, the majority of node pairs are unlikely to form an edge. Therefore, random negative edges can be seen as easy negative edges.

Avg. Performance of TGL methods. Using more difficult negative edges significantly impacts model performance. The simple baseline EdgeBank also works surprisingly well.
Image source: Poursafaei et al.

Now, what can be considered as hard negative edges? First, we introduce the historical negative edges which are edges that appeared in the training set but are absent in the current test step. We also define inductive negative edges as test edges which occurred previously in the test set but are not present at the current step. Lastly, we propose a baseline EdgeBank relying solely on memorizing past edges (essentially a hashtable of seen edges). In the plot above, we see that by changing the negative edges for evaluation, the average performance of existing TGL methods reduces significantly in the historical and inductive setting when compared to the standard setting. EdgeBank is also a surprisingly strong baseline for the standard setting. For details, see the blog below from one of the authors.

Temporal Knowledge Graphs

In the realm of Knowledge Graphs (KGs), temporal setup is slightly different from the homogeneous world, i.e., timestamped graph snapshots are not that common. Instead, some (or all) triples have an accompanying (start time, end time) pair of attributes denoting the time frame when a given fact was true. Triples thus become quintuples, or, in Wikidata, time attributes become qualifiers of a more general statement (main triple + multiple key-value qualifiers), and statements form so-called hyper-relational KGs.

For example, `(President of the French republic, office holder, Nicolas Sarkozy, 2007, 2012)` is a quintuple describing the time period where Nicolas Sarkozy was the President of the French republic. Alternatively, there can be only one timestamp per triple (forming quadruples). The most common prediction task is scoring head/tail prediction given the time attributes, e.g. , `(President of the French Republic, office holder, ???, 2007, 2012)` — this can be considered as a particular case of the hyper-relational link prediction where qualifiers are only dateTime literals. A classic example of temporal KG completion model is TNTComplex (ICLR 2020).

Krause et al. has taken the first effort towards bridging the gap between temporal KGs and homogeneous graphs. In this work, the authors propose a framework to formalize various temporal aspects in KGs. Namely, they define temporal KGs as local extensions, i.e., graphs that have timestamps on the edges, and dynamic KGs as global extensions, i.e., graphs that change topology over time by adding or removing nodes and edges. Even more, there can exits combinations of those basic types, e.g., a temporal and dynamic KG would be called incremental. We hope this work would bring a bit more order and clarity to the hectic literature on temporal KGs and the community would stick to the nice taxonomy. Next step: finalize a proper evaluation protocol for those graph types.

Temporal and Dynamic KGs (and their combinations).
Image source: Krause et al.

Wang et al. addressed the task of few-shot link prediction over temporal + dynamic graphs where the edges have timestamps and new nodes might appear at later timesteps (Incremental graphs as to the above classification by Krause et al.). The few-shot scenario makes the task even more challenging — we only have access to a limited number of training and inference points (usually, <5) to reason about the queries link. Here, the authors propose MetaTKGR, a meta-learning based approach that builds representations of new nodes by aggregating features of existing nodes within a certain delta t temporal neighborhood. The scalar difference between timestamps is vectorized via the Fourier transform.

Components of MetaTKGR.
Image source: Wang et al.

Libraries and Datasets

The lack of large datasets and challenging tasks has been holding back research on Temporal Graph Learning in the past few years. Luckily, new datasets from diverse domains are emerging. For example, Poursafaei et al. introduced six new publicly available TG datasets from the transportation, politics, economics and proximity domains. However, the field is still lacking a consistent effort to standardize benchmarks and evaluation to a high quality, what OGB did for static graphs. We hope that in 2023, we can see more standardized TG benchmarks with a focus on real applications.

Regarding libraries, a well-known one is Pytorch Geometric Temporal, an extension of Pytorch Geometric for temporal graphs. However, Pytorch-Geometric Temporal seems to only feature discrete-time methods and datasets. A library which also includes continuous-time methods would be a great added value for the community. Recently, Zhou et al. introduced TGL, a unified framework for large-scale offline Temporal Graph Neural Network training. In particular, on a 4-GPU machine, TGL can train one epoch of more than one billion temporal edges within 1–10 hours.

We list links to various TGL libraries and datasets below.

Disease Modeling with Temporal Graphs

In the recent COVID-19 pandemic, epidemic modeling is instrumental for understanding the spread of the disease as well as designing corresponding intervention strategies. Human contact networks are in fact temporal graphs. By combining contact graphs with classical compartment based models such as SEIR and SIR, we can more accurately forecast COVID-19 infection curve and go beyond the homogenous mixing assumption (all individuals are equally likely to be in contact with each other).

Chang et al. derived a temporal mobility network from cell phone data and mapped the hourly movement of 98 million people from census block groups (CBGs) to specific points of interest (POIs) in the US. By combining the hourly contact network with SEIR models on the CBG level, they are able to accurately fit real infection trajectory. In particular, the model shows that some ‘superspreader’ POIs such as restaurants and fitness centers account for a large majority of the infections. Also, differences in mobility between racial and socioeconomic groups lead to different infection rates among these groups. This work showcased the real world potential of utilizing large scale temporal graphs for disease forecasting and informing policies on intervention strategies.

Besides human contact networks, dynamic transportation networks also play an important role in the spread of COVID-19. In recent work by one of the authors, we incorporated daily flight networks into the SEIR model to estimate imported COVID-19 cases. By incorporating flight networks, it is possible for early detection of outbreaks and forecast the impact of travel restrictions. See the blog post by one of the authors for more details.

Despite the empirical success of temporal-graph-based disease models, it is also important to answer questions such as “how does the contact network structure impact the spread of disease?” and “what is the best way to modify the contact pattern such that the spread of COVID-19 can be slowed or prevented?Holme et al. compared the difference in outbreak characteristics between using temporal, static and fully-connected networks on eight network datasets and examined various network structures affecting the spread of the disease. They showed that converting temporal networks into static ones can lead to severe under- or over-estimation of both the outbreak size and extinction time of the disease.

What are the next steps for TGL on epidemic modeling?

1️). First, forecasting the entire contact or mobility network snapshot for the immediate future is a crucial challenge. With the predicted structure, we can apply network based SEIR models to estimate the infection curve.

2️). Second, defining and understanding the impact of interaction patterns on the contact network is crucial for policy making and interpretability. Analyzing the interplay between graph structures and the infection curve can help us identify the most effective intervention strategies.

Anomaly Detection in Temporal Graphs

Anomaly detection is a fundamental task in analyzing temporal graphs which identifies entities that deviate significantly from the rest. For example, fraud detection can be modeled as detecting abnormal edges in a transaction network and traffic accident identification can be seen as detecting anomalous events in a traffic network.

There is growing interest in utilizing the representation power of temporal graph networks for anomaly detection. Cai et al. designed an end-to-end structural temporal Graph Neural Network model for detecting anomalous edges, called StrGNN. An enclosing subgraph, a k-hop subgraph centered around an edge, is first extracted based on the edge of interest to reduce computational complexity. A Graph Convolutional Neural Network (GCN) is then used to generate structural embedding from the subgraph. Gated Recurrent Units (GRUs) are then used to capture temporal information. One of the challenges of anomaly detection is the lack of labeled examples. Therefore, Cai et al. proposed to generate “context-dependent” negative edges by replacing one of the nodes in a normal edge and training the model with these negative edges.

When comparing with unsupervised, non-GNN based anomaly detection methods such as SEDANSPOT and AnomRank, GNN based methods can easily incorporate any given attribute and has the potential to achieve stronger performance. However, there are two significant challenges for GNN based approaches.

1). First, how to scale to dynamic graphs with millions of edges and nodes? This is an open question for both the GNN module in extracting the graph features but also the temporal module such as GRUs and transformers in processing long term information.

2️). Second, how to produce accurate explanations for the detected anomalies? In real applications, detected anomalies are often verified and then potentially resulting in punitive measures for those detected entities. GNN explainability on dynamic graphs remains an open challenge.

LAD detects 2013 as a change point in the Canadian MP voting network due to abnormal amounts of edges between political parties.
Image source: Huang et al.

The task of change point detection aims to detect time points in a dynamic graph where the graph structure or distribution deviates significantly from what was observed before. This change can be attributed to external events (such as traffic disruption and COVID-19 related flight restrictions) or simply natural evolution of the dynamic graph. Recent work by one of the authors utilized the eigenvalues of the Laplacian matrix of each graph snapshot to embed the graph structure while applying sliding windows to compare the changes in graph structure in the long and short term. In the above, the proposed Laplacian Anomaly Detection (LAD) method detects a change in the Canadian Member of Parliament (MP) voting network due to increased edges between political parties. This coincides with Justin Trudeau being selected as the Liberal party leader in 2013.

Detecting Misinformation on Temporal Graphs

Misinformation spreads in different patterns and rates when compared to true information (Vosoughi et al.). There has been considerable research studying these network patterns in a static graph while dynamic graph based methods are underexplored (Song et al.). However, in the past year an increased amount of TGL methods were employed for misinformation detection and understanding. For instance, Zhang et al. developed a method based on Temporal Point Processes while Dynamic GCN (DynGCN) and DGNF are dynamic GNN based methods.

The illustration below shows the architecture of DynGCN. They construct graph snapshots with even spacing in time, feed each through GCN layers, and then combine the representations and learn the snapshots’ evolution patterns using attention. This is a relatively simpler approach to leverage temporal information compared to some methods discussed above like TGN or CAW, but nonetheless gives better performance than previous state-of-the-art for misinformation detection on the datasets that the authors examined.

DynGCN processes individual graph snapshots using GCN layers with shared weights, then combines the representations over time with an attention mechanism.
Image source: Choi et al.

Dynamic interaction patterns are shown to be quite informative for misinformation detection (Plepi et al.). With significant recent advances in TGL methods, we can expect novel state-of-the-art misinformation detection methods that incorporate dynamic graphs.

Joining Temporal Graph Learning Community

2022 has seen increased attention in TGL from the ML community. The first ever TGL workshop was held at NeurIPS 2022. The recordings of the talks and panel will be available soon on the NeurIPS virtual site. The accepted papers are available on the workshop website. Keep an eye out for announcements of new iterations of the TGL workshop there and join the workshop slack (up-to-date link in the website) to engage with the community. This year, we are also planning a TGL reading group, if you would like to share your work or be involved in co-organizing the reading group, please email shenyang.huang@mail.mcgill.ca

Image Source: Logo of the NeurIPS 2022 Temporal Graph Learning Workshop. Image by authors.

--

--