Spatial Data: Graph-Spectrum as Features

Published in

Towards Data Science

10 min readAug 28, 2020

A bird listening to Jimmy Hendrix’s Red House song, and a ship. Photo by Amos Hochman. Reposted with permission.

For my first Medium post I will show a nice, easy way to enrich your spatial data with features from Graph Theory. These features capture important information in your data that is hard to access otherwise.

The idea won Windward award for innovation in DataHack IL 2016 and it is a joint work with Mor Nitzan, Maria Dyshel, Noa Weiss and Guy Keren.

TL;DR Spatial data (e.g. GPS locations of vehicles over time), can be represented as graphs. The complex structure of those graphs can be compressed into mathematical objects, the “eigenvalues”. Those eigenvalues are easy to calculate and are very good in differentiating the spatial behavior (e.g. the behavior of the drivers). You can use those eigenvalues as new features!

What I like about the trick in-question is:

It can easily be applied in many domains (I will give a few examples bellow).
It captures a real behavior in your data that is very hard to process otherwise.
People are not using it because the math involved sounds scary. BUT:
From Python and R to Matlab — it is super easy to use in any standard ML/DS platforms you use! You probably don’t even need to import/load new packages/libraries.

Let’s dive-in.

The Data — What is “Spatial”?

We are focused on spatial data, which can be geospatial such as GPS pings from a fleet of cars, or Geolocation of app’s users when they use the app on their smart-phone. It can also be “spatial” in a more abstract space such as online users moving between websites, or how memes (or COVID-19) are spreading in a social network.

As a concrete example I will focus on the data where the idea first came to us: Windward data.

Windward is a start-up company focused on maritime intelligence. They have a lot of unique data regarding ships, but for simplicity we will focus on GPS locations of vessels in a time resolution of, say, every hour.

Imagine we have data for 100K vessels: GPS location every hour for the last 5 years. There are many features you can engineer even with such simplified data: ship’s velocity, direction, territorial waters, distance from maritime point-of-interests (such as straits, ports etc.), you can add domain knowledge and external data (rivers, straits, countries, ports, ocean depth, daylight, weather, etc.), but there is one important aspect of spatial data that might be missing, though. It is hidden well and hard to access directly.

But we’ll get to that in a bit. Let’s define our objective first.

The Objective

We are going to focus on a classical supervised classification problem, but the features I’m going to introduce are applicable also to other tasks: they are simply very good descriptors of real phenomena. They encapsulate principal information.

The Classification challenge: There are different types of maritime vessels, from tiny fishing boats to super tankers and bulk ships. Our objective is to classify 100K vessels into 10 given types.

The Challenge

Assume the data is clean, balanced and that there is enough training data. Every beginner Data Scientist will easily engineer cool features from the hourly GPS locations, but there is an aspects in the spatial behavior that is hard to code as features, even for experts (this difficulty may be intrinsic, as I will show later).

Let’s explain this with an intuitive example.

Consider a fishing boat in Hawaii and a fishing boat in Italy, and compare those to a large oil-tanker you will notice the difference. Fishing boats tend to have a “home-port” and move in-and-out of it. maybe daily, maybe monthly — but the movement “shape” stays similar. On the contrary, the oil-tanker departing a port will usually sail in virtually straight lines to its destination and back. On the map, the two fishing-boats in Hawaii and Italy will draw a star-like shape while the tanker will draw huge zigzags across the ocean.

This is clearly an over-simplification, but I hope that now it is clear what type of information I was referring to earlier: It is very easy to differentiate ships when you look at the shape they draw on the map. But how can we code this drawing as a feature? And here comes Graph Theory to the rescue.

The Math Part

Here I will make a short D-tour, briefly introducing some terms which you are probably familiar with: Graph, Adjacency Matrix, Eigenvalues. I will elaborate about a Graph’s Spectrum and its properties. That being said, we won’t get into precise, formal definitions (I added references bellow if you are interested). Instead, I’ll try to go over the general ideas and relevant intuitions.

I do assume some prior knowledge about eigenvalues of matrices. If you care less about the math, you can skip to the next section.

Graphs. Graphs are mathematical objects composed of a set of vertices V (aka nodes) and a set of edges E (aka links). Each edge is a link between two vertices. An edge can be directed, such that an edge eᵢⱼ is an edge from the i’th vetice to j’th vertice (but not vice versa), but for simplicity we will assume an undirected graph, where for each edge eᵢⱼ there also exists an edge eⱼᵢ.

Adjacency Matrix. One way to represent a graph is by a square matrix: The (i,j) entry in the matrix is 1 if there is an edge eᵢⱼ or 0 otherwise. This matrix is called the “adjacency matrix” of the graph and it is a square matrix (side-note: it is also the square of the Laplacian matrix).

Eigenvalues. Without getting into the definitions, I’ll just remind you that for a square n×n matrix, the eigenvalues are n numbers with algebraic and geometric meaning. Each eigenvalue comes with an eigenvector. Eigenvalues/vectors of a graph are the eigenvalues/vectors of its adjacency matrix.

The eigenvalues of a graph are the eigenvalues of adjacency-matrices of the graph (Alternative definitions exist in the literature, for instance using the normalized Lapalcian, but they are related).

The eigenvalues/vector of a graph reflect super-interesting mathematical properties (such as graph connectivity, graph cuts, expansion and more) and they are beautifully utilized in other domains such as physics (Diffusion and Percolation Theory), social networks (centrality measures, spread of memes & viruses) and machine learning (e.g. eigenvector centrality measures, such as in the famous PageRank algorithm).

Graph Spectrum and its properties. If we order the eigenvalues in descending order and concatenate them as a vector — this vector is known as “the spectrum of the graph”.

Amongst other open-questions in Graph Theory, there are some beautiful phenomena that are experimentally observed (e.g. by computer simulations) even-though they are yet to be proved. Some of the most intriguing questions are related graphs’ spectrum: When will two different graphs have the similar spectrum? When will two graphs with the exact same spectrum be different? What’s the probability that two (random) graphs have similar spectra?

Mathematicians don’t know a lot about the link between graphs’ similarity/distance and Graph’s Spectrum — there is more in it than meets the eye. but the little we know is enough to begin with.

What we do know is that:

Two graphs with the exact same topology, i.e., equal up to a permutation of the nodes (aka isomorphic) will have the exact same spectrum.
The other direction is not always true: There are graphs that are totally different but have the exact same spectrum (aka “cospectral”)

But we also have empirical evidence for the two following conjectures:

Highly similar graphs will tend to have similar spectrum.
Highly different graphs tend to have very different spectra.

See how I used ‘similar’, ‘different’ and ‘tend to’ without explaining exactly what I mean? That’s because the exact definitions and actual conjecture statements are somewhat technical. But this is the main takeaway intuition you should be taking.

Although the above conjectures are remain unproven — they are used for practical algorithms. The idea came from the fact that spectrum of a graph is used in heuristic solutions for graph isomorphism [3, 4], so why not in “almost-isomorphism”? In social networks, it is successfully used in pattern detection in a social graph [5].

We’re also going to rely on these conjectures for our purposes!

There is a lot more we don’t know about graphs’ spectra. How exactly does the spectrum change when we gradually move between two graphs? Is it monotonic in any sense? What’s the probability of random graphs to be cosepctral (See [6])? Beautiful open questions, and I didn’t even start with the properties of the eigenvectors!

But the little we know is enough to help us for our needs today.

The trick came from another prominent open-question in the intersection of Graph Theory and Complexity Theory: Graph Isomorphism. Given two graphs — it is unknown how hard it is to verify if they are isomorphic (i.e., same topology on permutation of the vertices). The problem of graph similarity is a closely-related to the problem of Graph Isomorphism and therefore the difficulty might be intrinsic. In Graph Isomorphism the trick is an approximation using the graph spectrum: different spectrum implies non-isomorphism. You can think about the spectrum as a low-collision hash function of the graph-space into a lower-dimension vector-space .

In other words: To measure similarity between graphs we embed the graphs into an n-dimensional vector-space. The euclidean distance in the embedding is defining a distance over the graph space. If you want to dive even deeper, note that the l2-norm of the spectrum is the root of the trace of the adjacency matrix which is also the Frobenius norm of the Laplacian matrix= sqrt(tr(A)).

Let’s take one last step and reduce the dimensionality: In case you remember the usage of the eigenvalues in PCA for dimensionality reduction, here we can do the same trick: we can take only the first k eigenvalues (In practice, the rest are usually very close to zero and are anyway neglectable).

To conclude the math:

We embed graphs into a low-dimension space by taking the eigenvalues of the adjacency matrix, aka spectral-space.
There is (unproven) phenomena indicating that similarity between graphs is tend to be preserved in the spectral-space too (using l2-norm).
As in PCA, we can consider only the k largest eigenvalues, without losing too much information.

Finally, we can go back to maritime data (or any spatial data of your interest).

The Trick: Embedding ship as its graph-spectrum

Let’s focus on one vessels sailing in the ocean. We define a graph for each such a vessel:

The Nodes Define nodes over the ocean (e.g. ports, straits, predefined polygon in the ocean, traffic separation schemes etc.). Note that we use the same nodes for all vessels, only the edges will be different.

The Edges We say there is an edge between two nodes if the given ship sailed directly between them. (This definition is highly negotiable, see improvement below) .

Now, finally each ship has a graph describing its spatial behavior and we can talk about “spectrum of a ship”.

Ships → Graphs → Matrices → Spectra → Distance. Voila!

We asked earlier: Since we can visually differentiate between ships by looking at the shape they draw on the map, can we code this drawing as a feature?

Now we can use Graph Theory to answer that:

The spectrum is an embedding of maritime vessels into a low-dimension vector-space capturing the spatial behavior of the ship.

A diagram showing the logic behind the trick. If the spatial behavior of two vessels is similar, then their eigenvalues tend to be similar. Although the other direction is not always true, in practice, eigenvalues are a good approximation for similarity. Image by author.

How to use it in real life?

In real life you have much more info on each vessel. So what you actually want to do is to concatenate the new graph-features to the other features.

Let’s look on the whole process:

For each vessel, create a graph.
For each graph calculate the spectrum. Take the top-k eigenvalues.
Use those k eigenvalues as k features (concatenated to other features if you have).
Enjoy!

Possible Improvements

For simplicity I described the basic method, but you might improve your results by using subtler, or more sophisticated versions:

The definition of the edges can be improved according to your task. For robustness maybe take into account only edges that the ship used more than a threshold. You can choose between edges as un/directed, un/weighted or maybe even be brave and try using multi-edges or hyper-edges.
Instead of using the eigenvalues of the Adjacency Matrix — better use the singular-values of the (Normalized) Laplacian Matrix or even the singular-values of the weighted. [1]
To improve efficiency, instead of calculating all eigenvalues and then taking the top-k, consider using an calculating only the top-k singular values, e.g., in python use scipy.sparse.linalg.svds or sklearn.decomposition.TruncatedSVD.
Real graphs are usually highly-sparse. So don’t use an SVD library that centralize the features as this is not changing the singular-values but only making your computation much heavier by losing the sparsity of the matrices.

Conclusion

The inter-connectivity in modern data leaves us no choice but to embrace graph perspective. By exploiting the power of Graph Theory we extract a few more bits of hidden information from our complex data.

The technique presented above — using the eigenvalues — was applied in several scenarios on real data in production. It improved classification models (as described above), but also showed good results in clustering tasks, risk models and others.

There were two “hard” sub-problems that had to be overcome: First, create a reasonable graph per vessel. Second, embed the graph by taking the eigenvalues.

Working on your data, I recommend you also split the task into two parts: First, how can you describe your data as a graph, or graphs? Second, since graphs are “heavy” objects, which graph-theory tools should you use to embed the data back into a lower dimension space?

Please, let me know if you used the trick and saw good or bad results in your model.

References

Good place to start

[1] Zhu, P., & Wilson, R. C. (2005, September). A study of graph spectra for comparing graphs. In BMVC.

[2] Spectral Graph Theory, Daniel A. Spielman — Lecture notes.

Graph matching/isomorphism problems

[3] Umeyama, S. (1988). An eigendecomposition approach to weighted graph matching problems. IEEE transactions on pattern analysis and machine intelligence, 10(5), 695–703.

[4] Xu, L., & King, I. (2001). A PCA approach for fast retrieval of structural patterns in attributed graphs. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 31(5), 812–817.

Practical usage example in social networks

[5] Leskovec, J., Singh, A., & Kleinberg, J. (2006, April). Patterns of influence in a recommendation network. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 380–389). Springer, Berlin, Heidelberg.

Cospectral Graphs

[6] Haemers, W. H., & Spence, E. (2004). Enumeration of cospectral graphs. European Journal of Combinatorics, 25(2), 199–211.