Using Graph Convolutional Neural Networks on Structured Documents for Information Extraction

Code to convert Structured Documents to Graphs and a comprehensive survey of GCN implementations to classify text entities in them

Published in

Towards Data Science

7 min readFeb 2, 2020

While Deep Learning solutions such as CNNs effectively capture patterns in data in Euclidean space, there is an increasing number of applications where data is represented in the form of a Graph and lack a grid-like regularity. As Graphs can be irregular, they may have a variable size of un-ordered nodes and each node may have a different number of neighbors, resulting in mathematical operations such as convolutions difficult to apply to the Graph domain.

Some examples of such non-Euclidean data include:

Protein-Protein Interaction Data where interactions between molecules are modeled as graphs
Citation Networks where scientific papers are nodes and citations are uni- or bi-directional edges
Social Networks where people on the network are nodes and their relationships are edges

This article particularly discusses the use of Graph Convolutional Neural Networks (GCNs) on structured documents such as Invoices and Bills to automate the extraction of meaningful information by learning positional relationships between text entities. Multiple approaches are discussed along with methods to convert documents to graphs since documents do not inherently have a graph-like layout.

What is a Graph?

A Graph is represented as:

Where,

Now let,

denote a node and,

denote an edge pointing from v_i to v_j. Then, the Adjacency Matrix A is an n x n matrix with,

where n is the number of nodes in the Graph.

Further, a Graph may have node attributes such that,

where X is a node feature matrix with,

representing the feature vector of node v.

Here are some illustrations of undirected graphs with no self-loops along with their corresponding Adjacency Matrices:

Each of these nodes can further have node attributes or feature vectors denoted by the feature matrix:

Example where each of the four nodes has a 3-dimensional feature vector

In the real world example of a social network, node attributes could be demographic information on people (nodes) and their relationships would be the edges. Be sure to check out this article for an in-depth understanding of Graphs.

How to convert Structured Documents to Graphs?

Structured documents such as Invoices and Bills have a certain order of occurrence when it comes to entities such as headers and tables. For example in most invoices, tables lie between the header and the footer. Also, the total amount, which is an important entity of the invoice which we hope to extract, generally lies in the bottom right corner of the table. Such recurring structural information along with text attributes can help a Graph Neural Network learn neighborhood representations and perform node classification as a result.

But how do we get such documents in a graphical format?

The answer is OCR coupled with Geometric Algorithms.

OCR:

In this step, we use commercially available OCR Engines such as Amazon Textract or Google Vision API to generate what is called an Object Map of the document.

Each “object” in the map is a text entity — a word or a sentence. In the context of this article, we refer to the object map as a Pandas DataFrame having five columns — xmin, ymin, xmax, ymax, Object.

(xmin, ymin) and (xmax, ymax) give us the coordinates of the rectangular bounding box while the Object column gives the actual text inside that box.

Here is a GitHub Gist going over the generation of such an Object Map given the path of the document image and the endpoint URL of the Google Vision API instance.

OCR Script to generate an “Object Map”

Object Map to Graph:

Once we have generated an object map we make use of a visibility based Geometric Algorithm to connect objects to each other, thus forming a Graph with objects as the nodes and the connections as the edges.

Geometric Algorithm: Connecting objects based on visibility

Input: 
  Object MapOutput: 
  Graph Dictionary: {source_node: [dest_node_1, dest_node_2, …], ..}
  Text List: Text for each node

Step-1: Start with the object in the top-left corner. For each object in the map, iterate over all other objects.
Step-2: For each object, look to the right.
Step-3: If another object is present in direct visibility, make a connection.
Step-4: For each object, look directly underneath.
Step-5: If another object is present in direct visibility, make a connection.

The output is in the form of a Dictionary so that we can use networkx, a Python library on network analysis to construct a graph from it. Further, we return the Text List so as to use the text of each node as its feature vector.

Here is the Python implementation.

Next, we use networkx to create an Adjacency Matrix A for the document along with a feature matrix X generated by count vectorizing at character level for each text object.

Here is the code for the same.

As the end result, we expect the document to look like this:

Convolution on Document Graphs for Information Extraction

Once we have our document in the form of a Graph, the next step is feeding this data to a GCN. Here, we have the option of choosing from a number of GCN implementations most notable of which are described below:

GraphSAGE — Inductive Representation Learning on Large Graphs:
Paper: arXiv
Website: snap.stanford.edu
Code: GitHub
Illustration:

Inductive Representation Learning, W. L. Hamilton et al.

Semi-supervised Classification with Graph Convolutional Neural Networks:
Paper: arXiv
Website: tkipf.github.io
Code: GitHub
Illustration:

Multi-layer GCN with first order filters, T. N. Kipf et al.

Here are implementations of GCNs specifically suited to Information Extraction from Invoices:

Table understanding in Structured Documents:
Paper: arXiv
Description:
The implementation of GCN in this paper does not strictly follow the mathematical definition of Graph Convolution instead, for each node the feature vectors of connected nodes in the Graph are simply concatenated with the feature vector of the node under consideration. Thus, a new aggregated feature vector is formed for each node which is then simply fed through a fully connected layer.

Table Detection in Invoice Documents by Graph Neural Networks:
Paper: ICDAR
Slides: priba.github.io
Code: Although the code isn’t provided by the author, here is my starter code for the same.
Illustration:

Overview of the Model Architecture, P. Riba et al.

Experiments

All of these methods give encouraging results. For example, here is the output of the Semi-supervised classification approach on a sample Invoice:

Output of Semi-supervised Classification, T. N. Kipf

Since the paradigm of Semi-supervised learning states that both labeled and unlabeled nodes be present in the Graph at the time of learning representations, this output was observed on a single Invoice as a Graph.

There were total four classes — info, contact, table item and other (not displayed). A set of seed nodes for each class were labeled initially. The GCN was then able to learn representations for the unlabeled nodes from these initial seed nodes.

An end-to-end solution can be implemented by first identifying seed nodes by using standard NLP techniques and then feeding the Graph to the network. The most noteworthy feature here is the ability to generate node representations for previously unseen documents with very low latency.

Conclusion

Graph Convolutional Neural Networks prove to be increasingly useful in novel applications where data assumes a connectionist structure. Further, data having spatial meaning as in the case of Structured Documents, can be adapted to a graphical structure and then be used with GCNs.

Semi-supervised learning can be used on-the-fly on static Graphs to generate representations for nodes without the need for large training sets. Moving beyond vanilla CNNs for non-euclidean data opens up exciting opportunities for new areas in applied research.

Here is a comprehensive survey on Graph Neural Networks as of 2019 for further reading.

Dhaval Potdar is a Data Science enthusiast with experience in AI Research. He has graduated from the Data Analyst and Machine Learning Engineer Nanodegrees with Udacity and is currently working in an Analytics Startup based out of Mumbai. Taking up messy data and turning it into something of value is what I’m getting better at day by day. Connect with me on LinkedIn and check out my GitHub for more cool projects.