Graph visualisation basics with Python Part II: Directed graph with NetworkX

Creating directed acyclic graph with the NetworkX package

Himalaya Bir Shrestha
Towards Data Science

--

In the first part of this series, I shared how to create a flowchart using the SchemDraw package in Python. My quest for learning about graph visualisation techniques in Python led me to explore some packages such as NetworkX and graphviz.

A graph G = (V, E) is a set of vertices V and edges E where each edge (u, v) is a connection between vertices where u, v ∈ V (Reducible, 2020). In Python, graphs are visualised using the nodes and edges. While the nodes represent any features, the edges represent the interaction between features in the graph.

Graph visualisation can have important domain applications such as networking, spatial data science, software engineering, bioinformatics, energy informatics, machine learning, and visual interfaces for other technical domains. In this post, I am going to share an example of creating a directed acyclic graph using NetworkX, exploring the characteristics of the graph including the centrality concept, and a method to get all the paths from the root (start node) to the leaves (end nodes) of the graph. Let’s get started.

NetworkX

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. In NetworkX, nodes can be any hashable object¹ (except None) e.g. a number, a text string, an image, another Graph, a customised node object, etc.

Directed and Undirected graph

Edges represent the connection between nodes and can hold arbitrary data such as weights, direction, or relation between the nodes. If the edges are undirected, then the Graph object is known as an Undirected graph (Graph). And if the edges are directed, then the Graph object is known as a Directed graph (DiGraph). Self-loops are allowed in DiGraphs but multiple (parallel) edges are not.

Cyclic and acyclic graph

A directed acyclic graph is a special type of directed graph with no directed cycles, such that following the direction of the edges will never form a closed loop. On the other hand, if the edges of the graph form a closed loop at any node, then it is known as a directed cyclic graph.

Image by Fabrice Villard in Unsplash.

Organogram using NetworkX package

Examples of directed acyclic graphs include family tree, organisational hierarchy tree, folder tree, etc. A tree is a hierarchical and acyclic data structure with a set of connected nodes. Each node in the tree can be connected with many children but must be connected to exactly one parent, except for the root node, which has no parent.

An organogram is a diagram that shows the hierarchical structure of an organisation, and the relationships between employees at different levels and departments within it. While the organogram could be created manually using programs such as MS PowerPoint, and Paint, I wanted to explore the possibility of creating it using Python so that the shape, size, and elements could be adapted easily by coding.

Consider a company X with two teams A and B within it. Let’s assume there are eight employees in the company: a CEO, two team leads for each of teams A and B, two staff in team A, and three staff in team B. In this example, there is only one root node (CEO), and exactly one path between the root and any node. As each node has at most three child nodes, this is an example of a ternary tree. If each node would have at most two child nodes, that would have been a binary tree. The steps to construct the organogram using NetworkX in Python are described below step by step.

Plain Graph

I started by creating a DiGraph object. I added eight nodes to this object starting from 0 to 7 for each of the employees. Next, I added the edges from the CEO to each of the team leads, and from the team leads to the staff in the corresponding team. It is possible to draw a graph object in NetworkX in different layouts such as circular, random, shell, spectral, planar, spring, etc. as available here. However, I set the x and y positions for each node manually inside a dictionary to give it a hierarchical inverted tree-like structure. And I also assigned labels for each node in the form of a dictionary.

G = nx.DiGraph()nodes = np.arange(0, 8).tolist()G.add_nodes_from(nodes)G.add_edges_from([(0,1), (0,2),
(1,3), (1, 4),
(2, 5), (2, 6), (2,7)])
pos = {0:(10, 10),
1:(7.5, 7.5), 2:(12.5, 7.5),
3:(6, 6), 4:(9, 6),
5:(11, 6), 6:(14, 6), 7:(17, 6)}
labels = {0:”CEO”,
1:”Team A Lead”, 2: “Team B Lead”,
3: “Staff A”, 4: “Staff B”,
5: “Staff C”, 6: “Staff D”, 7: “Staff E”}
nx.draw_networkx(G, pos = pos, labels = labels, arrows = True,
node_shape = “s”, node_color = “white”)
plt.title(“Organogram of a company.”)
plt.savefig(“Output/plain organogram using networkx.jpeg”,
dpi = 300)
plt.show()

The code above resulted in a plain organogram as shown below:

Plain organogram created using the code above. Image by Author.

Graph with colored nodes and edges

I discovered that it is possible to assign unique color and size to each node in the form of lists for node_color and node_size respectively. However, the node_shape of all nodes has to be uniform, and there are limitations to available shapes. The parameters that can be assigned to draw nodes in the graph can be found here. In the code below, I provided white color for the CEO, skyblue color for Team A, and mistyrose color for Team B for the nodes. Similarly, I increased the size of nodes for team leads as compared to other nodes. I also assigned blue and red colors as edge_color for teams A and B respectively, and the gray color as edgecolors for the border of nodes.

Furthermode, I added labels for the edges from CEO to the team leads. This was done using nx.draw_networkx_edge_labels() and passing G, pos, and edge_labels in the form of a dictionary.

colors = ["white", "skyblue","mistyrose", "skyblue",
"skyblue","mistyrose", "mistyrose", "mistyrose"]
edge_colors = ["blue", "red", "blue","blue", "red","red","red"]sizes = [1000, 2000, 2000, 1200, 1200, 1200, 1200, 1200]nx.draw_networkx(G, pos = pos, labels = labels, arrows = True,
node_shape = "s", node_size = sizes,
node_color = colors,
edge_color = edge_colors, #color of the edges
edgecolors = "gray") #edges of the box of node
nx.draw_networkx_edge_labels(G, pos = pos,
edge_labels={(0, 1): 'A', (0, 2): 'B'},
font_color='black')
plt.title("Organogram of Company X")
plt.show()

Consequently, I got the following organogram:

Colorful organogram created using the code above. Image by Author.

Graph with a bounding box as node

Next, I wanted to annotate the labels within a bounding box for each node. For this, I did not specify any shape, size, or color to the nodes. Instead, I added a parameter for bbox in the form of dict. I specified the facecolor, boxstyle, edgecolor and, pad for the bbox. The arguments that can be passed for bbox are available here.

nx.draw_networkx(G, pos = pos, labels = labels, 
bbox = dict(facecolor = "skyblue",
boxstyle = "round", ec = "silver", pad = 0.3),
edge_color = "gray"
)
plt.title("Organogram of Company X")
plt.show()

As a result, I got the organogram as shown below. It is to be noted that the arrows at the end of the edges are hidden by the bounding box, and it is not possible to assign different colors, shapes, or sizes to the individual bbox.

Organogram with labels inside the bounding box. Image by Author.

Characteristic of the digraph

I wanted to explore the characteristics of the DiGraph object G. The list of all nodes in G is obtained using G.nodes, and the list of edges is obtained using G.edges. G.degree returns the number of edges that a node is connected to. In the case of directed graph, the degree is further split into InDegree and OutDegree. G.in_degree returns the number of edges pointing to each node. For G, it is 0 for the CEO and 1 for every other employee. Similarly, G.out_degree returns the number of edges pointing out from each node. For G, it is 2 for the CEO, 2 for Team A lead, 3 for Team B Lead, and 0 for each of the staff below. This is depicted in the code snippet below.

Exploring the characteristics of the DiGraph representing the organogram. Image by Author.

Centrality concept

In graph analytics, the centrality concept refers to identifying the important nodes in a graph, by measuring the centrality of nodes relative to other nodes including their neighbors (connected nodes) or edges in the graph (Bhasin, 2019). There are various centrality concepts, which define the importance of a node from a different perspective and provide further information to analyse the graph and its nodes.

Degree centrality of a node is the fraction of the total nodes it is connected to. A node with a high degree centrality is generally considered highly active. In G, node 3 i.e. Team B Lead has the highest degree centrality since it is connected to four other nodes.

Betweenness centrality is a measure of how many times a particular node lies on the shortest path between all pairs of nodes in a graph. In G, team B lead has the highest betweenness centrality followed by team A lead. This implies that the team leads act as bridges between the CEO and the staff. The CEO and the staff have zero betweenness centrality because they don’t lie between any two nodes.

Closeness centrality is a measure of the proximity of a node to other nodes. It is calculated as the average of the shortest path length from the node to every other node in the network (Golbeck, 2013). In the case of closeness centrality, the nodes with lower values have higher centrality. This implies that the CEO and team leads have more centrality as compared to the staff (Golbeck, 2013).

A node with a high degree centrality will likely have higher betweenness centrality and closeness centrality as is the case with team leads in this example. It is to be noted that the betweenness centrality and closeness centrality values would change if G was an undirected graph while the degree centrality would remain the same.

The centrality values discussed above were obtained for different nodes by using nx.degree_centrality(G), nx.betweenness_centrality(G), and nx.closeness_centrality(G) respectively.

Plot showing the degree centrality, betweenness centrality and closeness centrality of different nodes. Image by Author.

Getting the list of all paths from the root to the leaves of the tree

From the given DiGraph object G, I wanted to get the list of all simple paths without repeating any node starting from the CEO (root node) to individual staff (leaf nodes). This was possible using nx.algorithms.all_simple_paths() and passing the DiGraph object, source node (CEO), and the list of target nodes (all staff) to it. Then I created a pandas dataframe df out of the generator object. By looping through the index and column of df, I got the exact positions in the root, intermediate, and leaf nodes respectively as shown below. df.values.tolist() returned each of the paths in the form of a list.

Retrieving all possible paths from the start node to the end node in the DiGraph in the form of a dataframe and list. Image by Author.

Conclusion

In this post, I used an example of an organogram to describe a way to plot a directed acyclic graph using the NetworkX package. I discussed the possibility of customising the graph utilising attributes such as shape, size, and color of nodes, edges, and bounding box. I depicted the possibility of exploring the characteristics of the graph object and analysing its centrality using NetworkX. And finally, I showcased a way to get the list of all simple paths from the root to the leaves of the given graph object.

In my experience, NetworkX package works well for graph network analysis and manipulation. However, it has certain limitations in terms of graph visualisation. For example, there are limitations in the shapes that can be used to represent nodes. Nodes can be in the shape of a square, circle, triangle, etc. but an elliptical shape is not possible because of which the labels can come outside of the nodes. Moreover, the shape of the nodes cannot be set different for different nodes. While using the bounding box to annotate labels, the facecolor cannot be set different for different nodes. In the next post of this series, I am going to share how the techniques in the graphviz package can be leveraged not only to overcome these limitations but also to create more comprehensive graphs conveniently.

Thank you for reading!

Footnotes

[1] An object is hashable if it has a hash value that never changes during its lifetime, and can be compared to other objects. The hash value can be obtained by simply passing the object to hash(). And the hash value is an integer which is used to quickly compare dictionary keys while looking at a dictionary.

References

Bhasin, 2019. Graph Analytics — Introduction and Concepts of Centrality.

Golbeck, 2013. Chapter 3: Network Structure and Measures. Analysing the Social Web.

Reducible, 2020. Introduction to Graph Theory- A Computer Science Perspective.

--

--

I write about the intersection of data science with sustainability in simple words. Views reflected are of my own, and don’t reflect that of my employer.