Python Tree Implementation with BigTree

Integrating trees with Python lists, dictionaries, and pandas DataFrames

Kay Jan Wong
Towards Data Science

--

Photo by Jan Huber on Unsplash
Photo by Jan Huber on Unsplash

Python has built-in data structures for lists, arrays, and dictionaries, but not for tree-like data structures. In LeetCode, questions for Trees are limited to Binary Search Trees and its implementation does not have many functionalities.

bigtree Python package can construct and export trees to and from Python lists, dictionaries, and pandas DataFrames, integrating seamlessly with existing Python workflows.

Tree-like data structures can be used to show hierarchical relationships, such as family trees and organizational charts.

This article will introduce basic tree concepts, how to construct trees with the bigtree Python package, tree traversal, search, modification, and export methods. This article ends off with ways to use trees for To-Do List implementation, and extend tree implementation to Trie and Directed Acyclic Graph data structures.

If you’re interested to contribute to the bigtree Python package, do reach out to collaborate!

Table of Contents

Tree Basics and Terminologies

Trees are non-linear data structures that store data hierarchically and are made up of nodes connected by edges. For example in a family tree, a node would represent a person and an edge would represent the relationship between two nodes.

Fig 1: Example of a tree — Image by author
Fig 1: Example of a tree — Image by author

After knowing the components that make up a tree, there are a few terminologies that extend to these components,

  • Root: Node that does not have any parent and the entire tree originates from it. In Fig 1, the root is Node a
  • Leaf: Nodes that do not have any child. In Fig 1, leaf nodes are Nodes c, d, and e
  • Parent Node: Immediate predecessor of a node. In Fig 1, Node a is the parent node of Nodes b and c
  • Child Node: Immediate successor of a node. In Fig 1, Node b is the child node of Node a
  • Ancestors: All predecessors of a node. In Fig 1, Nodes a and b are ancestors of Node d
  • Descendants: All successors of a node. In Fig 1, Nodes d and e are descendants of Node b
  • Siblings: Nodes that have the same parent. In Fig 1, Node d and e are siblings
  • Left Sibling: Sibling to the left of the node. In Fig 1, Node d is the left sibling of Node e
  • Right Sibling: Sibling to the right of the node. In Fig 1, Node e is the right sibling of Node d
  • Depth: Length of the path from Node to root. In Fig 1, the depth of Node b is 2, and Node d is 3
  • Height/Max Depth: Maximum depth of root to a leaf node. In Fig 1, the height of the tree is 3

Bigtree Setup

Bigtree is easy to set up, simply run the following command on Terminal.

$ pip install bigtree

If you want to export trees to image, run the following command on Terminal instead.

$ pip install 'bigtree[image]'

Without further ado, let’s dive into implementing trees!

Constructing Trees

To construct trees, we must first define the nodes and link the nodes by specifying the parent and children of the node.

For example, to construct a family tree,

from bigtree import Node

root = Node("a", age=90)
b = Node("b", age=65, parent=root)
c = Node("c", age=60, parent=root)

root.children
# (Node(/a/b, age=65), Node(/a/c, age=60))

root.depth, b.depth
# (1, 2)

root.max_depth
# 2

root.show(attr_list=["age"])
"""
a [age=90]
├── b [age=65]
└── c [age=60]
"""

root.hshow()
"""
┌─ b
─ a ─┤
└─ c
"""

In the example above, we define Node b and c to be children of Node a with 3 lines of codes. We can also add attributes, such as the age attribute to Nodes. To view the tree structure, we can use the show or hshow method.

We can also query the root, leaves, parent, children, ancestors, descendants, siblings, left_sibling, right_sibling, depth, and max_depth of the nodes as covered in the previous section.

The above method to define every node and edge can be manual and tedious. There are alternative ways to construct trees with lists, dictionaries, and pandas DataFrame!

If there are no node attributes, the simplest way to construct a tree would be using Python lists and the list_to_tree method.

from bigtree import list_to_tree

path_list = ["a/b/d", "a/b/e", "a/c"]

root = list_to_tree(path_list)

root.show()
"""
a
├── b
│ ├── d
│ └── e
└── c
"""

root.hshow()
"""
┌─ d
┌─ b ─┤
─ a ─┤ └─ e
└─ c
"""

If there are node attributes, it is recommended to construct a tree with a dictionary or pandas DataFrame, using dict_to_tree and dataframe_to_tree methods respectively.

from bigtree import dict_to_tree

path_dict = {
"a": {"age": 90},
"a/b": {"age": 65},
"a/c": {"age": 60},
"a/b/d": {"age": 40},
"a/b/e": {"age": 35},
}
root = dict_to_tree(path_dict)

root.show(attr_list=["age"])
"""
a [age=90]
├── b [age=65]
│ ├── d [age=40]
│ └── e [age=35]
└── c [age=60]
"""
import pandas as pd
from bigtree import dataframe_to_tree

path_data = pd.DataFrame(
[
["a", 90],
["a/b", 65],
["a/c", 60],
["a/b/d", 40],
["a/b/e", 35],
],
columns=["PATH", "age"],
)
root = dataframe_to_tree(path_data)

root.show(attr_list=["age"])
"""
a [age=90]
├── b [age=65]
│ ├── d [age=40]
│ └── e [age=35]
└── c [age=60]
"""

For more Node attributes and methods and other data structures to tree methods, do refer to bigtree Documentation.

Tree Traversal Algorithms

There are two types of tree traversal, Depth-First Search (DFS) and Breadth-First Search (BFS),

  • Depth-First Search starts at the root and explores each branch to its leaf node before moving to the next branch
  • Breadth-First Search starts at the root and explores every child node, and recursively does so for every node

Pre-Order Traversal (DFS, NLR)

Pre-Order Traversal is a Depth-First Search (DFS) method that performs 3 steps recursively,

  1. Visit the current node (N)
  2. Recursively traversal the current node’s left subtree (L)
  3. Recursively traverse the current node’s right subtree (R)
Fig 6: Example of a tree — Image by author
Fig 2: Example of a tree — Image by author

For pre-order traversal, it will traverse the tree in Fig 2 in the order,

['a', 'b', 'd', 'e', 'g', 'h', 'c', 'f']

Post-Order Traversal (DFS, LRN)

Post-Order Traversal is a Depth-First Search (DFS) method that performs 3 steps recursively,

  1. Recursively traversal the current node’s left subtree (L)
  2. Recursively traverse the current node’s right subtree (R)
  3. Visit the current node (N)
Fig 6: Example of a tree — Image by author
Fig 3: Example of a tree — Image by author

For post-order traversal, it will traverse the tree in Fig 3 in the order,

['d', 'g', 'h', 'e', 'b', 'f', 'c', 'a']

Level-Order Traversal (BFS)

Level-Order Traversal is a Breadth-First Search method.

For level-order traversal, it will traverse the tree in Fig 3 in the order,

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

Level-Order Group Traversal (BFS)

Level-Order Group Traversal is similar to Level-Order Traversal, with the difference being that each level will be returned as a nested list; list[idx] denotes the items in depth idx + 1.

For level-order group traversal, it will traverse the tree in Fig 3 in the order,

[['a'], ['b', 'c'], ['d', 'e', 'f'], ['g', 'h']]

There are other traversal algorithms available on bigtree, namely:

  • In-Order Traversal (DFS, LNR): Applicable to binary trees and not generic trees
  • Zig Zag Traversal (BFS): Similar to level-order traversal, but in a zigzag manner across different levels
  • Zig Zag Group Traversal (BFS): Similar to level-order group traversal, but in a zigzag manner across different levels

Tree Search Methods

We can use tree search methods to get one or multiple nodes that fulfil certain criteria, with methods find for one node and findall for multiple nodes.

from bigtree import Node, find, findall

root = Node("a", age=90)
b = Node("b", age=65, parent=root)
c = Node("c", age=60, parent=root)
d = Node("d", age=40, parent=b)

root.show(attr_list=["age"])
"""
a [age=90]
├── b [age=65]
│ └── d [age=40]
└── c [age=60]
"""

find(root, lambda node: node.age == 65)
# Node(/a/b, age=65)

findall(root, lambda node: node.age >= 60)
# (Node(/a, age=90), Node(/a/b, age=65), Node(/a/c, age=60))

For generic search methods without defining a lambda function, there are built-in methods,

  • find_attr and find_attrs: find one/multiple nodes by attribute
  • find_name and find_names: find one/multiple nodes by name
  • find_path and find_paths: find one/multiple nodes by full or partial paths
  • find_full_path: find one node by their full path
  • find_relative_path: find nodes matching relative path, supports unix folder expression and wildcards
  • find_child and find_children: find one/multiple children nodes by custom condition, removes the need for searching the whole tree
  • find_child_by_name: find one child node by their name

Tree Modification Methods

bigtree supports cases where nodes must be shifted or copied from a location to a destination. For instance, we can shift and reorder nodes in a To-Do list implementation.

from bigtree import Node, shift_nodes

root = Node("List")
groceries = Node("Groceries", parent=root)
urgent = Node("Urgent", parent=root)
groceries_milk = Node("Milk", parent=groceries)

shift_nodes(
root,
from_paths=["List/Groceries/Milk"],
to_paths=["List/Urgent/Milk"],
)

root.show()
# List
# ├── Groceries
# └── Urgent
# └── Milk

There are also other tree modifications methods such as,

  • copy_nodes: Make a copy of the node from location to destination, node will exist in two locations
  • shift_and_replace_nodes: Shift node from location to replace node in destination
  • copy_nodes_from_tree_to_tree: Make a copy of the node from one tree to another tree, node will exist in two trees
  • copy_and_replace_nodes_from_tree_to_tree: Make a copy of the node from one tree and replace node in another tree

Exporting Trees

As mentioned at the start of the article, bigtree integrates seamlessly with Python dictionaries and pandas DataFrame. Trees can be exported to a dictionary, nested dictionary, pandas DataFrame, and more formats.

Printing Tree to Console

Given a tree, we can print the tree to the console using print_tree with the ability to specify the attributes to print and the style of the tree. More customizations are also available in the bigtree Documentation.

from bigtree import print_tree

print_tree(root, attr_list=["age"], style="ascii")
"""
a [age=90]
|-- b [age=65]
| |-- d [age=40]
| +-- e [age=35]
| |-- g [age=10]
| +-- h [age=6]
+-- c [age=60]
+-- f [age=38]
"""

For a generator method, you can use the yield_tree method instead.

Export Tree to Dictionary

Given a tree, we can export the tree to a dictionary using tree_to_dict with the ability to store all attributes with names as-is or map tree attributes to custom attribute names using a dictionary.

from bigtree import tree_to_dict

tree_to_dict(root, all_attrs=True)
# {
# '/a': {'name': 'a', 'age': 90},
# '/a/b': {'name': 'b', 'age': 65},
# '/a/b/d': {'name': 'd', 'age': 40},
# '/a/b/e': {'name': 'e', 'age': 35},
# '/a/b/e/g': {'name': 'g', 'age': 10},
# '/a/b/e/h': {'name': 'h', 'age': 6},
# '/a/c': {'name': 'c', 'age': 60},
# '/a/c/f': {'name': 'f', 'age': 38}
# }

tree_to_dict(root, attr_dict={"age": "AGE"})
# {
# '/a': {'name': 'a', 'AGE': 90},
# '/a/b': {'name': 'b', 'AGE': 65},
# '/a/b/d': {'name': 'd', 'AGE': 40},
# '/a/b/e': {'name': 'e', 'AGE': 35},
# '/a/b/e/g': {'name': 'g', 'AGE': 10},
# '/a/b/e/h': {'name': 'h', 'AGE': 6},
# '/a/c': {'name': 'c', 'AGE': 60},
# '/a/c/f': {'name': 'f', 'AGE': 38}
# }

The original tree can be reconstructed back using the dict_to_tree method!

Export Tree to DataFrame

Given a tree, we can export the tree to a DataFrame using tree_to_dataframe with the ability to store all attributes as columns with names as-is or map tree attributes to custom column names using a dictionary.

from bigtree import tree_to_dataframe

tree_to_dataframe(root, all_attrs=True)
# path name age
# 0 /a a 90
# 1 /a/b b 65
# 2 /a/b/d d 40
# 3 /a/b/e e 35
# 4 /a/b/e/g g 10
# 5 /a/b/e/h h 6
# 6 /a/c c 60
# 7 /a/c/f f 38

The original tree can be reconstructed back using the dataframe_to_tree method!

Export Tree to Image (and more)

Given a tree, we can export the tree to an image or other graphics or files using tree_to_dot. This uses pydot under the hood, which uses Dot language and can be interfaced with Graphviz.

from bigtree import tree_to_dot

graph = tree_to_dot(root)
graph.write_png("tree.png")
graph.write_dot("tree.dot")
graph.to_string()

In the code snippet above, graph is of pydot.Dot data type which has built-in implementation to write to dot, PNG, SVG file formats, and more. The output is similar to Fig 3.

There are other export methods available, namely:

  • tree_to_nested_dict: Export to nested dictionary, original tree can be reconstructed back using nested_dict_to_tree
  • tree_to_newick: Export to Newick notation, original tree can be reconstructed back using newick_to_tree
  • tree_to_mermaid: Export to mermaid Markdown text
  • tree_to_pillow: Export to Pillow image

Additional: Using To-Do List with bigtree

If at this point, you are still wondering what you can do with bigtree, bigtree comes with a built-in To-Do List workflow with the ability to import and export from a JSON file.

This To-Do List implementation has three levels — app name, list name, and item name. You can add lists to app or add items to list. For example,

from bigtree import AppToDo

app = AppToDo("To-Do App")
app.add_item(item_name="Homework 1", list_name="School")
app.add_item(item_name=["Milk", "Bread"], list_name="Groceries", description="Urgent")
app.add_item(item_name="Cook")

app.show()
# To Do App
# ├── School
# │ └── Homework 1
# ├── Groceries
# │ ├── Milk [description=Urgent]
# │ └── Bread [description=Urgent]
# └── General
# └── Cook

app.save("list.json")
app2 = AppToDo.load("list.json")

In the example above,

  • App name refers to To-Do App
  • List name refers to School, Groceries, and General
  • Item name refers to Homework 1, Milk, Bread, and Cook

Additional: Extending to Trie

Trie is a type of k-ary search tree used for storing and searching a specific key from a set, derived from the word reTRIEval. Trie can be used to sort a collection of strings alphabetically or search if a prefix is present for a string.

Fig 14: Example of Trie — Image by author
Fig 4: Example of Trie — Image by author

To extend bigtree with Trie, we can add leading symbol ^ and trailing symbol $ to each word, and use tree search methods to find a specific word or subword with find_path. A Trie can be constructed as such,

from bigtree import list_to_tree, find_path

bag_of_words = ["ant", "and", "angry", "buoy", "buoyant"]
list_of_paths = ["/".join(["^"] + list(x) + ["$"]) for x in bag_of_words]

list_of_paths
# [
# "^/a/n/t/$",
# "^/a/n/d/$",
# "^/a/n/g/r/y/$",
# "^/b/o/y/$",
# "^/b/o/y/a/n/t/$"
# ]

trie = list_to_tree(list_of_paths)
find_path(trie, "/".join(list("^boy$")))

The code snippet above results in the image Fig 4 if exported using the tree_to_dot method.

Additional: Directed Acyclic Graph (DAG)

Directed Acyclic Graph (DAG) is a graph data structure where each node can have more than one parent. A tree is considered a restricted form of a graph. This difference results in the following differences,

  • Root: There is no concept of root in DAG since a node can have multiple parents
  • Depth: There is no concept of depth in DAG since there is no root node
  • Height/Max depth: There is no concept of height in DAG since there is no root node

DAGs are best used to represent workflows as workflows have a certain order (directed), and do not repeat indefinitely; does not have loops (acyclic).

Similar to trees, DAGs in bigtree can be constructed manually, from Python lists, dictionaries, or pandas DataFrames with methods list_to_dag, dict_to_dag, and dataframe_to_dag respectively.

from bigtree import DAGNode, dag_to_dot

a = DAGNode("Ingest Data from Source A")
b = DAGNode("Ingest Data from Source B")
c = DAGNode("Clean data", parents=[a, b])
d = DAGNode("Feature Engineering", parents=[c])

graph = dag_to_dot(a, node_colour="gold")
graph.write_png("dag.png")

The above code snippet results in the following image,

Fig 15: Example of DAG — Image by author
Fig 5: Example of DAG — Image by author

Hope you have gained a better understanding of tree structures and how to implement them using the bigtree Python package. If there are any suggestions, feedback, bugs, or new use cases, feel free to reach out!

Related Links

bigtree Documentation: https://bigtree.readthedocs.io

bigtree Official GitHub: https://github.com/kayjan/bigtree

--

--

Data Scientist, Machine Learning Engineer, Software Developer, Programmer | Someone who loves coding, and believes coding should make our lives easier