In computer science, a heap is a type of tree-shaped data structure that has the special property of being an almost-completely binary structure satisfying the heap property. This property corresponds to max heaps and min heaps. A max heap is a data structure where each child node is less than or equal to its parent node. A min heap is a similar type of data structure where each child node is greater than or equal to its parent node. When these constraints are placed on tree data structures, we end up with trees of relatively short length. This makes the process of searching for values within the tree much faster. Let’s consider the following max heap:

At the top of the tree we have the root node with a value of 90. The property of the max heap is that the root node has the maximum values. Further, the value of each node is less than or equal to its parent node. We see that 90 is the largest value in the tree. Further, at the second level we see the values 79 and 72, which are less than 90, and then 30 and 65 which are less than 79, and so on.
Conversely, take a look at the example of the min heap below:

If we look at the value at the root compared to the values at each node below the root, we see that 12 is the smallest value in the tree. At the level below, we have 20 and 29 which are both greater than 12, and so on.
The task of heapifying a tree is the process of reordering the elements of a tree such that it has the properties of a min or max heap. Specifically, max-heapify is the process of taking an array that is represented as a binary tree and recording the values at each node such that the child nodes are either less than or equal to the parent, satisfying a max heap:

Min-heapify is the process of recording the values at each node such that the child is greater than or equal to the parent node, satisfying a min heap:

Heapifying is useful because of the favorable properties of the heap data structure. Making a tree satisfy the heap properties can speed up many algorithmic tasks that are very important in software engineering. For example, heap data structures can be used for finding order statistics. An order statistic corresponds to the Kth smallest (or largest) value in a collection of items. This has applications in tasks such as quickly finding the median in an array.
Heap data structures can also be used for finding and keeping track of the minimum/maximum value in an array. This can be useful for scheduling tasks in a priority queue for customers, where customers with issues that take the shortest amount of time are prioritized. This can lead to a lower average waiting time for all customers. Heaps are also used in graph Algorithms such as Djiktra’s algorithm which is used to find the shortest path. This can be used for infrastructure planning tasks such as establishing a road network, electricity line or oil pipeline.
Understanding how to implement a heap data structure is an important skill set for every data scientist. Further, understanding the basic applications of a heap data structure can make it a powerful tool for many algorithmic tasks across a variety of software engineering applications.
Heapify a Binary tree
Heapifying is the process of converting a binary tree into a heap data structure. To see how this is done, let’s consider the following array:
array_in = [3, 5, 8, 10, 17, 11, 13, 19, 22, 24, 29]
This array has the corresponding complete binary tree:

We can define a heapify function that takes the array as input and converts it into a max or min-heap. Let’s consider converting this binary tree into a max heap. The first thing we need to do is find the last node that isn’t a leaf. A leaf is a node that doesn’t have any children. We see that 11, 13, 19, 22, 24, and 29 are all leaves since they do not point to any children:

Further, reading the nodes in each tree level from left to right we see that the last non-leaf node is 17. This is also the parent of the last node:

We can find the index of the last non-leaf node by taking the floor of half the number of nodes -1:
index of last non-leaf node = floor of (number of nodes)/2–1.
In our example there are 11 nodes so the index of the last non-leaf node is:
index of last non-leaf node = floor of (11)/2–1 = 5.5 -1 = floor of 4.5 = 4.0.
So the index of the last non-leaf node is 4, which has the value of 17 (remember we start with an index value of 0).
We want to build a max-heap from our binary tree. We can do this by heapifying the nodes up to the last non-leaf node [3,5,8,10,17] going in reverse order. We apply the heapify operation in reverse level order meaning starting from right to left at each level we compare each child node to its parent. For max-heapify, if the child node is greater than its parent, swap the values. For example, we start the heapify operation by swapping 17 with the value of its furthest right child, 29, since the child is greater than the parent:

We then move to the next node, going from left to right, and compare 24 with 29. This satisfies the max-heap property so we then move on to 22, which we compare to 10. Since 10 is at a parent node and is less than 22, it does not satisfy the heap property so we swap:

We then move to the next node. Since 19 is less than 22, it satisfies the max- heap so we move on to the next level. We start at 13 and compare it to its parent. It does not satisfy the heap property so we swap 8 and 13:

The next node values to swap are 5 and 29, then 5 and 24:

Then we swap 3 and 29, 3 and 24, and then 3 and 17:

Let’s write some c++ code that implements this heapify logic. Let’s create a .cpp file called heapify_code.cpp. In terminal type:
vi heapify_code.cpp
Let’s start by including which allows us to write to the standard input/output streams.
#include <iostream>
Let’s also define a function called heapify that returns void:
void heapify()
{
}
The function will take an integer array input. Let’s call the integer array array_in. It will also take an integer, subtree_root_index, for the index of subtree root . It will also take an integer, array_size, for the size of the array:
void heapify(int array_in[], int index, int array_size)
{
}
Next, we need to define a few variables within the scope of our function. Let’s initialize a variable called largest_value. Let’s also initialize variables for the left and right children. For the left child, the index is 2subtree_root_index +1 and the right child is 2subtree_root_index +2.
void heapify(int array_in[], int array_size, int subtree_root_index)
{
int largest_value = subtree_root_index;
int left = 2*subtree_root_index + 1;
int right = 2*subtree_root_index + 2;
}
Next let’s add logic that checks if the left child is larger than the root. If the left child is greater than the root, we redefine the largest_value as the left child. Within this logic we also need to make sure that the index of the left child is less than the size of the array:
void heapify(int array_in[], int array_size, int subtree_root_index)
{
...//code truncated for clarity
if (left < array_size && array_in[left] > array_in[largest_value])
{
largest_value = left;
}
}
Next we need to =add logic that checks if the right child is larger than the root. Like the previous check, if the right child is greater than the root, we redefine the largest_value as the right child. We also need to make sure that the index of the right child is less than array_size:
void heapify(int array_in[], int array_size, int subtree_root_index)
{
...//code truncated for clarity
if (left < array_size && array_in[left] > array_in[largest_value])
{
largest_value = left;
}
if (right < array_size && array_in[right] > array_in[largest_value]){
largest_value = right;
}
}
Finally, we need to check if the largest value is equal to the value at the root. If it is not we swap the values at the root with the largest value:
void heapify(int array_in[], int array_size, int subtree_root_index)
{
...//code truncated for clarity
if (largest_value != subtree_root_index )
{
swap(array_in[subtree_root_index], array_in[largest_value];
}
}
And finally we recursively call the heap function on the subtree under the condition largest_value is not equal subtree_root_index:
void heapify(int array_in[], int array_size, int subtree_root_index)
{
...//code truncated for clarity
if (largest_value != subtree_root_index )
{
swap(array_in[subtree_root_index], array_in[largest_value]
heapify(array_in, array_size, subtree_root_index);
}
}
The full function is as follows:
void heapify(int array_in[], int array_size, int subtree_root_index)
{
int largest_value = subtree_root_index;
int left = 2*subtree_root_index + 1;
int right = 2*subtree_root_index + 2;
if (left < array_size && array_in[left] > array_in[largest_value])
{
largest_value = left;
}
if (right < array_size && array_in[right] > array_in[largest_value]){
largest_value = right;
}
if (largest_value != subtree_root_index )
{
swap(array_in[subtree_root_index], array_in[largest_value]
heapify(array_in, array_size, largest_value);
}
}
Build Heap
Now that we’re done writing our heapify function, we can write another function that allows us to construct a heap given an input array. This function will take an array and its size as inputs, and within a for loop call the heapify function on the array starting from the last-node leaf node. We will call the function constructHeap:
void construct_heap(int array_in[], int array_size){
}
Let’s define a variable called last_non_leaf_node which is the array_size/2 -1:
void construct_heap(int array_in[], int array_size)
{
int last_non_leaf_node = (array_size/2) -1;
}
Next we can loop in reverse order starting from the last leaf node, iteratively reducing the index by 1, and call the heapify function with each value for the index:
void construct_heap(int array_in[], int array_size){
int last_non_leaf_node = (array_size/2) -1;
for (int subtree_root_index = last_non_leaf_node; subtree_root_index >=0; subtree_root_index-=1)
{
heapify(array_in, array_size, subtree_root_index);
}
}
Next let’s define a print function that will allow use to print out the values in our heap:
void print_heap(int array_in[], int array_size){
cout << "Printing values at each node in heap" << endl;
for (int index = 0; index < array_size; index+=1)
{
cout<< array_in[index] << endl;
}
}
Now we can define our main function which will serve as the driver code for executing our heapify, construct_heap and print_heap functions. Let’s define the array we work working with earlier, array_in = [3, 5, 8, 10, 17, 11, 13, 19, 22, 24, 29], which has the corresponding tree representation:

int main(){
int array_in[] = { 3, 5, 8, 10, 17, 11, 13, 19, 22, 24, 29};
int array_size = sizeof(array_in) / sizeof(array_in[0]);
construct_heap(array_in, array_size);
print_heap(array_in, array_size);
}
Let’s compile our script:
g++ heapify_code.cpp
And run our compiled script:
./a.out
And we should get the following output:

This has the array representation heap = [29, 24, 13, 22, 17, 11, 8, 19, 10, 5, 3] and the following transformation we performed is as follows:

The code used in this post is available on GitHub.
Conclusions
Heapifying a tree is important since it allows us to benefit from the favorable properties of the heap data structure. A heap is an essential data structure that has applications such as min/max searching, order statistics and finding the shortest paths. Heap data structures can significantly speed up these algorithmic tasks. It is generally useful any time you need to repeatedly select largest or smallest values from a collection of items which is the case with priority queues and order statistics.
_This post was originally published on the BuiltIn blog. The original piece can be found here._