Making Sense of Big Data

I had explained about the back-propagation algorithm in Deep Learning context in my earlier article. This is a continuation of that, I recommend you read that article to ensure that you get the maximum benefit from this one.
I’ll cover computational graphs in PyTorch and TensorFlow. This is the magic that allows these frameworks to calculate gradients for your neural networks. I’ll start with some introduction to types of computational graphs followed by framework specific details.
Computational Graphs Types[1]
All the deep learning frameworks rely on the creation of computation graphs for the calculation of gradient values required for the gradient descent optimization. Generally, you have to build the forward propagation graph and the framework takes care of the backward differentiation for you.
But before starting with computational graphs in PyTorch, I want to discuss static and dynamic computational graphs.
Static computational graphs:
These typically involve two phases as follows.
- Phase 1: Define an architecture (maybe with some primitive flow control like loops and conditionals)
- Phase 2: Run a bunch of data through it to train the model and/or make predictions
One of the advantages of static graphs is that it allows for powerful offline optimization/scheduling of graphs. This implies that these would be generally faster (the difference may not be significant in every use case and depends on our graph) than dynamic graphs in general. The disadvantage is that handling structured and even variable-sized data is ugly.
Dynamic computational graphs:
The graph is defined implicitly (e.g., using operator overloading) as the forward computation is executed.
Dynamic graphs have the advantage of being more flexible. The library is less invasive and allows for interleaved construction and evaluation of the graph. The forward computation is written in your favourite Programming language with all its features and algorithms. The downside is that there is little time for graph optimization and if the graph does not change, the effort can be wasted. Dynamic graphs are debug friendly. Finding problems in you code is much easier, because it allows line by line execution of the code and you will have access to all the variables. This is definitely a very important feature if you want to use Deep Learning for any real purpose in the industry.
PyTorch uses dynamic computational graphs. Tensorflow allows the creation of optimized static graphs and also has eager execution which allows for something similar to dynamic graphs. It is an imperative programming environment that evaluates operations immediately, without building graphs, operations return concrete values instead of constructing a computational graph to run later.
Now let’s look at computational graphs in PyTorch.
Computational Graphs in PyTorch [7]
At its core PyTorch provides two features:
- An n-dimensional Tensor, similar to NumPy but can run on GPUs.
- Automatic differentiation for building and training neural networks.
Deep learning architectures and their training involve a lot of matrix operations. A Tensor is nothing but an n-dimensional array. For people coming from a Python background, NumPy should ring a bell. It is an extremely powerful and optimized library for matrix operations. However, for deep learning purposes, the matrices are huge and require enormous computational power.
A PyTorch Tensor it nothing but an n-dimensional array. The framework provides a lot of functions for operating on these Tensors. But to accelerate the numerical computations for Tensors, PyTorch allows the utilization of GPUs, which can provide speedups of 50x or greater. PyTorch Tensors can also keep track of a computational graph and gradients.
In PyTorch, the autograd package provides automatic differentiation to automate the computation of the backward passes in neural networks. The forward pass of your network defines the computational graph; nodes in the graph are Tensors and edges are functions that produced the output Tensors from input Tensors. Back-propagation through this graph then gives the gradients.
Every Tensor in PyTorch has a flag: required_grad that allows for fine-grained exclusion of subgraphs from gradient computation and can increase efficiency. If x is a Tensor that has x.requires_grad=True then x.grad is another Tensor holding the gradient of x with respect to some scalar value.
As seen from the above example, if there is a single input to an operation that requires gradient, its output will also require gradient. Conversely, only if all inputs don’t require gradient, the output also won’t require it.
Autograd under the hood
Conceptually, autograd keeps a graph recording of all of the operations that created the data as you execute operations, giving you a directed acyclic graph whose leaves are the input tensors and roots are the output tensors. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule (back-propagation).
Internally, autograd represents this graph as a graph of Function objects, which can be apply()-ed to compute the result of evaluating the graph. When computing the forward pass, autograd simultaneously performs the requested computations and builds up a graph representing the function that computes the gradient (the .grad_fn attribute of each torch.Tensor is an entry point into this graph). When the forward pass completed, the graph is evaluated in the backwards pass to compute the gradients.
As discussed earlier the computational graphs in PyTorch are dynamic and thus are recreated from scratch at every iteration, and this is exactly what allows for using arbitrary Python control flow statements that can change the overall shape and size of the graph at every iteration. You don’t have to encode all possible paths before you launch the training – what you run is what you differentiate.
Every primitive autograd operator is two functions that operate on Tensors. The forward function computes output Tensors from input Tensors. The backward function receives the gradient of the output Tensors with respect to some scalar and computes the gradient of the input Tensors with respect to that same scalar.
To summarize, Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of the computation. Each tensor has a .grad_fn attribute that references a Function that has created the Tensor (except for Tensors created by the user since their grad_fn is None). If you want to compute the derivatives, you can call .backward() on a Tensor. After the call to the backwards function the gradient values are stored as tensors in grad attribute.
These concepts can be represented as following diagram.
![Source: By the author [7]](https://towardsdatascience.com/wp-content/uploads/2021/01/0p9_fUhKXCf0LWAxh.png)
So for example if you create two Tensors a and b. Followed by c = a/b. The grad_fn of c would be DivBackward which is the backward function for the / operator. And as discussed earlier a collection of these grad_fn makes the backward graph. The forward and backward function are a member of torch.autograd.Function. You can define your own autograd operator by defining a subclass of torch.autograd.Function.
is_leaf: All Tensors that have requires_grad which is False are leaf Tensors by convention. For Tensors that have requires_grad with is True, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and so grad_fn is None. Only leaf Tensors have their grad populated during a call to backward(). To get grad populated for non-leaf Tensors, you can use retain_grad().
Let us construct the computational graph example used in part 1 of the post and use PyTorch to compute the gradients.

The above code constructs the computational graph in PyTorch. Let us look at a few properties of the nodes in the graph.

The leaves don’t have grad_fn but will have gradients. Non leaf nodes have grad_fn but don’t have gradients. Before the backward() is called there are no grad values.
The gradients that we calculated theoretically in the previous post are calculated using PyTorch and shown below.

The properties of nodes after backward() call are shown below.

As you can see once the graph is built, calculating the gradients in PyTorch is a piece of case. It takes care of the differentiation for you. The jupyter notebook for this tutorial can be found at : https://github.com/msminhas93/ComputationalGraphs
This completes the discussion of computational graphs in PyTorch. In the next section let us look at computational graphs in TensorFlow.
Computational Graphs in TensorFlow
Tensorflow uses dataflow graph to represent computation in terms of the dependencies between individual operations. This leads to a low-level programming model in which one defines the dataflow graph, then creates a TensorFlow session to run parts of the graph across a set of local and remote devices.
An example of a dataflow or computational graph in TensorFlow is shown below.

In Tensorflow, any kind of computation is represented as an instance of tf.Graph object. These objects consist of a set of instances of tf.Tensor objects and tf.Operation objects. In Tensorflow, the tf.Tensor objects serve as edges while the tf.Operation serves as nodes which are then added to the tf.Graph instance.
In TensorFlow, a tf.Session() object stores the context under which a computation is performed. It is a class for running TensorFlow operations. A Session object encapsulates the environment in which Operation objects are executed, and Tensor objects are evaluated.
Now, let us construct the computational graph in Tensorflow using the example used in Part 1 of the post.

We start by creating four placeholders. A TensorFlow placeholder is a proxy for a tensor which is fed during session execution. It requires the feed_dict argument to Session.run(), Tensor.eval(), or Operation.run().
Next, we use the TensorFlow operations namely add, log and multiply to construct the example computational graph from the defined placeholders.
Once the graph is constructed, the next step is to run it in a Session. Python has a with statement which takes care of opening and closing the Session. In the session scope, we run the tf.gradients function to obtain the required gradients for our example. The output is shown below.

TensorFlow has a utility called tensorboard gives you a pictorial representation of the computational graphs with a lot of visualization functionalities. The graph for the previous example is shown below.

As can be seen, the graph is the same as the one we constructed in the example picture. The Jupyter Notebook can be found at: https://github.com/msminhas93/ComputationalGraphs
Next, let us now look at the timing comparison between the static graphs and the eager execution.


We can clearly see the performance difference here. The static graph was faster than the dynamic graph for this example.
With this, we reach the end of the "Back-propagation Demystified" series.
Conclusion
The key takeaways are as follows.
- Back-propagation is used for calculating the gradients required for the gradient descent based optimizations for training deep learning networks.
- Calculating an analytical expression for gradients is straightforward but computationally expensive.
- Computational graphs are methods of representing mathematical expressions and in the case of deep learning models, these are like a descriptive language giving the functional description of the required computations.
- Deep learning frameworks such as PyTorch and TensorFlow etc. depend on the creation of these computational graphs to implement the back-propagation algorithm for the defined networks for the calculation of gradients.
Finally, here is a comparison of how computational graphs are represented in PyTorch and TensorFlow.

I hoped you gained some knowledge by reading this and enjoyed the article. Would love to connect on LinkedIn.
References
[1]http://www.cs.cornell.edu/courses/cs5740/2017sp/lectures/04-nn-compgraph.pdf
[2]https://pytorch.org/tutorials/beginner/examples_autograd/tf_two_layer_net.html
[3]https://www.tensorflow.org/api_docs/python/tf/Graph
[4]https://www.tensorflow.org/guide/intro_to_graphs
[5]https://kth.instructure.com/files/1864796/download?download_frd=1
[6]https://jdhao.github.io/2017/11/12/pytorch-computation-graph/