A beginner introduction to TensorFlow (Part-1)

Published in

Towards Data Science

7 min readOct 28, 2017

Tensorflow is one of the widely used libraries for implementing Machine learning and other algorithms involving large number of mathematical operations. Tensorflow was developed by Google and it’s one of the most popular Machine Learning libraries on GitHub. Google uses Tensorflow for implementing Machine learning in almost all applications. For example, if you are using Google photos or Google voice search then you are using Tensorflow models indirectly, they work on large clusters of Google hardware and are powerful in perceptual tasks.

The main aim of this article is to provide a beginner friendly introduction to TensorFlow, I assume that you already know a bit of python. The core component of TensorFlow is the computational graph and Tensors which traverse among all the nodes through edges. Let’s have a brief introduction to each one of them.

Tensors:

Mathematically a Tensor is a N-dimensional vector, means a Tensor can be used to represent N-dimensional datasets. The figure above is complex to understand. We’ll look at it’s simplified version

The above figure shows some simplified Tensors with minimum dimensions. As the dimensions keep on increasing, Data representation becomes more and more complex. For example if we take a Tensor of the form (3x3) then I can simply call it a matrix of 3 rows and columns. If I select another Tensor of form (1000x3x3), I can call it as a vector or set of 1000 3x3 matrices. Here we call (1000x3x3) as the shape or Dimension of the resulting Tensor. Tensors can either be a constant or a variable.

Computational graphs (flow):

Now we understood what Tensor really means, and it’s time to understand Flow. This flow refers to a computational graph or simply a graph, the graph can never be cyclic, each node in the graph represents an operation like addition, subtraction etc. And each operation results in the formation of new Tensor.

The figure above shows a simple computational graph. The computational graph has the following properties:

The expression for above graph :

e = (a+b)x(b+1)

Leaf vertices or start vertices are always Tensors. Means, An operation can never occur at the beginning of the graph and thus we can infer that each operation in the graph should accept a Tensor and produce a new Tensor. In the same way, A tensor cannot be present as a non-leaf node, meaning they should be always supplied as an input to the operation / node.
A computational graph always represents a complex operation in a hierarchial order. The above expression can be organized in a hierarchial way, by representing a+b as c and b+1 as d. Therefore we can write e as:

e = (c)x(d) where c = a+b and d = b+1.

Traversing the graph in reverse order results in the formation of sub expressions which are combined to form the final expression.
When we traverse in forward direction, the vertex we encounter always becomes a dependency for the next vertex , for example c cannot be obtained without a and b, in the same way e cannot be obtained without solving for c and d.
Operation in the nodes of same level are independent of each other. This is one of the important property of Computational graph, when we construct a graph in the way shown in the figure, it is natural that, nodes in the same level for example c and d are independent of each other, means there is no need to know c before evaluating d. Therefore they can be executed in parallel.

Parallelism in computational graphs:

Last property mentioned above is of course one of the most important property, it clearly says that nodes at the same level are independent, means there is no need of sitting idle until c gets evaluated, you can parallely compute d while c is still being evaluated. Tensorflow greatly exploits this property.

Distributed Execution :

Tensorflow allows users to make use of parallel computing devices to perform operations faster. The nodes or operations of a computational are automatically scheduled for parallel computing. This all happens internally, for example in the above graph, operation c can be scheduled on CPU and operation d can be scheduled on GPU. The figure below shows two prospectives of distributed execution :

The first one, is a single system distributed execution where a single Tensorflow session( will be explained later) creates a single worker and the worker is responsible for scheduling tasks on various devices, in the second case, there are multiple workers , they can be on same machine or on different machines, each worker runs in its own context, in the above figure, worker process 1 runs on a separate Machine and schedules operations on all available devices.

Computational Subgraphs :

Subgraphs are the part of main graph and are themselves computational graphs by nature. For example, in the above graph, we can obtain many Subgraphs, one of them is shown below

The graph above is a part of main graph, from property 2 we can say that a Subgraph always represents a sub expression, as c is the subexpression of e. Subgraphs also satisfy the last property. Subgraphs in the same level are also independent of each other and can be executed in parallel. Therefore it’s possible to schedule entire Subgraph on a single device.

The above figure explains the parallel execution of Subgraphs. Here there are 2 Matrix multiplication operations, since both of them are at the same level, they are independent of each other, this holds good with the last property. The nodes are scheduled on different devices gpu_0 and gpu_1, this is because of the property of Independence.

Exchanging data between workers :

Now we know that Tensorflow distributes all it’s operations on different devices governed by workers. It is more common that, data in the form of Tensors are exchanged between workers, for example in the graph of e =(c)*(d) once c is calculated it is desirable to pass it on further to process e, therefore Tensor flows from node to node in upward direction. This movement is done as shown in the figure :

Here Tensors form device A has been passed on to device B. This induces some performance delays in a distributed system. The delays depends on an important property that is size of a Tensor. The device B is in ideal mode until it receives input form Device A.

Need for compression :

Well, it’s obvious that in computational graphs, Tensors flow between nodes. It’s important to reduce the delays caused by the flow before it reaches the node where it can be processed. One such idea of reducing the size is by using Lossy compression.

The data type of tensors has a major role to play, let’s understand why, it’s obvious that we go for higher degrees of precision in Machine Learning operations, for example if we use float32 as the data type of a Tensor, them each value is represented using a 32 bit floating point number, so each value occupies a size of 32 bits, the same applies for 64 bit also. Assume a Tensor of shape (1000,440,440,3), the number of values that can be contained within the tensor will be 1000*440*440*3 if data type is 32 bit then it’s 32 times of this big number, it occupies a significant space in the memory and thus impose delays for the flow. Compression techniques can be used to reduce the size.

Lossy compression :

Lossy compression deals with compressing the size of data and does not care about it’s value, means it’s value may become corrupt or inaccurate during compression. But still if we have a 32 bit floating point number like 1.01010e-12, there is a less importance that can be given to least significant digits. Changing or removing those values will not cause a much difference in our calculation. So Tensorflow automatically converts 32 bit floating point numbers to a 16 bit representation by ignoring all digits which are negligible, this reduces the size by almost half, if it’s a 64 bit number, it’s compression to 16 bit will cause the reduction in size by almost 75%. Thus space occupied by Tensors can be minimized.

Once Tensor reaches the nodes, the 16 bit representation can be bought back to it’s original form just by appending 0s. Thus a 32 or 64 bit representation is bought back after it reaches the node for processing.

This ends Part-1 of the Tensorflow introduction, programming and constructing simple Subgraphs will be explained in the next part.

Thanks ☺