Let’s talk Dataflow
Research and production at Google use TensorFlow. Perhaps for that reason, being one of the largest technology companies using machine learning it would be wise to understand what TensorFlow is, and have a mild interest with its new update. It is an application programming interface. An application program interface (API) is a set of routines, protocols, and tools for building software applications. If you are already familiar with what TensorFlow is or simply want to check the new updates, then this text may not be for you – go directly to TensorFlow’s new article on their update here at Medium. In my article I will attempt explaining and exploring dataflow which is a central concept in the TensorFlow.
What is TensorFlow?
TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks.
In this short first sentence we hear of two concepts that may be confusing or needs clarification.
- Dataflow
- Differentiable programming
In this article I will be focusing on the first concept.
Dataflow or Datastream Programming
Dataflow programming models a program as a directed graph of the data flowing between operations, thus implementing dataflow principles dataflow principles and architecture.
In mathematics a directed graph (or digraph) is a graph that is made up of a set of verticles connected by edges, where the edges have a direction associated with them.

Flawnson Tong in his article Everything you Need to Know About Graph Theory for Deep Learning explains a graph as: "A graph, in the context of graph theory, is a structured datatype that has nodes (entities that hold information) and edges (connections between nodes that can also hold information)" He later moves on to explain why this is important for deep learning. Deep Learning is a type of machine learning algorithm, which in turn is a subset of Artificial Intelligence.


Some use the term datastream instead of dataflow, this is due to a possible confusion with a prior concept called dataflow architecture based on an indeterministic machine paradigm. Indeterminism is the idea that events are not caused, or not caused deterministically. It is the opposite of determinism and related to chance. As such it is a certain weighted possibility that something is supposed to happen or not happen as well as a changing bias (input) that influences the outcome.


Dataflow principles – dataflow-based schedule representations. A paper called Equivalence between Schedule Representations: Theory and Applications says the following: "A schedule is usually represented as the mapping of a set of jobs to a set of processors; this mapping varies with time." As such it can be described as an optimisation problem or ‘mapping’ of different optimal or suboptimal (less) desirable solutions.
](https://towardsdatascience.com/wp-content/uploads/2019/10/0tiFCV7nhS1kRDUga.png)
The Wikipedia description on dataflow progrmaming was so interesting that I decided to quote it directly:
_"Where a sequential program can be imagined as a single worker moving between tasks (operations), a dataflow program is more like a series of workers on an assembly line, each doing a specific task whenever materials are available. Since the operations are only concerned with the availability of data inputs, they have no hidden state to track, and are all "ready" at the same time."_
Dataflow programming languages share some features of functional, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing. In functional programming, programs are treated as a sequence of stateless function evaluations.
As such!
TensorFlow: A machine-learning library based on dataflow programming.
However it is additionally at the same time a symbolic math library, and is also used for Machine Learning applications such as neural networks. It is an application programming interface. An application program interface (API) is a set of routines, protocols, and tools for building software applications.
Hope this makes you feel slightly happier, and of course it likely raises more questions than it answers.

TensorFlow 2.0
Google want to scale machine learning and have released TensorFlow2.0. It is described as an easy-to-use framework for enterprise and researchers. It is for scalable ML-powered applications that has a tight-integration of Keras into TensorFlow.
Keras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or PlaidML. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible.
As such TensorFlow 2.0 is prioritising Python developers (a Programming language). It has a more complete low-level API. A low-level interface. A programming interface (API) that is the most detailed, allowing the programmer to manipulate functions within a software module or within hardware at a very granular level. Granular data is detailed data, or the lowest level that data can be in a target set. In a philosophical sense: "…granular computing can describe a way of thinking that relies on the human ability to perceive the real world under various levels of granularity (i.e., abstraction) in order to abstract and consider only those things that serve a specific interest and to switch among different granularities."
The new 2.0 has saved model with variety of runtimes (deployment to web and cloud). Runtime is when a program is running (or being executable). That is, when you start a program running in a computer, it is runtime for that program. It can run in browser and has multi-GPU support. Multi-GPU. (MULTIple-Graphics Processing Units) Using two or more graphics cards to support faster runtime.
In addition to this there are expanded TensorFlow datasets (standardised). Recommend regular Python with eager execution. I have written about eager execution previously. Eager execution is a form of speculative execution where both sides of the conditional branch are executed; however, the results are committed only if the predicate is true.
This is day 125 of #500daysofAI. If you enjoy this article please give me a response as I do want to improve my writing or discover new research, companies and projects.