
Building an efficient input pipeline is an important performance optimization for training deep neural networks. Beside this, data provisioning needs even to be well structured and transparent do be ruled out as a source of errors for your training. While a lot of current developments are running on PyTorch – Tensorflow is still the way to go if you plan to go to Edge Devices or if want to run on giant training clusters with Terabytes of data. This is where the tf.data API with the tf.data.Dataset jumps in: having an efficient pipeline to provide you with training data which is versatile to scale up into data-center dimensions. Although, using it with your own data can still be frustrating, as you might hit some edges of the existing tutorials (I hit a lot of them).
This is my advice for you and and my biggest learning out of it: do not try to use any shortcut – use the full-blown pipeline as it is meant to be used and things will be incredibly easy to use and to understand. That’s why I wrote this tutorial for you – provide one end-to-end example which is simple in its core but utilizes most of the concepts of the tf.data API (without using any shortcut of writing files with special filenames in a directory structure named cats/dogs).
We use the following:
- 3000 images
- each image contains an object (a dot) of one of three colors
- each dot is placed on a random position on that image
Of course we want to predict where the dot is and which color it has based on the given images. Simple task, but can the labels be represented as a directory structure – I see no way. Do I want to save the labels in a CSV file to be matched again to images based on a UUID-filename – I tried that and it was no fun.
I want my data together as Dataset records storing image data and label information and I want these streamed into my model for training.
Read on if you like that idea too.
In this article I mostly concentrate on the relevant code to make the pipeline based on tf.data work. You will get the full boiler-plate code in the connected notebook: https://gist.github.com/FHermisch/1a517121ecb11d0e0206226ac69915ee
Create images
To ‘simulate’ large, complicated setups of generating test data (e.g. data augmentation, mechanical turks, etc.), I chose to generate images with some random data and use these. So there will be no ‘ready to use’ set loaded from some internet source. We create as simple images as possible for our test and validation data. Important for us, the data we use will have a structural complexity which is comparable to custom image classification and object detection tasks. As said before, our task is to detect the position of a rectangle within an image and the color of that rectangle.
We can use just a plain black background. Or – to have just something more visual appealing, use some NASA image as these are often quite impressive and mostly public domain use.

Datatype/shape of base image, web base image:
uint8 / (112, 112, 3) , uint8 / (112, 112, 3)
We continue with the NASA image. Note, both images are 112,112 size with RGB color channels as ‘channels last’ (Tensorflow style).
Now, lets place some random things on that base. We build a function which places an object of given color on that image and returns the position where the object was placed. This is our simple solution to generate some ‘object detection like’ images.
In the ‘placeobject’ function we initialize the object to be placed on the image:
- build a numpy array of ones in the size the object should have
- multiply by color value to convert every pixel into the desired color
For placing the object on the image, we choose a random relative y and x position. Now, we can calculate the absolute pixel position and copy the object data into the base image.
Lets have a look: there is now an object on our base image and the printed position matches with the objects position on the image.

Position 0.9109465116914442 0.13220923689802044
We now have a way to place objects with a certain color on our base image and know exactly the position we have placed this object on. Position and color will be our label/ground-truth for the later detection. The image containing the object will be our training data. Lets automate this by choosing the color randomly and generate masses of these images and labels.
Lets generate 5 images first and print out the labels.
Generated data (112, 112, 3) 0.8395090863371965 0.9547828984929204 ObjColorTypes.SPECIAL
Generated data (112, 112, 3) 0.5531254931489215 0.4768844126516376 ObjColorTypes.GREEN
Generated data (112, 112, 3) 0.47239734539676304 0.23156864975331592 ObjColorTypes.RED
Generated data (112, 112, 3) 0.539600313491926 0.14757914149460205 ObjColorTypes.SPECIAL
Generated data (112, 112, 3) 0.6978451492963156 0.5689848230831969 ObjColorTypes.RED
We should also have a visual look on the data. Later, we want to train an AI to learn something from these image – so be nice to the AI and have a look yourself before you feed it to the AI: are you able to see what you want the AI to see?!

Write TFRecord data
First, we setup a class which contains all the needed things to write these records. Pardon me, as I was raised with OO paradigms and having classes and instantiated objects often just feels natural to me. You can use functions with partial parameters or anything else here too.
import random
class QTFRec():
def __init__(self, fname):
self.fname = fname
self.tfwriter = tf.io.TFRecordWriter(self.fname)
def _bytes_feature(self, nparr):
return tf.train.Feature(
bytes_list=tf.train.BytesList(value=[nparr.tobytes()]))
def _float_feature(self, nparr):
return tf.train.Feature(
float_list=tf.train.FloatList(value=nparr))
def write_record(self, image, poslabel, collabel):
feature = {
'image_raw': self._float_feature(
image.ravel()),
'img_shape': self._bytes_feature(
np.array(image.shape, dtype=np.uint8).ravel()),
'poslabel': self._float_feature(
poslabel.ravel()),
'collabel': self._float_feature(
collabel.ravel())
}
tf_example = tf.train.Example(
features=tf.train.Features(feature=feature))
self.tfwriter.write(
tf_example.SerializeToString())
def close_record(self):
self.tfwriter.flush()
self.tfwriter.close()
In our class, we construct a TFRecord writer which will be used to write data to disk in a tfrecord format. TFRecord is a way to store data examples in a sequential way. Whereby each example consists of a bunch of features. We define our features within the ‘write_record’ function as a dictionary. In this case, we do have image-data, position of the object on the image, color of the object and we also do want to store the shape of the image data. TFRecord allows certain datatypes to be selected for a feature. We go with byte_list and float_list for our feaures.
Now, we put our actual data into the features: we start with having data in numpies which is just super convenient. We flat them out (‘.ravel()’) and put them to the respective feature-constructor. You may wonder why we store the image data as floats? This is a design choice (oops! – read later on the effects of this design choice) and therefore we already store the image data with color values in the 0<=val<=1 range, so we can later feed this directly to the training. You will see that there are a couple of places suitable for data conversions – if you have saved it here as uINT8 you can later convert it in the feeding pipeline. Last thing we need is a way to close the writer to assure everything is written to disk. We added a close_writer method to do this (little add-on: you might change this to work in conjunction with the python ‘with’ statement).
That’s it. One more thing we will come back to later: we currently do not write validation data separated from the training data. One might think that there will be a simple ‘split_dataset’ function which we can use later, but there is none with Datasets. This is understandable as tf.data is build to work with billions of records and one does not simply split billions of records in a certain way. We will later extend our class to actually write two sets of records. But lets continue with the training data first…
We create an instance of QTFRec and build another small class to encapsulate this and provide a function which just suits our callback of the data generation. Ok, this works. Now we can generate a reasonable amount of records for our training set.
qtfr = QTFRec(fname)
tfrsaver = TFRsaver( qtfr)
generatedata(baseimg, 3000, tfrsaver.savedata)
qtfr.close_record()
Lets use this dataset to setup an input pipeline to train a model. The next steps are quite comparable to tutorials using tfds, like e.g. MNist from tfds. We will focus on the part which takes some conversion logic as to adapt our data to the model training needs (we could have also written the data before in a better suited way, but lets do it this way to show you places where you can put in your custom conversion needs for you data and pipeline).
Build the pipeline
Opening the written dataset again is easy.
tfrds = tf.data.TFRecordDataset(TRAINSET_FNAME)
We can go with the online documentation, but I like the convenient style of using the built-in ‘help(tfrds)’ to see what type I got and what functions it offers.
As expected, a TFRecordDatasetV2. Notable functions:
-
‘apply’ for mapping a transformation function on it and building a pipeline of transformations: looks good – we will later use this
-
‘as_numpy_iterator’ this sounds handy for investigating the structure and the contents
- ‘other pipeline functions, batch/ shuffle/ etc.’
-
‘take’ takes an amount of elements out of the end of the pipeline
Lets play around to see what happens:
for npelem in tfrds.as_numpy_iterator():
print( type(npelem))
barr = np.frombuffer(npelem, dtype=np.byte )
print( barr.shape)
break
Lets see what it prints out.
<class 'bytes'>
(150629,)
We got a numpy shape of 150629 – this should be around 112x112x3 = 37632 ? Wait, what happened? Ok, we stored the image data as floats (out of convenience) and therefore we have blown every color value from one byte (uint8) to 4 bytes (float32). We should really change this – so always have a look at your data. Thinking about it, it is crystal clear but I missed it. I leave it here for you as an example. There are much better ways to waste your disk and IO then storing 4 times the size just out of convenience.
Lets continue. We got a tensor from the dataset and we have options to map different transformations to that set. We build a DataRead class which acts as the counterpart of our writer class from above.
class DataRead():
def __init__(self):
self.feature_description = {
'image_raw':
tf.io.VarLenFeature( dtype=tf.float32),
'img_shape':
tf.io.FixedLenFeature([], tf.string),
'poslabel':
tf.io.VarLenFeature( dtype=tf.float32),
'collabel':
tf.io.VarLenFeature( dtype=tf.float32)
}
def prepdata( self, fmap):
pmap = tf.io.parse_single_example(
fmap, self.feature_description)
imgraw = tf.sparse.to_dense(pmap['image_raw'])
imshape = tf.io.decode_raw(pmap['img_shape'], tf.uint8)
poslabel = tf.sparse.to_dense(pmap['poslabel'])
collabel = tf.one_hot( tf.cast(
tf.sparse.to_dense(pmap['collabel']), tf.uint8), tf.constant(3))[0]
return (tf.reshape( imgraw, tf.cast(imshape, tf.int32)),
tf.concat( [poslabel,collabel], axis=-1))
We first have to parse the tensor as everything inside is just bytes. Therefore, setup a feature_description dictionary for the different elements. Our ‘prepdata’ function will later be mapped to the pipeline. We parse a single entry to be able to access every single feature within a record as specified. This is a good point to put additional transformation code for the data. We have to transformations to do:
-
reshape the raw imagedata back into something structured: we decode the shape first and use this to reshape the imagedata back into its original 112x112x3 shape
-
put label data together in one tensor: we concat the position label with the colortype which we convert to a one-hot representation before
Now we got nice image data in 1x112x112x3 to feed for training an a label as 1×5 for the target/ ground-truth.
The input pipeline just maps a bunch of transformations together. Map the parsing function we just built. Map a caching function. Shuffle data after each full iteration. Form batches out of the single piped items. Start some prefetching to always have batches for the training ready as soon as they are needed.
datar = DataRead()
traindat = tfrds.map(
datar.prepdata,
num_parallel_calls=tf.data.experimental.AUTOTUNE)
traindat = traindat.cache()
traindat = traindat.shuffle(
1000, seed=1234, reshuffle_each_iteration=True)
traindat = traindat.batch(
BATCHSIZE, drop_remainder=True)
traindat = traindat.prefetch(
tf.data.experimental.AUTOTUNE)
As the result is still a Dataset, we use the as_numpy_iterator function again. Now, data pops out in our converted formats and we can easily visualize the image data and labels.

[0.602948 0.2850269 0. 0. 1. ]
A VERY simple object detection
This article is not focused on how to do object detection. So we do not get into detail for the next steps: Setup a model with a few convolutions and some fully connected layers at the end. Output is just a sigmoid which will be trained to match our labels (this is VERY basic but works for this extremely simplified example).
Model: "functional_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 112, 112, 3)] 0
_________________________________________________________________
conv2d (Conv2D) (None, 112, 112, 16) 448
_________________________________________________________________
re_lu (ReLU) (None, 112, 112, 16) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 38, 38, 16) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 38, 38, 32) 4640
_________________________________________________________________
re_lu_1 (ReLU) (None, 38, 38, 32) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32) 0
_________________________________________________________________
flatten (Flatten) (None, 5408) 0
_________________________________________________________________
dropout (Dropout) (None, 5408) 0
_________________________________________________________________
dense (Dense) (None, 128) 692352
_________________________________________________________________
re_lu_2 (ReLU) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 64) 8256
_________________________________________________________________
re_lu_3 (ReLU) (None, 64) 0
_________________________________________________________________
batch_normalization (BatchNo (None, 64) 256
_________________________________________________________________
dense_2 (Dense) (None, 5) 325
=================================================================
Total params: 706,277
Trainable params: 706,149
Non-trainable params: 128
_________________________________________________________________
Compile it with SGD as optimizer and MeanSquaredError as loss.
Run it and … oh wait … we have no validation data!
Epoch 1/10
46/46 [==============================] - 1s 15ms/step - loss: 0.1976
Epoch 2/10
46/46 [==============================] - 1s 15ms/step - loss: 0.1801
Epoch 3/10
46/46 [==============================] - 1s 14ms/step - loss: 0.1655
...
We do need a validation set to get any meaningful insights. Lets change our code for writing the data. We want to spare a certain percentage of our images for validation and save these to another Dataset. We need to add code to initialize (and later close) an additional writer. Within the ‘write_record’, we add a random step which generates a uniform random between 0 and 1 and routes the generated data to training or to validation according to the comparison with the provided validation split percentage.
Here we split the data randomly, but this will also be the spot to put some more ‘intelligent’ logic – e.g. assure in a face detection scenario that the validation set will only contain people not present in the training set at all. This splitting logic should be close to the generation of the data and cannot be done later or during training.
We run again and provide filenames for train and validation as well as a split percentage of 20%.
qtfrec = QTFRecVal(FNAME, 0.2, FNAMEVAL)
tfrsaver = TFRsaver( qtfrec)
generatedata(baseimg, 3000, tfrsaver.savedata)
qtfrec.close_record()
Run to generate 3000 images (20% will be put to the validation set).
Build a second pipeline for the validation (we don’t have to shuffle the validation).
tfvalrds = tf.data.TFRecordDataset(FNAMEVAL)
valdat = tfvalrds.map(
datar.prepdata, num_parallel_calls=tf.data.experimental.AUTOTUNE)
valdat = valdat.cache()
valdat = valdat.batch(BATCHSIZE, drop_remainder=True)
valdat = valdat.prefetch( tf.data.experimental.AUTOTUNE)
Generate, compile and run for 100 epochs.
...
Epoch 98/100
38/38 [==============================] - 1s 19ms/step - loss: 0.0020 - val_loss: 8.0567e-04
Epoch 99/100
38/38 [==============================] - 1s 20ms/step - loss: 0.0022 - val_loss: 8.2248e-04
Epoch 100/100
38/38 [==============================] - 1s 18ms/step - loss: 0.0021 - val_loss: 7.9342e-04
See how the losses have evolved. A word on accuracy here: we cannot use the out of the box accuracy functions as they will just not represent what we did. If you want some accuracy, you have to provide your own function: e.g. check if the correct color value was predicted and if the Euclidean distance between target position and predicted position is below a certain threshold.

Does it predict?
I would say yes – mostly. Color value is predicted very good but I expected the position detection to perform better… The good thing: you now have everything to build your own custom examples with tf.data pipelines for your detections.

Groundtruth label: [0.5074 0.7342 0. 0. 1. ]
Prediction from model: [0.5104 0.7335 0.0157 0.0145 0.9913]
The full notebook is available as a GIST:
https://gist.github.com/FHermisch/1a517121ecb11d0e0206226ac69915ee