The world’s leading publication for data science, AI, and ML professionals.

Create TFRecords Dataset and use it to train an ML model

Hi Geeks,

Beneficial for guys dealing with sequential learning models

Hi Geeks,

Hope you are well and safe.

In this story, you will learn about :

  1. What are TFRecords?
  2. How to save data as tfrecords files?
  3. Extract TFRecord data.
  4. How to use a dataset from tfrecord for training a model?

PS – If you are here just to get the code. Take it from here and enjoy!

What are TFRecords?

The TFRecord is a Tensorflow format that is used for storing a sequence of binary records. Other than sequential data, TFrecord can also be used for storing images and 1D vectors. In this article we will see how to store and read data of the following type : (i) Integer.(int64, uint8, etc.) (ii) Floats. (iii) Strings. (iv) Images. TFRecord can only be read and written in a sequential manner. So, It is generally be used for sequential models like RNN, LSTM, etc. But that does not mean we can use it for sequential learning only.

How to save data as tfrecords files ?

To store any data to tfrecords, we first need to create TensorFlow examples. These TensorFlow examples can be created using tf.train.examples functions. This function will create an example object which contains some features inside it. The code is as follows-

example = tf.train.Example(features = tf.train.Features(feature = {        }))

What should be there in the feature?

These features contain our images, data and also contain the filename of that data. If someone is working with a supervised algorithm, the corresponding labels of an image will also be there in features. So, a typical code for creating features will be –

NOTE- images and corresponding labels have been saved in byte format.

Once we have creates an example of an image, we need to write it into a trfrecord file. These can be done using Tfrecord writer. tfrecord_file_name in the below code is the file name of tfrecord in which we want to store the images. TensorFlow will create these files automatically.

writer = tf.python_io.TFRecordWriter(tfrecord_file_name)
writer.write(example.SerializeToString())

Code for storing images


Extract TFRecord data

Reading tfrecord file is simpler if you know the correct way to write them. The process is just the same. First, we need to create a dictionary of features that we have used to write the rfrecord file. Then we will create a dataset object using tf.train.TFRecordDataset function.

Once the dataset object has been created we will map that dataset object to our desired dataset using the code given below.

In the above piece of code, the function __extractfn maps the dataset into a list of our desired things (filename, image, labels, image_shape). To do this, we first parse the examples that we have made during the generation of tfrecords. After parsing we will need to decode the parsed examples to the images by using the _tf.io.decoderaw() function.

How to use a dataset from tfrecord for training a model?

To use data extracted from tfrecord for training a model, we will be creating an iterator on the dataset object.

iterator = tf.compat.v1.data.make_initializable_iterator(batch_dataset)

After creating this iterator, we will loop into this iterator so that we can train the model on every image extracted from this iterator. Function _extractimage do that for every image present in tfrecord by using iterator.get_next(). Refer to the code below for this function.

Hope you have got an idea about tfrecord now and how we can utilize them for training a model.

Happy coding!


Related Articles