In this article, we will build a Convolutional Neural Network layer-by-layer using the Functional API of tf.keras. Next, we will explore and integrate the Data Augmentation techniques provided by ImageDataGenerator class in Keras.
What is tf.keras?
- tf.keras is the TensorFlow’s implementation of the Keras API.
- Keras requires a backend to train custom neural networks. It used Theano as its default backend, before switching to TensorFlow starting from v1.1.0.
This tutorial is also available as a Google Colab notebook for a hands-on experience.
We will use the CIFAR-10 dataset, which is a popular baseline for Image Classification tasks. The CIFAR-10 dataset is a collection of 60000 RGB images of size 32×32. It comprises of 10 classes, with 6000 images per class. There are 50000 images available for training, and 10000 test images.
Let’s Begin….
Importing the necessary libraries-
import numpy as np
from tensorflow.Keras import *
from keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
import seaborn as sns
1. Data Preparation
Downloading CIFAR-10 Dataset-
(X, y), (test_X, test_y) = datasets.cifar10.load_data()
Out of the 50000 images available for training, we will create a validation set using 3000 images and use the rest 47000 images for training.
train_X = X[:47000]
train_y = y[:47000]
val_X = X[-3000:]
val_y = y[-3000:]
2. Model Architecture
We will use the TensorFlow Functional API since it gives us more control and access to each layers compared to Sequential API.
(Conv2D → BatchNorm → Conv2D → BatchNorm → Maxpooling2D) 3 → (Dense → BatchNorm)3 →Softmax
The layers which we will use-
- Input() instantiates a symbolic tensor object
- Conv2D() creates a kernel which is convolved with the layer input. We will use it to perform a Spatial Convolution over images.
- BatchNormalization() layer, during training uses the mean and std of the current batch of inputs, while during inference uses a moving average of the mean and std of the batches seen during training.
- MaxPooling2D() is used for the pooling operation for 2D spatial data
- Flatten() layer simply flattens the input so that it can be fed to a Dense layer
- Dense() layer is a regular NN layer, with options available for adding an activation and regularization within the method itself.
- softmax() applies the softmax activation function
Selecting appropriate kernel size, stride and padding are important parameters to design a Convolutional Neural Network. The CS231n guide by Stanford University provides a beautiful intuition behind choosing appropriate values for these parameters and explains how to determine the output dimensions after each layer.
input = Input(shape=(32, 32, 3))
x = layers.Conv2D(32, (3,3), activation='relu',padding='same')(input)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(64, (3,3), activation='relu',padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D(2)(x)
x = layers.Conv2D(128, (5,5), activation='relu',padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(256, (5,5), activation='relu',padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(512, (3,3), activation='relu',padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(512, (5,5), activation='relu',padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D(2)(x)
x = layers.Flatten()(x)
x = layers.Dense(1024, activation='relu')(x)
x = layers.BatchNormalization()(x)
x = layers.Dense(512, activation='relu')(x)
x = layers.BatchNormalization()(x)
x = layers.Dense(128, activation='relu')(x)
x= layers.Dense(10)(x)
output= activations.softmax(x)
model = Model(inputs=input, outputs=output,name="TF_Functional_API")
model.summary()
The number of parameters involved with each layer and the dimension of the layer outputs can be viewed with the summary() method.

The 3 primary methods used in Keras for model training and evaluation are compile, fit, and evaluate. TensorFlow provides a comprehensive guide for each of these methods.
Some Hyperparameters –
- Loss Function → SparseCategoricalCrossEntropy
- Optimizer → Adam
- Learning Rate → 0.001
- Batch size → 64
- Epochs → 30
3. Running our model (without Data Augmentation)
model.compile(loss=losses.SparseCategoricalCrossentropy(),optimizer=optimizers.Adam(learning_rate=0.001),metrics=['accuracy'])
model.fit(train_X, train_y.flatten(), batch_size=64, epochs=30, validation_data=(val_X,val_y.flatten()))
model.evaluate(test_X, test_y.flatten(), verbose=2)
The model gives an accuracy of 81.59 % on the test set.
The fit method returns a ‘History’ object where the attribute ‘History.history’ is a dictionary of the training and validation metrics (loss and accuracy) over the epochs. Thus it maintains a log which can be used to visualize these metrics as we will see later.
4. Data Augmentation using ImageDataGenerator class
Now, we will see how using Data Augmentation techniques can boost the accuracy of our model. Data Augmentation helps in preventing overfitting and helps the neural network model to generalize better on unseen variations of test images. Normalization, horizontal flipping, minor rotation, resizing are the most frequently used augmentation techniques.
We will use the ImageDataGenerator class in Keras. It provides a range of transformations for the input images which we can incorporate easily into our training.
datagen_train=ImageDataGenerator(featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True)
% For testing and validation set, we just normalize the images
datagen_test=ImageDataGenerator(featurewise_center=True,
featurewise_std_normalization=True)
datagen_train.fit(train_X)
datagen_test.fit(val_X)
datagen_test.fit(test_X)
Train and evaluate-
model.compile(loss=losses.SparseCategoricalCrossentropy(),optimizer=optimizers.Adam(learning_rate=0.001),metrics=['accuracy'])
model.fit(datagen_train.flow(train_X, train_y.flatten(), batch_size=64), steps_per_epoch=len(train_X) / 64, epochs=30, validation_data=datagen_test.flow(val_X, val_y.flatten(), batch_size=64))
model.evaluate(datagen_test.flow(test_X, test_y.flatten(), batch_size=1), verbose=2)
We reach a test accuracy of 88.30 %, already achieving a boost of ~7%
5. Results
Here are the visualizations of how the training progressed-


6. Conclusion
- We saw how a general image classification pipeline looks like, starting from data preparation and visualization, model designing, hyperparameter selection, training, and evaluation.
- Keras API in TensorFlow, especially the Functional API makes it very convenient for an user to design a Neural Network.
- Introducing just data augmentation in training provided the model a boost of ~7 % in test accuracy, thus demonstrating the benefits of the technique as discussed.
- We have used CIFAR-10 dataset for demonstration, however, this template can be extended to any image dataset.
- Here is an article which provides an extensive discussion on the key differences between Keras and tf.keras
The codes are available here for further development.
7. Further exploration
Some tricks to improve accuracy:
- Using architectures with skip connection, like ResNet, that have proven to perform well for Images.
- We can use Transfer learning. for eg. Load a model pretrained on larger and related datasets like ImageNet and then fine-tune on our dataset. A pretrained CIFAR-100 model could be more beneficial on CIFAR-10 dataset because of the similarity in size and classes.
- Tweaking the layers and hyperparameters used. For eg. An SGD optimizer, higher batch size, changing filter size and number of filters in each layer.
- Use Learning Rate Scheduling to decay the learning rate with epochs as the model learns.
- Use regularization techniques like Dropout and advanced strategies like CutMix, Label Smoothing, Mixup training etc.
Note:
We should note that the ResNet models are originally designed for images of much larger size. i.e. 224×224 from ImageNet. Therefore, it uses MaxPooling operations and higher convolution strides to reduce the dimension of images between layers. Thus, for CIFAR-10 images, we should Implement the model layers ourselves and change the filter size and stride values accordingly.