Neural Networks

Intro
A particular category of Neural Networks called Convolutional Neural Networks (CNN) is designed for image recognition. While it may sound super fancy, I assure you that anyone can grasp the main ideas behind it.
In this article, I will go through the essential components of CNNs and provide you with illustrated examples of how each part works. I will also talk you through the Python code that you can use to build Deep Convolutional Neural Networks with the help of Keras/Tensorflow libraries.
Contents
- Convolutional Neural Networks within the universe of Machine Learning algorithms
- What is the structure of Convolutional Neural Networks, and how do they work?
- A complete Python example showing you how to build and train your own Deep CNN models
Deep Convolutional Neural Networks (DCN) within the Machine Learning universe
The below chart is my attempt to categorise the most common Machine Learning algorithms.
While we often use Neural Networks in a supervised manner with labelled training data, I felt that their unique approach to Machine Learning deserved a separate category.
Hence, my graph shows Neural Networks (NNs) branching out from the core of the Machine Learning universe. Convolutional Neural Networks occupy a sub-branch of NNs and contain algorithms such as DCN, DN and DCIGN.
The below graph is interactive, so please click on different categories to enlarge and reveal more👇 .
If you enjoy Data Science and Machine Learning, please subscribe to get an email with my new articles.
What is the structure of Convolutional Neural Networks, and how do they work?
Let’s start by comparing the structure of a typical Feed-Forward Neural Network and a Convolutional Neural Network.
In a traditional Feed-Forward Neural Network, we have Input, Hidden and Output layers, where each of them may contain multiple nodes. We commonly refer to networks with more than one Hidden layer as "Deep" networks.

Meanwhile, Convolutional Neural Networks (CNN) tend to be multi-dimensional and contain some special layers, unsurprisingly called Convolutional layers. Moreover, Convolutional layers are often accompanied by Pooling layers (Max or Average), which help reduce the size of convolved features.

Convolutional layer
It is worth highlighting that we can have Convolutional layers of different dimensions:
- One-dimensional (Conv1D) – suitable for text embeddings, time-series or other sequences.
- Two-dimensional (Conv2D) – typical choice for images.
- Three-dimensional (Conv3D) – can be used for videos, which are essentially just sequences of images, or for 3D images such as MRI scans.
Since I focus on image recognition in this article, let’s take a closer look at how 2D convolution works. 1D and 3D convolutions work in the same way, except they have one fewer or one extra dimension.

Note that for a greyscale picture, we would only have one channel. Meanwhile, we would have three separate channels for a colour picture, each containing values for its respective colour (Red, Green, Blue).
We can also specify how many filters we want to have in the Convolutional layer. Having multiple filters lets us extract a broader range of features from the image.
How does convolution work?
There are three parts to a convolution: Input (e.g., 2D image), a filter (a.k.a. kernel) and an output (a.k.a. convolved feature).

The convolution process is iterative. First, a filter is applied over a section of an input image, and the output value is recorded. The filter is then shifted by one position when stride=1 or by multiple positions when the stride is set to a higher number, and the same process is repeated until the convolved feature is complete.
The below gif image illustrates the process of applying a 3×3 filter on a 5×5 input.

Let me elaborate on the above to give you a better understanding of the filter’s purpose. First, you will note that my custom filter has all 1’s down the middle column. This type of filter is designed to identify vertical lines in the input image as it gives a strong signal (high values) whenever vertical lines are present.
For comparison, here is what the Convolved Feature (output) would look like if we applied a filter designed to find horizontal lines:

As you can see, the entire output is populated with the same value, meaning that there is no firm indication of a horizontal line being present in the input image.
It is important to note that we do not need to specify values for different filters manually. The creation of filters is handled automatically during the training of the Convolutional Neural Network. Although, we can tell the algorithm how many filters we want to have.
Additional options
There are a couple more options for us to tweak when setting up a Convolutional layer:
- Padding – in some scenarios, we may wish for the output to be the same size as the input. We can achieve that by adding some padding. At the same time, it may make it easier for the model to capture essential features residing at the edges of an image.

- Stride – if we have large images, then we may want to use larger strides, i.e., shifting a filter by multiple pixels at a time. While it does help to reduce the size of the output, larger strides may result in some features being missed, like in the example below:

Multiple convolutional layers
It is often beneficial to set up multiple Convolutional layers to improve the network. The benefits arise from subsequent Convolutional layers identifying extra complexity within the image.
The first layer in a Deep Convolutional Network (DCN) tends to find low-level features (e.g., vertical, horizontal, diagonal lines…). Meanwhile, the deeper layers can identify higher-level characteristics, such as more complex shapes, representing real-world elements like eyes, nose, ears etc.
Pooling layer
It is common to add a Pooling layer following a Convolutional layer. Its purpose is to reduce the size of Convolved Features improving computational efficiency. Also, it can help to de-noise the data by keeping the strongest activations.

There are two commonly used types of pooling:
- Max pooling – takes the highest value from the area covered by the kernel (suitable for de-noising).
- Average pooling – calculates the average value from the area covered by the kernel.

Flatten and Dense Layers
Once we have finished deriving Convolved Features, we need to flatten them. This enables us to have a one-dimensional input vector and utilise a traditional Feed-Froward Network architecture. In the end, we train the network to find the optimum weights and biases, which enables us to classify images correctly.

Depending on the size and complexity of your data, you may want to have multiple pairs of Convolutional and Pooling layers followed by multiple Dense Layers, making your network "Deep."

A complete Python example showing you how to build and train your own Deep CNN models
Setup
We will need to get the following data and libraries:
- Caltech 101 image data set (source)
Data license: Attribution 4.0 International (CC BY 4.0)
Reference: Fei-Fei, R. Fergus and P. Perona. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. IEEE. CVPR 2004, Workshop on Generative-Model Based Vision. 2004
- Pandas and Numpy for data manipulation
- Open-CV and Matplotlib for ingesting and displaying images
- Tensorflow/Keras for building Neural Networks
- Scikit-learn library for splitting the data (train_test_split), label encoding (OrdinalEncoder), and model evaluation (classification_report)
Let’s import libraries:
Python"># Tensorflow / Keras
import tensorflow as tf # used to access argmax function
from tensorflow import keras # for building Neural Networks
print('Tensorflow/Keras: %s' % keras.__version__) # print version
from keras.models import Sequential # for creating a linear stack of layers for our Neural Network
from keras import Input # for instantiating a keras tensor
from keras.layers import Dense, Conv2D, MaxPool2D, Flatten, Dropout # for adding Concolutional and densely-connected NN layers.
# Data manipulation
import pandas as pd # for data manipulation
print('pandas: %s' % pd.__version__) # print version
import numpy as np # for data manipulation
print('numpy: %s' % np.__version__) # print version
# Sklearn
import sklearn # for model evaluation
print('sklearn: %s' % sklearn.__version__) # print version
from sklearn.model_selection import train_test_split # for splitting the data into train and test samples
from sklearn.metrics import classification_report # for model evaluation metrics
from sklearn.preprocessing import OrdinalEncoder # for encoding labels
# Visualization
import cv2 # for ingesting images
print('OpenCV: %s' % cv2.__version__) # print version
import matplotlib
import matplotlib.pyplot as plt # for showing images
print('matplotlib: %s' % matplotlib.__version__) # print version
# Other utilities
import sys
import os
# Assign main directory to a variable
main_dir=os.path.dirname(sys.path[0])
#print(main_dir)
The above code prints package versions I used in this example:
Tensorflow/Keras: 2.7.0
pandas: 1.3.4
numpy: 1.21.4
sklearn: 1.0.1
OpenCV: 4.5.5
matplotlib: 3.5.1
Next, we download and ingest Caltech 101 image data set. Note that we will only use four categories ("dalmatian", "hedgehog", "llama", "panda") in this example as opposed to all 101.
At the same time, we prep the data by resizing and standardising it, encoding labels and splitting it into train and test samples.
# Specify the location of images after you have downloaded them
ImgLocation=main_dir+"/data/101_ObjectCategories/"
# List image categories we are interested in
LABELS = set(["dalmatian", "hedgehog", "llama", "panda"])
# Create two lists to contain image paths and image labels
ImagePaths=[]
ListLabels=[]
for label in LABELS:
for image in list(os.listdir(ImgLocation+label)):
ImagePaths=ImagePaths+[ImgLocation+label+"/"+image]
ListLabels=ListLabels+[label]
# Load images and resize to be a fixed 128x128 pixels, ignoring original aspect ratio
data=[]
for img in ImagePaths:
image = cv2.imread(img)
image = cv2.resize(image, (128, 128))
data.append(image)
# Convert image data to numpy array and standardize values (divide by 255 since RGB values ranges from 0 to 255)
data = np.array(data, dtype="float") / 255.0
# Show data shape
print("Shape of whole data: ", data.shape)
# Convert Labels list to numpy array
LabelsArray=np.array(ListLabels)
# Encode labels
enc = OrdinalEncoder()
y=enc.fit_transform(LabelsArray.reshape(-1,1))
# ---- Create training and testing samples ---
X_train, X_test, y_train, y_test = train_test_split(data, y, test_size=0.2, random_state=0)
y_train=y_train.reshape(-1,1)
y_test=y_test.reshape(-1,1)
# Print shapes
# Note, model input must have a four-dimensional shape [samples, rows, columns, channels]
print("Shape of X_train: ", X_train.shape)
print("Shape of y_train: ", y_train.shape)
print("Shape of X_test: ", X_test.shape)
print("Shape of y_test: ", y_test.shape)
The above code prints the shape of our data, which is [samples, rows, columns, channels] for input data and [samples, labels] for target data:
Shape of whole data: (237, 128, 128, 3)
Shape of X_train: (189, 128, 128, 3)
Shape of y_train: (189, 1)
Shape of X_test: (48, 128, 128, 3)
Shape of y_test: (48, 1)
To better understand what data we are working with, let’s display a few input images.
# Display images of 10 animals in the training set and their true labels
fig, axs = plt.subplots(2, 5, sharey=False, tight_layout=True, figsize=(12,6), facecolor='white')
n=0
for i in range(0,2):
for j in range(0,5):
axs[i,j].matshow(X_train[n])
axs[i,j].set(title=enc.inverse_transform(y_train)[n])
n=n+1
plt.show()

Training and evaluating Deep Convolutional Neural Network (DCN)
You can follow comments in the code to understand what each section does. In addition to that, here is some high-level description.
I have structured the model to have multiple Convolutional, Pooling and Dropout layers to create a "deep" architecture. As mentioned earlier in the article, the initial Convolutional layers help extract low-level features, while later ones identify more high-level features.
So the structure of my DCN model is:
- Input layer
- The first set of Convolutional, Max Pooling and Dropout layers
- The second set of Convolutional, Max Pooling and Dropout layers
- The third set of Convolutional, Max Pooling and Dropout layers
- Flatten layer
- Dense Hidden layer
- Output layer
Note that the Dropout layer randomly sets input units to 0 based on the rate we provided (in this case, 0.2). It means that a random 20% of inputs (features/nodes) will be set to zero and will not contribute meaningful weights to the model. The purpose of the Dropout layer is to help prevent overfitting.
Finally, note that I have listed all possible parameters in the first set of Convolutional and Max Pooling layers as I wanted to give you an easy reference to what you can change. However, we keep most of them at default values, so we do not need to explicitly list them every time (see the second and third set of Convolutional and Max Pooling layers).
##### Step 1 - Specify the structure of a Neural Network
#--- Define a Model
model = Sequential(name="DCN-Model") # Model
#--- Input Layer
# Specify input shape [rows, columns, channels]
model.add(Input(shape=(X_train.shape[1],X_train.shape[2],X_train.shape[3]), name='Input-Layer')) # Input Layer - need to speicfy the shape of inputs
#--- First Set of Convolution, Max Pooling and Droput Layers (all parameters shown)
model.add(Conv2D(filters=16, # Integer, the dimensionality of the output space (i.e. the number of output filters in the convolution).
kernel_size=(3,3), # An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
strides=(1,1), # Default=(1,1), An integer or tuple/list of 2 integers, specifying the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.
padding='valid', # Default='valid', "valid" means no padding. "same" results in padding with zeros evenly to the left/right or up/down of the input. When padding="same" and strides=1, the output has the same size as the input.
data_format=None, # Default=None, A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch_size, height, width, channels) while channels_first corresponds to inputs with shape (batch_size, channels,height, width). It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be channels_last.
dilation_rate=(1, 1), # Default=(1, 1), an integer or tuple/list of 2 integers, specifying the dilation rate to use for dilated convolution. Can be a single integer to specify the same value for all spatial dimensions. Currently, specifying any dilation_rate value != 1 is incompatible with specifying any stride value != 1.
groups=1, # Default=1, A positive integer specifying the number of groups in which the input is split along the channel axis. Each group is convolved separately with filters / groups filters. The output is the concatenation of all the groups results along the channel axis. Input channels and filters must both be divisible by groups.
activation='relu', # Default=None, Activation function to use. If you don't specify anything, no activation is applied (see keras.activations).
use_bias=True, # Default=True.
kernel_initializer='glorot_uniform', # Default='glorot_uniform', Initializer for the kernel weights matrix (see keras.initializers).
bias_initializer='zeros', # Default='zeros', Initializer for the bias vector (see keras.initializers).
kernel_regularizer=None, # Default=None, Regularizer function applied to the kernel weights matrix (see keras.regularizers).
bias_regularizer=None, # Default=None, Regularizer function applied to the bias vector (see keras.regularizers).
activity_regularizer=None, # Default=None, Regularizer function applied to the output of the layer (its "activation") (see keras.regularizers).
kernel_constraint=None, # Default=None, Constraint function applied to the kernel matrix (see keras.constraints).
bias_constraint=None, # Default=None, Constraint function applied to the bias vector (see keras.constraints).
name='2D-Convolutional-Layer-1')
) # Convolutional Layer, relu activation used
model.add(MaxPool2D(pool_size=(2,2), # Default=(2,2), integer or tuple of 2 integers, window size over which to take the maximum. (2, 2) will take the max value over a 2x2 pooling window. If only one integer is specified, the same window length will be used for both dimensions.
strides=(2,2), # Default=None, Integer, tuple of 2 integers, or None. Strides values. Specifies how far the pooling window moves for each pooling step. If None, it will default to pool_size.
padding='valid', # Default='valid', One of "valid" or "same" (case-insensitive). "valid" means no padding. "same" results in padding evenly to the left/right or up/down of the input such that output has the same height/width dimension as the input.
data_format=None, # Default=None, A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).
name='2D-MaxPool-Layer-1')
) # Max Pooling Layer,
model.add(Dropout(0.2, name='Dropout-Layer-1')) # Dropout Layer
#--- Second Set of Convolution, Max Pooling and Droput Layers
model.add(Conv2D(filters=64, kernel_size=(3,3), strides=(1,1), padding='valid', activation='relu', name='2D-Convolutional-Layer-2')) # Convolutional Layer
model.add(MaxPool2D(pool_size=(2,2), strides=(2,2), padding='valid', name='2D-MaxPool-Layer-2')) # Second Max Pooling Layer,
model.add(Dropout(0.2, name='Dropout-Layer-2')) # Dropout Layer
#--- Third Set of Convolution, Max Pooling and Droput Layers
model.add(Conv2D(filters=64, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu', name='2D-Convolutional-Layer-3')) # Convolutional Layer
model.add(MaxPool2D(pool_size=(2,2), strides=(2,2), padding='same', name='2D-MaxPool-Layer-3')) # Second Max Pooling Layer,
model.add(Dropout(0.2, name='Dropout-Layer-3')) # Dropout Layer
#--- Feed-Forward Densely Connected Layer and Output Layer (note, flattening is required to convert from 2D to 1D shape)
model.add(Flatten(name='Flatten-Layer')) # Flatten the shape so we can feed it into a regular densely connected layer
model.add(Dense(16, activation='relu', name='Hidden-Layer-1', kernel_initializer='HeNormal')) # Hidden Layer, relu(x) = max(x, 0)
model.add(Dense(4, activation='softmax', name='Output-Layer')) # Output Layer, softmax(x) = exp(x) / tf.reduce_sum(exp(x))
With the model structure specified, let’s compile it, train it and print the results.
##### Step 2 - Compile keras model
model.compile(optimizer='adam', # default='rmsprop', an algorithm to be used in backpropagation
loss='SparseCategoricalCrossentropy', # Loss function to be optimized. A string (name of loss function), or a tf.keras.losses.Loss instance.
metrics=['Accuracy'], # List of metrics to be evaluated by the model during training and testing. Each of this can be a string (name of a built-in function), function or a tf.keras.metrics.Metric instance.
loss_weights=None, # default=None, Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of different model outputs.
weighted_metrics=None, # default=None, List of metrics to be evaluated and weighted by sample_weight or class_weight during training and testing.
run_eagerly=None, # Defaults to False. If True, this Model's logic will not be wrapped in a tf.function. Recommended to leave this as None unless your Model cannot be run inside a tf.function.
steps_per_execution=None # Defaults to 1. The number of batches to run during each tf.function call. Running multiple batches inside a single tf.function call can greatly improve performance on TPUs or small models with a large Python overhead.
)
##### Step 3 - Fit keras model on the dataset
history = model.fit(X_train, # input data
y_train, # target data
batch_size=1, # Number of samples per gradient update. If unspecified, batch_size will default to 32.
epochs=20, # default=1, Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided
verbose=0, # default='auto', ('auto', 0, 1, or 2). Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch. 'auto' defaults to 1 for most cases, but 2 when used with ParameterServerStrategy.
callbacks=None, # default=None, list of callbacks to apply during training. See tf.keras.callbacks
validation_split=0.0, # default=0.0, Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch.
#validation_data=(X_test, y_test), # default=None, Data on which to evaluate the loss and any model metrics at the end of each epoch.
shuffle=True, # default=True, Boolean (whether to shuffle the training data before each epoch) or str (for 'batch').
class_weight=None, # default=None, Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
sample_weight=None, # default=None, Optional Numpy array of weights for the training samples, used for weighting the loss function (during training only).
initial_epoch=0, # Integer, default=0, Epoch at which to start training (useful for resuming a previous training run).
steps_per_epoch=None, # Integer or None, default=None, Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch. When training with input tensors such as TensorFlow data tensors, the default None is equal to the number of samples in your dataset divided by the batch size, or 1 if that cannot be determined.
validation_steps=None, # Only relevant if validation_data is provided and is a tf.data dataset. Total number of steps (batches of samples) to draw before stopping when performing validation at the end of every epoch.
validation_batch_size=None, # Integer or None, default=None, Number of samples per validation batch. If unspecified, will default to batch_size.
validation_freq=1, # default=1, Only relevant if validation data is provided. If an integer, specifies how many training epochs to run before a new validation run is performed, e.g. validation_freq=2 runs validation every 2 epochs.
max_queue_size=10, # default=10, Used for generator or keras.utils.Sequence input only. Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.
workers=1, # default=1, Used for generator or keras.utils.Sequence input only. Maximum number of processes to spin up when using process-based threading. If unspecified, workers will default to 1.
use_multiprocessing=False, # default=False, Used for generator or keras.utils.Sequence input only. If True, use process-based threading. If unspecified, use_multiprocessing will default to False.
)
##### Step 4 - Use model to make predictions
# Note, we need to pass model output through argmax to convert from probability to label
# Also, we convert output from tensor to numpy array
# Predict class labels on training data
pred_labels_tr = np.array(tf.math.argmax(model.predict(X_train),axis=1))
# Predict class labels on a test data
pred_labels_te = np.array(tf.math.argmax(model.predict(X_test),axis=1))
##### Step 5 - Model Performance Summary
print("")
print('------------------------- Model Summary -------------------------')
model.summary() # print model summary
print("")
print("")
print('------------------------- Encoded Names -------------------------')
for i in range(0,len(enc.categories_[0])):
print(i,": ",enc.categories_[0][i])
print("")
print('------------------ Evaluation on Training Data ------------------')
# Print the last value in the evaluation metrics contained within history file
for item in history.history:
print("Final", item, ":", history.history[item][-1])
print("")
# Print classification report
print(classification_report(y_train, pred_labels_tr))
print("")
print('-------------------- Evaluation on Test Data --------------------')
print(classification_report(y_test, pred_labels_te))
print("")
The above code prints the summary of a model structure:

It also prints the performance summary in the form of a classification report:

We can see that the model has identified almost all training images correctly (f1-score of 0.99). However, the performance on the test data was not as good, with an f1-score of 0.81.
There may be some overfitting happening, so it is worth experimenting with various parameters and network structures to find the best setup. At the same time, the number of images we have is relatively small, making training and evaluation of the model much harder.
Additional evaluation
Finally, I wanted to see what category the model would put my dog in. While my dog is not a dalmatian, he is black and white. I wondered if the model would recognise him to be a dog and not a panda 😂
# Read in the image
mydog = cv2.imread(main_dir+"/data/mydog.JPG")
# Display the image
plt.matshow(mydog)
plt.show()

Prep the image and use the previously trained DCN model to predict the label.
# Resize
mydog = cv2.resize(mydog, (128, 128))
# Standardize (divide by 255 since RGB values ranges from 0 to 255)
mydog = mydog / 255.0
# The current shape of mydog array is [rows, columns, channels].
# Add extra dimension to make it [samples, rows, columns, channels] that is required by the model
mydog = mydog[np.newaxis, ...]
# Print shape
print("Shape of the input: ", mydog.shape)
print("")
#----- Predict label of mydog image -----
# Note, we need to pass model output through argmax to convert from probability to label
# Also, we convert output from tensor to numpy array
# Finally, we do inverse transform to convert from encoded value to categorical label
pred_mydog = enc.inverse_transform(np.array(tf.math.argmax(model.predict(mydog),axis=1)).reshape(-1, 1))
print("DCN model prediction: ", pred_mydog)
#----- Show Probabilities of each prediction -----
pred_probs=model.predict(mydog)
# Print in a nice format with label and probability next to each other
print("")
print("Probabilities for each category:")
for i in range(0,len(enc.categories_[0])):
print(enc.categories_[0][i], " : ", pred_probs[0][i])
And here are the results:
Shape of the input: (1, 128, 128, 3)
DCN model prediction: [['dalmatian']]
Probabilities for each category:
dalmatian : 0.92895913
hedgehog : 0.004558794
llama : 0.010929748
panda : 0.055552367
So, the model has identified my dog to be a dalmatian, although with a 5.5% probability of being a panda 😆
Final remarks
I sincerely hope you enjoyed reading this article and obtained some new knowledge.
You can find a complete Jupyter Notebook code in my GitHub repository. **** Feel free to use it to build your own Deep Convolutional Neural Networks, and do not hesitate to get in touch if you have any questions or suggestions.
Also, you can find my other Neural Network articles here: Feed-Forward, Deep Feed-Forward, RNN, LSTM, GRU.
Cheers! 🤓 Saul Dobilas