Discriminating network for Classification

How I have used Siamese network to build a classifier with very few images

Published in

Towards Data Science

7 min readApr 13, 2019

A discriminating network — what is it, why we need it and how we can build it?

We often come across the problem that we don’t have enough good quality images and label datasets for training a robust CNN based classifier — either we don’t have enough images or we don’t have the manual resources to label them. It is often the challenge to build a robust classifier with very few images as we need thousands of label images to train a robust neural network architecture architecture

A discriminating siamese architecture is often used in facial recognition. Based on this research paper, modern facial recognition models are often built using a siamese network a.k.a one-shot learning —

The above is a Siamese network. It’s plain and simple two networks — in this case two Resnet-50 architectures with same weights for both of them — a left network and a right network. Each network outputs an encoding of string of numbers for its corresponding image. Siamese network is popularly used for facial recognition. In this case, I have used it for cats and dogs classification. The way this architecture learns is - A cat would be more similar to another cat then to a dog. Different cats would have different features but dissimilarities between cats would be lesser that between a cat and a dog. The difference between the two encodings is the euclidean distance between the encoding of a cat and a dog. If the encodings belong to two dog or cat images we label the target as the positive or 1 otherwise if one image is of a dog the other cat or vice versa we label the target as negative class or 0

Another interesting research paper in this area:

Coding snippets:

Loading all required libraries

from keras.layers import Input, Conv2D, Lambda, merge, Dense, Flatten,MaxPooling2D,Activation, Dropout
from keras.models import Model, Sequential
from keras.regularizers import l2
from keras import backend as K
from keras.optimizers import Adam
from keras import optimizers
#from skimage.io import imshow
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import randomfrom keras.backend.tensorflow_backend import set_session
from keras.applications import resnet50, vgg16, vgg19, xception, densenet, inception_v3, mobilenet, mobilenetv2, nasnet, inception_resnet_v2
import tensorflow as tf
from keras.callbacks import ModelCheckpoint, TensorBoard, CSVLogger, EarlyStopping
from keras.applications.resnet50 import preprocess_input
#from keras.applications.xception import preprocess_input
import os
import datetime
import json
from keras.preprocessing.image import ImageDataGenerator

Lets grab and process images using OpenCV library. I have used 100 cat images and 100 dog images. The images of cats and dog belong to various breeds.

100 cat images:

100 dog images:

Code snippet to load and pre-process the images:

import glob
import cv2
from random import shuffledog_path = 'Y:\\Partha\\dog_cats_100\\dog\\*.jpg'
cat_path = 'Y:\\Partha\\dog_cats_100\\cat\\*.jpg'addrsd = glob.glob(dog_path)
addrsc = glob.glob(cat_path)
    
labelsd = [1 for addr in addrsd]  # 1 = dog, 0 =  cat
labelsc = [0 for addr in addrsc]# loop over the input imagesdatad = []for imagePath in addrsd:
# load the image, pre-process it, and store it in the data list
    img = cv2.imread(imagePath)
    img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    datad.append(img)datac = []
for imagePath in addrsc:
# load the image, pre-process it, and store it in the data list
    img = cv2.imread(imagePath)
    img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    datac.append(img)# to shuffle data
shuffle_data = True
if shuffle_data:
    d = list(zip(datad, labelsd))
    c = list(zip(datac, labelsc))
    e = d + c
    shuffle(e)
    data, labels = zip(*e)del datad
del datac
del addrsd
del addrsc
    
Y_train = np.array(labels)
X_train = np.array(data, dtype="int8")#preprocess for Resnet- 50
X_train =  preprocess_input(X_train)

Code snippet to define the architecture:

# Two inputs one each - left and right image
left_input = Input((224,224,3))
right_input = Input((224,224,3))#Import Resnetarchitecture from keras application and initializing each layer with pretrained imagenet weights.'’'
Please note that it’s usually better to intialize the layers with imagenet initializations than random. While training I will be updating the weights for each layer in each epoch. we don’t want to confuse this activity with transfer learning as I am not freezing any layer but initilializing each layer with imagenet weights
'’'convnet = resnet50.ResNet50(weights='imagenet', include_top=False, input_shape=(224,224,3))# Add the final fully connected layersx = convnet.output
x = Flatten()(x)
x = Dense(1024, activation="relu")(x)
preds = Dense(18, activation='sigmoid')(x) # Apply sigmoid
convnet = Model(inputs=convnet.input, outputs=preds)#Applying above model for both the left and right images
encoded_l = convnet(left_input)
encoded_r = convnet(right_input)# Euclidian Distance between the two images or encodings through the Resnet-50 architectureEuc_layer = Lambda(lambda tensor:K.abs(tensor[0] - tensor[1]))# use and add the distance function
Euc_distance = Euc_layer([encoded_l, encoded_r])#identify the prediction
prediction = Dense(1,activation='sigmoid')(Euc_distance)#Define the network with the left and right inputs and the ouput prediction
siamese_net = Model(inputs=[left_input,right_input],outputs=prediction)#define the optimizer. Here I have used SGD with nesterov momentum
optim = optimizers.SGD(lr=0.001, decay=.01, momentum=0.9, nesterov=True)#compile the network using binary cross entropy loss and the above optimizer
siamese_net.compile(loss="binary_crossentropy",optimizer=optim,metrics=[’accuracy’])

Now, I have created the images pairs. There will be two labels — 1 & 0 or we can say positive or negative labels or class for outputs.

Code snippet to create test train datasets

image_list = X_train[:180]
label_list = Y_train[:180]left_input = []
right_input = []
targets = []#Number of pairs per image
pairs = 8#create the dataset to train on
for i in range(len(label_list)):
    for j in range(pairs):
# we need to make sure that we are not comparing with the same image
        compare_to = i
        while compare_to == i: 
            compare_to = random.randint(0,179)
        left_input.append(image_list[i])
        right_input.append(image_list[compare_to])
        if label_list[i] == label_list[compare_to]:
            # if the images are same then label - 1
            targets.append(1.)
        else:
            # if the images are different then label - 0
            targets.append(0.)
            
#remove single-dimensional entries from the shape of the arrays and making them ready to create the train & datasets 
 
#the train data - left right images arrays and target label
left_input = np.squeeze(np.array(left_input))
right_input = np.squeeze(np.array(right_input))
targets = np.squeeze(np.array(targets))# Creating test datasets - left, right images and target labeldog_image = X_train[4] #dog_image = 1, cat_image = 0test_left = []
test_right = []
test_targets = []for i in range(len(Y_train)-180):
    test_left.append(dog_image)
    test_right.append(X_train[i+180])
    test_targets.append(Y_train[i+180])test_left = np.squeeze(np.array(test_left))
test_right = np.squeeze(np.array(test_right))
test_targets = np.squeeze(np.array(test_targets))

Code to train the network in a GPU

import tensorflow as tf
import os
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
#from keras_input_pipeline import *
os.environ['CUDA_VISIBLE_DEVICES'] = '1'siamese_net.summary()
with tf.device('/gpu:1'):
    siamese_net.fit([left_input,right_input], targets,
          batch_size=16,
          epochs=30,
          verbose=1,
          validation_data=([test_left,test_right],test_targets))

Result:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_10 (InputLayer)           (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
input_11 (InputLayer)           (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
model_7 (Model)                 (None, 18)           126367634   input_10[0][0]                   
                                                                 input_11[0][0]                   
__________________________________________________________________________________________________
lambda_4 (Lambda)               (None, 18)           0           model_7[1][0]                    
                                                                 model_7[2][0]                    
__________________________________________________________________________________________________
dense_12 (Dense)                (None, 1)            19          lambda_4[0][0]                   
==================================================================================================
Total params: 126,367,653
Trainable params: 126,314,533
Non-trainable params: 53,120
__________________________________________________________________________________________________
Train on 1440 samples, validate on 22 samples
Epoch 1/10
1440/1440 [==============================] - 91s 64ms/step - loss: 0.7086 - acc: 0.5354 - val_loss: 0.6737 - val_acc: 0.5455
Epoch 2/10
1440/1440 [==============================] - 76s 53ms/step - loss: 0.5813 - acc: 0.7049 - val_loss: 0.6257 - val_acc: 0.5909
Epoch 3/10
1440/1440 [==============================] - 76s 53ms/step - loss: 0.4974 - acc: 0.8257 - val_loss: 0.6166 - val_acc: 0.5909
Epoch 4/10
1440/1440 [==============================] - 76s 53ms/step - loss: 0.4494 - acc: 0.8799 - val_loss: 0.6190 - val_acc: 0.5909
Epoch 5/10
1440/1440 [==============================] - 76s 53ms/step - loss: 0.4190 - acc: 0.9042 - val_loss: 0.5966 - val_acc: 0.6364
Epoch 6/10
1440/1440 [==============================] - 76s 53ms/step - loss: 0.3968 - acc: 0.9243 - val_loss: 0.5821 - val_acc: 0.6818
Epoch 7/10
1440/1440 [==============================] - 76s 53ms/step - loss: 0.3806 - acc: 0.9368 - val_loss: 0.5778 - val_acc: 0.6818
Epoch 8/10
1440/1440 [==============================] - 76s 53ms/step - loss: 0.3641 - acc: 0.9535 - val_loss: 0.5508 - val_acc: 0.7273
Epoch 9/10
1440/1440 [==============================] - 76s 53ms/step - loss: 0.3483 - acc: 0.9715 - val_loss: 0.5406 - val_acc: 0.7273
Epoch 10/10
1440/1440 [==============================] - 76s 53ms/step - loss: 0.3390 - acc: 0.9778 - val_loss: 0.5341 - val_acc: 0.7273

We can see that we achieved 72% accuracy in the validation data set in just 8 epochs with just 100 cat and 100 dog images — with pairs of images created from the 200 images.

Conclusion:

My motivation to write this article is that we often don’t have hundreds or thousands of good quality labelled images to create robust CNN based classifiers. Alternatively, we can train a siamese network popularly used for facial recognition with very few images to build a classifier. I have used this technique for one my material defect detection use case where we have very few available images of defective materials. To maintain the confidentiality of the work, I have demonstrated with images of cats and dogs for this discussion in this blog. Please hit the clap button if you like my blog and please stay tuned on our future blogs as I regularly contribute blogs on machine learning and deep learning.

Discriminating network for Classification

How I have used Siamese network to build a classifier with very few images

A discriminating network — what is it, why we need it and how we can build it?

Conclusion:

Written by Partha Deka