Neural Structured Learning & Adversarial Regularization

Improving Classification Model Robustness with Adversarial Regularization in TensorFlow

Chris Price

Published in

Towards Data Science

12 min readSep 7, 2020

Introduction

As many of us are no doubt aware, the invariable progress made in the field of Computer Vision has lead to some incredible achievements across multiple disciplines from healthcare and self-driving cars, to climate study and gaming, to name but a few.

From state-of-the-art Liquid Nitrogen cooled hardware in the form of Tensor Processing Units (TPU) to increasingly sophisticated, multi-million parameter Deep-Convolutional Networks such as GoogLeNet, AlexNet the capability of such technology continues to break previously unassailable barriers.

Adversarial Vulnerability

Despite these incredible achievements, it has been proven that even the most skilful models are not infallible. Multiple research efforts have demonstrated how sensitive these models are to even imperceivably small changes in the input data structure. Initially in the findings of the joint research paper by Google and New York University: ‘Intriguing properties of neural networks, 2014’ the subject of model vulnerability to adversarial examples is now recognised as a subject of such importance that competitions now exist to tackle it.

The existence of these errors raises a variety of questions about out-of-sample generalization, and how the use of such examples might be used to abuse deployed systems.

Neural Structured Learning

In some applications, these errors might not arise intentionally, moreover, they can arise as a result of human error or simply as a result of input instability. In the mining industry, computer vision has innumerable, highly useful applications, from streaming processing plant conveyor belt imagery in order to predict ore purity for example, to detecting commodity stockpile levels and illegal shipping/mining using satellite imagery.

Quite often we find that such image data is corrupted during collection, as a result of camera misalignment, vibrations or simply very unique out-of-sample examples that can lead to misclassification.

In order to overcome examples such as these and generally improve our models against corrupt or perturbed data, we can employ a form of Neural Structured Learning called Adversarial Regularization.

Neural Structured Learning (NSL) is a relatively new, open-source framework developed by the good folks at TensorFlow for training deep neural networks with structured signals (as opposed to the conventional single sample). NSL implements Neural Graph Learning, in which a neural network is trained using graphs (see image below) which carry information about both a target (node) and neighbouring information in other nodes connected via node edges.

Image from TensorFlow Blog: Introducing Neural Structured Learning in TensorFlow, 2019

In doing so, this allows the trained model to simultaneously exploit both labelled and unlabelled data through:

Training the model on labelled data (standard procedure in any supervised learning problem);
Biasing the network to learn similar hidden representations for neighbouring nodes on a graph (with respect to the input data labels)

Image from TensorFlow Blog: Neural Structured Learning, Adversarial Examples, 2019.

Consistent with point two, we can observe in the above expression both the minimisation of the empirical loss i.e. the supervised loss, and the neighbour loss. In the above example, this is computed as the dot product of the computed weight vector within a target hidden layer, and the distance measure (i.e. L1, L2 distance) between the input, X, and the same input with some degree of noise added to it:

Adversarial examples are typically created by computing the gradient of the output with respect to the input, x_i, and then maximizing the loss. For example, where you have a model that classifies Chihuahuas and muffins, and you wish to create adversarial examples, you would input a 128 x 128 pixel Chihuahua image into your network, compute the gradient of the loss w.r.t. the input (a 128 x 128 tensor), then add the negative gradient (perturbation) to your image until the network classifies the image as a muffin. By training on these generated images again with the correct label, the network becomes more robust to noise/perturbation.

Why use NSL?

Higher accuracy: the structured signal(s) among samples can provide information that is not always available in feature inputs.
Greater Robustness: models trained with adversarial examples are demonstrably more robust against adversarial perturbations designed for misleading a model’s prediction or classification.
Less labelled data required: NSL enables neural networks to harness both labelled and unlabelled data, forcing the network to learn similar hidden representations for “neighbouring samples” that may or may not have labels.

Adversarial Regularisation

What can we do if do not have such explicit structures as inputs?

What is particularly useful about TensorFlows Neural Structured Learning library is the provision of methods that enable users to dynamically construct induced adversarial examples as implicit structures from raw input data, through adversarial perturbation. This generalisation of NSL is known as Adversarial Regularisation, where adversarial examples are constructed to intentionally confuse the model during training, resulting in models that are robust against small input perturbations.

Adversarial Regularisation In Practice

In the following example, we are going to compare the performance of a baseline image classification model (specifically, a Convolutional Neural Network), against a variant that utilises adversarial regularisation. Unfortunately, we cannot demonstrate the use of AR on any of the aforementioned mining data, as this is proprietary.

We will instead perform this analysis for two models trained on a renowned image classification dataset — Beans. We will compare the results of the baseline model, versus one trained on adversarial examples in order to fully comprehend the effect that adversarial regularisation has on the ability and performance of each model.

The Colab notebook containing the code used in this article can be found here. An excellent tutorial, where the inspiration for this article and where some of the code originated from, can be found on the TensorFlow NSL page.

Before we get started, we must first install TensorFlow’s Neural Structured Learning package:

!pip install neural_structured_learning

Imports

import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import keras_preprocessing
import neural_structured_learning as nsl
import tensorflow as tf
import tensorflow_datasets.public_api as tfdsfrom tensorflow.keras import models
from keras_preprocessing import image
from keras_preprocessing.image import ImageDataGenerator

Load & Inspect Image Data

TensorFlow hosts a number of renowned datasets within its TensorFlow Datasets collection.

We can load the Beans dataset that we want to train our model on using the tfds.load() method, which performs two operations:

Downloads the dataset and save it as tfrecord files.
Loads the tfrecord files and returns an instance of tf.data.Dataset

# load dataset
dataset = 'beans' #@paramdataset = tfds.load(dataset, shuffle_files=True)
train, test = dataset['train'], dataset['test']IMAGE_INPUT_NAME = 'image'
LABEL_INPUT_NAME = 'label'

Prior to performing any image scaling or image augmentation/perturbation, we can inspect a sample of the images within the dataset to gain an understanding of the various structures and compositions that a Convolutional layer might pick up as a feature(s), and to understand the differences between the various classes within the dataset:

# Get random batch
raw_images = train.take(10)# Tensor to np format
raw_images = [item['image'] for item in
raw_images.as_numpy_iterator()]# Plot batch
fig = plt.gcf()
fig.set_size_inches(10, 10)
for i, img in enumerate(raw_images):
  sp = plt.subplot(2, 5, i+1)
  sp.axis('Off')
  plt.imshow(img)plt.show()

By default, the tf.data.Dataset object contains a dict of tf.Tensors. We can iterate over the batch of images (the tf.data.Dataset key values) by calling .as_numpy_iterator() on our raw_images within our list comprehension. This method returns a generator which converts the batch elements of the dataset from a tf.Tensorto np.array format. We can then plot the resulting batch of images:

Image Generated by Author: Sample batch of 10 training images within the ‘Beans’dataset depicting 3 no. distinct classes: ‘Healthy’, ‘Bean Rust’ & ‘Angular Leaf Spot’

Preprocessing

We perform a simple scaling operation on our image data to map the inputs to a float tensor between 0 and 1 (the Beans dataset is a collection of 500 x 500 x 3 images). Helpfully, TDFS datasets store feature attributes as Dictionaries:

FeaturesDict({
    'image': Image(shape=(500, 500, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
})

As a result, we can access the individual images and their labels and perform these preprocessing ops in-place with the .map() attribute of our train and test tf.Dataset instances:

def normalize(features):
  """Scale images to within 0-1 bound based on max image size."""
  features[IMAGE_INPUT_NAME] = tf.cast(
    features[IMAGE_INPUT_NAME], 
    dtype=tf.float32) / 500.0)
  return featuresdef examples_to_tuples(features):
  return features[IMAGE_INPUT_NAME], features[LABEL_INPUT_NAME]def examples_to_dict(image, label):
  return {IMAGE_INPUT_NAME: image, LABEL_INPUT_NAME: label}# Define train set, preprocess. (Note: inputs shuffled on load)
train_dataset = train.map(normalize)
                     .batch(28)
                     .map(examples_to_tuples)test_dataset = test.map(normalize)
                   .batch(28)
                   .map(examples_to_tuples)

The function examples_to_dict will be explained shortly.

Baseline Model

We then build a simple, baseline convolution neural network model, and fit it to our image data:

def conv_nn_model(img_input_shape: tuple) -> tf.keras.Model():
  """Simple Conv2D Neural Network.
    Args:
      img_input_shape: An (mxnxo) tuple defining the input image   
      shape.
    Returns:
      model: An instance of tf.keras.Model.
  """
  model = tf.keras.models.Sequential([
      tf.keras.layers.Conv2D(16, (3,3), activation='relu',   
          input_shape=input_shape),
      tf.keras.layers.MaxPooling2D(2, 2),
      tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
      tf.keras.layers.MaxPooling2D(2,2),
      tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
      tf.keras.layers.MaxPooling2D(2,2),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(64, activation='relu'),
      # Note to adjust output layer for number of classes
      tf.keras.layers.Dense(3, activation='softmax')])
  return model# Beans dataset img dims (pixel x pixel x bytes)
input_shape = (500, 500, 3)# Establish baseline
baseline_model = conv_nn_model(input_shape)
baseline_model.summary()baseline_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['acc'])baseline_history = baseline_model.fit(
    train_dataset,
    epochs=5)

results = baseline_model.evaluate(test_dataset)
print(f'Baseline Accuracy: {results[1]}')3/3 [==============================] - 0s 72ms/step - loss: 0.1047 - acc: 0.8934 
Baseline Accuracy: 0.8934375

We can see that our baseline model has performed well on the test dataset, achieving 89% accuracy.

Adversarial Regularization Model

We will now examine how this model performs against a test set that includes adversarially perturbed examples, and pitch it against a model trained on a dataset that includes said examples. We proceed by first creating another convolutional NN model, only this time we will incorporate adversarial training into its training objective.

Next, using TensorFlow’s NSL framework, we define a config object with NSL’s helper function, nsl.configs.make_adv_reg_config :

#@title ADV Regularization Config# Create new CNN model instance
base_adv_model = conv_nn_model(input_shape)# Create AR config object 
adv_reg_config = nsl.configs.make_adv_reg_config(
    multiplier=0.2,
    adv_step_size=0.2,
    adv_grad_norm='infinity')# Model wrapper 
adv_reg_model = nsl.keras.AdversarialRegularization(
    base_adv_model,
    label_keys=[LABEL_INPUT_NAME],
    adv_config=adv_config)

We can note that this function requires us to set a number of Hyperparameters. Some of these do not require explicit values, others require our input:

multiplier: The weight of the adversarial loss relative to the labelled loss during training, w.r.t our AR model’s objective function. We apply 0.2 as the regularization weight.
adv_step_size: The degree/magnitude of adversarial perturbation to be applied during training.
adv_grad_norm: The Tensor norm (L1 or L2) to normalize the gradient i.e A measure of the magnitude of the adversarial perturbation. Defaults to L2.

We can then wrap our newly created model using the nsl.keras.AdversarialRegularization function, which will add the adversarial regularization we configured earlier with our adv_reg_configobject to the training objective (the loss function to be minimised) of our base model.

An important point to note at this stage is that our model expects its input to be a dictionary mapping of feature names to feature values. One can see that when we instantiate our adversarial model, we must pass in label_keys as a parameter. The enables our model to distinguish between input data and target data. Here, we can use our examples_to_dict function and map it to our training and test datasets:

train_set_for_adv_model = train_dataset.map(convert_to_dictionaries)
test_set_for_adv_model = test_dataset.map(convert_to_dictionaries)

we then compile, fit and evaluate our adversarially regularised model as normal:

4/4 [==============================] - 0s 76ms/step - loss: 0.1015 - sparse_categorical_crossentropy: 0.1858 - sparse_categorical_accuracy: 0.8656 - scaled_adversarial_loss: 0.1057  accuracy: 0.911625

Similarly, our adversarially regularised model generalises well to our test dataset, achieving similar accuracy (0.91%) to that of our baseline_model.

Evaluation Against Adverarially Perturbed Data

Now for the interesting part.

In much the same way that one would evaluate a trained model’s ability on a test set, we shall perform the same operation on our two models. In this instance, however, we will compare our two models; the baseline CNN, and the variant that has been trained on adversarially-perturbed input data against a test dataset containing adversarially perturbed examples.

In order to generate the aforementioned examples, we must first create a reference model, whose configuration (losses, metrics and calibrated/learned weights) will be used to generate perturbed examples. To do so, we once again wrap our performance baseline model with the nsl.keras.AdversarialRegularization function and compile it. Note that we do not fit this model to our dataset — we want to retain the same learned weights as our base model):

# Wrap baseline model
reference_model = nsl.keras.AdversarialRegularization(
    baseline_model,
    label_keys=[LABEL_INPUT_NAME],
    adv_config=adv_reg_config)reference_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['acc']models_to_eval = {
    'base': baseline_model,
    'adv-regularized': adv_reg_model.base_model}metrics = {
    name: tf.keras.metrics.SparseCategoricalAccuracy()
    for name in models_to_eval.keys()}

If at this point you are like me and like to understand the logic behind these things, you can find the source code containing the adversarial regularization class here.

We then store our two models; the baseline and the adversarially regularized variant in a dictionary, and subsequently loop over each batch of our test dataset (evaluation in batches is a requirement of the AdversarialRegularization model).

With the .perturb_on_batch() method of our newly wrapped reference_model, we can generate adversarially perturbed batches consistent with our adv_reg_config object, and evaluate the performance of our two models on them:

labels, y_preds = [], []# Generate perturbed batches, 
for batch in test_set_for_adv_model:
  perturbed_batch = reference_model.perturb_on_batch(batch)
  perturbed_batch[IMAGE_INPUT_NAME] = tf.clip_by_value(
      perturbed_batch[IMAGE_INPUT_NAME], 0.0, 1.0)
  # drop label from batch
  y = perturbed_batch.pop(LABEL_INPUT_NAME)
  y_preds.append({})
  for name, model in models_to_eval.items():
    y_pred = model(perturbed_batch)
    metrics[name](y, y_pred)
    predictions[-1][name] = tf.argmax(y_pred, axis=-1).numpy()for name, metric in metrics.items():
  print(f'{name} model accuracy: {metric.result().numpy()}')>> base model accuracy: 0.2201466 adv-regularized model accuracy: 0.8203125

Results

The effectiveness of adversarial learning on improving model robustness is immediately apparent by the dramatic reduction in our baseline model’s performance on adversarially perturbed data, vs that of the adv_reg_model.

Performance on our baseline model has dropped 69% vs our adversarially regularised model, which realised only a 14% drop in performance.

With Kera’s Layers API, we can examine the effect of adversarially perturbed data on our baseline model by visualising the convolutional layers to understand what features are extracted both prior and after perturbation:

Before perturbation

# Random img & conv layer idxs
IDX_IMAGE_1=2
IDX_IMAGE_2=5
IDX_IMAGE_3=10
CONVOLUTION_NUMBER = 10# Get baseline_model layers 
layer_outputs = [layer.output for layer in baseline_model.layers]
activation_model = tf.keras.models.Model(
    inputs =baseline_model.input, 
    outputs = layer_outputs)# Plot img at specified conv
f, axarr = plt.subplots(3,2, figsize=(8, 8))
for x in range(0, 2):
  f1 = activation_model.predict(test_images[IDX_IMAGE_1].reshape(
      1, 500, 500, 3))[x]
  axarr[0,x].imshow(f1[0, : , :, CONVOLUTION_NUMBER],cmap='inferno')
  axarr[0,x].grid(False)
  f2 = activation_model.predict(test_images[IDX_IMAGE_2].reshape(
      1,500, 500, 3))[x]
  axarr[1,x].imshow(f2[0, : , :, CONVOLUTION_NUMBER],cmap='inferno')
  axarr[1,x].grid(False)
  f3 = activation_model.predict(test_images[IDX_IMAGE_3].reshape(
      1, 500, 500, 3))[x]
  axarr[2,x].imshow(f3[0, : , :, CONVOLUTION_NUMBER],cmap='inferno')
  axarr[2,x].grid(False)

Image Generated By Author: Intermediate image representations for a given convolutional layer

We can observe in the image above, that our baseline model appears to have identified the relevant distinguishing features that define each class: angular leaf rust, healthy and bean spot, made visible by the distinct colour gradients.

After perturbation

Now we can examine what features the baseline model identifies in the perturbed data:

# Pertubed test data
perturbed_images = []
for batch in test_set_for_adv_model:
  perturbed_batch = reference_model.perturb_on_batch(batch)
  perturbed_batch[IMAGE_INPUT_NAME] = tf.clip_by_value(
  perturbed_batch[IMAGE_INPUT_NAME], 0.0, 1.0)
  perturbed_images.append(perturbed_batch)# Get images
pt_img = [item['image'] for item in perturbed_images]IDX_IMAGE_1=0
IDX_IMAGE_2=1
IDX_IMAGE_3=2
CONVOLUTION_NUMBER = 11base_mod_layer_out = [layer.output for layer in baseline_model.layers]base_mod_activ = tf.keras.models.Model(
  inputs = baseline_model.input,
  outputs = base_mod_layer_out)f1 = base_mod_activ.predict(pt_img[IDX_IMAGE_1].numpy())[x]
f2 = base_mod_activ.predict(pt_img[IDX_IMAGE_2].numpy())[x]
f3 = base_mod_activ.predict(pt_img[IDX_IMAGE_3].numpy())[x]

Image Generated by Author: Intermediate image representations; adversarially perturbed data.

As we can observe in the above representations, the network’s has struggled to represent the raw pixels in each image, and are considerably more abstract as a result of the perturbation. In the image to the far right, it would appear that network managed to successfully retain the features representative of the ‘angular leaf rust’ class, but the basic structure of the leaf is mostly lost. Of course, this is just a single convolutional layer within our untuned network but still serves as a credible demonstration of how a previously skilful model can be unseated as a result of adversarial input data.

Conclusion

In this article we examined how we can significantly increase a convolutional neural network model’s robustness and generalisation performance on adversarially perturbed data, using adversarial regularisation. Additionally, we explored:

How to add adversarial regularisation to a Keras model.
How to compare an adversarially regularised model against a baseline performance model.
How to examine the effect of adversarially perturbed data on a conventionally trained model by visualising intermediate layers.

Please do comment if you find errors or have any constructive criticism/builds.

Thank you for reading.