The world’s leading publication for data science, AI, and ML professionals.

Adversarially-Trained Classifiers for Generalizable Real World Applications

CS282A Designing and Understanding Neural Networks at UC Berkeley

Photo by Kevin Ku on Unsplash
Photo by Kevin Ku on Unsplash

Motivation: The Purpose of Generalizable AI

The field of computer vision continuously calls for improved accuracy on classifiers. Researchers everywhere are trying to beat the previous benchmark by just some small margins on one particular dataset. We think this trend is great for pushing the edge of human understanding, but we also believe that there is a larger problem that has been largely underexplored— building a generalizable classifier.

So, what is a generalizable neural network? And why is generalizability important? Why can’t we just finetune our neural networks every time to a particular dataset, since the specificity allows us to maintain higher overall accuracy for each of those tasks?

Generalizability refers to a machine learning model’s resistance to data perturbations that could occur in the real world (i.e random objects in the background, image distortions). The more sensitive it is to randomness, the less generalizable it is. Improving model generalizability allows for models to perform significantly better when we deploy them to solve problems with fully unknown data-distribution.

This is incredibly important because, in real-world applications when end-users supply us with test-data in real-time, we never know when the underlying data distribution will change! For this reason, it is possible that a breakthrough in generalizing neural models can translate to improvements in performance benchmarks across multiple machine learning tasks such as autonomous driving and voice recognition.

Therefore, our group’s goal in this project is to create neural models capable of classifying unseen data with unknown perturbations.

In this article, we will discuss methods of using adversarial examples as training data as well as how to generate them. As a bonus, we will also explore CAM Visualization as a way to explain the adversarial misclassification behaviour at the very end.

Model Design Choices

Before we started the design process, we considered TensorFlow 2.0 with Keras and PyTorch. Ultimately, we went with TensorFlow with and Keras since it allows for simpler implementations and a wider variety of pre-trained models. We wanted our project to ultimately be more readable and concise.

Major design choices in our projects include:

  • Incorporating Data Augmentation with random flips, random cropping, colour jittering, and common additive Gaussian, Poisson, and Salt-and-Pepper noises.
  • Selecting MobileNet v2 as the base model for having very high accuracy to the number of parameters ratio among pre-trained models (see Optimization Choices section), and reasonable model size for Tiny ImageNet, the dataset worked with.
  • Incorporating Neural Structured Learning (NSL) Adversarial Regularization to improve robustness by injecting adversarial loss in the training process.

We also considered implementing TRadeoff-inspired Adversarial DEfense via Surrogate-loss minimization (TRADES) [Zhang et al. 2019], but ultimately left that out for future consideration.

Prior Research in Adversarial Example Generation

Even though recent neural networks for computer vision tasks have reached remarkable accuracy, they are still extremely vulnerable to small, human-imperceptible perturbations.

Previously researchers have speculated that this is due to the nonlinear components of the neural networks, polarizing those activations (think exploding gradients). However, in Explaining and Harnessing Adversarial Example, the authors argued that this vulnerability is actually due to the linear components of neural networks, namely in ReLU, LSTM. Even in the case of the nonlinear sigmoid function, the model will often have most values in the linear regime (i.e when x are close to 0), further supporting this finding.

Image by Goodfellow, Shlens, and Szegedy
Image by Goodfellow, Shlens, and Szegedy

In this paper, the authors also described a fast way to generate adversarial examples along with an adversarial training method, which we used. The authors introduced the Fast Gradient Sign Method (FGSM) to efficiently generate adversarial examples: calculate the gradient with respect to the input, and then perturb the input such that:

input = input + epsilon * sign(gradient)

Below is an example of FGSM applied to a logistic regression model trained on MNIST threes and sevens in the original paper.

Image by Goodfellow, Shlens, and Szegedy
Image by Goodfellow, Shlens, and Szegedy

Perturbation was applied directly on sevens and inverted on threes to make the model misclassify sevens. FGSM can be directly incorporated into the loss function, which indirectly results in additional regularization effect.

The authors sited better final model accuracy and increased robustness. The error rate on adversarial example error rate was reduced to just 17.9% from 89.4% in the base model.

Model Set-up: Training and Validating

For our model, we used an adversarial wrapper around our model during training time only. We won’t include all the code here, but the general framework should look like this.

base = create_model('base', dim, 2, len(classe))
adv_model = nsl.keras.AdversarialRegularization(base, config, ...)
adv_model.compile(optimizer=keras.optimizer.SGD,...))

There are some hyperparameters that are most important to the wrapper. They are multiplier and step size. Multiplier had a significant influence on regularization, and step size is used to find the adversarial example later on during validation.

config = nsl.configs.make_adv_reg_config(
    multiplier=0.2,
    adv_step_size=0.2,
    adv_grad_norm='infinity',
)

Training time procedure is basically the same as other Keras models, but make sure that datasets are converted dictionaries instead of tuples since you are feeding the data to the wrapper, not the actual classifier.

def convert(image, label):
  return {IMAGE_INPUT_NAME: image, LABEL_INPUT_NAME: label}
train_data_adv = train_data.map(convert)
val_data_adv = val_data.map(convert)

During validation, be sure to also create a base reference model to make sure that your adversarial wrapper training working. You should see a significantly higher performance by your adversarial model on perturbed data, and only marginally lower performance on unperturbed data.

Image by Author, adversarial accuracy comparason
Image by Author, adversarial accuracy comparason

Lastly, remember to use the base model defined in the beginning during test time and validation. The adversarial wrapped model should only be used in training time, and even if you save the adversarial wrapped model’s weights using standard Keras API, it will only save the weight of the base model. So make sure you always load the weights into the base model, then add the wrapper or you will have mismatch issues.

Optimization Choices

We used SGD in favour of ADAM due to previous experience and papers suggesting SGD is more suitable for CV tasks. For the same reason, instead of deciding the number of epochs to finetune different layers in advance, we move onto high layers as training loss plateaus.

Since we were using Keras as our framework of choice, we also tested the same training method on other networks architectures:

  • ResNeXt, DenseNet
  • NASNet, NASNet Mobile
  • EfficientNet B2, B4, B6

NASNet tends to overfit due to its high number of parameters, and we found that MobileNet V2 performed the best. EfficientNet was a close contender, but we were unable to use it due to its late addition to TensorFlow (it was only available in tf-nightly, so we had new technical issues almost every day).

We will skip over the process of choosing batch sizes and learning rate since methods for optimizing those are ubiquitous in Deep Learning papers and other medium posts. We will simply note that they played a big role in training performance and varied across different models.

Image by Author, NASNet Overfitting Tiny Image Net dataset
Image by Author, NASNet Overfitting Tiny Image Net dataset

[EXTRA] Explainable AI Challenge

Current methods of classification ask for inputs and produce classes without any decipherable explanation for contexts or results. Effectively, this makes deep learning algorithms a black-box for researchers and engineers. In this project, we tackled the Explainable AI Challenge by offering our own explanations for adversarial behaviour and general misclassification.

For the next few sections, saliency refers to some unique features for a particular input in the context of visual processing. Basically, saliency visualization method allows for emphasis on visually alluring locations on an image that could have "contributed to" neural work making a particular classification decision.

There are many choices for which type of saliency we should visualize. Some examples include linear activations or noise generation using guided propagation, but we don’t discuss them here. We simply changed the last layer to a linear activation to see which pixels had the biggest influence on classification decision. We used keras-vis module. The two saliency maps show the positive gradients and negative gradients respectively.

We will note here that keras-vis is slightly outdated and might not work with some versions of TensorFlow 2.0 Keras.

Image by Author
Image by Author

Our model was able to correctly predict the cockroach label. We can see the model was able to do this easily since linear activation maximizations in the last layer clearly show the shape of cockroach.

Image by Author
Image by Author

Our model did not classify this object correctly. This can be explained by the fact that Tiny Image Net has generally lower resolution, resulting in similar activations from objects with similar shapes. The misclassification here was Broom (n02906734 in ImageNet). We can also attribute this to our random crops during training data augmentation, where we might have cropped out the broom handle during training.

In the following syringe example, the activations from these images are very similar, so our model decided it was a toss between Oboe, Syringe, Broom, and Beer Bottle. These were amongst the top 5 predictions.

Image by Author
Image by Author

The misclassifications was a Beer Bottle (n02823428), reasonable guess?

We also visualized some of the adversarial training examples which we generated. The objects are recognizable to humans, but to a neural network, they are often masked with some type of activation nullifications. In the following examples, it is clear that the activations here do not resemble the original object shape in any way as seen earlier.

Image by Author
Image by Author

Activation and inhibitions become random and distorted.

Image by Author
Image by Author

Activation and inhibition are obscured.


Related Articles