The world’s leading publication for data science, AI, and ML professionals.

Using Distillation to Protect Your Neural Networks

How Distillation is Being Used to Keep Neural Networks Safe

Distillation is a hot research area. For distillation, you first train a deep learning model, the teacher network, to solve your task. Then, you train a student network, which can be any model.

While the teacher is trained on real data, the student is trained on the teacher’s outputs. It predicts the teacher’s output: whether the label, the probability distribution of labels, or something else.

Photo by Charles Deluvio on Unsplash
Photo by Charles Deluvio on Unsplash

Note: There are variants of distillation. One is self-learning, where the teacher and student networks are the same!


Photo by Bermix Studio on Unsplash
Photo by Bermix Studio on Unsplash

Benefits of Distillation

Distillation offers different benefits. For example, you can pick a student network that is smaller than the teacher network. You will be able to achieve higher accuracy than training the student model from scratch. At the same time, you will have a smaller, less complex, faster model than the teacher.

Distillation also provides regularization, making your network more generalizable. In this post, we will discuss a major use of distillation: protecting your neural networks from attacks!


Why It Works

Like this past post on rethinking regularization, distillation provides flatter local minima. Hence, small changes to the input data are less likely to change the predicted values. Why is that important? Attackers can create adversarial examples. These examples include small changes (e.g. changing a few pixels) to real inputs that result in wrong predictions.

This is part of the reason that distillation was included in my last post on COVID mask prediction. Without distillation, it would be quite easy to fool the mask prediction model by modifying a few pixels.

One example that is particularly dangerous: self-driving cars. Imagine if a hacker taped a sticker to stop signs. The sticker could look like the original stop sign, but the change in 2% of the pixels would make self-driving cars miss the stop sign. These changes are called adversarial perturbations. With distillation, hackers would need to use a more tailored attack or change more pixels, which could be noticeable.


Going Forward

Photo by Willian Justen de Vasconcellos on Unsplash
Photo by Willian Justen de Vasconcellos on Unsplash

You often do not need foolproof security, just better security than similar targets. It would be ideal if we could find a way to make every neural network highly secure, but that looks to be out of reach for now.

Security is a cat and mouse game with attacks often advancing faster than defenses. Distillation is a basic defense that can be used to make your networks more robust to attacks. You could use libraries like KD_Lib to try distilling your networks today.

As attack variants evolve to sidestep defenses, so do distillation variants evolve to be immune to those attacks. Distillation will make attacks harder but not impossible. It is one step in your security toolbox.


Related Articles