Avoiding Detection with Adversarial T-shirts

Understanding the new research behind the new t-shirt pattern that fools state-of-the-art human detection systems

Param Raval
Towards Data Science

--

Photo by Lianhao Qu on Unsplash

How can just wearing a specific type of t-shirt make you invisible to the person detection and human surveillance systems? Well, researchers have found and exploited the Achilles’ heel of deep neural networks — the framework behind some of the best object detectors out there (YOLOv2, Faster R-CNN, HRNetv2, to name a few).

Earlier approach:

In [1], the authors manage to get a benchmark accuracy of deception of 57% in real-world use cases. However, this is not the first time attempts have been made to deceive an object detector. In [2] the authors designed a way for their model to learn and generate patches that could deceive the detector. This patch, when worn on a cardboard piece (or any flat surface) could evade the person detector albeit with an accuracy of 18%

From [2]. Left: The person without a patch is successfully detected. Right: The person holding the patch is ignored.

“Confusing” or “fooling” the neural network like this is called making a physical adversarial attack or a real-world adversarial attack. These attacks, initially based on intricately altered pixel values, confuse the network (based on its training data) into labeling the object as “unknown” or simply ignoring it.

Authors in [2] transform images in their training data, apply an initial patch, and feed the resulting image into the detector. The object loss obtained is used to change the pixel values in the patch and aimed at minimising the objectness score.

From [2]. Generating patches and getting the object loss.

However, other than the low accuracy of 18%, this approach is limited to rigid carriers like a cardboard and doesn’t perform well when the captured frame has a distorted or skewed patch. Moreover, it certainly doesn’t work well when printed on t-shirts.

“A person’s movement can result in significantly and constantly changing wrinkles (aka deformations) in their clothes” [1]. Thus making the task of developing a generalised adversarial patch even more difficult.

New Approach:

The new approach in [1] employs Thin Plate Spline Mapping to model cloth deformations. These deformations simulate a realistic problem faced by previous attempts at using adversarial patterns. Taking care of different deformations would drastically improve the system's performance as it would be able to not-detect the pattern in more number of frames.

Understanding Splines themselves would be enough to get a rough idea of what they are trying to do with this approach.

Splines:

For a more formal, mathematical definition you can check this out, and for a more simplified understanding, I think this article does it best.

In an intuitive sense, splines help plot arbitrary functions smoothly — especially ones that require interpolations. Splines help model this missing data: here in modeling cloth deformation, where deformations in the patch shape can be seen in successive frames, we can use an advanced form of polynomial splines called Thin Plate Spline (TPS).

Check out this article by Columbia that illustrates and explains TPS Regression well.

These changes, or displacements, in the patch frames overtime are then modeled simply as a regression problem (since we only need to predict the TPS parameters for future frames).

Generating the T-shirt Pattern:

The said pattern is just an adversarial example — a patch that acts against the purpose of the object detector. The authors use the Expectation Over Transformation (EOT) algorithm which helps in generating such adversarial examples over a given transformation distribution.

Here, the transformation distribution is made up of the TPS transformations since we want to replicate the real-time wrinkling, minor twisting, and changes in the contours of the fabric.

From [1]: Modeling the effects of cloth deformation.

Along with TPS transformation they also use physical color transformation and conventional physical transformation within the person’s bounding box. Thus, this gives rise to the equation that models pixel values for the perturbed image.

The EOT formulation based on all these complex formulations can finally compute the attack loss and work towards fooling the object detector.

The explanation of the procedure, in its most simplified form, so far is for single object detectors. The authors have also proposed a strategy for multiple object detectors that involves applying min-max optimization to the single object detector equation.

Finally:

The results after training and testing on their own dataset are impressive.

From [1]. Results after generating a custom adversarial patch on the author’s dataset

And the use of TPS shows great improvement too:

From [1]. Results from different poses compared using TPS (second row) and without TPS (first row)

What the future holds:

  • In an article by the Northeastern University, Xue Lin, one of the authors of [1], clarified that their goal isn’t to create a T-shirt in order to furtively go unnoticed by the detectors.

“The ultimate goal of our research is to design secure deep-learning systems, … But the first step is to benchmark their vulnerabilities.” — Xue Lin

  • Certainly the authors realise the great scope of improvement in their results and mention that further research will be done to achieve it.
Photo by Sebastian Molina fotografía on Unsplash

Thank you for reading all the way through! You can reach out to me on LinkedIn for any messages, thoughts, or suggestions.

References:

[1]: Xu, Kaidi, et al. Adversarial t-shirt! evading person detectors in a physical world (2019), arXiv preprint. arXiv-1910.11099

PDF: https://arxiv.org/pdf/1910.11099.pdf

[2]: Thys, Simen, Wiebe Van Ranst, and Toon Goedemé, Fooling automated surveillance cameras: adversarial patches to attack person detection (2019), Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.

PDF: https://arxiv.org/pdf/1904.08653.pdf

[3]: Athalye, Anish, and Ilya Sutskever, Synthesizing robust adversarial examples (2017), arXiv preprint arXiv:1707.07397.

PDF: https://arxiv.org/pdf/1707.07397.pdf

--

--

A student of engineering with a zeal for Machine Learning and writing. Reach out to me on LinkedIn: https://www.linkedin.com/in/param-raval/