AI Alignment and Safety

Apple’s NeuralHash — How it works and how it might be compromised

A guide to the technology, its vulnerabilities and possible mitigations

Swee Kiat Lim

Published in

Towards Data Science

7 min readAug 20, 2021

Apple recently announced various measures to combat the spread of Child Sexual Abuse Material (CSAM).

Next, iOS and iPadOS will use new applications of cryptography to help limit the spread of CSAM online, while designing for user privacy. CSAM detection will help Apple provide valuable information to law enforcement on collections of CSAM in iCloud Photos.
— Apple’s statement announcing these measures

Among other things, these include the scanning of iCloud Photos for CSAM using a new technology known as NeuralHash.

Image by author, inspired by Apple’s Technical Summary on CSAM Detection. Hashes listed here are not actual CSAM hashes. This excludes details such as the blinding step. Refer to the Technical Summary for a detailed diagram.

Apple is scanning my photos?!?

“Scanning” is probably the wrong word to use. In reality, Apple intends to check for CSAM without having to look at your photos.

Consider the idea of a fingerprint — something unique to every individual. It identifies a person but by itself does not tell you anything about the individual. In computer science, a hash function helps us compute a hash that acts like a file’s fingerprint. The hash is (kinda) unique to the file but by itself, it doesn’t tell us anything about the file’s contents.

If two files produce the same hash, they are very likely to be the same file. Just like how if two sets of fingerprints match, they are likely to be from the same person.

Apple intends to store a database of hashes that corresponds to known CSAM. Then, by hashing all your iCloud photos and comparing your hashes to the CSAM hash database, Apple can identify matches without looking at your iCloud photos.

But it’s easier said than done. Traditional hash functions (MD5, SHA256 etc.) are very sensitive to changes in the file. For instance, these two images have totally different MD5 hashes, even though they look the same (the one on the right is 2 pixels shorter in width).

(**Left**) Original Doge meme, MD5: 53facff91ec83f60a88235ab628590bb | (**Right**) Image cropped by author, MD5: da25273f33c4ec95f71984075079bd16

Being super sensitive to changes is usually a useful feature of hash functions, because this makes it really easy to tell if a file has been tampered in any way.

But this becomes a huge limitation if we are using it to detect for CSAM. Imagine if the image on the left above was a banned image. Apple may store the MD5 hash 53f…0bb. But just by cropping the image slightly, we still get an image that looks the same (on the right) but has a totally different MD5 hash of da2…d16, which would have evaded detection.

More generally, even if Apple has a database of known CSAM, CSAM distributors can just rotate, crop or resize the images to evade detection. Apple will have to store an infinite number of hashes to account for the infinite ways to rotate, crop or resize the images. Imagine if people’s fingerprints changed whenever they lost weight, got a haircut or clipped their nails!

In order for the detection to work, the hash needs to be the same even if the image has been modified slightly. Introducing… NeuralHash.

Okay tell me about this NeuralHash

The NeuralHash is a hashing algorithm that is insensitive to small changes in the input image.

(**Left**) Original Doge meme, NeuralHash: 11d9b097ac960bd2c6c131fa | (**Right**) Image cropped by author, NeuralHash: 11d9b097ac960bd2c6c131fa

For the same pair of images that had very different MD5 hashes earlier because of the unnoticeable crop, we get identical NeuralHashes.
(All NeuralHashes in this article were computed with this in-browser demo.)

(**Left**) Original Doge meme, NeuralHash: 11d9b097ac960bd2c6c131fa | (**Right**) Image flipped by author, NeuralHash: 20d8f097ac960ad2c7c231fe

Even in the case where the image is flipped, significant portions of the NeuralHashes are still the same:
Left NeuralHash: 11d9b097ac960bd2c6c131fa
Right NeuralHash: 20d8f097ac960ad2c7c231fe

(**Left**) Original Doge meme, NeuralHash: 11d9b097ac960bd2c6c131fa | (**Right**) Image edited by author, NeuralHash: 11d9b0b7a8120bd286c1b1fe

Here’s another example where we overlay some words onto the image but the resulting NeuralHash is still similar:
Left NeuralHash: 11d9b097ac960bd2c6c131fa
Right NeuralHash: 11d9b0b7a8120bd286c1b0fe

Without going too deep into the details, the NeuralHash algorithm uses a convolutional neural network (CNN) to compute the hash. During training, the CNN is shown pairs of images. Positive pairs comprise of images that are simple transformations of each other (rotations, crops, resizes) such as the pairs of images above. Negative pairs comprise images that are totally different. The CNN is trained to map positive pairs to the same hashes and negative pairs to different hashes. In doing so, it learns to ignore small transformations applied to the image.

And you said it can be broken?

Most researchers and students in AI and ML will probably have heard of adversarial attacks, where neural networks are tricked by adding noise to images.

Image by author showing how adversarial attacks can trick AI models, inspired by the canonical panda example from Figure 1 of **Explaining and Harnessing Adversarial Examples** by Goodfellow et al., 2014.

For instance, a neural network may initially label our Doge image as “dingo”. But we can add a small amount of noise to the same photo and now the neural network labels the photo as a “Siamese cat”.

These attacks have been widely known for several years in the AI community and there has been a constant back-and-forth between researchers developing adversarial attacks and defenses.

The same attack can be applied quite easily to the CNN in the NeuralHash algorithm. In mid-August, Reddit user AsuharietYgvar released instructions on how to export a copy of Apple’s NeuralHash model. Several scripts appeared demonstrating successful attacks just hours later.

Adversarial attacks on the NeuralHash model come in two forms. The first type of attack — let’s call it “Different Image, Same Hash” — adds noise to a source image so that the resulting NeuralHash is identical to a target image.

Example of a hash collision pair, generated by the author from the Doge (**left**) and Grumpy Cat (**right**) memes. Both images have the same NeuralHash 11d9b097ac960bd2c6c131fa, after adding noise to the Grumpy Cat image.

In the example above, noise was added to the Grump Cat image so that the result has the exact same NeuralHash as the Doge image. This is also known as a hash collision — when the same hash is computed from different images. This is problematic because it means that someone can add noise to innocuous images so that they correspond to CSAM NeuralHashes, then distribute them on random websites, disguised as regular images. These fake CSAM images would raise false alarms on Apple’s servers and result in people being wrongly flagged. The same attack can also be used to disguise CSAM as regular images.

Another attack — let’s call this “Same Image, Different Hash” — adds a tiny bit of noise but dramatically changes the NeuralHash.

(**Left**) Original Doge meme, NeuralHash: 11d9b097ac960bd2c6c131fa | (**Right**) Image generated by author, NeuralHash: f8d1b897a45e0bf2f7e1b0fe

In this example, even though both images look similar (we added noise with some light yellow-green blobs on the right) the NeuralHashes are quite different. This noise is specially crafted to attack the underlying CNN powering the NeuralHash. Contrast this to the earlier example where we added some words to the image but the NeuralHash stayed largely similar.

With this attack, distributors of CSAM can evade detection by adding a little bit of noise to their images and vastly changing the resulting NeuralHashes. As shown above, this can be a small amount of noise that doesn’t degrade the image significantly.

Don’t bring me problems, bring me solutions

One way to avoid these attacks would be to never store the CNN on user devices. Without access to the CNN weights, these attacks will be much harder. However, that also implies Apple running the hashing on their servers, which in turn means that all our iCloud photos will be shared.

Alternatively, Apple can run the NeuralHash with many CNN models instead of just one. Using many models increases the difficulty of generating these attacks, since an attack will have to trick all the models at the same time. However, this increases the compute that has to be done on-device, which may be undesirable for users.

Another possible solution is to do some preprocessing on the image before hashing it, such as changing it to black-and-white, increasing the contrast or generating multiple random crops and rotations. This helps because the attacks are sometimes fragile and may be negated by sufficient preprocessing.

The NeuralHash algorithm is a new technology and the extent of its vulnerabilities are perhaps not fully explored. While I am heartened by Apple’s intention to combat CSAM, I would encourage researchers to continue looking into potential weaknesses of the system in the hopes of improving it. Hopefully this serves as a nice overview of the NeuralHash technology! Try running the algorithm here!