Security and Privacy considerations in Artificial Intelligence & Machine Learning — Part 5: When the attackers use AI.

Manish Prabhu
Towards Data Science
10 min readNov 5, 2018

--

Note: This is part-5 of a series of articles on ‘Security and Privacy in Artificial Intelligence & Machine Learning’. Here are the links to all articles (so far):

ML & AI as an attacker’s weapon

In the last post, we reviewed some constructive use cases of AI & ML for security-related scenarios. However, just as we, the defenders, have started increasingly using AI for improving security so have the attackers to make attacks more sophisticated!

Let us look at how AI & ML are getting deployed and leveraged as a part of the attackers’ armory. Like earlier, we will begin with cybersecurity scenarios but then also touch on broader issues around ‘weaponization’ and worries emerging from AI going rogue on us.

AI & ML in the Attacker’s Toolkit

Let us now look at the broader space where we assume that attackers start using AI & ML as a core part of their weaponry.

Why have those CAPTCHAs become so hard?

Can you remember the ‘easy’ CAPTCHAs we used to happily ‘pass’ when challenged just 3-4 years ago? You read a bunch of squiggly letters and numbers off an image and typed them in the text box and, voila, you got certified as a human (and not a bot)! Nowadays, when I see a CAPTCHA with that large grid of images I feel intimidated (and nervous if an important action I need to complete is guarded by such a CAPTCHA). It seems that CAPTCHAs have somehow got things mixed up — the problems they pose appear more difficult for humans to solve!

However, this is an inevitable outcome of attackers using AI & ML to break CAPTCHAs. Image recognition and transcription techniques have advanced so much that the old-style CAPTCHAs are ‘sitting ducks’ when an attacker is keen enough to use AI/ML to do get past them. This BlackHat paper covers in detail how even the most sophisticated CAPTCHAs in use currently can be broken using a clever improvisation of existing services in combination with machine learning.

Do CAPTCHAs make you nervous these days?

Adaptive Malware and Intelligent Botnets

Just like security features such as authentication have become ‘adaptive’ via use of AI & ML, so have the cyber attacks. Malware that has machine learning capabilities at its disposal can ‘learn’ what might and might not work in an environment and morph itself to get past defenses and infiltrate systems. Also, once a malware has landed on a target system, it can significantly enhance its chances of not getting detected if it can evolve a different hiding strategy for each target of acquisition.

Malware that can ‘learn’

Attackers often use a fleet of compromised computers to carry out large scale attacks. These are usually called ‘botnets’ and are controlled by attackers from a command and control (C&C) center. With the introduction of AI & ML, these botnets are being designed to become ‘self learning’ and ‘intelligent’ — making them more peer-to-peer in their approach to discovering possible attack points and autonomous in decision making with much less reliance on the C&C center than traditional botnets. This approach is slated to make these swarms of bots immensely efficient and lethal at the same time.

Swarm botnets

Machine Learning to break Crypto

Much of information security today is built over a foundation of cryptography. Many of the cryptography primitives are based on algorithms designed by cryptographers with a goal to achieve ‘good enough’ security (in lieu of the evasive ‘perfect’ security) by making a breach of the crypto ‘sufficiently difficult in practice’ for attackers.

For instance, symmetric encryption techniques attempt to achieve the effect of a ‘pseudo random’ permutation function. That is, for all practical purposes, to an attacker who is not aware of the encryption key, the mapping between a block of plain text and the corresponding encrypted text must appear like a random permutation of the bits from the original block.

However, realize that there’s a gap between this ‘pseudo random’ behavior of the as-designed algorithm and true randomness. Also, that the combination of the hidden key and the algorithm together essentially represent the ‘pseudo random’ function.

Now consider that, at their core, ML techniques are about discovering a hidden ‘target function’. Would it be possible for a powerful neural network to be able to ‘infer’ a key with sufficiently large number of samples of plain text and encrypted text? (Note that ‘sufficient’ here may be a very large number. However, in crypto, even a small improvement beyond the theoretical difficulty of an algorithm —often represented by the effort involved in a brute force approach — is counted as a breach or a weakness.) In a similar vein, could a ‘trained’ neural network produce a possible ciphertext given a new plaintext without knowledge of the key?

Can Crypto be attacked using machine learning?

Interestingly, the Google Brain team seems to have worked on something along these lines (crypto attack and defense using AI) as outlined in this paper. Although this Reddit thread has people disputing the value of the work I feel that the risks to crypto from AI are not entirely hypothetical nor far-fetched.

At a very basic level, one area where AI & ML have already started helping attackers immensely is to speed up brute force attacks on crypto. One of the challenges of brute force attacks is being able to infer when a brute force attempt at decrypting cipher text has yielded ‘potentially valid’ plain text. In the past, this used to require human examination and significantly slowed down brute force attacks. Given that the original plain text is natural language in a lot of common use cases, use of various Natural Language Processing (NLP) techniques to automatically flag decrypted messages that ‘look like’ valid ones have eased the attackers burden quite a bit.

Social Engineering and beyond…

Considering that humans have consistently been the ‘weakest link’ in security of systems, the area of ‘social engineering’ is a natural low hanging fruit for attackers to target for deployment of AI & ML.

What can AI & ML do to make the ‘social engineering’ attacks scarier?

Phishing emails — one of the more popular techniques social engineers use—can now be made significantly more convincing and ‘personalized’. In the past, an astute recipient (or detection software) could infer that a particular email is a scam. However, with the ability to analyze and capture the general context and simulate natural writing styles, that task is ever increasing in its difficulty. Furthermore, in the past, if attackers wanted to really sound convincing to a specific/target user they had to do bulk of the ‘attack personalization’ manually. No more today! ML & AI have solved a major scaling challenge for social engineering attacks.

Future ‘social engineering’ will be AI-powered

If we look at phone-call-based attacks, notice how, today, we often hear a stranger’s voice on the other end attempting to perform ‘social engineering’. Basically, someone previously unknown to you (but armed with some info about you) is attempting to convince you to give away sensitive information. However, given the advances in AI & ML, in the future the person on the other side will likely sound like a close family member (e.g., a child or a spouse seeking help in distress). What will we do when that happens?

At the other end of the spectrum of such attacks are abilities that ML & AI have delivered for ‘mass social engineering’. Chasing those very rapidly gets us into threats AI & ML pose to foundational social institutions like democracy. Elections these days are far more about data science than a candidate’s campaign promises and track record and AI&ML have given the much needed techniques to target (and manipulate) individual citizens in a scalable manner. (We will visit this aspect again in a future article.)

Democracy under attack?

How about our courts? In most jurisdictions, courts tend to be behind when it comes to awareness and adoption of new technologies. With the advances in ML-based audio and video creation how far are we from being able to enact a fake scene that is absolutely indiscernible from reality? What happens to admissibility of audio-visual evidence and what should the hapless judge do when that happens?

Fabricated evidence will get more real and convincing.

Autonomous Weapons

Taking this ‘risks to humanity’ theme further, there is much debate about the use of AI & ML in the context of ‘autonomous weapons’. This is how a signature campaign against Autonomous Weapons defines the term:

Autonomous weapons select and engage targets without human intervention. They might include, for example, armed quadcopters that can search for and eliminate people meeting certain pre-defined criteria, but do not include cruise missiles or remotely piloted drones for which humans make all targeting decisions. Artificial Intelligence (AI) technology has reached a point where the deployment of such systems is — practically if not legally — feasible within years, not decades, and the stakes are high: autonomous weapons have been described as the third revolution in warfare, after gunpowder and nuclear arms.

Essentially, with autonomous weapons a human is excluded from the final ‘pull the trigger’ decision — essentially granting AI ‘a license to kill’.

Even as of today, AI & ML are already extensively deployed albeit piecemeal in various weapons systems and provide the side that has these advanced systems a significant position of advantage. However, autonomous systems change the equation altogether and bring in major moral and ethical dilemmas — apart from a risk of rapid, uncontrolled escalation. Almost all major software vendors engaged in development of AI & ML have announced resolutions to not partake in development of such weapons. But that should be hardly a consolation in the wake of resourceful nation states interested in creating them. Also, if we go by the history of warfare — whence similar concerns came up in the context of machine guns, submarines and chemical warfare — the eventual deployment and use of autonomous weapons seems inevitable. We know too well from the past that ‘transnational trust’ is a complex concept. Nations are likely to engage in such work driven just by the apprehension that ‘if we don’t someone else will’.

Autonomous Weapons — the next frontier of weaponization!

Will AI control humanity one day?

Most of this article has been about (human) attackers using AI & ML as a part of their toolkit/armory. However, there is a large community that is worried about AI developing ‘consciousness’ or ‘a mind of its own’ and becoming rogue. In that setting we have to consider AI by itself as an adversary!

There are two basic concepts involved in this scenario.

The first one is about ‘special purpose’ v. ‘general purpose’ intelligence. It is more akin to how — before the advent of general purpose computers — we still had computers for dedicated, domain-specific use cases. Using that analogy, we are still in the era of special purpose (dedicated) AI currently. We have a lot of sophistication and are seeing great results — but in vertical, task-oriented domains (self-driving cars, speech recognition, etc.). These scenarios and the type of AI enabling them can be categorized as Artificial ‘Special’ Intelligence (ASI). However, serious efforts are under way to conquer the next frontier and create what would qualify as Artificial ‘General’ Intelligence (AGI) — machines that can, by themselves, learn previously unknown tasks from across a range of unrelated domains.

The second concept is about exponential speedup in capabilities of AI(something we are familiar with as we have seen it happen in other technologies in the past). Herein, it is projected that in about 15–20 years AI will be at par with humans although there’s dispute over the mapping between capabilities of a human brain and that of an AI/ML system. Add ‘consciousness’ to that picture and scary possibilities start emerging for a future after that point taking us into the realm of Artificial ‘Super’ Intelligence (ASI) and its uneasy implications.

An important thing to note here is that — although ‘human intelligence level’ seems like a significant milestone to us (humans) — to a conscious ASI it may not even be anything beyond mere academic interest!

The road to ASI (Courtesy: Wait but Why — Tim Urban)

As far as we’re concerned, if an ASI comes to being, there is now an omnipotent God on Earth — and the all-important question for us is: Will it be a nice God?

— Tim Urban (Wait but Why)

Naturally, this is a hotly debated topic and there are bigwigs from the ‘Whos Who in AI & ML’ on both sides of the discussion (whether AI will take over v. remaining firmly under human control). Yet other researchers have suggested a different and less dramatic way to frame the problem as “how to build AI in a manner that it doesn’t develop goals that are misaligned with those of it’s creators?”.

What next?

In this article, we touched on a very broad set of issues — we started with cyber attacks leveraging AI & ML and landed all the way across into existential issues for the human race.

Although the series is titled ‘Security and Privacy of AI & ML’, so far we have only sporadically covered privacy. In the next article we will take a more mindful look at privacy implications of AI & ML and also at the techniques that are being developed to deliver scenarios with various levels of privacy assurance.

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

--

--

Passionate about security design and engineering of real-world systems. Enjoy exploring new/interesting areas and sharing my learning with everyone.