The world’s leading publication for data science, AI, and ML professionals.

Implementing Defensive Design in AI Deployments

With the upcoming launch of one of our AI products, there has been a repeating question that clients kept asking. This same question also…

A series of insights and battle scars from the world of medical device design

Picture from Pixabay on Pexels.com
Picture from Pixabay on Pexels.com

With the upcoming launch of one of our AI products, there has been a repeating question that clients kept asking. This same question also shows up once in a while with our consulting engagements, to a lesser degree, but still demands an answer. The simple version of the question is this:

How can I know that the AI is doing a good job?

Now it’s easy to throw around confusion matrices and neural activation graphs to clients, but they have a much deeper question – and a very valid concern. They are not asking about the performance of the system, they are asking about its alignment to their own problems. If this model is now in charge of one or many of their Business processes, how can they manage it if they cannot see its criteria for how it is executing its tasks?

This touches on a combination of management fundamentals, business logic, and the ongoing evolution of the machine learning fields. The goal of bespoke AI solutions is to accelerate key processes that can alleviate the workload of the rest of the staff, or to make decisions in real time. As such, a system that cannot reliably execute a process within a trustworthy tolerance range might as well not be implemented at all.

Here’s a few real world example cases that we came across over the last few months:

  • What if the system needs to review, organize, and highlight key sentences inside legal reports? Is there a cost associated with each key sentence missed or mislabeled?
  • What if it’s performing a medical diagnostic? Is there someone to vouch or question the results, if a patient is prescribed to undergo months of painful therapy?
  • What if a malicious agent successfully penetrates through a server’s firewalls? Is there any lessons that can be learned about how the attack was performed rather than simply categorizing it?

The main reason for asking about the models’ reliability is quality assurance. A supervisor may accept a lack of rationale for an employee making certain decisions, but would want a track record or historical proof that the employee’s judgement is reliable, resilient, and reproducible. (This is partially why I recommend to all my fellow engineers and data scientists to build a portfolio as well as a resume when looking for new jobs or projects.)

What is defensive design?

Defensive design is a design methodology that assumes that a system will fail, and as such, should fail without harming the remaining of the system, the patient, or anything else involved. It is intended to mitigate damage "when" things will break, not "if" they break. The premise of defensive design is to include multiple mitigation layers so that the deployed system does not hang on a very, very thin thread.

New Machine Learning experts and data scientists do not have the same pedigree and battle scars as devops and security engineers, and that can quickly become a problem – especially as more and more machine learning solutions are sown into the business logic of an organization.

Medical device design to the rescue

Let’s look at an industry with heavy requirements engineering activities that can inspire best practices: medical device design. There’s a lot that can go wrong, and the worst case scenario can be colorful, to say the least. So many things can go wrong, in fact, that we wrote a patent on just the safety systems required for a new neurology device.

In terms of designing a new medical device from scratch, there is a recommended methodology that ensures a degree of certainty in verification and traceability, called the V-Model. Is this just a fancy way of drawing the dreaded Waterfall model? Maybe! But there are some useful themes and ideas that we can apply to most AI deployments.

The V-Model of software development. I also call it "The Waterfall Model's bastard child". Although it's not as cyclical and responsive as Continuous Deployment, it does ensure that there is traceability back to the original design. From Wikimedia.
The V-Model of software development. I also call it "The Waterfall Model’s bastard child". Although it’s not as cyclical and responsive as Continuous Deployment, it does ensure that there is traceability back to the original design. From Wikimedia.

V-Model Takeaways

The following themes can be pulled out of the V-Model:

  1. There is a deliberate reason for doing things. Whether it be client requirements, technology limitations, or performance expectations, there is a reason for every action that can be traced back to the genesis of the project.
  2. There is verification of every design versus its implementation. It doesn’t matter if it is a sub-sub-module or the entire system, there is a way to verify the true performance of the system against expected performance.
  3. There is a difference between verification and validation. "Did I build the thing right?" is not the same question as "Did I build the right thing?", and usually the latter is what’s important when working with clients.

Implementing defensive design

There’s 3 main stages at which designers would want to ensure that they follow a solid framework for defensive design: at the client discussion/requirements stage, during coding, and during & after deployment.

Defensive design at the requirements stage

Requirements, requirements, and more requirements – that’s how we ensure good, clean fun at the design stage. "But programming is fun!" Yes, Karen, it is, but wasting time and money is not.

Present a tool to the customer, not a sentient being. Remind the client that machine learning is boring, and is intended to do one or more steps of a well-defined process very well.

Verify the data. Does the client have the data they think they have? Is the raw data itself clean? Does the API change its format once in a while?

Then verify it again. Are you able to have repeatable access to the working dataset? Does it match up to the first audit that you’ve performed?

Start small and gain momentum. Limit the scope of the initial promise. If you can solve a simple problem first, you can now take a larger and larger chunk of the business process. This also forces the client to ensure that the data they are presenting to the system is cleansed.

(We’ve seen a lot of success in various projects throwing a cold towel over the whole project at the beginning, which sobers everybody up and allows for a clear business case to emerge. It’s easy for your champions at the client site to promise everything but the kitchen sink in terms of functionality.)

Agnes Skinner making her requirements known. From Tenor.
Agnes Skinner making her requirements known. From Tenor.

If you want to read more about requirements engineering, I recommend the book of the same name, written by Elizabeth Hull and Jeremy Dick.

Defensive design at the genesis stage

Now you get to start coding and working with the client on the right user interface or interaction level, be sure to have clearly defined requirements that can be measured, assessed, tested, and explained.

User-centric design. We always talk about letting the user ask the right questions, rather than giving the user a prepared answer. Design the UI so the user is always in charge of the critical decision and the system should be deployed as an assistant – the human-in-the-loop has the last say.

A screenshot of our AuditMap.ai tool. The Value
A screenshot of our AuditMap.ai tool. The Value

Preprocessing/data validation at the input. An example of this is a simple classifier: Before sorting all of the statements, you can put an "English/non-English" filter upfront to make sure you get clean data going in, or a "looks good/looks bad" to ensure a filtering mechanism protects the rest of the system.

Multi-model design. It can be a tight ensemble or just different API calls, but allowing the review of each of the models’ performance allow for traceability all the way back to the business logic.

Defensive design at the deployment stage

Congrats! Your AI is deployed, your client/boss is happy, you go read up on Hacker News for a while, and now you’re fresh for your next project. However, from our experience, this is where you want to put the most effort, at least for the first 6 months of live deployment.

Tracking corrections. What are the mistakes that the system makes now that it’s deployed? If you haven’t accounted for a corrective action tracking mechanism, then you won’t know what are the long-term fixes to the datasets. (Bonus: this allows for a nice increase in your dataset, helping it grow over time. A longer explanation is available in one of Dan’s articles.)

Ongoing client conversations. There are now use cases that were otherwise unforeseen when the tool was designed. To ensure that your client has the best experience, continue the conversation after the hand-off to very that the tool still performs as expected. (Do they have to reset the server after 50 requests because of a weird memory leak? If so, they’re probably not gonna call you for a follow-on engagement no matter how high your accuracy.)

User mitigation. Is your system resilient to erroneous user inputs, filetypes, and corrupted files? What about multiple users uploading multiple files at once?

Intermediate documentation. Make sure that both module developers and integrators understand the rationale behind design decisions. You can write your own documentation, or let a tool like Swagger manage the documentation from (properly) commented code.

Load balancing. Is your system resilient to load volatility? After your client launches (Conference talks and TV interviews are especially bad for this.) We like a lot of the DigitalOcean options.

Never forget that users will invent use cases that no one ever dreamed of.

We're always impressed by unforeseen client use cases. From Imgur.
We’re always impressed by unforeseen client use cases. From Imgur.

The weakest part of your system is most likely the humans involved. They are also the most essential part for a successful deployment.


What defensive design is and is not

Defensive design is boring. It’s systematic, and more often than not, people will ask about why some of the structures and deliverables are built beyond their primary need. After a few near misses, your team should start realizing the importance of assuming the worst.

What defensive design is not, however, is a guarantee or insurance policy. Just because you followed every possible step does not mean that your system is future-proof and can withstand a technological hurricane.

Just be ready "when" your system craps out, and don’t just think about a theoretical "if". This way, you can make sure that the AI is doing a good job by minimizing the damage when it doesn’t.


If you have additional questions about this article or our design methodologies when starting new client projects, feel free to reach out by LinkedIn.

Other articles you may enjoy

Other articles you may enjoy from Daniel Shapiro, my CTO:


Related Articles