Toward Responsible AI (Part 4)

Responsible use of AI should start with a detailed assessment of the key risks posed by AI [1], followed by a good understanding of the principles that should be followed [2], and then the governance of AI from a top-down and end-to-end perspective [3]. We have discussed these in our previous articles [1, 2, 3]. In this article, we focus on the first line of defense and dive into the nine-step Data Science process [4] of value scoping, value discovery, value delivery, and value stewardship and highlight the dimensions of governance.
Overview
Given the focus on governance we look to answer the key questions on who is making what decision based on what rationale along the nine-step process. Although we have nine steps in the process, we consider only those points in the process where we are making a key decision. The figure below shows our nine step process, the key outcomes for the nine steps, and the six stage gates where we are making a major decision. These decisions are:
- Is it worth having an AI solution or not?
- How do we design (build or buy or rent) the AI solution?
- Does the model meet our expectations?
- Do we deploy the model into production?
- Is the model ready to be transitioned for ‘business-as-usual’ operation?
- Should the model continue as-is, retrained, redesigned, or retired?

Stage Gate 1: Is it worth having an AI solution or not?
This stage gate occurs in the first step of business and data understanding of the value scoping phase of the project. This is undoubtedly the most critical step as it determines if we want to go ahead with having an AI solution or not. We should understand five key areas to make this determination:
- Business Impact: First, is around the business objective or use case that is being contemplated. What business activity, decision, or action are we making more efficient or effective? Is this solution an opportunity for a new process or system altogether? What is the business impact to the organization that deploys the solution – time savings, cost savings, increased revenues, better experience, etc.?
- Data access and use: Second, is to understand what data is available, accessible, and can be ethically used to develop the AI solution. Questions such as, do we have sufficient data that can be used to build the model? Is this data labelled appropriately or can be annotated? How much historical data is available? How dynamic is the data or how frequently and significantly does the data change?
- Model use context: Third, is to understand how the model will be used when it is put into production. This should answer questions such as, who are the users of this model? What is their level of business or technical understanding? What is the interaction between humans and the model e.g., human-in-the-loop, human-on-the-loop and human-out-of-the loop?
- Social Impact & Harms Assessment: Fourth, is to understand the broader societal and ethical implications of the AI solution. How many people will be impacted? Does it infringe on human rights, such as loss of individual liberty, or loss of privacy, or impact the environment adversely? Does it cause physical, emotional, psychological or financial harm? Does it have the potential to mislead or manipulate opinions?
- Risk Assessment: Fifth, is to make a determination of the risk tiering of the AI solution. The risk tiers are on a scale of 1–3 or in some cases 1–5. The risk tiering is based on the severity and frequency of the harm caused by the AI solution as well as the potential societal impact of the solution. This assessment should answer questions such as, what types of risk does the AI solution entail? what is the likelihood of the risk and the exposure (financial or non-financial) of the solution? what are the potential risk mitigation strategies and control categories that need to be considered?
This step involves a number of stakeholders. We describe the roles of these stakeholders using the following RACI (Responsible-Accountable-Consulted-Informed) matrix. A summary of the decision, documents, and RACI matrix is captured in the stage gate card below.

The business impact – in terms of the value of the AI Solution to the organization and its societal impact or potential risks will determine the seniority of the accountable business sponsor. The higher the economic impact or greater the potential risk the greater is the need for it to be approved by the business ethics board.
The business specification, social impact & risk assessment documents are key documents that should be produced by the product owner. The accountable person, which in this case, is the business sponsor makes the go/no-go decision to proceed further and evaluate the technical feasibility of the solution. It is important to uncover both the risks and potential harms as well as potential benefits of the application in order to weigh the trade-offs of proceeding with development.
Stage Gate 2: How do we design (build, buy or rent) an AI solution?
This stage gate occurs in the second step of solution design of the value scoping phase of the project. The business specification from Step 1 gets further refined during this step and we start the exploration of both the data and the model that is required. This step also determines if the AI solution should be built or can be bought or leased from within the large ecosystem of Analytics and AI vendors or cloud platform providers. Key areas of consideration in this step are:
- Success and Acceptance Criteria: The business specification from the previous step should be refined to formulate the key activity that is being automated or decision that is going to be augmented with AI [5]. This should also take into account how the model will be used and what data will be part of the model. Based on these, the performance criteria of the model (e.g., accuracy, specificity, false positive rates, etc.), the interpretability, explainability, safety, privacy, security, and robustness of the model should be agreed upon. We call this broadly as success criteria. These should also be translated into a set of high-level testing and validation procedures that should be met. We call this broadly as acceptance criteria.
- Build vs Buy vs Rent Analysis: With the above artifacts an organization can determine if the solution required already exists and can be rented as a managed solution. Key considerations here would be how well the generic model meets the business requirements, success and acceptance criteria already agreed. Security and confidentiality of the data required to make the inferences and where that data is stored will also have to play a significant role. If the solution cannot be rented, but can be bought, one should perform a similar due diligence in ensuring it meets the business requirement, success & acceptance criteria but is also cost-effective. Any retraining of the model using proprietary data should be factored into this consideration. Finally, if we are building the model then an overall solution design should be developed for how the model will be built during the training phase and also how the model will be eventually deployed and used. For instance, if explainability of a system is extremely important per the success criteria, then a simpler model structure or a solution with explainable AI elements added to it would be required.
- (Preliminary) Datasheets for Datasets: One of the practices that is becoming more commonplace in the data science community is the concept of datasheets for datasets [6]. Datasheets capture not only the key data items within a dataset and its definition, but also the motivation for gathering the data, the composition, collection process, preprocessing, cleansing, and labelling required, recommended uses, sharing and distribution of the data, maintenance, and retiring of data. While at this stage of the process the datasheet may not be comprehensive, a preliminary datasheet that captures some of these elements would be valuable for making an informed go/no-go decision for model building.
- (Preliminary) Model Cards: Similar to datasheets for datasets, model cards for model reporting [8] are also becoming widely used in the data science community. Model cards provide a specification of the models being built including details of the model, the algorithms being used, the intended use of the model, ethical considerations (such as bias and fairness or explainability requirements), training, validation, and test data. At this early stage of the process the model cards will be preliminary and will be refined during the subsequent phases. The model specifications should take into account the social impact and risk assessment discussed in the previous step.
- Solution Design: The concept of solution architecture is a well-accepted practice in software development. As models move from standalone models to a model factory [8] we can enhance this traditional solution architecture with a few additional dimensions. A model architecture that captures how different models interact with each other; how the data is labelled or validated by end users; how the data pipelines are built to gather, cleanse, process and feed the models should be part of the overall solution design.
- Detailed Development Plan: With all of the above analysis a reasonable plan for experimentation and an initial cost for the experimentation can be built. The project or sprint plan should identify the key roles, responsibilities, and timing for the training, testing, and development of the AI solution. Note our discussion elsewhere [9] that project scoping for models, especially for new models that have not been built before, suffers from a number of traps that should be taken into account while planning for value delivery.
The stage gate card for this stage is shown below.

The success and acceptance criteria will be determined jointly by the business domain experts and the data science teams. The build vs buy vs rent decision should be made by the product owner in conjunction with the business sponsor. The datasheets will be a joint deliverable by the data owners and the data scientists, while the model cards will be led by the data scientists. The software solution architects need to collaborate with Machine Learning or Model architects and data architects to complete the overall solution design document.
The final decision of whether to go ahead and build, buy, or rent the AI solution or abandon the entire effort is taken jointly by the product owner and the business sponsor. All of the documents detailed in this and the previous step will be taken as input into the process.
Stage Gate 3: Does the model meet our expectations?
This stage gate occurs at the end of the value discovery phase of the project. When we come to this stage gate, the training of the model has been completed through an iterative cycle of data extraction, pre-processing, model building and testing. The key decision made at this stage gate is to either proceed towards model deployment or to delay/abandon the model development. Key areas of consideration in this step are:
- Success and Acceptance Test Results: The detailed testing of the model as described by the success and acceptance criteria should be carried out prior to this stage gate. The results should summarize the performance of the model, the interpretability and explainability of the model, fairness, safety, control, security, privacy, robustness and reproducibility of the model. All of the acceptance tests mandated or recommended in the acceptance criteria should be performed and evaluated. Ideally, the model should meet the success and acceptance criteria agreed in the previous stage gate. However, in some cases, a decision may be made to relax the thresholds for acceptance and approve the model for the next stage. This would occur only on an exceptional basis and should be escalated to the business sponsor and product owner for consideration.
- Datasheets, model cards, and updated solution design: When the model training is complete the key documents from the previous stage gate can be updated. The datasets and data items that were useful in training the model should be highlighted. The relevant models chosen, the hyper-parameters used, the experiments performed, and the ensemble of models that provided the best results should be documented. Any changes to the overall model architecture, data pipelines, and solution architecture should also be updated.
- Integration and deployment requirements: Once the model has been approved for deployment the specifications for model deployment can be drafted. It should include whether the model inferencing in production will be batch-mode or real-time, how the model will be validated during the transition period, and how frequently would the model require retraining (e.g., if the training of the model in transition is continuous or periodic). Performance tuning and load testing requirements should also be detailed in the requirements. In addition, details on how the model is to be integrated with the rest of the application systems (e.g., used as docker container or as an API call or will have its own standalone front-end) should be detailed.
- Change management plan: The final step for planning for eventual deployment is the change management plan that includes any process changes, validation of AI solution, training and redeployment of personnel.
The stage gate card for this stage is shown below.

The success and acceptance test results will be prepared by the data scientists and data engineers and will be validated by 2nd line model validators and the ethics team. The updated data sheets will be the responsibility of the data engineers, while the updated model cards will be the responsibility of the data scientists. The solution architects, along with ML engineers, and Model Ops personnel will be responsible for the integration and deployment requirements. The change management plan will be the responsibility of the business team and the product owner.
Stage Gate 4: Do we deploy the model into production?
This stage gate occurs only if the previous stage gate has approved the model for deployment. This stage gate occurs at the end of the model deployment step of the value delivery phase of the project. When we come to this stage gate, the model is ready to be deployed. The key decision made at the end of this stage gate is to release the deployment for production use.
- Integration & deployment test results: Based on the integration and deployment requirements specified in the earlier stage the model could be served as an API endpoint or have its own web interface or delivered as a standalone application with UI. The integration tests should be performed at this stage. In addition, performance tests, load balancing – especially if the models will be trained in production, and model use bias tests, explainability tests with end users, robustness tests, and adversarial attack tests should be performed at this stage.
- Process checklist: All changes to processes based on the interaction mode of the deployed model (e.g., human-in-the-loop, human-above-the-loop, human-out-of-the-loop) should be carried out. The process flow for validation of models, if required, should be in place with appropriate criteria for exceptions, escalations, and approvals. ModelOps, DataOps, and SecOps processes should be in place at this stage.
- Training checklist: Any new roles (e.g., ModelOps, DataOps etc) or changes to existing roles should be documented with clear responsibilities and activities. If there is a requirement for the training of a large number of users (I.e., casual or power users) for new roles or changed roles they should be carried out before deployment.
The stage gate card for this stage is shown below.

The integration and deployment test results will be prepared by the data scientists, ML engineers, and software engineers, and will be validated by 2nd line model validators and the ethics team. The process checklists and training checklists will be the responsibility of the different Xops (ModelOps, DataOps, SecurityOps) groups. The training will be the responsibility of business teams and change management experts.
Stage Gate 5: Is the model ready to be transitioned for ‘business-as-usual’ operation?
This stage gate occurs at the end of the value delivery phase of the project. Although the model is in production the operation of the model should be closely monitored before it can be transitioned into a BAU (‘business-as-usual’) model.
- Model Monitoring Results: This stage gate is normally not required for traditional software deployment as the working of the software is not sensitive or may not change with new data. However, in the case of models there is a potential for a number of ‘drifts’ [11] – data drift, feature drift, concept drift etc. that should be monitored closely during the first few months of a deployed model. The period of close observation depends on the frequency of use of the model and the arrival rate of the data.
- BAU Transition checklist: After the close monitoring of the model, if the model is still worthy of production use, it should be transitioned to the operations team. Ensuring appropriate operations personnel are assigned and trained is a critical aspect of the transition of the model to BAU. At this stage the ongoing monitoring criteria (I.e., performance, bias, explainability etc), the thresholds for intervention, and frequency of testing should be established and agreed.
The stage gate card for this stage is shown below.

The model monitoring results will be prepared by the different operational roles (ModelOps, DataOps) with the assistance from data scientists. The BAU transition will be the joint responsibility of the business team and the operational roles.
Stage Gate 6: Should the model continue as-is, retrained, redesigned, or retired?
This stage gate occurs periodically during the evaluation and check-in step of the value stewardship phase. The frequency of the evaluation will be determined at the end of the transition step or value delivery phase. The decision to be made at each evaluation is to either continue the model with no changes, retrain the model, redesign, or retire the model. In order to make this determination the following areas should be examined:
- BAU Monitoring results: All of the model monitoring results from the previous stage should be included in BAU monitoring. In addition, model decay should be evaluated by comparing the results between the current period and the previous period/original model deployment and training results. Any decay or instability can result in retraining or redesign of the model.
- ROI Value: In addition to the model performance, the business should also include a ROI or value assessment of the model. This should take into account the benefits of the model (e.g., efficiency gains, effectiveness, enhance experience, cost savings, revenue gains) and make a determination if the benefits of having the model in production outweighs the costs of maintaining the model. This will determine if the model should be retired or replaced with a different or newer model.
The stage gate card for this stage is shown below.

The BAU monitoring results will be prepared by the different operational roles (ModelOps, DataOps) with the assistance from data scientists and business team. The ROI value analysis will be the responsibility of the business team working with the data scientists and ModelOps personnel. The ultimate decision of retraining, redesigning or retiring the model should be made by the product owner.
The six stage gates that are part of the 9-step model development and management are critical milestones that are an integral part of any Software 2.0 management process. The details and depth of the specifications we have outlined here can be adjusted based on the scale, scope, complexity, and societal impact of models. In subsequent articles, we will discuss the specifications and how to modulate these specifications based on the risk tiering of models.
Authors: Anand S. Rao, Ilana Golbin, and Vidhi Tembhurnikar
References
[1] Five Views of AI Risk: Understanding the darker side of AI (Towards Responsible AI – Part 1)
[2] Ten principles of Responsible AI for corporates (Towards Responsible Ai – Part 2)
[3] Top-down and end-to-end governance for the responsible use of AI (Towards Responsible AI – Part 3)
[4] Model lifecycle: From ideas to value
[5] Ten human abilities and four intelligences to exploit human-centered AI
[7] Model Cards
[8] Model evolution: From standalone models to model factory
[9] Consequences of mistaking model for software
[10] Endpoint vs API