The world’s leading publication for data science, AI, and ML professionals.

User feedback – the missing piece of your ML monitoring stack

A complete guide to building user-centric AI.

Image from Unsplash
Image from Unsplash

The misalignment of AI models and users

Have you ever spent months, and who knows how much $$$, implementing an AI model, to find no one uses it? Even if you do overcome the challenges of adoption, how do you know if the model outputs are truly adding value to the user’s decisions, queries, professional or daily activities?

Machine learning performance metrics and real-time monitoring tools are an excellent way to calculate the performance of a model, and identify when things may be going wrong from a technical point of view. But without understanding user engagement or satisfaction, it is difficult to know if the model is being used for its intended purpose.

Additionally, listening to the users of AI models may uncover incorrectly predicted edge cases; explainability algorithms that don’t explain things quite as clearly as we hoped; or user experience flaws that impact how users engage with a model.

The remainder of this article will cover the importance of understanding user feedback on AI models, the different types of user feedback, and how user feedback can be collected to improve model performance, increase user adoption, and ultimately align AI models to users.

Contents

  • The misalignment of AI models and users
  • What is user feedback for AI?
  • Why is user feedback for AI important?
  • What are the different types of feedback?
  • A guide to collecting user feedback
  • Wrapping up

What is user feedback for AI?

When we refer to user feedback, the user in question depends on the use case you are implementing. For example, this might be an internal business user or stakeholder of an internal ML-based demand forecasting application; it may be an external domain expert, such as a medical oncologist, leveraging a MedTech product assisting with detecting tumours in medical scans; Or, it may be the end user of an external facing job application assistant, leveraging generative AI to help write and refine resumes.

The concepts, methods and benefits outlined in this article apply to all of these different use cases. However, some benefits may apply more or less depending on the use case itself, and should be taken into consideration on a case by case basis.

For the purpose of this article, we will use the resume assistant, described above, to illustrate the benefits of user feedback for this application.

A second important point, when referring to user feedback, is that we don’t just mean relabelling incorrect predictions, or a feedback loop for automated model retraining. User feedback involves any information given by users, providing an understanding of the usefulness and adoption of the AI application. In our resume assistant example, user feedback could include user satisfaction scores, providing insights into how happy users are with the generated resumes, or written comments to highlight specific issues.

This type of feedback should not always be pushed directly into an automated retraining pipeline for several reasons:

  1. User feedback is typically unstructured, and highlights issues outside of incorrect predictions, so can’t always be used to directly retrain a model. For example, a user highlighting that the resume assistant is using overly formal language may require more examples of less formal text in the training data, opposed to directly retraining with this feedback.
  2. Focusing solely on correct/incorrect predictions overlooks valuable information provided by users. Understanding user feedback allows AI teams to improve the application based on user experiences and usage patterns.
  3. Training strategies such as reinforcement learning with human feedback (RLHF) work very well within a controlled environment. However, real-world user feedback can be noisy and potentially harmful. For example, blindly incorporating user feedback into the training data can lead to data poisoning, where malicious users intentionally mislead the model.

Therefore, AI teams should review user feedback to extract the different insights and determine the next best course of action to improve the overall AI application.

Why is user feedback for AI important?

Enabling Model Evaluation

Many AI models lack a ground truth. This makes evaluation on a test dataset difficult, as it is usually based on a proxy metric that often only tells part of the story. This is especially true for generative models, where understanding whether users are satisfied with model predictions is usually the most important metric.

Increasing Model Performance:

User feedback can be used to continuously improve the performance of AI models. Users can hold good domain knowledge required to build a robust model. Additionally, monitoring user engagement can help identify if a model has poor performance due to the train/test set being a poor representation of the real-world.

Increasing User Alignment:

User feedback provides insight into what aspects of the model work well and what causes friction. This enables the AI team to enhance user experience, making the model more intuitive and user-friendly. Additionally, AI teams can ensure the models are aligned to all users, and not just smaller sub groups. For example, ensuring the resume assistant maintains quality across all languages, not just English.

As users feel their voices are heard, they are more likely to trust the AI model and remain engaged, leading to increased user alignment and adoption.

Increasing AI responsibility:

Through user feedback, AI teams can identify and address concerns related to safety, bias, or other ethical considerations. This proactive approach results in the development of safer and more responsible AI models. By seeking and responding to user feedback, AI teams demonstrate their accountability and commitment to creating high-quality and reliable AI solutions. Feedback may also reveal the need for additional educational resources and documentation, which AI teams can provide to ensure users have a clear understanding of the model’s capabilities and promote best practices.

In summary, leveraging user insights allows AI teams to refine models, optimise user experience, and address ethical concerns, leading to higher user satisfaction and trust.

Now we have clarified what user feedback is and its benefits, let’s cover the different types of feedback and what they are used for.

What are the different types of feedback?

There are two main categories of user feedback, explicit and implicit. This can be explained and illustrated nicely by our new best friend, ChatGPT (in the image below).

ChatGPT illustrating the difference between explicit and implicit feedback. Screenshot taken from ChatGPT by OpenAI, and edited by author.
ChatGPT illustrating the difference between explicit and implicit feedback. Screenshot taken from ChatGPT by OpenAI, and edited by author.

Explicit user feedback refers to direct, intentional, and consciously provided input from users regarding their experiences, opinions or preferences. As you would’ve seen in the ChatGPT interface, the thumbs up/down feedback is an example of explicit feedback.

Explicit feedback can be broken down further into quantitative and qualitative. Quantitative feedback includes measurable scales, such as thumbs up/down, user satisfaction (also known as the 5-point Likert scale), or any custom scale that best fits what you are trying to understand from your users.

Qualitative feedback typically involves an open textbox to allow users to provide written feedback. Combining a quantitative measure with qualitative feedback, enables the AI team to understand the "why" behind the user’s comment, and uncover details such as AI bugs, domain knowledge, or user preferences.

Submitting qualitative feedback, after selecting a negative quantitative response. Screenshot taken from ChatGPT by OpenAI.
Submitting qualitative feedback, after selecting a negative quantitative response. Screenshot taken from ChatGPT by OpenAI.

Implicit user feedback refers to indirect, unintentional, and unconsciously provided data based on users behaviours, actions or patterns. Looking again at the ChatGPT UI, the ‘copy to clipboard’ button is an example of how OpenAI collects implicit feedback. For the resume assistant example, implicit user feedback also could be taken by tracking any edits the user makes to the generated output.

Consideration needs to be made when choosing the types of feedback to implement. Explicit feedback provides a much clearer understanding of the user’s feedback and thoughts. However, for external use cases, the end users may not always provide explicit feedback, as they may not understand how they will benefit (or feel they don’t have time!). In this case, implicit feedback can also give a good understanding of how the AI application is being used, without relying on the user to take direct action.

Based on the application, and the challenges you currently face, you should also consider what measures you want to implement. For example, if you are focused on increasing the model performance, then a thumbs up/down measure with a comment can help identify model issues. But if you are more focused on increasing adoption, then perhaps a user satisfaction score would be better.

A guide to collecting user feedback

In this section, we walk through four key steps for collecting user feedback, and integrating the user insights back into the ML monitoring system (as shown below).

System diagram outlining a high-level architecture for collecting user feedback, and integrating the user insights back into the ML monitoring system. Image by author.
System diagram outlining a high-level architecture for collecting user feedback, and integrating the user insights back into the ML monitoring system. Image by author.

Step 1: Design & build feedback components within your AI app

After defining your goals on why you are collecting user feedback, you can determine what type of feedback best fits your requirements. User feedback is typically implemented after a model output has been generated. However, you may wish to collect feedback throughout your application, to gain feedback on certain features of the application.

AI model metadata should be captured along with all feedback submitted through the component. This includes things such as the model version, prompts or requests, the model outputs, and user demographics (such as user ID and location).

Step 2: Develop analytics capabilities to understand user feedback

For quantitative feedback this might include plots such as user satisfaction (CSAT/NPS) or the average positive/negative response over time, with the ability to compare these metrics for different model versions, users or other metadata.

For qualitative feedback, use ML to analyse the sentiment in user comments and to classify feedback into various categories. This allows monitoring the different sentiment / satisfaction metrics over the different categories of comments.

Step 3: Identify AI issues

Using the analytics capabilities, recurring topics and themes in the feedback can be identified to categorise areas for improvements. AI issues can then be raised and prioritised to be solved by the AI team.

The role of the AI team at this stage is to identify model issues as well as user issues, and determine the best course of action to resolve them.

For a reminder of the types of insights the AI team may find within the user feedback, have a look back at the "What is user feedback for AI?" section.

Step 4 – Integrate user feedback back into your ML monitoring system

Integrating user feedback into your current ML monitoring system will allow you to set up alerting (similar to performance monitoring, or drift detection). For example if the global user satisfaction score drops below a certain threshold, an alert can be triggered to notify the AI team to take action.

Additionally, summaries and daily reporting can be sent to the AI team or stakeholders, providing an overview of the user feedback.

Wrapping up

To summarise, user feedback enables AI teams to identify bugs, fine tune models and align models to users.

The above can also be achieved with ML monitoring systems. However, by assessing the model from a different perspective, i.e. from the user’s point of view, we can identify additional information that would have been missed by the traditional ML monitoring systems.

I hope this article has sparked your interest, and provided you with initial ideas on how you can start listening to your users and enhancing your AI applications.

If you would like to learn more about user feedback for AI, or share and discuss your ideas around this topic, please feel free to get in touch via Linkedin or email.


Related Articles