Making the Right Decisions: AI Advice, Decision Aids, and the Promise of LLMs

Exploring the new dawn of decision-making with LLMs

Ujwal Gadiraju

Published in

Towards Data Science

10 min readSep 28, 2023

Introduction

The democratization of AI has led to the adoption of AI systems in a variety of domains. The recent wave of generative models, such as pre-trained large language models (LLMs), has led to their widespread use in different activities in our daily lives — from increasing productivity by aiding the framing of emails to helping solve the dreaded “blank page” obstacle for novice and expert writers alike. Due to the increasing reliance on LLMs to aid decision-making, this article presents a synthesis of human decision-making and the evolution of human-AI decision-making. Finally, the article reflects on the opportunities that LLMs offer for aiding decision-making tasks and the concomitant threats of reliance on LLMs for decision-making.

Human Decision-Making

In a world that is characterized by a growing spectrum of choices with respect to nearly every single decision we encounter in our daily lives (e.g., food to buy or clothes to wear, books to read, music to listen to, or movies to watch, from lifestyle choices to travel destinations), the quality of decision-making has received renewed interest. In his influential work exposing the “paradox of choice,” Barry Schwartz articulated this growing difficulty in decision-making on the heels of technological advances at the stroke of the millennium [12]. Schwarz elucidates this with an example of a doctor offering a patient an array of treatments, conveying the potential risks and weighing them against the benefits for each. In such a situation, the burden of a high-stakes decision is shifted from the expert doctor to the non-expert patient. Among other factors, an abundance of choices often tends to impede effective human decision-making.

Different research communities, ranging from evolutionary psychology to cognitive sciences and neuroscience, have explored the nature of human decision-making and various factors that shape decision-making processes among humans [2,10]. It is no secret that human decision-making is plagued by cognitive biases and punctuated with irrationality. This was most famously documented by Nobel prize-winning behavioral economist Daniel Kahneman in Thinking, Fast, and Slow [6].

Human-AI Decision Making

The advent of technologies has ushered in growth in decision-support systems that can support humans in overcoming obstacles in their decision-making processes. Decision-support systems take various shapes and forms in broader socio-technical contexts– from algorithms that power user interactions to complex machine-learning models that aid users with predictions and forecasting. For example, recommender systems can help users by presenting them with content or products likely to best satisfy their needs. Other algorithmic systems can mine through large volumes of data to offer advice to users in a plethora of decision-making tasks.

A central objective common to all human-AI decision-making contexts is the potential to improve decision-making effectiveness by combining human intelligence with the computational power of algorithmic systems. This, however, is far from how many collaborative human-AI decision-making processes unfold in the real world. Humans fail to rely appropriately on AI systems in decision-making tasks, leading to sub-optimal team performance. Appropriate reliance has been conceptualized as the reliance of humans on AI advice when it is correct and self-reliance when the AI is incorrect [11]. There are a number of factors that play a role in shaping such outcomes — human factors (e.g., domain knowledge, affinity to technology interaction, prior experience); system factors (e.g., accuracy or confidence of the AI system); task factors (e.g., task complexity, task uncertainty, stakes).

Empirical explorations of human-AI decision-making in various contexts, including loan application decisions and medical diagnosis, have revealed that humans either under-rely on AI advice and lose out on the opportunity to improve the outcomes of their decisions or over-rely on AI advice and achieve sub-optimal outcomes. To tackle the problem of over-reliance and under-reliance and foster appropriate reliance on AI advice, prior works have proposed the use of explanations [13], cognitive forcing functions (i.e., interventions that force critical consideration and reflection during the decision-making process) [4], tutorials or training sessions that communicate the strengths and weaknesses of AI systems, and initiatives to increase the general AI literacy of populations. Recent work has proposed an alternative framework called “evaluative AI” to promote appropriate reliance on AI advice. This framework suggests that decision support tools should provide evidence for and against decisions made by people rather than provide recommendations to accept or reject [7].

Cognitive biases have also influenced human-AI decision-making [1, 3]. Rastogi et al. [9] argued that our general perception and understanding of decision-making tasks can be distorted by cognitive biases, such as confirmation bias, anchoring bias, and availability bias. They explored the role of anchoring bias and proposed methods to mitigate their negative effects on collaborative decision-making performance. He et al. [22] showed that the Dunning-Kruger effect, a metacognitive bias, can influence how people rely on advice from AI systems. They revealed that users who overestimate their ability or performance tend to exhibit under-reliance on AI systems, hindering optimal team performance in decision-making tasks. Other factors, such as algorithmic aversion and appreciation, have also been shown to be influential in determining the fruitfulness of human-AI decision-making [17].

Despite the ongoing work in the broad realm of human-AI collaboration, fostering appropriate reliance on AI systems in decision-making tasks remains an unsolved problem. Different research communities at the intersection of Artificial Intelligence, Machine Learning, and Human-Computer Interaction are actively working on advancing our understanding of this area and developing methods, tools, and frameworks that can help us benefit from the potential of human-AI collaboration.

At this moment, large language models (LLMs) have found widespread application and adoption across domains. In the remainder of this article, we will explore the opportunities that LLMs provide in aiding human decision-making and intertwined with potential benefits.

LLMs for Decision-Making Tasks

LLMs are being increasingly used in a variety of sociotechnical systems despite demonstrating biases and the potential to cause harm. Having said that, they have also shown promise to have a positive impact at scale — for instance, through supporting auditing processes as demonstrated by Rostagi et al. [8] via an auditing tool that is powered by a generative LLM. The authors proposed to leverage the complementary strengths of humans and generative models in the collaborative auditing of commercial language models. Wu et al. [14] proposed AutoGen, a framework that enables complex LLM-based workflows using multi-agent conversations. AutoGen can support online decision-making tasks such as game-playing or web interactions.

On the one hand, there is evidence that LLMs like GPT-3 exhibit behavior that strikingly resembles human-like intuition — and the cognitive errors that come with it [16]. Recent research has highlighted the feasibility of using ChatGPT for radiologic decision-making, potentially improving clinical workflows and the responsible use of radiology services [18]. To enhance AI safety in decision-making processes, Jin et al. [15] have aimed to replicate and empower LLMs with the ability to determine when a rule should be broken, especially in novel or unusual situations. On the other hand, LLMs can inadvertently perpetuate stereotypes toward marginalized groups [20] and exhibit bias involving race, gender, religion, and political orientation. Similar to the challenges in fostering appropriate trust and reliance among users in decision-support systems, if humans are to rely on LLMs for decision-making, we will need to better understand the benefits and pitfalls of such interactions. A particularly magnified risk in LLM-powered interactions is the apparent ease with which conversational interactions can be facilitated. Prior work has already uncovered the illusionary role of explanatory depth created by using explanations in decision-making tasks, resulting in overreliance on AI systems. If human interactions with decision-support systems are made even more seamless (for example, through interactive or conversational interfaces), we can expect to unearth more instances of inappropriate reliance.

It is increasingly difficult to study LLMs through the lenses of their architecture and hyperparameters. There is sufficient evidence, at this point, to understand that generative AI can produce high-quality written and visual content that may be used for the greater good or misused to cause harm. Potsdam Mann et al. [19] argue that a credit–blame asymmetry arises for assigning responsibility for LLMs outputs and ethical and policy implications focused on LLMs.

What Needs To Be Done Next?

It is evident that more research and empirical work is needed to inform the safe and robust use of LLMs in decision-making tasks. This is particularly evident when considering the current limitations with respect to multimodal and multilingual LLMs. Here is a compilation of some questions that remain critical in determining the extent to which we can consistently benefit from marrying LLMs into everyday decision-making:

How can we facilitate appropriate reliance on LLMs or LLM-infused systems for effective decision-making?
How can we increase the robustness, reliability, and trustworthiness of LLM-infused decision-support systems?
How can we foster appropriate trust and reliance on LLMs in multimodal and multilingual decision-making contexts?
How can people of varying abilities, individual traits, prior knowledge, education and qualifications, and other demographics be equally supported by LLMs in decision-making tasks?

So, if you have an LLM at your fingertips, do not be in a hurry to rely on it as a black-box decision-making aid just yet!

Dr. ir. Ujwal Gadiraju is a tenured Assistant Professor at Delft University of Technology. He co-directs the Delft “Design@Scale” AI Lab and co-leads a Human-Centered AI and Crowd Computing research line. He is a Distinguished Speaker of the ACM and a board member of CHI Netherlands. Ujwal spends a part of his time working at Toloka AI with their AI, data, and research teams and is also an advisory board member for Deeploy, a growing MLOps company.

References

Bertrand, A., Belloum, R., Eagan, J. R., & Maxwell, W. (2022, July). How cognitive biases affect XAI-assisted decision-making: A systematic review. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (pp. 78–91).
Bossaerts, P., & Murawski, C. (2017). Computational complexity and human decision-making. Trends in cognitive sciences, 21(12), 917–929.
Boonprakong, N., He, G., Gadiraju, U., van Berkel, N., Wang, D., Chen, S., Liu, J., Tag, B., Goncalves, J. and Dingler, T., 2023. Workshop on Understanding and Mitigating Cognitive Biases in Human-AI Collaboration.
Buçinca, Z., Malaya, M. B., & Gajos, K. Z. (2021). To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1), 1–21.
Haupt, C. E., & Marks, M. (2023). AI-generated medical advice — GPT and beyond. Jama, 329(16), 1349–1350.
Kahneman, D. (2011). Thinking, Fast and Slow. Macmillan.
Miller, T. (2023, June). Explainable AI is Dead, Long Live Explainable AI! Hypothesis-driven Decision Support using Evaluative AI. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (pp. 333–342).
Rastogi, C., Tulio Ribeiro, M., King, N., Nori, H., & Amershi, S. (2023, August). Supporting human-ai collaboration in auditing llms with llms. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (pp. 913–926).
Rastogi, C., Zhang, Y., Wei, D., Varshney, K. R., Dhurandhar, A., & Tomsett, R. (2022). Deciding fast and slow: The role of cognitive biases in ai-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction, 6(CSCW1), 1–22.
Santos, L. R., & Rosati, A. G. (2015). The evolutionary roots of human decision making. Annual review of psychology, 66, 321–347.
Schemmer, M., Hemmer, P., Kühl, N., Benz, C., & Satzger, G. (2022). Should I follow AI-based advice? Measuring appropriate reliance in human-AI decision-making. arXiv preprint arXiv:2204.06916.
Schwartz, B. (2004). The paradox of choice: Why more is less. New York.
Vasconcelos, H., Jörke, M., Grunde-McLaughlin, M., Gerstenberg, T., Bernstein, M. S., & Krishna, R. (2023). Explanations can reduce overreliance on ai systems during decision-making. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1), 1–38.
Wu, Qingyun, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. AutoGen: Enabling next-gen LLM applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155 (2023).
Jin, Z., Levine, S., Gonzalez Adauto, F., Kamal, O., Sap, M., Sachan, M., Mihalcea, R., Tenenbaum, J. and Schölkopf, B., 2022. When to make exceptions: Exploring language models as accounts of human moral judgment. Advances in neural information processing systems, 35, pp.28458–28473.
Hagendorff, T., Fabi, S., & Kosinski, M. (2022). Machine intuition: Uncovering human-like intuitive decision-making in GPT-3.5. arXiv preprint arXiv:2212.05206.
Erlei, A., Das, R., Meub, L., Anand, A., & Gadiraju, U. (2022, April). For what it’s worth: Humans overwrite their economic self-interest to avoid bargaining with AI systems. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (pp. 1–18).
Rao, A., Kim, J., Kamineni, M., Pang, M., Lie, W., & Succi, M. D. (2023). Evaluating ChatGPT as an adjunct for radiologic decision-making. medRxiv, 2023–02.
Porsdam Mann, S., Earp, B. D., Nyholm, S., Danaher, J., Møller, N., Bowman-Smart, H., … & Savulescu, J. (2023). Generative AI entails a credit–blame asymmetry. Nature Machine Intelligence, 1–4.
Dhingra, H., Jayashanker, P., Moghe, S., & Strubell, E. (2023). Queer people are people first: Deconstructing sexual identity stereotypes in large language models. arXiv preprint arXiv:2307.00101.
He, G., Kuiper, L., & Gadiraju, U. (2023, April). Knowing About Knowing: An Illusion of Human Competence Can Hinder Appropriate Reliance on AI Systems. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1–18).