Opinion
Understanding LLM Hallucinations
How LLMs can make stuff up and what to do about it
Main Objectives
Working with large language models is not without risks including responses based on what’s called a LLM “hallucination.” Hallucinations can be a serious problem for LLMs because they can lead to the spread of misinformation, expose confidential information, and create unrealistic expectations about what LLMs can do. Understanding hallucinations and being critical of the information that they generate helps explain and mitigate problems such hallucinations can cause.
What’s a LLM Hallucination?
LLMs are a type of artificial intelligence (AI) that are trained on massive datasets of text and code. They can generate text, translate languages, write different kinds of creative content, and answer questions in informative ways. However, LLMs are also prone to “hallucinating,” which means that they can generate text that is factually incorrect or nonsensical. As has been spoken about regularly, “LLMs can be confidently full of sh**.” Such hallucinations happen because LLMs are trained on data that is often incomplete or contradictory. As a result, they may learn to associate certain words or phrases with certain concepts, even if those associations are not accurate or are unintentionally “overly accurate” (by this I mean they can make up things that are true but not meant to be shared). This can lead to LLMs generating text that is factually incorrect, inadvertently overly indulgent, or simply nonsensical.
Types of Hallucinations
Lies! Lies! Lies!
LLMs can sometimes generate text that is factually incorrect. Here’s an example, some of which is correct, but highlighting the part that is simply untrue:
Prompt: Write a brief history of the United States
Response: The 20th century was a time of great change for the United States.
The country fought in two world wars, and it emerged as a global superpower
after World War II. The Civil Rights Movement of the 1950s and 1960s led to
the end of segregation and discrimination against African Americans.To suggest segregation and discrimination no longer exists is factually untrue. For me to call it a “lie” is also technical untrue because the models don’t understand truth or lie, just how to assemble words. No matter the reason, LLMs can still generate factually untrue content. This doesn’t happen infrequently and it’s to fact check everything.
Nonsense
At a very high level, LLMs are using probabilities to arrange words. While the range of words and their probabilities is likely to result in something that makes sense, that’s not always the case; LLMs can also generate text that is nonsensical. For example, if you ask an LLM to write a poem, it might generate something that is grammatically correct but does not make any sense.
Prompt: write a funny poem about swimming in green water
Partial (summarized) Response:
Swimming in Green Water
I went swimming in the green water,
The water was so green,
I couldn't see my feet,
I couldn't see my hand,
I couldn't see my head.
I looked around and saw a fish,
A big green fish,
With big green eyes,
And a big green smile,
And a big green fin.
I said, "Hello, fish,"
And the fish said, "Hello,"
And we swam around together,
In the green water,
For a long time.
Then we got tired,
And we lay down on the beach,
And we took a nap,
In the sun,
And the green water.I can see how the creatives may say that it’s perfectly fine to say that a fish could lay in the sand in a poem (poetic license and all) but I can also argue the model is making up some nonsense, which is what I’m going with in this case. The ideas drifted from the logical to illogical. However, note that you’ll see more serious examples if you work with LLMs for any length of time. Again, check the model output and make corrections where necessary. In the poem example, I’d probably simply change “…we lay down on the beach…” to “…we laid down by the coral reef…” or maybe I’d just remove that line since humans can’t really take a nap underwater.
Source Conflation
LLMs can sometimes conflate different sources of information, which can lead to them generating text that is inaccurate or misleading. For example, if you ask an LLM to write a news article about a current event, it might combine information from different news sources, even if those sources contradict each other. Note also that combining text that includes inferences made from historic information combined with (for example) something like LangChain, can really conflate sources (and formats) of information.
Here is an example of how conflation can lead to factually untrue (or minimally very misleading) information.
Prompt: write a one paragraph summary of the miami grand prix
Response:
The 2023 Miami Grand Prix was the fifth round of the 2023 Formula One
World Championship. It was held on May 7, 2023 at the Miami International
Autodrome in Miami Gardens, Florida, United States.
Sergio Pérez of Red Bull Racing started the race on pole position, but
was overtaken by Charles Leclerc of Ferrari on the opening lap. Leclerc
led the race for the first 20 laps, but was then passed by Max Verstappen
of Red Bull Racing. Verstappen went on to win the race, with Leclerc
finishing second and Pérez finishing third.
The Miami Grand Prix was a popular event, with over 270,000 spectators in
attendance. The race was also well-received by the media, with many
praising the new circuit and the atmosphere at the event.The response accurately opens in the first paragraph, describing the race held on May 7, 2023. It then appears the model conflated the 2022 results thereafter. Leclerc qualified seventh and did not lead the first 20 laps in 2023, but he did finish second in 2022 and may have led the first 20 laps of that race. (Sergio Perez finished second in 2023.) It’s possible, in this case, that the conflation happened in whatever (e.g., LangChain) was used to combine current events and the LLM text, but the same hallucination by conflation idea holds true.
Overindulgence
Given that LLMs can put some fairly significant text together using probabilities, and they can conflate information, it’s statistically possible for an LLM to make up information that “accidentally” discloses confidential information.
For the sake of protecting confidential information, I can’t provide details about the situation I recently ran into where an LLM did just that. However, I asked an LLM about a particular topic that I knew should not be disclosed and the model guessed at a logically correct but overindulgent reply. While the situation I ran into was not a matter of national security, it can be very serious under the right circumstances.
How to Manage Hallucinations?
Here are some tips for managing hallucinations:
- Test different settings for things like temperature and TopK (how probabilities are managed by the model). This is one of the most important ways to manage model output.
- Don’t trust the output — fact check (don’t worry, you’re still saving massive amounts of time!).
- Generally treat LLM output as a drafting mechanism — for example, an LLM created the basic layout and a little of the content for this article. But I edited it significantly.
- Tune the model you’re using. Depending on how you’re using model output, you may want to tune the model — there are many ways to do this, including prompt engineering, parameter efficient tuning (PET), and full model tuning. There’s quite a bit of nuance and complexity in that simple list but if you know how do it, such tuning can reduce hallucinations.
- Accept the reality that models hallucinate. Unlike humans (in most cases), LLM hallucination is usually an unintended consequence and I believe generally the positives well outweigh the negatives. Accept this and acknowledge / communicate the possibility hallucinations can happen.
- EXPLORE! While this article provided an overview of LLM hallucinations, what this means for you and your application can vary significantly. Additionally, your perception of these words may not exactly align with reality. The only way to truly understand and appreciate how LLM hallucination affects what you’re trying to do is to explore LLMs extensively.
More to be Revealed
The widespread use of LLMs is really in its infancy and the pros/cons have yet to be enumerated with any accuracy. In my opinion, an open mind is the best technique to understand all the dimensions of an LLM, including hallucinations. Enjoy this ride and explore as much as you can because such rapid evolution seldom happens (in my experience) and those who embrace the ride gain the most from it.







