Q 770. AI agents powered by LLMs can sometimes generate convincing but completely incorrect responses — a phenomenon known as hallucination. Think of a scenario in your domain where hallucination could lead to confusion, loss of trust, or even serious consequences. What steps would you take — using prompts, flow logic, or system design — to detect, reduce, or recover from hallucinated responses? 🏆 The best answer will be selected on the basis of: Realism and impact of the chosen scenario Thoughtfulness in identifying when and how hallucination might occur Practical strategies to prevent or contain the risk Note for website visitors - This platform hosts two weekly questions, one on Monday and the other on Thursday. All previous questions can be found here: https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/. To participate in the current question, please visit the forum homepage at https://www.benchmarksixsigma.com/forum/. The question will be open until Monday or Thursday at 5 PM Indian Standard Time, depending on the launch day. Responses will not be visible until they are reviewed, and only non-plagiarised answers with less than 5-10% plagiarism will be considered for winner selection. If you are unsure about plagiarism, please check your answer using a plagiarism checker tool such as https://smallseotools.com/plagiarism-checker/ before submitting. All correct answers shall be published, and the top-rated answer will be displayed first. The author will receive an honourable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term. Some people seem to be using AI platforms to find forum answers. This is a risky approach as AI responses are error-prone because our questions are application-oriented (they are never straightforward). Have a look at this funny example - https://www.benchmarksixsigma.com/forum/topic/39458-using-ai-to-respond-to-forum-questions/ We also use an AI content detector at https://quillbot.com/ai-content-detector. Only answers with less than 45-50% AI-generated content will be considered for winner selection.

The avoidance of harmful LLM hallucinations requires extensive investments when developing any medical LLMs before real-world deployment. Scenario: AI Hallucination During the National Agency for Food and Drug Administration and Control (NAFDAC)-Triggered Drug Recall Context A major pharmaceutical - consumer goods company in Lagos, Nigeria uses an LLM-powered AI chatbot to handle customer inquiries across its website and social media platforms. During a real product recall of a popular anti-malarial drug, triggered by NAFDAC’s discovery of unregistered batches in circulation, concerned customers begin flooding the chatbot with queries like: “Is batch DX41-002 part of the NAFDAC recall?” The chatbot which is powered by a general-purpose LLM, tries to give response based on its training and internal logic, but it lacks direct access to the official NAFDAC data. Because there is no accurate real-time information, the LLM confidently generates a hallucinated response saying: “No, batch DX41-002 is not affected by the recall and is safe to use.” But realistically, AX45-003 has been recalled and identified by NAFDAC as unregistered batches. Impact of the Hallucination Mistreatment: Consumers may ingest the drug thinking it is safe to use, and since not recommended by NAFDAC, may be unsafe for use and this may have an adverse effect of the safety of the patient. Compliance Issue: Misleading safety statements given by the Chatbot could trigger penalties by NAFDAC. Legal Liability: The company responsible for the manufacture of the drug could become susceptible to lawsuits and consumer protection action. Damage of Reputation: The company can incur bad reputation if the false information gets widely spread on social media platforms. Reason for the hallucination Root Cause Description Lack of data grounding The LLM has no access to live NAFDAC data or an internal batch registry Ambiguous user input Batch codes follow different formats and naming conventions which could lead to misinterpretation Sampling configuration The use of high top-p or temperature values give room for more speculative completions Training Uncertainty The LLM overuses reassuring language when uncertain Strategies to mitigate hallucinations 1. Prompt Engineering and Intent Restriction Give clear and specific prompts so that the chatbot can be guided towards desired outputs and intent restrictions. Disclaimers should also be included. E.g.: “Only respond if the batch code exists in the verified NAFDAC recall database. If missing or unsure, refer the user to an official support. Do not guess.” “Do note that this response is based on the data currently available. Please, kindly verify this information with NAFDAC or contact your healthcare service provider.” 2. Flow Logic and Escalation Paths Design the chatbot’s behaviour with layered safeguards and fallback logic e.g., If the user mentions “batch,” “recall,” “NAFDAC”, trigger a recall-check intent. If no match or confidence level is low, escalate to human agent or provide a static contact link. 3. System Architecture Enhancements By using Retrieval-Augmented Generation (RAG), connect the LLM to a live database of affected batches sourced from NAFDAC or internal compliance systems. Also, instead of fetching responses from scratch, the model can retrieve the relevant facts first, then summarize them. The below model configuration can be adopted. Restrict randomness by using lower top_p (e.g., 0.1–0.2) feature. Use temperature = 0 for deterministic outputs in safety-critical queries. Make use of only template-based responses and generate answers only as configured. 4. Monitoring, Feedback, and Recovery Process · By detecting potential misleading or overconfident statements, automatically flag risky responses. · Provision of user feedback controls by allowing users to report incorrect or unhelpful information. · Keep audit logs for all responses by maintaining traceability for legal and regulatory reviews. Conclusion In pharmaceutical manufacturing domains especially in heavy markets in Nigeria where compliance is a major subject, LLM hallucination is a great ordeal. Mitigating these risks means combining LLM safeguards with a high-level system architecture, data integration and human fallback channels.

Scenario: Project Management Training Imagine an LLM-powered assistant giving guidance on a project management methodology. A learner asks, "Can you explain how Critical Chain Project Management (CCPM) integrates with Agile?" and the AI confidently invents a hybrid that doesn't exist or fabricates terminology. This could: Confuse trainees Undermine the trainer’s credibility Lead to flawed project plans in real-world applications Where It Goes Wrong LLM fills gaps with plausible-sounding text There's no fact-check layer or source attribution Learners assume correctness due to confident tone How to Detect, Reduce, or Recover from Hallucination 1. Prompt Engineering Use verifier prompts: Encourage source citation: 2. Flow Logic Enhancements Add a "Validation Loop": After generating a response, run a follow-up prompt like: Flag the answer if uncertainty is detected. Offer a confidence score or "Reviewed/Unverified" tag per response. 3. System Design Solutions Grounding with reliable sources: Connect the AI to a curated knowledge base (e.g., PMBOK, Agile Alliance, NASA's systems engineering docs) to restrict outputs to validated material. Human-in-the-loop review: Show a preview or summary to a trainer before auto-distributing content to students. Use a “Request Review” button for users to flag suspicious answers. Fallback responses: If hallucination is suspected or detected: 4. Recovery Strategy Log user interactions and provide a feedback loop: Maintain a correction history to revise and improve the model's behavior. Bonus: Domain-Aware Safeguards In your book writing domain: always flag or verify historical or scientific claims. In astrological ceremonies: warn if the AI makes date-specific predictions without traditional validation (e.g., using actual panchang data).

Message added by Mayank Gupta, May 22, 20251 yr

AI or Artificial Intelligence is a self learning and/or self rewriting technology that mimics human mind, intelligence and decision making. It has the ability to evolve and learn basis the responses it receives in different situations. As per IEEE SA, AI is “the combination of cognitive automation, machine learning (ML), reasoning, hypothesis generation and analysis, natural language processing and intentional algorithm mutation producing insights and analytics at or above human capability.”

Hallucination (in plain English) refers to the experience of perceiving something that isn't actually present. It can involve seeing, hearing, or sensing things that are not real.
In Artificial Intelligence, hallucination refers to the situation where AI (generally large language models) generates responses that seem to be correct but are either factually incorrect, misleading, or completely fabricated. Hallucinations occur because AI relies on patterns in its training data rather than verified knowledge.

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Nwamaka Benedicta Olorungbade on 19th May 2025.

Applause for all the respondents - Nwamaka Benedicta Olorungbade, Airat Aroyewun, Sakshi Dixit, Divya Iyer, Swarandeep Kaur Juneja, Diop Saliou, A.Kumar, Giridarasanmugaraja Kathirvel.

When AI Sounds Confident — But Is Totally Wrong

Followers

May 19, 20251 yr

Q 770. AI agents powered by LLMs can sometimes generate convincing but completely incorrect responses — a phenomenon known as hallucination.
Think of a scenario in your domain where hallucination could lead to confusion, loss of trust, or even serious consequences. What steps would you take — using prompts, flow logic, or system design — to detect, reduce, or recover from hallucinated responses?

🏆 The best answer will be selected on the basis of:

Realism and impact of the chosen scenario
Thoughtfulness in identifying when and how hallucination might occur
Practical strategies to prevent or contain the risk

Note for website visitors -

This platform hosts two weekly questions, one on Monday and the other on Thursday.
All previous questions can be found here: https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/.
To participate in the current question, please visit the forum homepage at https://www.benchmarksixsigma.com/forum/.
The question will be open until Monday or Thursday at 5 PM Indian Standard Time, depending on the launch day.
Responses will not be visible until they are reviewed, and only non-plagiarised answers with less than 5-10% plagiarism will be considered for winner selection.
If you are unsure about plagiarism, please check your answer using a plagiarism checker tool such as https://smallseotools.com/plagiarism-checker/ before submitting.
All correct answers shall be published, and the top-rated answer will be displayed first. The author will receive an honourable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term.
Some people seem to be using AI platforms to find forum answers. This is a risky approach as AI responses are error-prone because our questions are application-oriented (they are never straightforward). Have a look at this funny example - https://www.benchmarksixsigma.com/forum/topic/39458-using-ai-to-respond-to-forum-questions/
We also use an AI content detector at https://quillbot.com/ai-content-detector. Only answers with less than 45-50% AI-generated content will be considered for winner selection.

Solved by Benedicta Olorungbade

May 19, 20251 yr

Go to solution

May 19, 20251 yr

Solution

The avoidance of harmful LLM hallucinations requires extensive investments when developing any medical LLMs before real-world deployment.

Scenario: AI Hallucination During the National Agency for Food and Drug Administration and Control (NAFDAC)-Triggered Drug Recall

Context

A major pharmaceutical - consumer goods company in Lagos, Nigeria uses an LLM-powered AI chatbot to handle customer inquiries across its website and social media platforms.

During a real product recall of a popular anti-malarial drug, triggered by NAFDAC’s discovery of unregistered batches in circulation, concerned customers begin flooding the chatbot with queries like:

“Is batch DX41-002 part of the NAFDAC recall?”

The chatbot which is powered by a general-purpose LLM, tries to give response based on its training and internal logic, but it lacks direct access to the official NAFDAC data.

Because there is no accurate real-time information, the LLM confidently generates a hallucinated response saying: “No, batch DX41-002 is not affected by the recall and is safe to use.” But realistically, AX45-003 has been recalled and identified by NAFDAC as unregistered batches.

Impact of the Hallucination

Mistreatment: Consumers may ingest the drug thinking it is safe to use, and since not recommended by NAFDAC, may be unsafe for use and this may have an adverse effect of the safety of the patient.
Compliance Issue: Misleading safety statements given by the Chatbot could trigger penalties by NAFDAC.
Legal Liability: The company responsible for the manufacture of the drug could become susceptible to lawsuits and consumer protection action.
Damage of Reputation: The company can incur bad reputation if the false information gets widely spread on social media platforms.

Reason for the hallucination

Root Cause	Description
Lack of data grounding	The LLM has no access to live NAFDAC data or an internal batch registry
Ambiguous user input	Batch codes follow different formats and naming conventions which could lead to misinterpretation
Sampling configuration	The use of high top-p or temperature values give room for more speculative completions
Training Uncertainty	The LLM overuses reassuring language when uncertain

Strategies to mitigate hallucinations

1. Prompt Engineering and Intent Restriction

Give clear and specific prompts so that the chatbot can be guided towards desired outputs and intent restrictions. Disclaimers should also be included.

E.g.:
“Only respond if the batch code exists in the verified NAFDAC recall database. If missing or unsure, refer the user to an official support. Do not guess.” “Do note that this response is based on the data currently available. Please, kindly verify this information with NAFDAC or contact your healthcare service provider.”

2. Flow Logic and Escalation Paths

Design the chatbot’s behaviour with layered safeguards and fallback logic e.g., If the user mentions “batch,” “recall,” “NAFDAC”, trigger a recall-check intent. If no match or confidence level is low, escalate to human agent or provide a static contact link.

3. System Architecture Enhancements

By using Retrieval-Augmented Generation (RAG), connect the LLM to a live database of affected batches sourced from NAFDAC or internal compliance systems. Also, instead of fetching responses from scratch, the model can retrieve the relevant facts first, then summarize them.

The below model configuration can be adopted.

Restrict randomness by using lower top_p (e.g., 0.1–0.2) feature.
Use temperature = 0 for deterministic outputs in safety-critical queries.
Make use of only template-based responses and generate answers only as configured.

4. Monitoring, Feedback, and Recovery Process

· By detecting potential misleading or overconfident statements, automatically flag risky responses.

· Provision of user feedback controls by allowing users to report incorrect or unhelpful information.

· Keep audit logs for all responses by maintaining traceability for legal and regulatory reviews.

Conclusion

In pharmaceutical manufacturing domains especially in heavy markets in Nigeria where compliance is a major subject, LLM hallucination is a great ordeal. Mitigating these risks means combining LLM safeguards with a high-level system architecture, data integration and human fallback channels.

May 19, 20251 yr

In my area of work, and that is manufacturing support, if an AI agent provides the wrong material code or material status, production delay or wrong items are used.

To reduce hallucination:

I make highly specific prompts that go something like: “You may only reply with the code of what material is listed in the database.”

I hook the AI up to a certified Google Sheet so it only scrapes live figures.

I put in debouncer checks, like “Is it the right code? Yes/No” to allow users to confirm before proceeding.

I have the logic to display the source data as well, so users can verify the info themselves.

May 20, 20251 yr

Artificial intelligence has rapidly evolved into a powerful tool for communication, research, and decision-making. Language models like ChatGPT, Bing Copilot, and others are now used to generate emails, write reports, assist in legal drafting, and even provide medical insights. But there's a critical flaw in how these systems interact with users: they often sound extremely confident — even when they're completely wrong.

This mismatch between confidence and correctness isn't just a coincidence. It's a core limitation of how AI models work, and it can have serious consequences. Here's a deep dive into the phenomenon, why it happens, and how to protect yourself from being misled.

1. The Illusion of Authority

AI-generated responses often use professional, well-structured, and assertive language. This tone creates an illusion of authority, even when the underlying facts are incorrect.

Example:

Ask AI "Is it safe to mix ibuprofen with alcohol?"
AI Answer: “It is generally safe to consume a moderate amount of alcohol with ibuprofen, as there is no known interaction between the two.”
In reality, combining alcohol with NSAIDs like ibuprofen increases the risk of stomach bleeding and liver damage. The advice is dangerous — but sounds clinical and calm, which may disarm the reader.

This isn't deception; it's design. Language models are trained to produce fluent, plausible-sounding text, not to verify facts.

2. Hallucination Under Pressure

When AI is asked about something obscure or under-documented, it tends to hallucinate — a term researchers use to describe AI inventing answers that sound credible but are entirely fabricated.

Example:

Ask: “What journal published Dr. Laila Thompson’s theory on quantum biology?”

If such a person doesn’t exist, an AI might fabricate an answer like: Dr. Thompson’s theory was published in the Journal of Advanced Quantum Studies in 2018.”

Neither the person nor the journal may exist. But the tone remains scholarly and assured.

3. No Real Understanding of the World

AI models don’t understand facts the way humans do. They don’t know* that birds aren’t mammals or that Paris isn’t in Italy — they just predict the next word based on patterns.

Example:

Prompt: “What kind of mammal is an eagle?”

An AI might respond: Eagles are large birds of prey and belong to the family Accipitridae. They are powerful mammals known for their vision and hunting skills.

The model merged conflicting concepts (bird vs. mammal) but still delivered a grammatically perfect — and confidently incorrect — sentence.

4. When the Stakes Are High

This confidence-error mismatch becomes dangerous in high-stakes domains like medicine, law, finance, or engineering. If professionals rely on unverified AI outputs, the risk of serious error increases.

Ask to AI: A researcher asks AI to help write a grant application and includes claims on CRISPR-Cas9 :

AI answer: “CRISPR-Cas9 has been clinically proven to reverse Alzheimer's in humans.”

While CRISPR shows promise in genetic therapy, no clinical reversal of Alzheimer’s has been proven. This is a hallucinated or overstated claim.

Language models like GPT or Claude don’t have a database of facts. Instead, they use probabilities based on patterns from billions of text samples. That’s why they’re so good at mimicking human speech — and also why they sometimes lie with style. The model doesn't know it's wrong; it has no internal conscience, no sense of doubt. Its confidence comes from its fluency, not its truthfulness.

AI might sound like the smartest person in the room — but it’s often just the most confident. As we enter a world where AI co-authors emails, scripts news reports, and powers search engines, it’s vital to remember: confidence is not competence.

Until AI can truly distinguish fact from fiction, we must — and that means listening carefully, questioning often, and always verifying.

May 20, 20251 yr

In financial services, AI hallucinations, such as incorrect investment advice or misinterpretation of regulations, can cause confusion, loss of client trust, or regulatory issues. To reduce this risk, I would use constrained prompts, implement validation checks against trusted data, and include a human review layer before sharing outputs with clients. Adding disclaimers or confidence indicators can also help users identify and question uncertain responses, ensuring greater reliability and accountability.

May 20, 20251 yr

AI is trained to predict based on past patterns and might not verify the facts and understand the context .It would be better to verify from sources for critical decisions and do some cross questioning.
AI might not understand the human sense as it wouldn't know what is true or false.It responds only as what seems correct to it based on training data.
It might be giving outdated or incorrect medical advice without understanding the backgound of the patient or it might misquote Misquoting historical events(bias) or give incorrect advice on laws as there could be training limitations .

Even the best models could be trained using imperfect data extracted from the internet or other sources of information which might be outdated. As a User we should always keep in mind that confidence does not mean accuracy.
Its always better to not rely completely on AI in critical situations.

May 20, 20251 yr

In supply chain process, AI hallucinations can lead to serious issues like incorrect inventory forecast, incorrect or misunderstood contract terms and agreements. In some extreme scenarios this might also lead to financial and reputational loss

Few steps that can taken to mitigate such situation is:

1. Ensure the AI retrieves data only from verified sources of supplier database.

2. Cross verification in terms of requirement and availability of products.

3. Ensure there is transparency by mentioning when the AI generated information is sourced from predictive models and not verified sources and vice-versa.

4. Human intervention in case of any red flagged scenarios.

Overall focusing on correct source, verification and strict workflow can help in evading major issues.

May 20, 20251 yr

Scenario: Project Management Training

Imagine an LLM-powered assistant giving guidance on a project management methodology. A learner asks, "Can you explain how Critical Chain Project Management (CCPM) integrates with Agile?" and the AI confidently invents a hybrid that doesn't exist or fabricates terminology. This could:

Confuse trainees
Undermine the trainer’s credibility
Lead to flawed project plans in real-world applications

Where It Goes Wrong

LLM fills gaps with plausible-sounding text
There's no fact-check layer or source attribution
Learners assume correctness due to confident tone

How to Detect, Reduce, or Recover from Hallucination

1. Prompt Engineering

Use verifier prompts:

“Only answer if you are confident. If unsure or if the source is ambiguous, say you’re not certain.”
Encourage source citation:

“Cite your sources and provide links or references for any factual claims.”

2. Flow Logic Enhancements

Add a "Validation Loop":
- After generating a response, run a follow-up prompt like:
  
  “Double-check this answer for accuracy. Are any parts speculative or unverifiable?”
- Flag the answer if uncertainty is detected.
Offer a confidence score or "Reviewed/Unverified" tag per response.

3. System Design Solutions

Grounding with reliable sources:
Connect the AI to a curated knowledge base (e.g., PMBOK, Agile Alliance, NASA's systems engineering docs) to restrict outputs to validated material.
Human-in-the-loop review:
- Show a preview or summary to a trainer before auto-distributing content to students.
- Use a “Request Review” button for users to flag suspicious answers.
Fallback responses:
If hallucination is suspected or detected:

“I’m not confident about that answer. Would you like to review a vetted source or consult an expert?”

4. Recovery Strategy

Log user interactions and provide a feedback loop:

“Was this answer helpful or accurate?” with a feedback form.
Maintain a correction history to revise and improve the model's behavior.

Bonus: Domain-Aware Safeguards

In your book writing domain: always flag or verify historical or scientific claims.
In astrological ceremonies: warn if the AI makes date-specific predictions without traditional validation (e.g., using actual panchang data).

May 20, 20251 yr

In the domain of telecom project and process management, AI hallucinations — where models generate confident but factually incorrect content — can have serious implications.

Scenario:

Imagine using an AI assistant to auto-generate performance reports or root cause analysis (RCA) summaries for customer-impacting incidents. The AI misinterprets log patterns and asserts a hardware fault, whereas the real issue is a misconfigured software update. Because the explanation sounds technically sound and confident, it’s accepted without cross-checking.

Result:
   •   Wrong team mobilized (e.g., hardware vendors instead of DevOps)
   •   Delays in resolution
   •   Escalations from customers
   •   Incorrect internal reporting to leadership
   •   Regulatory reporting errors

✅ How to Prevent or Manage This:
   1.   Human-in-the-Loop (HITL):
Always involve SMEs to validate AI-generated analysis before it reaches key decision-makers or clients.
   2.   Fact-Referencing Enforcement:
Demand the AI to cite log IDs, ticket numbers, or source documents. If it can’t, flag the output for mandatory review.
   3.   Domain-Specific Fine-Tuning:
Train the AI model on validated, telecom-specific datasets and terminologies to reduce context mismatches.
   4.   Restricted Prompting:
Use structured prompts that limit speculation and only allow summarizing known inputs.
E.g., “Summarize ticket data without assuming causes unless explicitly mentioned.”
   5.   Risk-Based Output Segmentation:
Tag AI responses into risk levels. High-risk outputs (e.g., RCA, financial impacts) should be double-checked.
   6.   Feedback Loops:
Capture hallucinations and feed them back into the model’s training pipeline or validation filters.

Conclusion:

While AI can enhance productivity, unchecked confidence in its responses can mislead teams and damage trust. Building a robust validation framework, enforcing traceability, and using human oversight are essential steps in ensuring responsible AI usage in critical domains like telecom operations.

May 20, 20251 yr

In an audit firm, an AI agent assists in drafting responses to client queries on complex tax provisions (e.g., applicability of Section 115BAB or interpretation of ICDS). A hallucinated explanation — such as misstating the turnover threshold for concessional tax rates or inventing a non-existent circular — could mislead clients, erode credibility, or trigger regulatory non-compliance if relied upon.

Hallucination Risk Points:
   •   When the AI draws outdated or fabricated citations (e.g., “CBDT Circular No. XYZ”)
   •   When summarizing provisions without context (e.g., ignoring amendments or carve-outs)
   •   When overconfidently stating positions without flagging uncertainty.

Mitigation Strategy:
   1.   Prompt Engineering:
   •   Frame prompts to explicitly instruct citation boundaries, e.g.:
“Summarize Section 115BAB as per Income Tax Act, 1961, without inventing any provisions. If unsure, say ‘Needs expert review’.”
   2.   Guardrails via Flow Logic:
   •   Add a “confidence check” node: if the AI’s output has no verified source or includes legal terms, route it to human review.
   •   Include a flag like: “This content is AI-generated and requires review before sharing externally.”
   3.   System Design:
   •   Integrate a legal/tax database API (like Taxmann or CCH) to validate key facts before output is finalized.
   •   Use RAG (Retrieval-Augmented Generation) to anchor responses in reliable excerpts.
   4.   Recovery Plan:
   •   Maintain a feedback loop where auditors can mark hallucinated responses.
   •   Automatically retrain or fine-tune the system with validated corrections and red-flagged errors.

Impact:
By combining prompt discipline, flow logic, and retrieval validation, hallucination risk is minimized. This preserves trust, ensures compliance, and upholds professional standards expected in regulated domains like audit and tax.

May 20, 20251 yr

Again, I like to use flour Mill as an example - Hallucination in Flour Milling Operations

A factory manager uses an AI-powered assistant to optimize wheat blending ratios for specific flour quality targets (example) for pasta vs. noodles). The LLM, based on prior conversations or generic knowledge, suggests:

"Use 70% hard red winter wheat and 30% soft red wheat to meet noodle flour specifications in Nigeria."

However, this blending recommendation is hallucinated — it doesn’t match the actual wheat properties available in the silos or the target ash and protein content needed. Following this advice could lead to:

•Off-spec flour impacting product quality

•Rejection from QC or downstream customers

•Increased rework or wastage

•Loss of trust in AI system recommendations

Steps to Prevent or Correct Hallucination

1. Prompt Engineering

Use prompts that anchor the AI to local, real-time data: Based on current silo stock wheat types and lab-tested specs, suggest a blend for noodle flour with 12.5% protein and 0.65% ash.

2. Flow Logic Safeguards

•Require the LLM to reference lab test results or ERP data before answering.

•Set rules to stop AI from generating recommendations without access to validated data.

3. System Design Improvements

•Integrate the LLM with a wheat spec database or lab interface (RAG-based system).

•Add warnings or uncertainty tags when confidence is low or input data is missing.

•Enable a review mode, where a miller can verify and adjust the recommendation before implementing.

By embedding AI into the real-time process flow with verified data and human checks, flour milling businesses can leverage LLMs effectively while minimizing operational risk from hallucinations.

May 21, 20251 yr

A scenario in HR domain where hallucination could lead to confusion, loss of trust, or even serious consequences is when a chatbot is set-up to share details and answer questions regarding company’s policies. E.g. An employee asks the chatbot about Sandwich leave policy and the bot responds with a detailed explanation of the policy, however the response is entirely fabricated and not based on the actual company policy.

This might lead to a lot of confusion due to information provided by the chatbot, employees might lose trust in the HR department to provide accurate information and in extreme circumstances if an employee takes leaves or makes decisions based on incorrect information, this might impact their job security or career advancement.

Controls –

System Design – Ensure that LLM is trained on accurate and up-to-date data, and implement validation mechanisms to verify the accuracy of responses. Include source of information in AI responses to build trust amongst employees on the authenticity of the data.

HITL (Human in the Loop) – All AI generated responses must by reviewed by HR professionals.

Continuous Monitoring and Improvement – Regularly monitor the performance of the AI model and update it as needed to prevent hallucinations and improve accuracy.

May 21, 20251 yr

Scenario: The Access Review Recommendation is generated with the assistance of an AI Agent that has comprehensive access to all relevant information about the user, their activities, and their access permissions. The expected outcome is for the AI Agent to analyze this data and provide a detailed recommendation to the access owner or manager on whether to approve or revoke access.

Process Flow:

The Access Owner initiates the Voiceflow and inquires about any pending access reviews for a specific individual.
The Access Review Agent queries various APIs and knowledge bases to identify pending access reviews.
The Access Review Recommendation Agent is then activated to generate recommendations for each pending access review.
The Access Owner receives the recommendation, which includes explanations supported by data and rationale, and then takes the recommended action.

Potential Risks: If the AI Agent starts to generate recommendations based on incomplete data or focuses too narrowly on certain aspects, it could result in incorrect access approvals, posing significant risk and compliance issues, or inappropriately revoking access, which could adversely impact business operations. Even hallucinated results are often backed up by rationale and data, making it difficult for the end-user to challenge the recommendation, leading to a high likelihood (99%) of the end-user following the suggested action.

Avoiding Hallucination:

Provide clear instructions to the Recommendation Agent to only make recommendations if the required data is available.
Include a confidence percentage to indicate to the user how confident the AI is in its proposed recommendation.
Classify proposed recommendations using an existing model to categorize them as either low risk or high risk.
High-risk items should follow an additional verification flow with another level of verification using a different LLM model.

May 21, 20251 yr

Let's consider saltiness of a product as quality release crititeria.

Ai answer with confident the quantity of salt to be added in the recipe.

if the saltiness level is high or too low, consumer could reject to product which could lead to loss:

- consumer trust,

- market share

- Punishment from autority of regulation....

To avoid this:

- Prompt engineering with validation check in internal DB

- Human validation or escalation to Human agent when answer is not found in the DB

May 22, 20251 yr

Here can be one of incidence where AI may sounds confident but can be totally wrong.

Taking example in research domain. One discovery leads to another discovery. Researcher referred research article for the available knowledge. Traditionally research article were in paper format (thus mostly as research papers) but in current scenario all the articles are available digitally. There are numerous research article for a given problem statement which are in public domain, which can both genuine (Proved) and non-genuine (Ambiguous).

Researched with help of AI agent, generally would refer both the internal knowledge base and external knowledge base to understand and predict their research work outcome. Technically based on the trained data sets and acquired knowledge, AI would be confident about the outcome based on its input data but it may happen that inclusion of unidentified/ misunderstood/ambiguous data would give another outcome far from the reality.

Similar case study can be at diagnostic purpose at healthcare domain. Like all the symptoms may predicts some disease however it may happen that the symptoms are adverse reaction of a particular medicine which the patient is on.

May 22, 20251 yr

AI Agent: Customer Support Agent

Scenario:

A customer inquiry about the feature of the product that he/she wants to buy: the user wants to confirm whether this product has this feature or not. The e-commerce platform provides a customer service agent which will clarify the customer's query regarding the features of the product.

Product: Mixer Grinder
Customer question: Does the mixer grinder model A210 have a self-cleaning feature?

Hallucination:
The AI agent replies, "Yes, the mixer grinder model A210 is equipped with the self-cleaning feature after the grinding by pressing the self-cleaning button in the mixer grinder."

Actually this model doesn’t have self-cleaning feature and there is no button provided in the mixer grinder.

Consequences:
The customer might buy the product expecting that it have that feature of self-cleaning in the model. When the customer receives the product, finds the missing feature in the model, and is disappointed with the product, it leads to the return of the product and a negative review. The customer will lose trust in the company's AI agent and also the brand.

To make it better,

Prompt:
When a customer asks about the product feature of any model, always refer to the official product specification mentioned in the product database or website.
Do not invent the features, if a requested feature is not explicitly listed in the product data or specifications, respond to the customer as "The enquired feature is not listed in the specification for this model."

Flow Logic:
Knowledge Base restriction

The AI agent should look up in the database, especially for the queries related to product specifications where it is listed as model, feature list, and specifications.

Product verification
If the feature is found in the database, then the AI agent should reply, "Yes, the feature is available." If not, "This model doesn't have this feature."

It should not create any false positive response to the customer when the feature is not available in the database.

"Was this information helpful about this Mixer Grinder A210?" This kind of small follow-up message will allow customers to correct if they suspect an error during conversation.

Control
The AI agent should be integrated with and primarily restricted to the official product information database. There should be limitations on the LLM's ability to generate information based on the specific products; the output should be derived from the verified product data in the knowledge base.

1 yr1 yr Rohit Gandhi locked this topic

May 22, 20251 yr

Nwamaka Benedicta Olorungbade has provided the best answer to this question by quoting a genuine hallucination example, its downside and how to prevent it from happening again. Well done!

1 yr1 yr Rohit Gandhi unlocked this topic

Create an account or sign in to comment

Followers

Go to topic listing

When AI Sounds Confident — But Is Totally Wrong

Featured Replies

Solved by Benedicta Olorungbade

Scenario: Project Management Training

Where It Goes Wrong

How to Detect, Reduce, or Recover from Hallucination

1. Prompt Engineering

2. Flow Logic Enhancements

3. System Design Solutions

4. Recovery Strategy

Bonus: Domain-Aware Safeguards

Create an account or sign in to comment

Who's Online (See full list)

Lead AI Transformation without coding

Most Solved

Forum Statistics

Member Statistics

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)