The avoidance of harmful LLM hallucinations requires extensive investments when developing any medical LLMs before real-world deployment.
Scenario: AI Hallucination During the National Agency for Food and Drug Administration and Control (NAFDAC)-Triggered Drug Recall
Context
A major pharmaceutical - consumer goods company in Lagos, Nigeria uses an LLM-powered AI chatbot to handle customer inquiries across its website and social media platforms.
During a real product recall of a popular anti-malarial drug, triggered by NAFDAC’s discovery of unregistered batches in circulation, concerned customers begin flooding the chatbot with queries like:
“Is batch DX41-002 part of the NAFDAC recall?”
The chatbot which is powered by a general-purpose LLM, tries to give response based on its training and internal logic, but it lacks direct access to the official NAFDAC data.
Because there is no accurate real-time information, the LLM confidently generates a hallucinated response saying: “No, batch DX41-002 is not affected by the recall and is safe to use.” But realistically, AX45-003 has been recalled and identified by NAFDAC as unregistered batches.
Impact of the Hallucination
Mistreatment: Consumers may ingest the drug thinking it is safe to use, and since not recommended by NAFDAC, may be unsafe for use and this may have an adverse effect of the safety of the patient.
Compliance Issue: Misleading safety statements given by the Chatbot could trigger penalties by NAFDAC.
Legal Liability: The company responsible for the manufacture of the drug could become susceptible to lawsuits and consumer protection action.
Damage of Reputation: The company can incur bad reputation if the false information gets widely spread on social media platforms.
Reason for the hallucination
Root Cause
Description
Lack of data grounding
The LLM has no access to live NAFDAC data or an internal batch registry
Ambiguous user input
Batch codes follow different formats and naming conventions which could lead to misinterpretation
Sampling configuration
The use of high top-p or temperature values give room for more speculative completions
Training Uncertainty
The LLM overuses reassuring language when uncertain
Strategies to mitigate hallucinations
1. Prompt Engineering and Intent Restriction
Give clear and specific prompts so that the chatbot can be guided towards desired outputs and intent restrictions. Disclaimers should also be included.
E.g.:
“Only respond if the batch code exists in the verified NAFDAC recall database. If missing or unsure, refer the user to an official support. Do not guess.” “Do note that this response is based on the data currently available. Please, kindly verify this information with NAFDAC or contact your healthcare service provider.”
2. Flow Logic and Escalation Paths
Design the chatbot’s behaviour with layered safeguards and fallback logic e.g., If the user mentions “batch,” “recall,” “NAFDAC”, trigger a recall-check intent. If no match or confidence level is low, escalate to human agent or provide a static contact link.
3. System Architecture Enhancements
By using Retrieval-Augmented Generation (RAG), connect the LLM to a live database of affected batches sourced from NAFDAC or internal compliance systems. Also, instead of fetching responses from scratch, the model can retrieve the relevant facts first, then summarize them.
The below model configuration can be adopted.
Restrict randomness by using lower top_p (e.g., 0.1–0.2) feature.
Use temperature = 0 for deterministic outputs in safety-critical queries.
Make use of only template-based responses and generate answers only as configured.
4. Monitoring, Feedback, and Recovery Process
· By detecting potential misleading or overconfident statements, automatically flag risky responses.
· Provision of user feedback controls by allowing users to report incorrect or unhelpful information.
· Keep audit logs for all responses by maintaining traceability for legal and regulatory reviews.
Conclusion
In pharmaceutical manufacturing domains especially in heavy markets in Nigeria where compliance is a major subject, LLM hallucination is a great ordeal. Mitigating these risks means combining LLM safeguards with a high-level system architecture, data integration and human fallback channels.