Q 816. In many AI-driven processes, a model or agent produces a result with a certain level of confidence — but confidence doesn’t always mean correctness. If AI acts too early, it risks errors; if it waits too long, it may slow down the process or frustrate users. Think of one process in your domain where AI makes or recommends decisions. How should the system decide when it’s confident enough to act automatically, and when it should pause or escalate to a human? What factors — such as data quality, past accuracy, or risk level — should influence that threshold? ⚠️ Note: Any answer that is generic or does not connect with a specific, relevant process will not be approved. 🏆 The best answer will be selected on the basis of: Relevance of the chosen process scenario Clarity and depth in defining confidence thresholds Practicality of balancing speed, accuracy, and trust Note for website visitors - This platform hosts two weekly questions, one on Monday and the other on Thursday. All previous questions can be found here: https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/. To participate in the current question, please visit the forum homepage at https://www.benchmarksixsigma.com/forum/. The question will be open until Monday or Thursday at 5 PM Indian Standard Time, depending on the launch day. Responses will not be visible until they are reviewed, and only non-plagiarised answers with less than 5-10% plagiarism will be considered for winner selection. If you are unsure about plagiarism, please check your answer using a plagiarism checker tool such as https://smallseotools.com/plagiarism-checker/ before submitting. All correct answers shall be published, and the top-rated answer will be displayed first. The author will receive an honourable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term. Some people seem to be using AI platforms to find forum answers. This is a risky approach as AI responses are error-prone because our questions are application-oriented (they are never straightforward). Have a look at this funny example - https://www.benchmarksixsigma.com/forum/topic/39458-using-ai-to-respond-to-forum-questions/ We also use an AI content detector at https://quillbot.com/ai-content-detector. Only answers with less than 45-50% AI-generated content will be considered for winner selection.

AI-Based Quality Inspection for Tea Bags – Ekaterra Lipton - UAE I am sharing this personal experience from my training & consulting experience at Ekaterra Lipton – UAE. Like most of the modern manufacturing plants, Ekaterra Lipton UAE also uses AI-powered vision systems to inspect tea bags for defects such as empty tea bags, torn/damaged tea bags, missing tags, smears etc. These tea bags once filled roll over the conveyor belt before packaging wherein the screener constantly scans every tea bag. Their inhouse AI model assigns each item a defect probability score (0–1) based on image analysis. Confidence Threshold Logic AI Confidence Level System Action Rationale > 0.90 (High confidence – clear defect or no defect) AI acts automatically — accept or reject the item Their model has historically achieved >95% accuracy in these cases, allowing fast throughput without delaying the process. 0.5–0.90 (Moderate confidence) AI flags item for human re-inspection Mixed visual screener indicators (e.g., slight smear, tags) which reduces certainty to accept or reject; a quality inspector validates the decision. < 0.5 (Low confidence) AI pauses and escalates for manual inspection and potential model retraining with new information new defect patterns, Image noise, or faulty sensors make autonomous action risky. Factors Influencing Confidence Thresholds Sensor Conditions & Calibration The most common reasons are poor lighting conditions, dust accumulation on camera / screener lens, or camera calibration issues can distort images and trigger lower confidence score thereby triggering more human reviews. Past Model Accuracy & Drift Thresholds are dynamic and it emphasizes by putting weightage on the most recent trend and reduces weightage on older results. If rolling 60 data points i.e. recent performance false rejection rate exceeds 2.5%, the system self corrects & tightens the auto-decision range to prevent wastage. Continuous Learning Loop Every Human re-inspection results are fed back real time to retrain the model. Over time, this raises confidence reliability and reduces manual interventions. Balancing Speed, Accuracy, and Trust Speed: AI handles routine inspections in real time, maintaining production flow. Accuracy: Ambiguous cases are reviewed by experts, reducing false positives. Trust: Operators see AI as a collaborator, not a replacement — decisions are explainable, auditable, and based on confidence logic. Example In a tea bag production at Ekaterra Lipton, the AI system detects a torn tea bag. AI Confidence Score = 0.97 : Clear Auto-reject case. Another image shows a minor smear (confidence = 0.66) : Flagged for human review. If confirmed defective, the feedback improves future accuracy.

Domain: Quality Assurance in Manufacturing How Confident Should AI Be Before It Acts? In a Quality Department confidence level is not about a percentage threshold (>80% GO / <80 % NOGO) but it is about risk, reversibility and responsibility. AI is now embedded in many day to day QA activities, from in-line inspection and NCR tracking to critical suppliers performance monitoring. While final accountability stays with humans, AI can make informed decisions in defined, low-risk zones that help speed up the process without compromising integrity. The Process: In-Process Inspection and Defect Flagging In the shop floor, AI integrated laser 3D scanners detect surface marks, scratches or stains. Here is how it plays out: For critical sealing surfaces or safety features, even if AI detects a high probability of defect. It should only alert and hold the parts, never proceed to reject or scrap. QA must validate before taking any actions. For example, AI integrated laser 3D scanner may flag a White patch mark under 20X magnification. Is the white patch below or above the coating surface. Rejection criterion is “Visible to the naked eye not 20X magnification” and “fluorescent under UV light in dark room” only QA can check and decide. No matter how much confidence % based on co-relation to historic data, AI should not decide. AI’s confidence is information — not authority. It only raises alarm, will not pull the plug. For non-functional or cosmetic areas if the detection system has proven reliable over time, AI can auto-tag parts for recheck or rework and allow the line to continue. These are safe, reversible decisions where AI genuinely adds value. When Can AI Decide on Its Own? AI should act automatically only in areas where: The risk is minimal :- e.g., non-functional cosmetic issues or repetitive, well-understood defects. E.g Dent upto 1 x 1 mm in 10CM2 is permitted. If laser scanner measures this and is with in the acceptance limit, Then AI can decide to accept by saving the pictures for future reference. The action is reversible :- such as routing a part for re-inspection or initiating re-measurement. The decision is data driven and routine — for example: Auto-adjusting sampling plans when process stability is proven. Triggering calibration reminders when measurement drift from mean is detected. Flagging repeated rejection to tighten incoming checks and supplier outgoing inspection. These are operational support decisions, not customer-impacting ones. They save time, reduce fatigue, and allow engineers to focus on complex, judgment-based issues. The Balance AI should be trusted to act where it can’t harm and required to ask where it can harm. It can manage data, spot deviations and trigger safe, reversible actions — but the moment a decision touches safety, customer experience or compliance. The final say belongs to humans, That’s the real balance: AI ensures speed and consistency, humans ensure wisdom and accountability. Together, they create a Quality system that is fast, reliable and deeply human at its core.

Message added by Nisusho Zhimomi, October 21, 2025Oct 21

AI or Artificial Intelligence is a self learning and/or self rewriting technology that mimics human mind, intelligence and decision making. It has the ability to evolve and learn basis the responses it receives in different situations. As per IEEE SA, AI is “the combination of cognitive automation, machine learning (ML), reasoning, hypothesis generation and analysis, natural language processing and intentional algorithm mutation producing insights and analytics at or above human capability.”

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Shashi Prakash on 20 October 2025.

Applause for all the respondents - Adil Khan, Manik Sood, Shashi Prakashi, Sanjib Ghosal

How Confident Should AI Be Before It Acts?

Followers

October 16, 2025Oct 16

Q 816.
In many AI-driven processes, a model or agent produces a result with a certain level of confidence — but confidence doesn’t always mean correctness.

If AI acts too early, it risks errors; if it waits too long, it may slow down the process or frustrate users.

Think of one process in your domain where AI makes or recommends decisions.

How should the system decide when it’s confident enough to act automatically, and when it should pause or escalate to a human?

What factors — such as data quality, past accuracy, or risk level — should influence that threshold?

⚠️ Note: Any answer that is generic or does not connect with a specific, relevant process will not be approved.

🏆 The best answer will be selected on the basis of:

Relevance of the chosen process scenario
Clarity and depth in defining confidence thresholds
Practicality of balancing speed, accuracy, and trust

Note for website visitors -

This platform hosts two weekly questions, one on Monday and the other on Thursday.
All previous questions can be found here: https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/.
To participate in the current question, please visit the forum homepage at https://www.benchmarksixsigma.com/forum/.
The question will be open until Monday or Thursday at 5 PM Indian Standard Time, depending on the launch day.
Responses will not be visible until they are reviewed, and only non-plagiarised answers with less than 5-10% plagiarism will be considered for winner selection.
If you are unsure about plagiarism, please check your answer using a plagiarism checker tool such as https://smallseotools.com/plagiarism-checker/ before submitting.
All correct answers shall be published, and the top-rated answer will be displayed first. The author will receive an honourable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term.
Some people seem to be using AI platforms to find forum answers. This is a risky approach as AI responses are error-prone because our questions are application-oriented (they are never straightforward). Have a look at this funny example - https://www.benchmarksixsigma.com/forum/topic/39458-using-ai-to-respond-to-forum-questions/
We also use an AI content detector at https://quillbot.com/ai-content-detector. Only answers with less than 45-50% AI-generated content will be considered for winner selection.

Solved by Shashi Prakash

October 20, 2025Oct 20

Go to solution

October 16, 2025Oct 16

Domain: Quality Assurance in Manufacturing

How Confident Should AI Be Before It Acts?

In a Quality Department confidence level is not about a percentage threshold (>80% GO / <80 % NOGO) but it is about risk, reversibility and responsibility.

AI is now embedded in many day to day QA activities, from in-line inspection and NCR tracking to critical suppliers performance monitoring.
While final accountability stays with humans, AI can make informed decisions in defined, low-risk zones that help speed up the process without compromising integrity.

The Process: In-Process Inspection and Defect Flagging

In the shop floor, AI integrated laser 3D scanners detect surface marks, scratches or stains.
Here is how it plays out:

For critical sealing surfaces or safety features, even if AI detects a high probability of defect. It should only alert and hold the parts, never proceed to reject or scrap.
QA must validate before taking any actions.

For example, AI integrated laser 3D scanner may flag a White patch mark under 20X magnification. Is the white patch below or above the coating surface. Rejection criterion is “Visible to the naked eye not 20X magnification” and “fluorescent under UV light in dark room” only QA can check and decide. No matter how much confidence % based on co-relation to historic data, AI should not decide.

AI’s confidence is information — not authority. It only raises alarm, will not pull the plug.

For non-functional or cosmetic areas if the detection system has proven reliable over time, AI can auto-tag parts for recheck or rework and allow the line to continue.
These are safe, reversible decisions where AI genuinely adds value.

When Can AI Decide on Its Own?

AI should act automatically only in areas where:

The risk is minimal :- e.g., non-functional cosmetic issues or repetitive, well-understood defects. E.g Dent upto 1 x 1 mm in 10CM²is permitted. If laser scanner measures this and is with in the acceptance limit, Then AI can decide to accept by saving the pictures for future reference.
The action is reversible :- such as routing a part for re-inspection or initiating re-measurement.
The decision is data driven and routine — for example:
- Auto-adjusting sampling plans when process stability is proven.
- Triggering calibration reminders when measurement drift from mean is detected.
- Flagging repeated rejection to tighten incoming checks and supplier outgoing inspection.

These are operational support decisions, not customer-impacting ones.
They save time, reduce fatigue, and allow engineers to focus on complex, judgment-based issues.

The Balance

AI should be trusted to act where it can’t harm and required to ask where it can harm.
It can manage data, spot deviations and trigger safe, reversible actions — but the moment a decision touches safety, customer experience or compliance. The final say belongs to humans, That’s the real balance:

AI ensures speed and consistency, humans ensure wisdom and accountability.
Together, they create a Quality system that is fast, reliable and deeply human at its core.

October 20, 2025Oct 20

Solution

AI-Based Quality Inspection for Tea Bags – Ekaterra Lipton - UAE

I am sharing this personal experience from my training & consulting experience at Ekaterra Lipton – UAE. Like most of the modern manufacturing plants, Ekaterra Lipton UAE also uses AI-powered vision systems to inspect tea bags for defects such as empty tea bags, torn/damaged tea bags, missing tags, smears etc. These tea bags once filled roll over the conveyor belt before packaging wherein the screener constantly scans every tea bag. Their inhouse AI model assigns each item a defect probability score (0–1) based on image analysis.

Confidence Threshold Logic

AI Confidence Level	System Action	Rationale
> 0.90 (High confidence – clear defect or no defect)	AI acts automatically — accept or reject the item	Their model has historically achieved >95% accuracy in these cases, allowing fast throughput without delaying the process.
0.5–0.90 (Moderate confidence)	AI flags item for human re-inspection	Mixed visual screener indicators (e.g., slight smear, tags) which reduces certainty to accept or reject; a quality inspector validates the decision.
< 0.5 (Low confidence)	AI pauses and escalates for manual inspection and potential model retraining with new information	new defect patterns, Image noise, or faulty sensors make autonomous action risky.

Factors Influencing Confidence Thresholds

Sensor Conditions & Calibration
The most common reasons are poor lighting conditions, dust accumulation on camera / screener lens, or camera calibration issues can distort images and trigger lower confidence score thereby triggering more human reviews.
Past Model Accuracy & Drift
Thresholds are dynamic and it emphasizes by putting weightage on the most recent trend and reduces weightage on older results. If rolling 60 data points i.e. recent performance false rejection rate exceeds 2.5%, the system self corrects & tightens the auto-decision range to prevent wastage.
Continuous Learning Loop
Every Human re-inspection results are fed back real time to retrain the model. Over time, this raises confidence reliability and reduces manual interventions.

Balancing Speed, Accuracy, and Trust

Speed: AI handles routine inspections in real time, maintaining production flow.
Accuracy: Ambiguous cases are reviewed by experts, reducing false positives.
Trust: Operators see AI as a collaborator, not a replacement — decisions are explainable, auditable, and based on confidence logic.

Example

In a tea bag production at Ekaterra Lipton, the AI system detects a torn tea bag.
AI Confidence Score = 0.97 : Clear Auto-reject case.
Another image shows a minor smear (confidence = 0.66) : Flagged for human review.
If confirmed defective, the feedback improves future accuracy.

October 20, 2025Oct 20

Consider the use of AI in content moderation within the publishing industry, where AI reviews manuscripts submitted by authors before publication. It automatically flags plagiarism, ethical issues, or copyright violations. Confidence is just one signal. AI should weigh confidence against consequence, context, and historical reliability. A well-designed publishing AI will not just ask “Am I confident?”—it will ask “Am I confident enough, given what is at stake?”

To determine when the AI should act automatically versus when it should escalate to a human, the AI needs a decision framework built around several key factors, as follows:

1. Risk associated with a decision: High-risk decisions should always involve human review, regardless of AI’s confidence level. For example, publishing a medical research article on COVID-19 carries far more risk than approving a mathematics book’s manuscript.

2. Ability to give reasoning with sources: If the AI can clearly explain its reasoning with the reference sources, its decisions can be trusted. e.g., “This article matches 95% of this article, and this is the website link”.

3. Past feedback through human overrides: The system should learn from past human overrides. If humans have often reversed the AI’s decisions in a specific domain, that domain should be flagged for mandatory human review until the model's accuracy improves.

4. Novelty of articles: AI algorithms are best at handling familiar patterns taught to them. If the manuscript content includes emerging technologies or an unknown subject, the system should escalate to humans.

5. Matters that can have legal consequences: When it comes to sensitive topics like politics or religion, even if the AI has a high confidence level, the reputational and legal risks are substantial to automate these decisions, and they must be escalated to humans.

October 20, 2025Oct 20

In manufacturing, especially in multi-stakeholder environments like cold rolling steel strip manufacturing, AI must act only when its confidence level is high enough to avoid costly errors. Confidence here refers to the probability that the AI's prediction or decision is correct. The correct confidence level is not a single static number. It depends on the risk and impact of the action on final product quality. The higher the potential cost of an error (a false positive and false negative), the higher the AI confidence level.

Low confidence -> AI should alert an operator or wait for more data.
Medium confidence -> AI can suggest actions but not execute them.
High confidence (typically >95%) -> AI can act automatically, especially in routine or time-sensitive operations.

Since Cold rolling is a continuous process where steel strips are passed through rollers to reduce thickness and improve surface finish. A key challenge is strip breaks—tears in the steel that stop production and damage costly machine. Let’s consider strip break classification process as an example, which is one of the most critical and high-impact areas in cold rolling process.

If AI decision is 99% confidence, a strip will break due to high tension, it slows the rollers to adjusts tension automatically. Whereas if it’s 70% confidence, it alerts the operator to check manually and validate with functional expert.

Example: Strip Break Classification

Problem Impact:

Strip breaks cause 4%- 5% production loss annually few crores of money
Diagnosing causes manually takes few weeks to concludes.
Unplanned downtime and production loss

AI Solution:

AI analyses sensor data (tension, torque, electric voltage) every 10 milliseconds.
It classifies strip breaks in real-time, reducing downtime and manual effort.

Confidence Threshold:

AI acts only when confidence >95% in its classification. Because 95% and above confidence is statistically significant to justify the decision.
If confidence is <95%, it flags the event for manual review or potential break for RCA.

Oct 20Oct 20 Rohit Gandhi locked this topic
Oct 23Oct 23 Rohit Gandhi unlocked this topic

Create an account or sign in to comment

Followers

Go to topic listing

How Confident Should AI Be Before It Acts?

Featured Replies

Solved by Shashi Prakash

Create an account or sign in to comment

Who's Online (See full list)

Lead AI Transformation without coding

Most Solved

Forum Statistics

Member Statistics

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)