Skip to content
View in the app

A better way to browse. Learn more.

Benchmark Six Sigma Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.
Message added by Mayank Gupta,

AI or Artificial Intelligence is a self learning and/or self rewriting technology that mimics human mind, intelligence and decision making. It has the ability to evolve and learn basis the responses it receives in different situations. As per IEEE SA, AI is “the combination of cognitive automation, machine learning (ML), reasoning, hypothesis generation and analysis, natural language processing and intentional algorithm mutation producing insights and analytics at or above human capability.”

 

Bias (or Accuracy) is the difference between the average of observed values and the standard (actual value)

 

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Pavitra Jain on 25 August 2025.

 

Applause for all the respondents - Vatsala Muthukumaraswamy, Sunny Prithviraj, Solomon Gnanaraj, Monica Salunkhe, Arunungshu, Pavitra Jain, Gagan Kathuria, Swapnil Madhav Chaukar, Ayomide Otokiti, Tushar Ghosh.

Bias in, Bias out: How Do We Break the Cycle?

Featured Replies

Q 799. How Can We Prevent Bias From Creeping Into AI-Enabled Processes?
AI agents learn patterns from data, prompts, and flows — but those patterns can sometimes carry hidden biases that affect fairness, quality, or even compliance. In a service delivery context, biased AI outputs could impact customer satisfaction, employee morale, or regulatory standing. Think of one process in your domain where bias risk could appear (e.g., prioritizing cases, recommending actions, responding to customers). What specific steps would you take — in design, testing, or monitoring — to minimize the impact of bias?

 

The best answer will be selected on the basis of: 

  • Realism and relevance of the bias scenario
  • Practicality of the prevention or mitigation method
  • Clarity and creativity in the proposed approach

 

Note for website visitors -

Solved by Pavitra Jain

Bias in AI is not a technical bug; it's a systemic threat that can cascade through healthcare outcomes, compliance, and trust.

Bias Scenario: Automated Case Prioritization in Medical Coding
Suppose an AI tool is employed to rank medical coding cases by urgency, complexity, or reimbursement value. If the training dataset disproportionately represents particular populations ( for example-older adults, inner-city hospitals, affluent zip codes), the AI will tend to rank those cases more often—discriminating against cases from rural, low-income, or minority communities.

In this scenario, we should have medical coders from diverse backgrounds involved in the design process to highlight possible blind spots and enable them to alert suspicious prioritizations, inputting those instances back into the model for retraining.

The biasness in any AI model can be attributed to primarily two reasons i.e. either because of the model design or the training data being used is having biasness. An ideal AI model is expected to be working without any biases but the biases can creep in even after the implementation of the model as the data through which it keeps on getting trained can bring on the biasness.

Usually the AI agents are being deployed as chat bots in the service industry to free up the manual time being spent by the team members and to reduce the costs. But the responses and reply should be without any biasness.

e.g. say there is AI agent which is being deployed to work on behalf of the Program managers for tasks like taking the minutes of meeting, being part of the PI epics rationalization, prioritization of the Epics, stories prioritization, technical resource mapping with the Epics and stories being prioritized.
Now, with any of the biasness coming in terms of the Epics preferred for any of particular departments, Epics with less complexity being mapped with any particular team members based on gender will be wrong precedent and not ideal for the Program management or organization.

There are two way with which this particular scenario can be addresses one is preventive measures where the data model is being examined at regular time interval (say weekly, bi-weekly, monthly) and then in case of any discrepancy observed, rectification steps being taken.

Also, there can be a reactive step in which after observing discrepancies the model is being retrained or corrected.

The monitoring work can be  handled and handed over to the data professionals for regular audits as legal and compliance is required to be adhered to avoid any penalties or business impact.

Bias In – Refers the Bias that enters an AI system through its inputs (data, design choice and assumption). Unfairness can happen through the way the data collected, missing diversification, choices selected by the developers, also, and some historical data.

How do break Bias In? – Collect diverse, representative data, using fairness techniques. Build diverse team to reduce blind spot.

Bias Out – This is the output results or decisions based on the biased inputs. This can provide unfair predictions and recommendations.

How to break Bias out? – Fairness checks on output, Keep human in decision making, continuous monitoring.

In Service Delivery – In my domain Alternative Investments, We have a different product structures for the same client, same type of work. (Private equity, Private credit, Reit funds, Interval funds, etc). This caused issues when AI standardized the rules for New account creation or Redemptions.

Also, there are few source files from particular Broker/Dealer needs tax reporting handled by our transfer agency. So, we need to have different series of account types while all others should receive different series accounts. These are the challenges we see today.

How to prevent/Overcome?

Design – We continue to work with our tech teams to identify these different product structures and use balanced training data across all product structures and account types.

Test with all such rules and scenarios.

Monitor – there are few steps involving HITL currently to ensure the Bias is addressed. So, our processors manually checking such account types and correcting them if necessary.

 

Edited by Solomon Gnanaraj

Q. Bais in, Bais Out: How to break the cycle?

Answer -

In a service delivery context given example of prioritizing cases, recommending actions and responding to customers, following steps can be take at various stages of solution development –

Design stage –

·       Assess project objective, scope, metrics and success measures with timelines

·       Reach out to stakeholders incase of difference of opinion.

·       Interview and empathize the issues faced

·       Brainstorm and validate assessment criteria.Design FMEA

·       Course correct metrics and success measures if required

·       Involve Developers, testers in the kick off call

 

Testing phase –

·       Develop use test cases.

·       Build Agentic AI workflow with what-if scenarios, And OR logic

·       Link knowledge base repository with correct calibrated clean database

Monitoring phase –

·       Intelligent dashboards with powerapps workflow when any shift in data is observed

·       Calibrate and retraining AI for precision and accuracy.

·       Periodic governance

This is how one can let the Bias IN and then Bias it Out through careful design, testing and monitoring to break the cycle.

Bias is inevitable in AI enabled processes as the data or pattern of AI algorithms contains those Bias. Hence to optimize the Bias we need to measure Voice to Noise ration. The objective  would be to optimize the noise to get the optimum voice from the data pattern, prompts & flows which are the source of Bias.  Below are the overview of the process.

Invoice Prioritization & Payment Scheduling

AI agents will help to determine the priority of Invoice processing based on empirical data. In case the training was stressed only on past payments data then

  •  Focus only on those vendors who make high volume and value transaction while deferring small/local suppliers.
  • Adequate focus will not be given to the newer vendors because there’s less historical data
  • Urgency will be biassed on invoice literature, giving advantage to vendors with “Clean & Crisp” submissions.

Steps to Prevent and Minimize Bias

1. Design Phase

  • Modify Training Data: Ensure vendor data caters all types of Vendors (large, mid-size, small , new entrants) so AI doesn’t give additional weight to big players.
  • Ensure Transparency: Define clear business rules such as all vendors, irrespective of size or demography, should be paid within agreed terms.

2. Testing Phase

  • Bias Test: Test identical invoices across various strata of Vendor (local vs. global, large vs. small) to confirm AI recommendations are consistent.

3. Monitoring Phase

  •  Dashboards: Track payment schedule across vendor categories. Highlight if some specific groups consistently get delayed.
  • HIL: Introduce Human In Loop model, let Account payables team override AI-driven prioritization where they observe AI bias.
  • Feedback Incorporation: Store vendor complaints in case of not on time payment and use them as indication in model bias reduction. Ensure optimum residual value i.e Highest voice lowest noise.

 

Bias in, Bias out in AI simply means the responses/text/images being generated by our GPTs inherit the bias (Bias-in). Since these models also learn form the responses they give, Bias gets further into their veins (Bias-out).
 
This Bias is not a technical issues, it is a dataset issue. The dataset that the models are fed are biased in the first place, the bias could be of any type, human races, ethnicity related, societal prejudices. Example: Visa for US sees a much higher rejection rate for Indian applicants compared to Australian. If the historical data for these applications is fed into an AI model, which is asked to start approving rejecting the applications now, it will inherently be rejecting more applications for Indians than Australians.
 
Some of the major reasons for these biases are: 
  • Algorithmic design flaws: Bias can also be introduced through algorithm design, where certain variables—like zip codes—can act as proxies for race or socioeconomic status, leading to discriminatory outcomes.
  • Training data bias: The visa rejection examples we covered above is a training data bias. It is amongst the most common type of bias in AI today.
  • Human bias: The labeling provided to the AI models can be biased to the tune of biasness in developers' assumptions.
  • Incomplete samples: If the data used to train the AI model does not represent the population in the correct manner that bias will inevitably infuse into the model outcomes. The underrepresented group will not have the accurate outcomes.
  • Training loops: The decisions made by AI can be fed back to the AI dataset, which in turn can further worsen the bias in an already biased model. 
How can we break the bias cycle
  • inclusive data collection: Use of data sampling exercises can be used to correct the data that is fed into the AI models, remember if you feed in right, the right outcome will come.
  • Bias measurement tools: Their are tools available in the market to test for biasness, test for bias regularly. 
  • Source and Quality scores: Show the source info and quality scores in output results so that the end user has the exposure to input taken.
  • Bias training and awareness: developers and users should be educated about the types of biases and the ways to avoid them, this will help build better solutions.

Is AI solution biased? Well before asking this question, let us dwell more into human nature, is human response or process building biased, it has to be, it forms the basis of selecting criteria, a baseline on which the entire process is set or supposed to operate. Similarly, when we create an AI agent there will be a bias in AI-enabled customer service processes, especially in banking—can have serious consequences, from unfair treatment of customers to regulatory violations.

Let’s break this down using your example of a third-party contact center handling banking queries, such as Annual Maintenance Charges (AMC) or unauthorized UPI transactions, and explore how bias can creep in and how to mitigate it.


What Bias Can Appear in Banking Customer Service and Where?

1. Case Prioritization

  • Risk of bias: AI may prioritize cases based on customer profile (e.g., high-value customers), potentially delaying resolution for others.
  • E.g: AMC-related queries from senior citizens may be deprioritized if the model learns they are less likely to escalate.

2. Action Recommendations

  • Bias possibility: AI may suggest refunds or escalations based on historical patterns that reflect biased decisions.
  • Example: UPI fraud cases from Tier-2 cities may be less likely to get recommended for escalation due to historical underreporting.

3. Response Generation

  • Bias Risk: Regional models may respond taking into consideration the tone of voice, choice of words, AI agent will respond differently given the tone, politeness and choice of words for customers based in northern part of India versus the same AI agent might find the customer’s similar language or choice of words as rude or condescending and might deny service in southern part of India. Language models may respond differently based on customer name, language, or tone.
  • Example: A polite query may get a more helpful response than an agitated one, even if both are valid.

4. Billing Model Influence

  • Bias Risk: If billing is based on connect minutes, agents may be incentivized to prolong calls. If based on call count, they may rush.
  • Example: AMC queries may be wrapped up quickly without full resolution under a per-call billing model.

So, what do we do to minimize bias in Design, Testing, and Monitoring

A. Design Phase

  1. Diversify Training Data
    • Be it low income customers or high rollers, you might want to include varied customer profiles, geographical regions of customers, languages, net worth of customers, and complaint types.
    • Low amount frauds or frauds based on a certain amount should not matter when a customer is complaining of an unauthorized transaction by a merchant. There is a possibility of bias setting in based on a low or high amount transaction, AI might prioritize only high amount unauthorized transaction cases.
    • We must ensure representation of certain vulnerable groups (e.g.,low income, senior citizens, rural customers).
  2. Provide clear objectives that kill bias
    • Design AI models with fairness constraints (e.g., equal resolution rates across demographics).
    • Avoid optimizing solely for efficiency metrics like AHT (Average Handling Time).
  3. Human-in-the-Loop
    • Keep humans involved in sensitive decisions (e.g., refund approvals, fraud escalations).

B. Testing Phase

  1. Inclusion of Bias Audits
    • Test model outputs across different customer segments.
    • Use synthetic data to simulate edge cases (e.g., same query from different regions).
  2. Scenario-Based Testing
    • Create test cases for AMC and UPI queries with varying tones, languages, and urgency levels.
    • Check for consistency in response quality and resolution.
  3. Metric Diversification
    • Track fairness metrics alongside performance metrics (e.g., resolution equity, escalation parity).

C. Monitoring Phase

  1. Set up real-time dashboards
    • Monitor call outcomes by customer segment, query type, and agent behavior.
    • Flag anomalies (e.g., unusually short calls for UPI fraud cases).
  2. VOC : Feedback
    • Collect customer feedback post-call and correlate with AI decisions.
    • Use feedback to retrain models and adjust flows.
  3. Billing Model Alignment
    • Ensure billing models don’t incentivize biased behavior.
    • Consider hybrid models (e.g., quality-adjusted call count) to balance efficiency and fairness.

How do we break the “Bias In, Bias Out” Cycle

  • Continuous Learning: Regularly update models with new, unbiased data and feedback.
  • Make it transparent: Make AI decision-making explainable to agents and supervisors.
  • Assign ownership: as a check mechanism, assign accountability for bias monitoring and remediation.
  • Cross-Functional Collaboration: Involve friendly customer base,  compliance team, QA team, and customer experience teams in AI governance.

Bias in AI isn’t some abstract problem as it shows up in small, practical ways that most people don’t notice until it’s already causing a problem. I’ll give you an example from the kind of work I’ve seen. In service delivery, we often deal with maintenance requests or customer cases that need to be prioritized. Now imagine if the AI is trained mostly on past data. If, historically, tickets raised by one team or location were often resolved slowly, the AI might start to assume those cases are “less urgent.” Before long, those teams will always find themselves at the back of the queue. The irony is the delay might have been caused by resource issues or the way the forms were filled out, not because their problems were any less important.

That’s how bias creeps in; quietly, and it reinforces itself over time. People on the ground start to feel ignored, managers get a distorted picture of performance, and eventually trust in the system erodes.

For me, the way to handle this is to think of it in three steps: how you design the system, how you test it, and how you keep an eye on it after it’s live.

In design, you can reduce the risk by being intentional with the criteria. Don’t just ask the AI “which case is important.” That leaves too much room for bias. Ask it to sort based on specific, objective measures: downtime cost, safety impact, compliance deadlines. The way you phrase the prompt matters a lot here. It’s like training a new employee, if you give vague instructions, they’ll pick up the wrong habits.

In testing, don’t just look at overall accuracy. You’ve got to watch for patterns. Run “what if” scenarios, like two tickets with the same severity logged by different shifts, and see if the AI treats them fairly. I’ve found that having people from different teams review sample outputs helps because each group notices different red flags.

In monitoring, you can’t just set and forget. Build in transparency. For example, if the AI deprioritizes a ticket, it should also give a one-line reason: “low safety risk, low downtime impact.” That way, if someone disagrees, they can challenge the logic. And just as importantly, track whether one group of users keeps getting the short end of the stick. If you see patterns, adjust the flows or roll back to an earlier version.

Here’s a rough sketch of how I see it working in practice:

Biased Data       → Biased Prompts     → Biased Outputs


Balanced inputs    Neutral wording    Rationale + audits

---------------- Continuous Loop ----------------


Breaking the cycle isn’t about “fix it once.” It’s more like continuous improvement in operations.
You don’t assume the process is perfect, you keep testing, you adjust, you listen to feedback, and you document every change, so you know what worked and what didn’t.

If I’m honest, bias will never be gone completely, because humans build the systems and humans carry bias. But if you treat AI flows with the same discipline as any other critical process, version control, testing edge cases, and constant monitoring, you stop bias from quietly taking root and repeating itself at scale.

In Banking service, AI might be used to reply to customer cases based on urgency, value, or predicted outcomes. There is a possibility that the training data is based on historical biases (e.g., favouring certain ethnicity, creed, age or regions).

 

Following are the suggested Steps to Minimize Bias in Design, Testing & Monitoring

1. Design Phase

  • Define fairness criteria: one should define “fairness criteria”
  • Source Diverse data : one should Ensure training data includes a wide range of date that includes representation of all types of variation that exist in real world / population of that universe. 

2. Testing Phase

  • Testing for bias: Introduce Use fairness metrics for example disparate impact analysis to test the model
  • Observe scenarios: one should run test cases with all kinds of customer profiles.
  • Human-in-the-loop: Include manual review for flagged decisions.

3. Monitoring Phase

  • Metrics for dashboards: Track prioritization patterns by different customer segment.
  • Continuous Feedback loops: Build a culture where employees and customers can report unfairness.
  • Continuous train the Model: There is still a possibility that some biases infiltrate despite all checks and balance hence it is suggested that one should Periodically retrain models to identify noise and biases.

Great answers from all respondents. 

 

The best answer has been provided by Pavitra Jain. Well done.

 

Answer from Swapnil is also a must read.

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.