Skip to content
View in the app

A better way to browse. Learn more.

Benchmark Six Sigma Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

How Should MBBs Rethink Hypothesis Testing and Data Credibility When AI Is Involved?

Featured Replies

Q841

In Lean Six Sigma, decisions are traditionally driven by validated data, statistical hypothesis testing, confidence intervals, and cause–effect verification. With AI now surfacing correlations, predictions, and recommendations — often without transparent statistical logic — MBBs face a new challenge: what evidence is “good enough” to act on?

Think of a specific Lean Six Sigma project in your domain where AI could be used for analysis or prediction.
How should an MBB treat AI-generated insights when it comes to:

  • Forming or testing hypotheses

  • Establishing statistical confidence

  • Assessing data quality and credibility

Where should AI insights accelerate decisions, and where must traditional statistical validation still be non-negotiable?

⚠️ Any answer that is generic or does not connect with a specific LSS project or dataset will not be approved.

🏆 The best answer will be selected on the basis of:

  • Depth of understanding of LSS statistics and hypothesis testing

  • Clarity in distinguishing correlation, prediction, and causation

  • Practical judgment in balancing AI insights with statistical rigor

Note for website visitors

Solved by Adil Khan18

In Lean Six Sigma, the Black Belts have training to believe only that which is statistically verifiable: p-values, confidence intervals, root-cause confirmation by designed experiment. But AI changes the game. It does not always describe a why, but tends to indicate what has been connected or predictive, and in any case on a magnitude and at a rate beyond human ability.

And then, when AI crops up an insight, such as, collaboration between transactional processes in a BPO violation of SLA, or non-obvious drivers of handling time, how should an MBB treat it? So how do you recognize when it is an evidence or a pattern that is yet to be accomplished?

We can base this on a real life BPO situation.

Domain: BPO - Large-Scale Finance and Accounting (Order-to-Cash) operations.

Problem Statement: Within 6 months, decrease the Turnaround Time to less than 12 days in Technical Invoice dispute resolution against the current invoice dispute resolution of 18-22 days and still not increase write-offs and customer dissatisfaction.

How AI helps here:

An educated artificial intelligence model that uses more than three years of transactions, emails, tickets, and CRM data, focused on...

  • Those invoices that are likely to become disputes are to be anticipated.

  • Best anticipated resolution effort.

  • Adding routing, prioritization and root-cause tags before the actual occurrence of disputing.

The Core Tension for MBBs

Lean Six Sigma was developed on an solid concept:

Were we not unable to show its validity statistically, then we should not act upon it.

AI takes a different approach by offering:

  • High accurate prediction.

  • Identification of patterns on thousands of variables.

  • No classical hypothesis stating recommendations or p values.

It is not the question of the usefulness of AI in the MBB challenge, but the question of what the insights provided by it have to do to gain the right to leadership.

Hypothesis Formulation and Testing in an AI-Facilitated DMAIC:

The classical hypothesis in our example on Order-to-Cash could be:

H0: The TAT of the resolution of invoice is not related to the root cause of the dispute.

H1: There are root causes that contribute to an increase in TAT of resolution to a significant extent.

This would generally be confronted with:

  • ANOVA / regression

  • Clearly defined variables

  • Controlled samples

The manipulation of Hypothesis formation (Not elimination) by AI.

In this instance, AI had led to the appearance of something other:

The likelihood of becoming long-tail disputes is 2.4x higher with invoices where pricing inconsistency is partial and when emails are received by offshore customers when out of business hours.

This observation was not determined by a pre-deemed hypothesis.

It was based on pattern recognition in thousands of features.

How an MBB Should Treat This:

AI must be discussed as a generator of hypotheses and not validate one.

For this project:

  • Insights of AI were rendered into falsifiable hypotheses e.g.

Proactive intervention on invoices with such features will decrease the TAT of disputes by 25 percent.

Classical DMAIC discipline continued to be applied:

  • No blurry definition (end-to-end TAG)

  • Control vs. pilot groups

  • Before/after comparison

Key Principle:

The AI cannot provide the answer to the question of why, but still MBBs need to create the evidence.

Setting Statistic Confidence when AI is involved:

The Temptation

Our BPO case presented the AI model that demonstrated:

  • The ability to predict the likelihood of disputes with 87 percent accuracy.

  • High lifts curve and ROC.

The temptation is to say:

The model is correct - hence we take action.

That does not qualify as Six Sigma thinking.

What Confidence Means Now

In the case of MBBs, the confidence should come to:

Do we care whether the coefficient is significant?

to

Does this insight reliably respond to the improvement of the CTQ?

What We Did in This Scenario:

  • Invoices prioritization was done based on AI predictions in just one pilot region.

  • We measured:

    • Lessening the mean dispute TAT.

    • Early resolution percentage percentage increase.

    • None of write-offs or customer escalations.

  • The results were statistically validated, rather than the AI model internals.

Non-Negotiable Rule

Process confidence is not similar to AI accuracy.

It still requires confidence to be won via:

  • Controlled pilots

  • Measured deltas

  • Stability over time

Measuring Data Quality and Credibility in AI-Based Analysis:

In BPO, AI Can Be very powerful.

In this project, AI:

  • Digested 1.2M invoices, messages and tickets in days.

  • Patterns that had been identified had not been phrased by any SME.

  • Normalized data quality problems that had been flagged by humans.

Where MBB Judgment Matters.

In spite of volume and sophistication:

  • Artifacts of some "strong predictors" were:

    • Legacy process exceptions

    • Regional policy dissimilarities.

    • Ineffective root-cause tagging.

  • Historical inefficiency was the first aspect which AI was exposed to, and not the intended way to act.

MBB's Existence of credibility.

MBBs apply:

  • What systems, what time period, what definitions Data lineage checks Before, during, after data definitions.

  • Discrimination (regions, customers, type of disputes)

  • SMEs validation- Does it sense operationally?

Rule:

AI can scale data. Only humans can assign trust.

When AI Should Be Making Decisions faster vs When validation is Non-Negotiable:

AI should speed up decision making when:

  • It is possible to reverse the decision (routing, prioritization, alerts).

  • The cost of being wrong is low

  • The use of the AI recommendation is decision support rather than automation.

Example from the Scenario:

Using AI to address high-risk invoices to the initial stages and fast..

Traditional Validation cannot Be Compromised When:

  • The decision modifies policy, controls or customer commitments.

  • The effect is on compliance, revenue recognition or terms of the contract.

  • The machine wisdom goes against familiar logic of processes.

Example from the Scenario:

Still needed: redesigning dispute ownership models, or modifying definitions of SLA.

  • Statistical validation

  • Pilot control groups

  • Leadership sign-off

The Bigger Shift MBBs Need To Make.

AI does not remove Lean Six Sigma rigor.

Where rigor is used, it re-positions.

  • Hypotheses are transferred to after-insight.

  • Median changes place an emphasis on statistical models to statistics on the process.

  • The credibility of data is not a technical presumption, but a specific leadership role.

Concluding Lesson of the BPO Engagement.

AI accelerates insight.

Lean Six Sigma has the right to take action.

MBBs who treat AI as:

  • A quick-fix solution to problems will not be trusted.

  • A hypothesis engine in DMAIC will be much faster and no less credible.

Those who will win will not be substituting statistical reasoning with AI, but those who make AI work within trained finesse logic. Only optimization matters because it optimizes faster, when you are still maximizing the right things.

Project : Reduce cycle time and improve quality of Narratives draft for property offering final memos.

Problem statement : Current average time to produce a first draft Narrative ( Property overview, Market Overview, Borrower Sponsor, Maps & Aerials, Demographics, Crime Reports, Strenghths & Weakness) is 5hours per deal, with 16% of drafts requiring major rework due to :

  1. Missing or inconsistent data

  2. Grammar, UK/US English used interchangeably

  3. Misalignment across sections(e.g strenghths not matching the market facts)

Goal : Reduce average drafting time by 30 - 40% and reduce work by 20 - 30% while maintaining or improving Narratives quality( rated by originators & underwriters)

Where AI Fits :

  1. Auto generating data from known sources ( internal systems & vetted external websites)

  2. Gnerating a first draft of : Property overview, Market Overview, Borrower Sponsor, Maps & Aerials, Demographics, Crime Reports, Strenghths & Weakness

  3. Improving grammar, UK/US English usage, Overall stylistic alignment

Analyst still : confirms source credibility, checks factual accuracy, adds nuance specific to the deal, approves/ edits strengths & weakness.

Hypothesis Formation :

In this project, AI outputs are best treated as structured hypotheses, for example:

  1. The property's location near XYZ business park is a key strength due to strong employment growth

  2. Crime rates in the submarket have declined over the five years, supporting stability of the assest

  3. Market rent growth has moderated but remains above long term averages.

From an MBB standpoint:

  1. These are not facts, they are claims to be tested

  2. AI's role in Define/Measure is propose: Here is what might be true, here are patterns implied by the data I see.

Guidance to analysts:

  1. Treat AI text as hypothesis rather than validated conclusions

  2. Use prompts that frame AI output as provisional. For example: "List 5 potential strengths of this property based on the data below, and mark each as High confidence or Needs verification with a short reason.

Hypothesis testing : AI can help structure tests, but LSS will be led by humans.

For Narratives quality, Hypothesis will be like

  1. H1 : Using AI to draft market overview reduces average drafting time by 30% without lowering quality scores

  2. H2 : AI generated strengths/weaknesses will be atleast 80% aligned with analysts final conclusion

AI can help design experiments or sampling plans( eg: How many deals do we need to test to detect a 20% reduction in cycle time at 95% confidence level) and Suggest such metrics to track ( Time saved, number of factual errors, number of grammar corrections, etc) But the actual test( data collection, calculations, significance tests) should be performed in transparent tools like Excel, Minitab, etc. where we can see the formulas and logic.

AI derived statistics ( eg: This is 90% likely) should not be treated as valid inferential statistics unless we have a clearly documented model and method behind it.

Hence MBB needs to have rule of thumb that :

  1. AI is acceptable in forming hypotheses and in helping us set up tests

  2. AI is not acceptable as the only source of evidence that a hypothesis is confirmed. Statistical validation must be done through transparent methods.

Statistical Confidence:

We are building marketplace with in Berkie (In house AI) for all the prompts to be used for Narrative drafting

In the marketplace initiative, there are two layers of confidence

  1. Confidence about workflow change (Process Improvement) : Does using AI actually improve the quality/productivity?

  2. Confidence about individual deals output ( deal level narrative accuracy) : Is the AI generated narrative for this deal accurate enough to use?

Confidence in workflow change: Here as an MBB, we should use classic LSS experiments.

  1. Define Metrics : Drafting cycle time per deal, Number of factual corrections per Narrative, Number of grammar/style fixes, and Originator & analysts satisfaction score

  2. Design : Pre/Post study like N deals before AI, N deals after AI or Some analysts use AI, some dont, over the same period of time

  3. Analyze: Use standard tools (T tests, control charts) to confirm 1. Is there a statistically significant decrease in drafting time 2. Is quality maintained or improved

Document : Effect sizes, confidence intervals and any risks observed(e.g common type of AI errors)

Confidence about individual deals output: Here instead of 95% confidence in a strict statistical sense, we define operational acceptance criteria.

For example : For an AI assisted Narrative to be acceptable as a draft, we require 1. 0 critical factual errors (data that could mislead credit decision) 2. less than or 2 minor factual discrepancies ( currently aiming for 97% accuracy, acceptable error is three low critical errors) 3. Grammar and usage of UK english errors below X per 1000 words 4. Analyst can review and finalize within 30mins for 90% of the deals.

For an MBB perspective, we can:

Use sampling and inspection like 1. Randomly sample AI drafts 2. Score them against a checklist 3. Track defects per unit and apply control charts

Overtime we can characterize AI as a process with a certain defect rate, Just like any human process.

Assessing data quality and credibility when AI is in the loop

This is critical as the data for Narrative is pulled from multiple sources (Internal systems, public websites, vetted websites, etc)

Separate source credibility from AI credibility :

A) As MBB, We should enforce a clear hierarchy

  1. Source credibility rules(upstream of AI) : Internal : Salesforce/Omniview, Berkadia internal property/loan systems, internal market research has highest trust, Trusted external: Offical census data, reputable third party market reports, public crime data portal has highest trust, Open web or generic searches (google search) has low trust unless specifically validated.

  2. AI Usage rules: AI is allowed to Summarize and rephrase known, structured inputs from trusted sources and Highlight patterns or inconsistencies across those sources. However, AI is not allowed to freely invent data not present in the supplied sources and acras the primary source of record for quantitaive facts.

B) Data lineage and traceability: To keep Lean Six Sigma like rigor, we need traceability

  1. For each factual statement in the AI narrative, analysts should be able to trace source and transformation

  2. We should ask AI to annonate paragraphs with references e.g(Source : Crime Report 2024 Q2) or (Source: Internal rent roll 2025-01) or generate a separate evidence log for the narrative

From an MBB angle, this is like keep data collection forms and measurements system documentation for our Y's and X's.

C) Measurement system analysis for AI and analysts : We will take AI and analysts as measurement devices for Narrative content.

We wil run Gage R&R style excerise :

  1. we will select a set of deals

  2. we will have a) AI only draft(first pass) b) Analyst only draft(without AI) c) AI+Analyst (AI draft, then analyst edit)

  3. We will have independent reviewers ( seasoned and quality specialist) rate accuracy, clarity, grammar/UK US english usage, alignment between sections

  4. We will assess variations attributing to AI v.s Human v.s Combination. Where AI improves repeatbility/consistency(e.g style/alignment) v.s where it adds risk ( e.g subtle factual errors)

This quantifies data quality and helps set where AI is safe v.s risky

Where AI should accelerate decisions, and where traditional validation is non - negotiable

A) Areas where AI can safely accelerate and automate:

In the Narrative AI marketplace, AI is well suited for:

  1. Drafting and rephrasing : a) Converting bullet research into fluent US english paragraphs b)Enforcing consistent style and structure

  2. Summarization: Summarizing long third party reports like Crime Reports, Market Reports, Sponsor histories, etc

  3. Formatting and aligning : a) Aligning terminology across sections(e.g consistently referring to submarket names, asset class labels b)Ensuring internal consistency between section (e.g strength and weaknesses must refer to facts already stated in the narrative.

  4. Highlighting potential strengths and weaknesses : a) Suggesting deal strengths/risks based on the data provided b) Tagging statements as "requires analyst verification" where sources are weaker or ambigous

  5. Quality checks : a) Flagging likely grammar issues b)Enforcing US spelling conventions c)checking for obvious contradictions (We said crime is low in one section and high in another section of Narrative)

Here an MBB can reasonably reduce manual efforts while keeping risk low, especially with defined guardrails and human review.

B) Areas where traditional validation is non negotiable

From a Lean Six Sigma and risk perspective the following should not be delegated solely to AI

  1. Quantitative facts that influence credit/investment decisions e.g Occupancy rates, rent levels, reports fulled from trusted source

  2. Casual statments and forward looking claims e.g Because of X, We expect Y, Crime reductions mean lower risk for the asset. AI may propose such statements however Analyst must evaluate casuality using domain knowledge and data. Use traditional reasoning and where relevant, simple statistical checks(e.g trend analysis) rather than trusting AI's intuition

  3. Compliance, regulatory and reputation risk : Any statement that touch on fair housing, sensitive demographic descriptions, legal regulatory interpretations must be reviewed and where necessary, crafted and reviwed by analysts. AI may help draft neutral language but it should not be the final authority.

  4. Final sign off and accountability: The accountable owner of the narrative is the analyst ( and ultimately the deal team) not the AI.

MBB guidance should codify that a) AI is a tool in the Measure/Analyze steps b)Approve/Reject decisions remain with analysts.

In conclusion, AI should treated as a powerful assistant, not a replacement for Lean Six Sigma rigor. It can speed up research and drafting the Narrative, but the standards for data validity, statistical confidence, and final judgement must remain with the analysts. Where the impact is high, traditional verification is still essential and where risk is low, AI can safely help us work faster and more consistently.

Domain & Project Context

This case comes from a lean six sigma project I led in one of my previous company (Medical Equipment Manufacturing), focused on a critical sub-assembly line of Mobile Viewing Station (MVS), which is critical part of all C-arms medical machines we produced. This supported the Yearly revenue of 85 Million Euros, Monthly output/demand of MVS was 150, each MVS required one complete set of LCD Carriers. To give better context, MVS displays live the images from C-arms of patients for the surgery and doctor’s operates C-arms by MVS controls.

Since MVS was the mandatory piece of final machine, if this line stops, C-arms assembly lines stops as well, thus impacting revenue.

 

What Actually Happened

We started seeing the severe rejections of LCD carriers at the MVS line. We observed minor shape distortion, visually parts looked fine, but during the assembly LCD monitors would not seat properly.

At our end, rejection rate climbed to 85%.

We looked at few process details, injection molding time 7.5 Minutes, followed by 10 mins in cooling fixture. We observed on paper, everything looked stable, however in reality, something in that window was introducing deformation.

We attempted the rework to keep the production alive, with success rate of 35%, rework was clearly a containment action, not a solution for us.

 

If I had AI Available then & How I Would Use It Today

If I had to deal with this problem today, I would bring AI early.

AI could help me quickly correlate:

Rejection rates against injection molding cycle times

Cooling fixture duration vs shape deviations

Differences across mold cavities & production shifts

Batchwise behaviour at the supplier

Measurement variation between supplier & our plant

Raw material behaviour (33% Glass-Fibre Nylon From BASF)

Assembly fixture tolerance stake-up at our line

It would have helped me to form hypothesis faster & narrow the investigation space.

However, in my view that’s where AI’s role ends, these correlations are signals not evidence. It could not tell us which one was the true root cause. As LSS expert, it I treat AI output as a hypothesis accelerator not a validator.

 

Statistical Confidence (What was good enough & What wasn’t)

This project reinforced something I strongly believe, The level of statistical confidence required depends on how irreversible the decision is.

We needed speed and for short actions like rework to protect the output, directional confidence was acceptable.

But for irreversible decisions, statistical rigor was non-negotiable for us:

Redesigning the molds

Changing the cooling fixtures geometry

Locking supplier process parameters

Declaring the permanent fix of the issue

 

With 85% rejection, limited rework success & 85 Million Euro revenue at risk, we relied on below:

MSA at supplier

Independent MSA at our plant

Controlled trials on molding & cooling parameters

Physical verification of carrier geometry at assembly

No AI could replace the level of proof.

 

 

Data Quality & Credibility (Still LSS Responsibility)

One of the early discoveries we made was that measurement variation itself was part of the confusion, the same carrier set measure differently at the supplier & at our plant.

Just imagine, if I had trained AI on that data without first fixing the measurement system, it would have produced confident but misleading insights.

This reinforced a core LSS principle for me, before trusting any analysis by human or AI, validate the measurement system.

In my view, AI processes bad data faster, it does not make it more credible.

 

Correlation, Prediction & Causation

This project clearly force us to respect the difference in between all three:

Correlation: Certain batches & cooling windows showed higher rejection

Prediction: AI could likely flag high-risk batches

Causation: Only controlled testing proved that cooling fixture design and post-molding deformation were the real drivers

Causation was confirmed only after we:

Stabilizing the measurement systems, validating the material behaviour, redesigning cooling fixtures, updating the assembly fixture geometry.

That sequence mattered for us.

 

Bottom line- Where AI should accelerate & Where it must not

In my view, AI should accelerate inEarly Pattern detection,Hypothesis prioritization, & Faster narrowing of investigation scope.

However, traditional statistical validation must remain non-negotiable for us when:

Root causes are declared, Tooling/mold changes are approved, supplier processes are modified, production release decisions affect the final product.

On conclusion note, AI would help me move much faster at the front end, however when decisions affect 85M Euro revenue, medical equipment (patient safety) and irreversible tooling changes, classical lean six sigma discipline still defines what counts as proof.

Domain :

Plant based ingredients Manufacturing, high volume production

 

Context :

We manufacture Plant based ingredients with key head products and many by-products in high volume while processing high volume of agriculture crop as raw material. The process is of complex and the products are produced in dry form and liquid form.

For the products produced in dry form has complex process of drying and concentrated liquid streams addition to recover different streams as saleable product with specified moisture limits.

 

Condition of the Process :

While ensuring recovery of one of the internally generated concentrated liquid stream. It’s observed that there is an increase in moisture by increasing the 50% concentrated soluble liquid while other parameters were constant though there is an increase it's was evident that any variation or excess addition would lead to high variation in product moisture from 4.5% to as high as 15%. the high moisture and high variation in moisture is leading to jamming of silos and long down times, reliability issues and impacting OEE.

 

Why and how this really add value to process with other benefits to Organisation :

  • Stable increase in moisture by 2.5 % from earlier average of 5.5 % to 8.0 %, means increase in Quantity by 4.5 T/Day

  • Improved Yield by 1.8 %/2.5 T/day by stable addition of Concentrated liquid stream addition,

  • Increased performance of workshops : OEE by 4 %+

  • Due to constant settings on other parameters, there are few more tangible benefits like, savings on steam consumption, power savings on constant air flow, Etc..

  • Dryer downtime reduction to 8 hour/month from 38 hours/month

  • Improved safety across shifts, reduced the stress and workload for the operators and production team

 

This opportunity was observed in May-2024, As a reactive approach, We planned to roll out a DMAIC project and used ''Hypothesis testing in Analyse Phase'', through which identify a suitable spot or corrélation to establish a constant settings for 50% concentrated soluble addition.

 

Before mentioning about AI Involvement, let me give a real example from our manufacturing Process where we preformed Hypothesis testing.

 

Hypothesis testing used to find the ''statistical significance'' and then to arrive at the ''practical significance'' to generate the solution and solution validated in Improve Phase.

 

STEP 1 :

With the sample data > than 30 did the first check, descriptive analysis with Box plot was done, didn’t find the difference and not conclusive we decided move on to inferential statistics through Hypothesis testing

 

Step 2 :

To choose the appropriate Hypothisis test and formulate the H0 and Ha

 

The Assumption for Hypothesis testing was consucted.

 

The quantity addition of 50 % concentrated soluble liquid stream has positive impact on Product Moisture variation > or Equal to 380 ltr/hr - ''H0''

 

The quantity addition of 50 % concentrated soluble liquid stream has positive impact on Product Moisture variation < 380 ltr/hr - ''Ha''

 

Here in this step MBB rethink and data credibility when AI Involved : Here MBB should take the opinion by the process expert and team to validate via shop-floor reality not by the AI inferences.

 

STEP : 3

Define the Alpha for above assumption, Alpha at 0.05

If  p is less than α, reject H0 and accept HA

If p is greater than α, accept H0 and reject HA

Here MBB should take the reference of process difficulties, customer needs and criticality before finalising the Alpha value for the Hypothesis while reviewing the AI results.

 

STEP : 4

As the Output ‘’Y’’ is the Continuous and comparing one sample test with a target,

To compare mean sample data with target value, found data was normal and has the standard deviation.

Based on above reference the ''1 sample Z Test'' was conducted in minitab.

 

Normality test :

Peformed the normality test of the data and data found to be normal with ''P value 0.531'' which is greater than Alpha.

Step 5 : Statistical conclusion

As P value at ''0.531'' greater than Alpha 0.05, Accept the H0 saying that The quantity addition of 50 % concentrated soluble liquid stream has positive impact on Product Moisture variation > or Equal to 380 ltr/hr - ''H0''

 

Step 6 : Practical Conclusion and Business Decision

Based on the above inference we come to a conclusion that any 50% concentrated soluble addition more than 380 ltr/hr rate would increase the moisture by keeping rest the dryer process control parameter constant.

As next step we conducted for the Practical significance with linear regression analysis, the equation Y= M+(b*X) got was for one of the data was : Moisture % = 1.6 + (0.0068*Soluble addition) between the parameters with Soluble addition and moisture are found that the 2.33 % moisture could be increased stably with additional 108 ltr/hr more to the dryer by keeping rest of the parameter constant.

MBB rethink & data credibility with AI Involvement :

 

AI provides hypothesis based on its study in data pattern, correlations and past examples and often without practical reality or shop-floor reality. Hypothesis testing is clearly manual and performed by team and to be validated by strong expert/team review.

 

AI provide the inner version of the data but the MBB along with team should come out with true Hypothesis conclusions and cross verify the result provided by AI and surety of data patterns, means the validation based on decision specific cross verifying the AI inferences.

 

MBB succeed by cross verifying the data, decision, pattern or Correlation understanding given by the AI and not by just blindly accepting what AI populated.

An advantage of AI also should be cross verified by the team under the suggestion of MBB though the all the inferences given based on multiple parameters analysis, like for example dryer inlet and outlet temp or air flow & steam in and out temps or feed and product outlet parameters. This avoids the over trusting of data makes a thread with Physical process constraints and physical performance.

Company: Amazon

Domain: E-Commerce

Business Context: Amazon maximize their sales through offering Same-Day Delivery (SDD) which increases Average-Order-Value (AOV) without significantly raising delivery costs, while ensuring high customer satisfaction.

During Amazon Great Indian Festival (AGIF) period - at high level Amazon Same-Day Delivery works as follows: Customer Places order for items marked “Same-Day Delivery eligible” – Order processing – Packing and Picking, Delivery Assignment, Last-Mile Assessment – Delivers to customer’s doorstep.

Case Study: Amazon Great Indian Festival Metrics (Diwali festival season in October) 2023 vs 2025

Metric

2023

2025

Trends

Sales in USD – billion

11.68

14.26

22%

Average Order Value (AOV)

43

52

21%

Metro Cities Covered

3

5

 

Pin Codes Delivered

18463

21743

18%

Fulfillment Centers

118

269

128%

Sortation Centers

43

117

172%

Delivery Stations

102

497

387%

 

Sales and Average Order Values (AOV’s) deal with large volumes of data based on continuous financial metrics and were measured based on counterfactual baseline over years and analyzed the correlation patterns between SDD uptake and AOV.  Sales (2023 – $11.68 billion vs 2025 - $14.26 billion) were compared using Paired T- Test and results, there is no statistically significant difference between Amazon Great Indian Festival sales performance in 2023 and 2025 at the 95% confidence level (p = 0.390), even though the trends showcase 22% increase from 2023 to 2025.

Average Order Values are calculated as: Total Revenue / Total Number of Orders. A Paired T-test was conducted to compare Average Order Value between 2023 and 2025. The mean AOV increased from $43 to $52. The test showed a statistically significant difference (t =- 11.3, p = 0.001, a = 0.05). Therefore, the increase in AOV is both statistically and practically significant. Grounded on Paired Analysis (paired t) we observed Average Order values improvements based on data design of pin codes, metro cities and product categories.

With regards to expansion i.e. Metro cities from 3 to 5 and pin codes from 18463 – 21743, applied regression and multiple regression models. We replaced hypothesis testing with our internal system modeling. Using capability analysis, the cost per order is measured which suggests process capability improvements through routing, batching and delivery slot optimization.

Based on above data, MBB methods and AI tools helped in identifying the hidden shopping patterns in customer behavior i.e. easy to predict who buys and opts for Same day delivery for products such as Amazon devices (Echo, Fire, Kindle), Furniture, Fashion (Clothing and Shoes), Beauty (Skincare), Toys and Games (Kid’s products mainly aged between 7-12 years). This helped to design top deals across various products with affordable finance options. Another dimension assisted to identify is the customers pattern who opted for same day delivery who tend to buy higher-margin electronics and home-appliances, however the same day delivery costs spiked disproportionately due to late-night deliveries, and this was not identified due to limited options while doing manual testing.

Apart from Segment-level analysis between pin codes & delivery slot, an additional correlation analysis was performed between costs (logistics) vs delivery pin codes and peak time delivery hours. To speed up the same-day delivery, we investigated the delivery timing patterns with logistics team and redesigned the delivery timings using time-series analysis from late at night to morning (8 AM to 11 AM), midday (afternoon 12 PM to 4 PM) and evening (5 PM to 9 PM).

This aided to detect the cost of hotspots. To reduce costs and meet Same day delivery event, we worked on establishing and expanding fulfillment networks i.e. increased Fulfillment centers (FC’s) to 269 from 118, sortation centers (SC’s) to 117 from 43, and delivery stations (DS’s) to 497 from 102, advanced inventory placement, and engagement with local last-mile partners (transportation) led to increased customer satisfaction metrics on meeting the same day delivery parameter.

Between FY 2023 and FY 2025, the Amazon Great Indian Festival growth experienced a data established AI-enabled decision-making, when administrated with robust data-credibility checks, and drove non-linear growth in revenue, customer reach, and reduce cost efficiency. It is a proven fact that hypothesis testing alone is insufficient where Master Black Belts must redesign experiments, prefer paired and segmented analysis, apply casual modelling method by reducing mean comparison. Statistics provide variances and trends whereas an MBB must provide reliability.

 

There is a project which has been taken to improve sales conversion project by 10%

 

Problem statement:-

The project has been taken to improve sales conversion project by 10%, which will directly improve incentive level

The apporach adaopted process based study

Brainstorming conducted for Process based study and data collated on the causes identfied

 

Post which hypothesis test was done to validate the impact  of causes on Conversion

Test 1

Ho: Covnersion rate is same across all agent

H1: Conversion rate differs across agents

Test used One way anova

P>0.05 null stays

 

Test 2

H0: AHT has not impact on conversion

H1: AHT has impact on conversion

Logistic regression used to check the impact,

P>0.05 , null stays

 

Also, tested (test 3)

Is any specific product has an impact on Sales conversion

HO: No specific product has an impact on sales conversion

H1 : Specific product has an impact on sales conversion

P<0.05 – product has an impact on conversion

 

The key factor product has an impact on sales converion has given signnifcant impact

Basis which Recommender  ie prediction model has been suggested to create AI based model for specific products which lead to more conversion and also product combination which can be sold with other has been analyzed basis AI model to focus mainly on key combination product

 

In the improvement phase 2 batches created

There is no difference in sales conversion rate between a

  • One batch of 5 for all any combination product to improve sales

  • Another batch of 5 only for AI recommended combination product to improve sales

Basis 2 proption test  observed AI based combination product was gving better output as compared to all product

 

So, have noticed hypthesis testng can be used to validate the AI outcome

 

Summary of the above exercise:-

There is a clear distinguish between hypothesis test and AI prediction

AI has suggested combination product while  Hypothesis test used to statisically validate the conversion uplift

  • Solution

Domain: Aerospace MRO - Engine shop for CFM56/LEAP turbofans performance restoration visits

(€220M turnover facility ~1,800 shop visits / year, focus on reducing module TAT and cost while maintaining zero escapes on critical parts)

Specific Lean Six Sigma Project: Reducing High Pressure Turbine Blades rework rate from 42% To less than 25% On Performance Restorations

(This is a complete high cost driver worth nearly €4 - 6M on an annual basis, given unexpected coating wear that we notice, coolant hole blockage or tip rub forcing rework delays in TAT by 8-12 days per engine. The project began Q1 2025, utilizing AI-based predictive analysis of the borescope images and oil debris pattern recognition.)

How the MBB treats AI-generated insights in this project

1. Forming or testing hypotheses

Traditional LSS: MBB first conducts brainstorming, resulting in hypotheses like “coating peal off is caused by EGT exceedances above 50°C cumulative. ” Verified using designed controlled experiments and / or regression analysis.

With AI: The model shows correlations instantly (for example: “HPT rework 68% correlated with borescope images of micro-cracks near cooling holes + oil debris iron particles > 15ppm last 500 hours”).

MBB Role

·       Consider these AI results as hypothesis generators rather than conclusions.

·       Turn the AI correlation into a testable null / alternative hypothesis, e.g., H₀: No difference in rework rate between high iron oil lots and low iron oil lots.

·       Run confirmatory DOE or stratified sampling – no AI pattern should ever be taken as ‘causation’ without this.

·       Document: “Terminal Object: ‘AI suggested X → we formed hypothesis Y → tested via Z’””

2. Establishing statistical confidence

Also, it provides probability scores, such as “92% confidence this lot will need HPT rework” but rarely includes p-values, degrees of freedom or power.

MBB must:

·       Demand transparency in Data flow : force the AI team to show their statistical method, such as random fore sets, SHAP values, or Bayesian posterior probabilities.

·       Re-run important patterns using classic statistics: t-test, chi-square, logistic regression, etc. on hold-out data.

·       Set hard thresholds: AI insights only actionable if classical p-value < 0.05 and effect size > medium (Cohen's d > 0.5 or OR > 2).

3. Assessing data quality and credibility

AI Agent is only as good as the training data set, so all the good, bad and covering full tolerance band must be used for training. In MRO, the historical borescope images and oil reports are noisy with several variations (different inspectors, varying illumination, and inconsistent sampling).

MBB safeguards:

·       Check data lineage audit: who collected, when, under what conditions? Reject datasets if greater than 15% missing/mislabeled.

·       Use stratified sampling to check for bias: for example, do high hour engines dominate the sample?

·       Run inter-rater reliability on borescope annotations. Kappa >0.7.

·       Never blindly trust AI predictions on ‘Black Swan’ events — If the model is trained on <10 similar cases, treat as low credibility.

AI accelerates decisions in

a)      Early Analyze phase: pattern discovery – 5 – 10 potential X’s in hours vs weeks of manual Pareto / fishbone analysis.

b)      Screening: quickly eliminate weak signals (AI correlation <0.3: drop the hypothesis).

c)      Simulation: Test “What-If” Situations (e.g., additional inspection predicted to lead to 18% reduction in re

Where traditional statistical validation is non-negotiable

·       Causation Claims: AI provides correlation (iron particles + rework) – MBB requires DOE or natural experiment to establish cause.

·       Critical to Safety Features: Changes to the HPT system that affect HPT integrity require p < 0.01 + power > 0.9.

·       Control phase: sustainment metrics including rework rate after change, again using control charts – here, monitoring is done by AI, but control limits in terms of what constitutes out of control remain set by SPC.

Practical outcome after 7 months

·       Rework rate is now 23.8%

·       TAT savings approx 9.2 days / engine on average

·       No escapes - since MBB applied classical validation on each major insight

·       Team trusts AI because it is “AI suggests àWe Validate à We prove it”.

Bottom line from the engine teardown bay

AI is an amazing needle finder, and it screens hypotheses better than we do. But in aerospace MRO, causation is king, safety is non-negotiable, and regulators don’t accept "the neural net said so".

The MBB’s task is to ensure that DMAIC is keeping: leveraging AI to make discovery happen faster, relying on classical stats to validate everything, and never confusing correlation with proof.

That balance makes AI a serious accelerator of real improvement rather than a shiny toy.

Domain: Polymer Chemical Industry

Project: To increase the over all plant utilisation from 80% to 90%

Process Explanation: The plant basically contains two kinds of polymer reactions. SBR-Styrene-Butadiene Reactions and ABR-Acrylic Based reactions. The same reactor is used for both and both are batch process. The reactor size is 40 KL. The average batch size is 33 KL per batch. This is to accommodate a the pressure build up in the reactor. For ABR the pressure build up goes upto 2 bar incase of a stable reaction. For SBR the pressure build up actually goes upto 6 Bar under stable conditions. The batches can be taken back to back with a mild steam flush at 3 Bar and 120oC. There are 3 ways to increase the output.

  1. Increase the batch size keeping the cycle time Constant: To keep the cycle time constant, we need to increase the flow rate of chemicals into the reactor. Increase in the flow will result in lower reaction time for the chemicals. This might lead to unreacted chemicals, lower heat dissipation and vapour formation which leads to pressure build up.

  2. Reduce the cycle time by increasing the flow keeping the batch size constant: This results in the same above problems but pressure build is less.

  3. Reduce the cycle time by increasing the flow and also increasing the batch size : This results in all the issues and is a more riskier process.

We produced around 27 odd batches which involved a total of 350 Raw materials. The ratio also varies across each batch. Varying the ratio impacts the quality and application. It also affects downstream processes. For SBR the minimum time is 4.5 hrs and for ABR the minimum time is 3.5 hrs. Time cannot be reduced below this.

MBB Vs AI

AI: AI is able to analyse the data much faster than the MBB. All parameters can be studied in one go. Simulations can be made. Hypothesis can be designed. It can even design a DOE.

MBB: The role of MBB is to recheck the hypothesis. Ensure that limits are taken into account. Verify the designed DOE. Get the practical experience into the system.

In simple words, AI helps to speed up the process which we as MBB would do manually and also increase the number of data points to get accurate results. In manual process we may miss out data options. But AI can be used to analyse all the data at once. But causations have to be rechecked.

  • Author

🏆 Best Answer: Adil Khan
Exceptional depth and rigor. Clear project grounding, disciplined separation of AI insight vs proof, and strong ownership of evidence standards. Exemplary MBB judgment.

 Approved: Smitha Muralidharan
Very strong, well-structured response anchored in a real initiative. Clear thinking on evidence, risk, and ownership, with practical decision boundaries.

 Approved: Ankit Kulkarni
Solid manufacturing case with high business and safety stakes. Strong clarity on where AI accelerates and where human-led validation remains essential.

 Approved: Rabiya Bronekar
Relevant initiative with clear use of data and validation logic. Demonstrates good understanding of how AI insights should be tested before action.

🟡 Approved (Conditional): Taby Sheikh
Thoughtful and insightful, but at times too broad. For future questions, anchor your response more tightly to a single concrete initiative and keep the explanation sharply focused on decision-making in that context.

🟡 Approved (Conditional): Bharath CN
Good domain knowledge and intent. For future questions, present your ideas through one clearly defined example and ensure conclusions are communicated with crisp structure and clarity.

🟡 Approved (Conditional): Vijay Yivaturi
Strong business exposure and scale. For future questions, narrow the narrative to one specific initiative and clearly articulate how judgments are made in practice.

🟡 Approved (Conditional): Abhinandan Kunder
Relevant context and experience. For future questions, strengthen responses by grounding them in a clearly scoped example and explicitly stating the reasoning behind key decisions.

Not Approved

Suman Acharjee – AI content: 71%
The response relies heavily on AI-generated structure and language, reducing originality and ownership of thinking.

Vijay Gonsalves – AI content: 100%
Entirely AI-generated. Does not meet the forum requirement for personal reasoning and applied insight.

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.