Poornima_Gupta_aZ3h

Members

Joined
April 4Apr 4
Last visited
June 9Jun 9

View Profile Find content

Newbie

Current rank (1/14)
View all
Recent Badges
View all
- Rare

Name
Poornima Gupta

The recent visitors block is disabled and is not being shown to other users.

Should AI Predict Who Is About to Quit?
Should AI Predict Who Is About to Quit?

Poornima_Gupta_aZ3h replied to Vishwadeep Khatri's topic in We ask and you answer! The best answer wins!

Diagnose systems, never score people: I take View B without qualification. Organizations should not act on individual predictive attrition signals. My reason is not that prediction is unkind. It is that the prediction does not measure what it claims to measure, and acting on an invalid measurement does damage you cannot undo. View A fails on its own terms. An attrition model is sold to you as a measure of who will leave. What it actually measures is who currently resembles the people who left before. Those are two different things, and the gap between them is structural. You cannot engineer it away. Let me run two attacks at View A. Each one on its own can be argued with. Together they cannot. Attack one, construct invalidity: the model does not measure intent to leave. Attack two, reflexive corruption: the act of using it destroys whatever validity it had and bakes its own errors into its future evidence. View A has to defeat both. It defeats neither. 1. The foundation: six disciplines, one verdict. This is an argument about what the model can actually know, not about whether using it is ethical. That matters, because it means no future upgrade fixes the problem. A better-built version still measures the wrong thing. Six separate fields, each from its own first principles, land in the same place. Mathematics: the base-rate trap. Quitting is rare, and that rarity alone defeats even a good model. Picture 1,000 employees where about 12% leave in a year: 120 real leavers, 880 stayers. Run a genuinely good model that catches 80% of leavers and correctly clears 80% of stayers. It flags about 96 of the 120 real leavers, but it also wrongly flags 20% of the 880 stayers, roughly 176 loyal people. The manager opens a list of about 272 names and only 96 are real. Around 65% of everyone flagged is actually staying. Two out of every three people you have labelled a flight risk are loyal. The reason is not a weak model. It is that leavers are rare, so even a small error rate on the large group of stayers drowns out the true signals. Improving the model slightly barely touches this. You cannot engineer away the rarity of quitting. Statistics: Goodhart's Law. When a measure becomes a target, it stops being a good measure. The moment absenteeism or message tone becomes the thing that triggers action, those signals stop telling you about intent and start telling you that people know they are being watched. The predictive power you saw in the pilot is gone the day you deploy. Psychology: the self-fulfilling prophecy. This is the Pygmalion and Golem effect, well documented since Rosenthal and Jacobson in the 1960s. Tell a manager that someone is a flight risk and the investment quietly stops. Fewer stretch assignments, guarded conversations, a backfill plan started in the background. The employee feels it and pulls away. The label produces the very exit it predicted. Cognitive science: the ecological fallacy. A group base rate ("people with feature X left at rate R") does not give you the right to a verdict about an individual ("you will leave"). Robinson named this error in 1950: forcing a group correlation onto a single person. Let me make it concrete. A child who misses a lot of school looks, to any attendance model, like a future low performer. That is the group pattern. But my own daughter did exactly that. She found class boring, so she taught herself what she missed in half a day and recharged in the other half, and did perfectly well. The signal said "at risk." The child said otherwise. A model fed her attendance would have flagged her with full confidence and been wrong, because it pressed a group correlation onto a person it did not fit. That is the same thing an attrition model does to an employee. Biology, ecology and physics: the observer changes the system. An employee is not a fixed specimen you can read off a slide. The same person behaves differently the moment their environment shifts, so add a new pressure, managers acting on scores, and the whole system rearranges around it. Physics gives the general version, that observing something disturbs it. It is worse here, because this system senses the observer's intent and reacts to it. You are never measuring something that holds still. Six fields, three of them quantitative, none borrowed from the others, all reaching the same verdict. The individual-level prediction is not a valid measurement, and using it is precisely what destroys its validity. Better data, fairness constraints, human review: all of these sit downstream of the problem and none can reach it. And the real reasons people leave are often invisible to any behavioural signal anyway. Quiet boredom in a role done too long. A team with no new learning to offer. A manager with no budget to grow the team, so the workload slowly becomes unsustainable. None of these leave a clean mark in absenteeism or message tone. The model cannot see the cause, so it grabs the residue it can see and calls that a prediction. 2. What the model is really measuring instead of intent. This goes further than a vague "AI is biased." I want to name exactly what the model substitutes for intent, and who pays for it. "Communication behaviour," "engagement," and "workload signals" are all scored against an unspoken baseline: a neurotypical employee, no caregiving load, a standard working rhythm. Think about who sits away from that mean as a stable trait, not a sign of leaving: an autistic employee whose sentiment scores read flatter and who keeps the camera off; an ADHD employee whose activity comes in irregular bursts; a caregiver whose calendar is compressed or non-standard; someone in a different time zone or culture whose communication norms differ. The model cannot tell "this person has always communicated like this" apart from "this person is withdrawing before they leave." So it over-flags the neurodivergent and the caregiver, systematically. View A does not just fail at measurement in the abstract. It fails in a discriminatory direction, turning a diversity characteristic into a risk score and handing managers permission to act on it. Put plainly: the attrition model is a neurotypicality detector wearing a retention model's clothes. The stakes are lopsided too. The data collection itself, the email patterns, calendar, login times, sentiment reading, is intrusive surveillance employees rarely agreed to, run by the party with all the power against the party with no equivalent insight and no recourse. A wrong flag costs the scorer nothing. It can cost the scored everything. 3. Conceding everything true about View A, and why it still loses. Here is View A at full strength. Argument 1. "This is just good management. Surfacing problems early helps the employee too." I concede that completely. But the prediction does none of the actual work. Every genuinely good fix, a bad manager, pay that has fallen behind, a crushing workload, is justified by the underlying condition, not the forecast. If pay is unfair, fix it for everyone underpaid. You never needed a flight-risk score to know that underpaying people is wrong. Delete the prediction and you lose none of the legitimate benefit while losing every illegitimate use. It is all downside. Argument 2. "Peers report something like a 25% drop in turnover. It clearly works." I concede the number moved. But this is survivorship and confounding dressed up as cause and effect, and it is exactly the too-clean statistic this forum warns us about. You cannot see the leavers you supposedly saved, because they did not leave. Turnover falls for a dozen tangled reasons, and a soft job market alone will do it. A model can cut turnover while being wrong about individuals two times out of three, simply by triggering a wave of well-meaning spending. The number tells you the spending worked, not whether the score was right. I take the IBM version of this in section 6. Argument 3. "Then just retrain it, or add fairness constraints." I concede you can shrink some measured disparities. But you cannot retrain your way out of a construct-validity failure. You would only compute a sharper estimate of the wrong quantity. Worse, every retrain after deployment learns from data the deployment already corrupted (section 4). You do not converge on the truth. You converge on your own past behaviour. Argument 4. "Doing nothing is also a choice. Attrition really is expensive." I concede this, and it is genuinely View A's strongest point. But the alternative to a hidden score is not doing nothing. It is acting on signals that are disclosed, present, and freely given: an employee who raises a concern, a manager who notices a real, nameable problem. That is very different from acting on something inferred, future, and covert. View B acts. It just refuses to act on a verdict the data cannot support. 4. The Irreversible Loop I would refuse this system even handed a better model than any that exists, because of what builds up over time. Year one, the score is "just one more input." By year three it carries the weight. Managers defer to it. Promotions and stretch work quietly route around anyone it has flagged. The organization slowly loses the ability to read its own people directly, the way most of us lost the ability to read a map once the phone took over the turns. There is a name for this in aviation and clinical decision-making: automation complacency, first documented by Parasuraman and colleagues in the 1990s, and the unsettling part is that it shows up in experts as much as novices. Practice alone does not fix it. Then the loop closes. Let's take an example. An employee gets falsely flagged. They receive less investment, slowly disengage, and eventually leave. Next year, their exit shows up in the training data as a true positive. The model's mistake has become the model's proof. You can never audit it, because you cannot see the careers it quietly cut short. There is no body to find. An organization that builds this is not buying foresight. It is building a machine that turns its own suspicion into fact and calls that accuracy. The harm is invisible, it compounds, and it launders itself. 5. The evidence: one failure mechanism across eight industries, banking included. I mark each case as documented or illustrative, so you can see what carries evidential weight and what is there to make a point. # Industry Case Proxy the model used What was actually true Status 1 Healthcare Obermeyer et al., Science, 2019 Cost as a stand-in for health need Less had historically been spent on Black patients, so the model decided they were healthier and under-referred them. Fixing the proxy would have raised the share of Black patients flagged for extra care from 17.7% to 46.5% Documented, peer-reviewed 2 Technology / HR Amazon recruiting model (Reuters, 2018) Résumé resemblance to past hires Penalised the word "women's," downgraded female candidates, scrapped Documented 3 Banking (mine) Credit and conduct proxy models under fair-lending and SR 11-7 Behavioural proxies for default or conduct risk Decades of doctrine exist because proxies encode historical bias, which is why explainability and high-risk governance are mandatory Documented regulatory regime 4 Public sector Netherlands SyRI and childcare-benefits (toeslagenaffaire) An algorithmic fraud-risk profile A court banned SyRI in 2020. The benefits algorithm used dual nationality as a risk flag, wrongly accused around 26,000 families, and the government resigned in 2021 Documented 5 Education UK A-level algorithm, 2020 School history as a stand-in for merit Downgraded roughly 40% of results, hit disadvantaged students hardest while inflating private-school grades, then withdrawn Documented 6 Policing Predictive policing tools, such as the Chicago "heat list" Past data as a stand-in for future crime A feedback loop: enforcement sent where the model predicted generated the data that confirmed it. Several programs were shut down Documented 7 Insurance Actuarial and underwriting proxy history Proxies for individual risk Found again and again to encode protected characteristics, which is why protected-class underwriting is legally constrained Documented in regulation 8 Streaming (the control case) Churn models, Netflix or telco style A behavioural proxy for customer churn Tolerated, but only because a wrong flag costs a discount coupon, not a career. Same technique, trivial stakes, no power gap Illustrative of the stakes asymmetry It is one mechanism repeating: an observable proxy stands in for something deeper you cannot see, carries the bias of the past inside it, and once acted on, manufactures the future that proves it right. Attrition prediction is the same machine, only now pointed at your own staff. The only thing that changes across these eight is the cost of a wrong flag and the size of the power gap between the one scoring and the one being scored. Banking is the case I stake my position on, because it is mine. We already settled this question, but for customers. We are not allowed to deny someone credit on an unexplainable proxy model. SR 11-7 model-risk governance, fair-lending law, and adverse-action requirements force us to explain and to test for bias, precisely because we learned the hard way that proxy models carry the past forward. An attrition score on an employee is structurally the same thing as a risk score on a borrower. It just lacks the guardrails we already decided were non-negotiable for our customers. So my position is simple. If my bank would not let an unexplainable proxy model quietly deny a customer a loan, I am not going to let one quietly deny my colleague a stretch assignment. The doctrine already exists inside the bank. We have just never turned it inward. 5b. The deeper point. The metric is the real failure, not the machine. Take the AI away and the failure still happens. That alone tells you the AI was never the problem. The problem is acting on a proxy metric at all. AI does not invent this mistake. It industrialises it, at scale and speed. Two well-documented cases, no algorithm in either. Wells Fargo. The bank measured products per customer, the cross-sell metric, the famous "Eight is great," believing it captured the depth of the customer relationship. Once that metric became the target people were judged and fired against, it stopped measuring relationship quality and started measuring fear. Regulators imposed an initial 185 million dollar penalty in 2016, and a later review put the number of unauthorized accounts at around 3.5 million, with total costs running into the billions and executives pursued for years. The metric measured the wrong thing, the bank acted on it, and the result was a disaster, with no AI in the loop. The NHS four-hour A&E target. A metric meant to capture "patients are treated promptly." On paper it was a triumph: the share waiting more than four hours fell from roughly 23% to around 5% within two years. But much of that was gamed. Patients admitted at the three-hour-fifty-eight mark whether or not it made clinical sense. Ambulances left idling outside so patients had not technically "arrived." Staff pulled into A&E during reporting windows while other procedures were cancelled. The number got better; the care did not, and the sickest were sometimes left at risk. Both sit next to what economists call the cobra effect, the story of a bounty on dead cobras that bred more cobras, rewarding the metric instead of the goal. I name it as a concept, not documented history, since the anecdote is probably apocryphal, but the principle is exactly what Wells Fargo and the NHS show with hard consequences. The lesson lands straight on this debate: if acting on a proxy metric corrupts behaviour even when humans run it, handing that same proxy to an AI does not fix the flaw. It just removes the friction that used to slow it down. 6. The Boundary: Systems vs. People Conceding where View A has a point does not soften my position. It sharpens it, because it shows exactly where the line falls. View B is right about the thing that matters: you must never act on an individual prediction. View A is right only about something I am not even disputing. Let me be precise about that one thing. View A is right about one thing only, and only in one configuration: aggregate, anonymised, systemic diagnosis, with no individual identifiability and no route to individual action. If the model says "engagement in retail operations is collapsing and that division's attrition is climbing," that is a legitimate use. The construct-validity problem dissolves at the group level, where you measure a group rate against a group outcome, the exact resolution the data can support. The ecological fallacy only appears when you drop to the individual; the self-fulfilling prophecy needs an individual to label. Take the individual out and both failure modes disappear. This is why Bex's argument does not prove what Bex thinks it proves. Bex defends acting on AI predictions by pointing to IBM's reported 25% reduction in turnover. Grant the figure in full, then look at what it actually is: an aggregate, organization-level result. It tells you a bundle of retention spending across a whole population lined up with lower total turnover. It tells you nothing about whether any single prediction about any single person was valid, because the result was never measured at the individual level. So Bex's strongest evidence lives entirely inside the one domain I concede is legitimate, systemic group-level diagnosis. It never reaches the claim in dispute, that you should act on a named individual's risk score. Bex has proven the boundary, not crossed it. The IBM number is an argument for diagnosing systems, not for scoring people, which is my position exactly. A parallel from my own world: this is precisely how AML transaction monitoring is meant to work. The system flags patterns and typologies for investigation. It never convicts a person on the score alone; a human investigation has to independently establish the fact. The model points. It does not pass sentence. Attrition analytics should inherit that discipline. So the dividing line was never "AI or no AI." It is this: AI can diagnose systems. AI cannot score people. Valid at the level its data supports, the cohort, the team, the function, and invalid one level below, at the named individual. Bex is right about one narrow thing, and wrong about the thing the question is actually asking. 7. The decision-ready framework. Redesign what the system optimises for, then gate it. Do not ask humans to "override the AI." Override invites theatre, a tired manager rubber-stamping a confident-looking score. The better move is to redesign what the system is even allowed to optimise for, so the toxic artefact, the personal risk score, is never created. Control 1, the resolution gate. The model may surface signals only at a unit large enough that no individual can be identified: team, function, or site, with the minimum cohort size set by privacy review. The output is a systemic-health dashboard, never a list of names. The authority boundary is clear. The AI leads on where to look. It has no authority at all on at whom. Control 2, the multi-objective routing function. Where leadership wants to spend on retention, route the spending, not a personal score, through an explicit, auditable objective: maximize α · P(retention from fixing the condition) + β · capability gain + γ · risk-weighted bench depth − δ · individual-identifiability penalty In plain language: spend retention budget where fixing a genuine, visible problem is most likely to keep people, where it also builds the team's skills and covers your most fragile roles, and heavily penalise any option that depends on labelling a specific individual as a flight risk. The first three terms give leadership everything View A actually wanted, less turnover and stronger teams, while the last makes it structurally impossible to create the individual score I am arguing against. Every term is real and measurable: the first maps to pay-equity and workload-fairness data, capability gain to skills coverage, bench depth to succession metrics, the penalty to a hard privacy flag. This beats human-in-the-loop override for a simple reason. You cannot misuse a flight-risk list that was never created. The control is built into the structure, not left to behaviour. Control 3, a four-stage readiness gate, each test with its rationale, run before any attrition analytics ship: Construct test. Does the output claim to measure an individual's intent? If yes, stop. That construct is invalid at the individual level (section 1), so going ahead is malpractice. Resolution test. Is the smallest unit of output a protected, anonymous cohort? Validity holds at the group level and collapses at the individual one. Disparity test. Calibrate the flags separately against neurodivergence, caregiving, time zone, and protected-class proxies. The bias lives inside "communication behaviour" (section 2), so you have to go looking for it on purpose. Counterfactual-action test. Can every intervention be justified by the underlying condition alone, with the prediction deleted? If not, it is illegitimate. A genuine fix never needs the score. A manager could run this gate on a Monday morning. That is the test of a real framework rather than a slogan. In the interest of honesty, here is how my own View B can fail. Run the resolution gate carelessly and someone games it by drawing cohorts so small they re-identify individuals, so you set a minimum cohort size. A systemic dashboard can still be used to punish a whole team, so the rule is that action targets the condition, not the cohort. And "only act on disclosed signals" can under-serve those who suffer in silence, which is exactly why the systemic dashboard matters: it surfaces the silent, structural problems. View B was never "do nothing." It is "act only on what you can actually, validly know." Closing, back to where I started. The question hides its conclusion inside a single verb: "AI can predict which employees will leave." It cannot. It can identify who currently resembles the people who left before, a very different and far more dangerous thing, and treating that resemblance as destiny is exactly what turns a shaky forecast into a self-fulfilling one. The individual measurement is invalid, deployment launders its own errors back in as evidence, and the bias falls hardest on the neurodivergent and the caregiver. Every legitimate good View A promises is available without the prediction at all, by acting on the conditions you can actually see and fix. So. Diagnose systems, never score people. Act on what is disclosed, never on a covert verdict. View B, without qualification.
- May 23May 23
- 19 replies
Should AI Decide Which Projects Deserve to Survive?
Should AI Decide Which Projects Deserve to Survive?

Poornima_Gupta_aZ3h replied to Vishwadeep Khatri's topic in We ask and you answer! The best answer wins!

Position: View B — Continue the project. The AI's prediction is a warning to investigate, not a verdict to execute.I take View B without qualification. The project should continue. The AI recommendation should be logged, the underlying signals should be examined, and the sponsor should be required to respond to them — but the termination decision itself must stay with humans. To hand it to an AI in the conditions described is to commit a category error about what AI can and cannot know. Below is why, the eight cases that prove it, the strongest objections answered, where View A is genuinely right, and the framework I would put on the table on Monday morning. The real issue the question is actually asking about The dilemma is presented as "trust the data" versus "trust the sponsor." That framing flatters the AI. The genuine question is narrower and harder: on what reference class is the AI's failure prediction trained, and is the project in front of it a member of that class? Every AI prediction model — whether logistic regression, gradient-boosted trees, or a transformer — works by pattern-matching the present situation against historical outcomes. Its accuracy is bounded by what statisticians call the stationarity assumption: that the future will resemble the past. For routine, high-volume, well-understood processes (call-centre staffing, fraud scoring, loan default prediction), that assumption holds and AI prediction is genuinely superior to human judgement. For transformational initiatives — the word is in the question itself — the assumption collapses. The whole point of a transformation is that it is not drawn from the historical reference class. There is a deeper, less obvious flaw buried in the training data itself: survivorship bias. The AI learns "what failure looks like" from the projects in the organisation's history that ran long enough to produce a recorded outcome. But the boldest transformations are precisely the ones most likely to have been killed early in the past — so they never generated a "success" label for the model to learn from. The model is therefore structurally taught that ambitious, slow-burning, signal-noisy projects fail, because the counter-examples were terminated before they could prove otherwise. The AI is most confident about killing exactly the category of project on which it has the least valid evidence. It is reading the bullet holes on the planes that came back, and concluding the engines are safe. This is reinforced by Clayton Christensen's argument in The Innovator's Dilemma (HBR, 1995, expanded 1997): the projects most likely to disrupt an organisation are precisely those that look like underperformers by conventional metrics in their early years, because they serve a market the existing measurement system was never designed to see. And Nassim Taleb's distinction in The Black Swan (Random House, 2007) gives it a name: AI prediction lives in Mediocristan (predictable, average-driven worlds where extremes are bounded), while transformational initiatives live in Extremistan (outlier-dominated worlds where a single result dominates everything else). Stopping a project in Extremistan because its early signals look like a typical Mediocristan failure is the textbook error. You are asking a forecaster trained on coin flips to rule on a lottery ticket. An AI trained on the existing measurement system will systematically flag disruptive initiatives for termination. It is not malfunctioning. It is doing exactly what it was built to do — and that is the problem. Eight real-world cases where killing the project on the early signals would have destroyed the prize # Industry Project What the early "AI signals" would have shown What patience actually produced 1 Aerospace SpaceX Falcon 1, 2006–08 Three consecutive launch failures, $100m of personal capital exhausted, no commercial revenue, milestone slippage in every quarter Fourth launch reached orbit on the last funded attempt (Sept 2008); NASA's $1.6bn CRS contract followed in Dec 2008; SpaceX now performs more orbital launches annually than any other launch provider on Earth 2 Pharmaceuticals / biotech Katalin Karikó's mRNA research, 1989–2013 Continuous grant rejections, four formal demotions at the University of Pennsylvania, no commercial output for two decades — every conventional milestone signalled failure Underpinned the BioNTech-Pfizer and Moderna COVID-19 vaccines; Nobel Prize in Medicine 2023; estimated millions of lives saved 3 Consumer products Dyson Dual Cyclone vacuum, 1979–93 5,127 failed prototypes over 15 years; wife working as art teacher to fund household; no licensee in the UK industry — Bob Sutton noted this was a "textbook case" of what an AI would call escalation of commitment Created a category that disrupted the entire global vacuum market; Dyson is now a privately held conglomerate worth over £20bn 4 Streaming / media Netflix streaming pivot, 2007–11 Cannibalised the profitable DVD-by-mail business; the 2011 Qwikster split lost ~800,000 subscribers in a single quarter and the stock fell ~75%; Hastings publicly apologised Foundation of the modern subscription economy; Netflix shares rose 6,744% from end-2009 to end-2020 vs. S&P 500's 237% over the same period 5 Banking — UK / global HSBC Dynamic Risk Assessment (with Google Cloud), c.2018–2021 A long, data-intensive ML build to replace a legacy rules-based AML system; the hardest, slowest part was getting years of fragmented transaction and KYC data fit to train on — milestone slippage, sustained spend, no production output for an extended period On completion, detected 2–4x more genuinely suspicious activity than the legacy system while cutting false-positive alert volumes by over 60% and compressing analysis from weeks to days; now monitors over 1bn transactions/month and won Celent Model Risk Manager of the Year 2023 6 Banking — UK First Direct (Midland Bank), launched 1989 First two years showed customer-acquisition costs running far ahead of forecast and contribution margin deeply negative; internal scepticism that a "branchless" bank could work in the UK First Direct became the highest-rated bank in the UK for customer satisfaction for over two decades; the template for every UK digital bank that followed, including Monzo, Starling and Revolut 7 Industrial / energy Tesla Model 3 production ramp, 2017–18 Musk publicly called it "production hell"; multiple missed targets; cash burn so severe that analysts including Goldman Sachs predicted insolvency; an AI trained on automotive launches would have triggered termination Model 3 became the best-selling electric vehicle in the world; Tesla's market capitalisation crossed $1 trillion in 2021 8 Pharma — gene therapy Novartis CAR-T / Kymriah, 2012–17 Patient enrolment delays, FDA back-and-forth on manufacturing, treatment costs that seemed commercially unviable, multiple stoppages First FDA-approved gene therapy in the US (2017); foundation of an entire treatment modality for paediatric leukaemia and lymphoma The pattern is consistent. Every transformation that mattered looked, in its third quarter, exactly like a failing project. The signals the AI in the question is being asked to weigh — milestone delays, budget consumption, decision bottlenecks — are the very signals that a transformation, by definition, generates while it is being built. They are not symptoms. They are the work. The asymmetric payoff that the AI cannot see The case for View B is fundamentally a payoff-asymmetry argument, not a probability argument. Even if the AI is technically correct that the project has, say, a 70% probability of failure, the question is not "what is P(failure)?" — it is "what is the expected value, weighted by the asymmetry of outcomes?" Decision If project would have succeeded If project would have failed Continue (View B) Captures the full upside — potentially transformative (Karikó, Dyson, SpaceX, HSBC's AML detection) Loses incremental investment from today onward — finite, bounded, recoverable Terminate (View A) Loses the transformative outcome forever; loss is unmeasured because it never appears on any P&L Saves incremental investment from today onward In Mediocristan projects, the two losses are symmetric — and View A wins. In Extremistan projects, the upside loss is potentially infinite (a vaccine that saves millions, a launch capability that reshapes an industry, an AML system that catches multiples more financial crime) and the downside loss is finite (a few more quarters of burn). When the payoffs are this asymmetric, expected value mathematics inverts the apparent verdict of the probability model. This points to the real fix, and it is not "put a human in the loop to overrule the machine" — that is babysitting a system whose objective is wrong. The fix is to redesign what the AI optimises for. A failure-predictor trained to maximise P(success) is answering the wrong question. The decision-relevant quantity is expected value under a convex payoff — the mathematical property that makes the upside disproportionately large relative to the bounded downside. Formally, the routing function should maximise: E[V] = α · P(success) · V(success) − β · (burn rate × time remaining) + γ · O where V(success) is the magnitude of the upside (the term that explodes in Extremistan and which a pure P(success) model discards entirely); the middle term is the downside, which is finite, bounded, and recoverable — you only ever lose the forward burn; and O is the option value of keeping the bet alive to learn more before committing further (Dixit & Pindyck's real-options logic, Investment Under Uncertainty, 1994). The coefficients α, β, γ are set by the board's risk appetite, not by the model. A multi-objective routing function is the right instinct. But it has to be anchored in this — expected value under convexity — and not in proxy objectives like "capability gain" or "resource-utilisation depth." Those proxies are themselves Mediocristan metrics: easy to count, and therefore exactly the kind of measurable-but-incomplete variable the McNamara Fallacy warns against. Optimising a transformation decision on proxy metrics is a more sophisticated way of making the same mistake. The only objective that survives the Extremistan critique is one in which V(success) — the size of the prize — is a first-class term. A model that cannot represent the magnitude of what it might be killing has no business recommending the kill. There is a name for the underlying error, and it is worth stating because it is exactly what the AI is doing. The McNamara Fallacy — named after Robert McNamara, the US Defense Secretary who measured success in the Vietnam War by enemy body count because it was the variable he could most easily quantify — describes the trap of treating what is measurable as the whole of what matters. The fallacy runs in four steps: measure what is easy to measure; disregard what cannot be measured; then assume what cannot be measured is unimportant; and finally conclude that what cannot be measured does not exist. The AI in the question measures milestone delays, burn, and bottlenecks because those are countable — and is structurally blind to strategic optionality, organisational learning, and the sheer size of the eventual prize, because those are not. It will report that the war is being won on the numbers, right up to the point the organisation loses it. This is the same logic that underpins venture capital portfolio construction (Sequoia's published doctrine that a single 100x return justifies a fund full of zeros), and it is why the question framing — "high probability of failure" — is a red herring. Probability is not the deciding variable. The four strongest objections to my position — and why none survives contact Intellectual honesty requires meeting View A at its strongest, not its weakest. Here are the four most serious objections, each conceded on its own terms, then answered. Objection 1: "You are rationalising the sunk-cost fallacy. Staw's escalation-of-commitment research is real, and View B is exactly the cover a biased sponsor uses to keep digging." Conceded — fully. Escalation of commitment is real, well-evidenced (Staw, 1976), and is precisely what kills organisations that confuse persistence with progress. But this objection defeats passive continuation, not my position. My Step 3 requires the sponsor to commit, in writing, to forward-looking termination triggers — specific, measurable, time-bound. That is the documented antidote to escalation: it removes the sponsor's discretion to move the goalposts. I am not defending the sponsor's right to persist; I am replacing their judgement with pre-committed kill criteria. The objection lands on View B's caricature, not on the protocol. Objection 2: "Survivorship cuts both ways. For every Dyson and Karikó there is a graveyard of zealots who persisted into bankruptcy. You are showing me the winners and hiding the losers." Conceded — completely, and it is the strongest objection. Yes: most persistent bets fail, and a list of survivors proves nothing on its own. But this is why my argument is built on payoff asymmetry, not on success rates. In a convex-payoff portfolio you do not need most bets to win — you need the rare winner's magnitude to exceed the sum of the bounded losses. Venture capital is a standing proof: most investments return zero, and the model is still rational because one outcome can return the fund many times over. The graveyard is not evidence against the strategy; the graveyard is the strategy's accepted cost. The objection assumes we are counting wins. We are weighing magnitudes. Objection 3: "Then just retrain the AI on transformation data. The problem is a bad model, not the principle of AI-led termination." Conceded in principle — defeated in practice. If a valid reference class of comparable transformations existed, the AI's outside view would be sound and View A would win. But genuine transformations are, by definition, low-frequency, non-stationary, and heterogeneous — there are too few, too dissimilar, and the world they operated in no longer exists. This is not a data-volume problem that more training fixes; it is a structural property of the phenomenon. No amount of more data or better architecture repairs it, because the failure is in the reference class, not the model. That is what makes this objection's remedy unreachable rather than merely difficult. Objection 4: "Your protocol hands every sponsor a permanent excuse. 'It's Extremistan, the AI can't judge it' becomes the universal defence, and nothing ever gets killed." Conceded — this is the real danger, and View A is right to fear it. A framework that protects transformations must not become a framework that protects everything. That is exactly why Step 4 makes the AI's signal escalate — louder, more frequent, board-level — rather than disappear, and why "Extremistan" is not a label a sponsor may simply assert. It must be argued in Step 2 against explicit criteria (low-frequency, non-stationary, convex payoff), and the burden is on the sponsor to demonstrate membership, not merely claim it. The routine majority of initiatives are Mediocristan; they fail that test and remain squarely in the AI's domain. The protocol protects the rare transformation precisely by refusing to protect the routine project. The conclusion is therefore earned, not assumed: View B survives its four strongest objections, and each objection, properly answered, turns into a feature of the framework rather than a hole in it. Where View A is genuinely right — and why this case is not one of them I will not pretend View A has no domain. It does, in a precise zone: Routine IT migrations with rich historical reference classes — RBS 2012, where a corrupted update to the bank's overnight batch-processing system locked 6.5m customers out of their accounts and left 100m payments unprocessed (£125m in remediation, £56m in fines), and TSB 2018, where a "big bang" cutover to a new core banking platform went live with 2,000 known defects, locking out millions and even exposing some customers' accounts to strangers (£330m loss, 80,000 customers gone, CEO resigned). Both were flagged internally by engineers before go-live, and both fit AI's Mediocristan zone of competence. Stopping these — or at least delaying go-live — would have been the right call. Compliance and operational-resilience projects where the universe of possible outcomes is bounded and well-characterised. Cost-reduction programmes with linear, additive payoffs. The distinguishing feature of the View A zone is that the project's success criteria are well-defined upfront, the reference class is rich, and the payoff distribution is roughly symmetric. In those conditions, AI prediction outperforms human judgement — particularly biased human judgement (Staw, "Knee-Deep in the Big Muddy," 1976) — and View A is correct. In practice this means an explicit allocation: AI-led termination for the routine majority of initiatives that are Mediocristan-class, human-led judgement for the minority that are genuine transformations. The case in the question, however, is described as a transformation initiative with strong executive sponsorship (which signals strategic significance, not just political protection) and political importance (which signals that the organisation has staked its forward narrative on it). These are the markers of Extremistan, not Mediocristan. View A does not apply here. The reframing: the AI does not decide. It triggers an investigation. The question implies a binary — kill or continue. This is a false choice. The correct response to an AI failure prediction on a transformation initiative is a Mandatory Investigation Protocol — neither passive continuation nor automated termination. The AI leads on detection and on escalation cadence; the human leads on the termination decision itself. Step 1 — Signal logged, sponsor informed within 48 hours. The AI's prediction and the underlying feature attribution (which signals drove the score) are sent to the sponsor and to an independent reviewer. No automatic action is triggered. Step 2 — Diagnostic, not verdictive. The sponsor and a small independent panel ask three questions: (a) Are the AI's signals symptoms of real failure (e.g., team disengagement, vendor instability) or symptoms of normal transformation friction (milestone slippage during architectural change)? (b) Has the project's reference class been correctly identified, or is this an Extremistan initiative being judged on Mediocristan benchmarks? (c) What new information would update us in either direction, and when can we get it? Step 3 — Pre-committed kill criteria, not predictive ones. If the project continues, the sponsor must commit in writing to forward-looking termination triggers — specific, measurable, time-bound events whose occurrence would close the project. Eric Ries calls these "innovation accounting" milestones (The Lean Startup, 2011). Andy Grove called them "strategic inflection points" (Only the Paranoid Survive, 1996). They turn an open-ended commitment into a series of bounded bets. Step 4 — The AI's role is escalated, not authoritative. Each subsequent AI re-prediction is logged and forces a board-level review at fixed cadence. The AI gets louder over time. It never gets the final word. This protocol gives you the best of both worlds: the AI's signal cannot be politically suppressed (Step 1 makes it visible to independent reviewers), and the AI's signal cannot prematurely kill a transformation (Step 4 keeps the decision human). It directly addresses the failure mode the question is worried about — sponsor capture — without falling into the opposite failure mode of algorithmic over-reach. Why this answer matters specifically for banking In banking, this argument has unusual weight — and the sharpest illustration comes from inside the AML function itself. HSBC's Dynamic Risk Assessment is the case that should give every View A advocate pause. HSBC set out, with Google Cloud, to replace its legacy rules-based AML transaction-monitoring system — the kind of system across the industry that closes more than 95% of its alerts as false positives — with a machine-learning system. The build was long and data-intensive; the hardest and slowest part was not the model but getting years of fragmented transaction and KYC data into a state fit to train on. Through that period the project displayed exactly the signals an AI failure-predictor weighs most heavily: milestone slippage, sustained spend, and no production output. An AI trained on historical IT-migration reference classes — on RBS 2012 and TSB 2018 — would have recommended abandonment with high confidence. It would have been catastrophically wrong. HSBC piloted the system in 2021 and is now finding two to four times more financial crime than it did previously, with much greater accuracy. It was first implemented in the UK in 2021 and has since been deployed across six markets, covering 80% of the bank's customers. The system reduced alert volumes by more than 60% while detecting 2-4x more suspicious activity, and cut the time needed to analyse billions of transactions across millions of accounts from several weeks to a few days. Note the reflexive twist that makes this case unique: the transformation being built was itself an AI — and an AI failure-predictor, judging that build against legacy reference classes, would have killed it. The cost of that termination would not have been a write-off on a P&L. It would have been the choice to keep detecting a fraction of the money laundering the bank can now see — a Type I error (a false-positive "this will fail" verdict) whose consequence is societal, not merely financial. The wider banking record reinforces the point. First Direct (Midland, 1989) survived two years of negative contribution to become the highest-rated UK bank for customer satisfaction for over twenty years. By contrast, HSBC's own Connected Money app, JPMorgan's Finn and NatWest's Bó were all killed early on exactly the kind of signals an AI would flag — and ceded digital-deposit territory to neobanks and to rivals who persisted. For a UK bank operating under SS1/21 (PRA operational resilience) and SMCR, the right framing is therefore not "should the AI be allowed to stop a project" but "should the accountable senior manager be required to engage with AI signals on the record before approving continuation?" The answer to that is unambiguously yes — and is materially different from the question being asked. The first preserves accountability with the human. The second outsources it to the model. Crucially, the banking failures most often cited in support of View A — RBS 2012 and TSB 2018 — were not killed by AI; they failed because the humans ignored signals their own engineers had raised. The lesson there is not "let the AI decide." It is "force the humans to act on the evidence." That is what View B's protocol does. View A solves the wrong problem. Why normalising AI-led termination is the deeper institutional danger There is a cost that sits above any single project. If an organisation normalises AI-led kill decisions, it teaches its most capable people that ambition is futile — that any initiative bold enough to matter will be flagged and stopped before it can prove itself. Over time the best sponsors stop proposing transformations at all, because they learn the model will end them at the first noisy quarter. That is an irreversible ratchet: the organisation quietly loses the muscle to attempt hard things, and capability of that kind cannot be switched back on when it is finally needed — it has to be rebuilt over years, by which point the disruptor has already arrived. The danger of View A is not that it kills one good project. It is that, normalised, it trains an entire institution out of the capacity for transformation while every dashboard still shows green. Conclusion Continue the project. Not because executive sponsors are infallible — they are not, and Staw's escalation-of-commitment literature is real. Continue it because the AI is operating outside its zone of competence, its training data is survivorship-biased against exactly this kind of project, the payoff distribution makes probability the wrong variable, and the historical pattern is clear: every transformation that mattered looked, at this stage, exactly like this one. The right response to the AI's signal is a structured investigation that forces the sponsor to defend the project on forward-looking criteria — not an automated termination that confuses a forecaster trained on the past with an oracle of the future. The AI is telling you something is unusual. That is useful information. It is not telling you the project will fail. It cannot.
- May 20May 20
- 16 replies
Performance Optimization vs Team Development — What Should AI Prioritize?
Performance Optimization vs Team Development — What Should AI Prioritize?

Poornima_Gupta_aZ3h replied to Vishwadeep Khatri's topic in We ask and you answer! The best answer wins!

🎯 VIEW B — DISTRIBUTE OPPORTUNITIES BROADLY 💡 MY POSITION Managers must distribute opportunities broadly. The AI should inform that decision — not make it. When AI consistently assigns every critical task to the same small group of top performers, it is not building a stronger organization. It is building a more efficient one today, and a more fragile one for tomorrow. I have experienced this personally. We lost a mid-level manager — not a star by any performance metric, not the person the AI would have flagged as critical. Within sixty days, we discovered she had been carrying four client relationships, developing two junior analysts, and maintaining the institutional memory of a process that existed nowhere in writing. The AI had her rated average. The organization discovered she was irreplaceable six weeks after she left. We had to rebuild under pressure. That experience is why I hold this position. The question is not whether top performers should handle critical work. Of course they should — when the stakes are immediate and the margin for error is zero. The question is what happens to everyone else while that is happening, and what the organization looks like five years from now when those same top performers have burned out, moved on, or become so overloaded they are no longer performing at all. My argument is simple: distributing opportunities broadly is not a concession to fairness. It is the only strategy that builds an organization capable of surviving what it cannot predict. 🛡️ 📊 PART 1 — WHY AI TASK ALLOCATION IS STRUCTURALLY BROKEN The AI measures past performance, speed, accuracy, customer feedback, and delivery consistency. These are legitimate metrics. They are also, without exception, backward-looking. ⏪ "Who has performed best on tasks like this, up to today?" That is what the AI answers well. "Who will perform best as conditions change? Who has potential if given the chance? Who is being suppressed by a system that never lets them prove themselves?" That is what it cannot answer. Clayton Christensen showed in The Innovator's Dilemma that the data validating current success is precisely what blinds organizations to future requirements. Daniel Kahneman explains why in Thinking, Fast and Slow: AI operates as System 1 cognition at scale — fast, pattern-based, confident. Building organizational capability requires System 2 thinking: slow, deliberate reasoning about possibilities that have no historical template. 🗺️ The AI sees the people who have already succeeded. It cannot see the people who would succeed if given the same opportunity. Their absence from the data makes them invisible — not incapable. By the time you realize your talent pipeline is dry, there will be nobody ready to carry what those top performers had been carrying. 📉 Organizations do not fail because they gave too much important work to people with potential. They fail because they gave all important work to the same small group until that group burned out, left, or became the ceiling rather than the foundation. 🧠 PART 2 — FIVE DISCIPLINES. ONE VERDICT. The case for View B is not just operational. Five completely separate disciplines — mathematics, psychology, cognitive science, biology, and ecology — each arrive at the same conclusion independently. When this many fields of human knowledge converge on one answer, it is not a preference. It is a pattern. 🕸️ 📐 Mathematics: Markowitz Modern Portfolio Theory (Nobel Prize, 1990) Every banker already believes this — for money. 💸 Harry Markowitz won the Nobel Prize in Economics for proving that a diversified portfolio delivers better risk-adjusted returns than a concentrated one — even when that portfolio contains only the highest-performing assets. Concentration risk — in every framework from Basel to the PRA — is something we are required to mitigate. Now apply the same logic to people. View A is the equivalent of a fund manager who puts 100% of capital into the three best-performing assets. We are trained from year one that this is bad risk management. The mathematics do not change because the asset is a person instead of a bond. Academic researchers have formalized this directly — proposing Talent Portfolio Theory as a framework, drawing explicitly from Markowitz — arguing that capability development should be diversified to minimize organizational risk exactly as a financial portfolio is. 📊 🧬 Psychology: Self-Determination Theory (Deci & Ryan) Edward Deci and Richard Ryan spent four decades proving that sustained high performance requires three conditions: Autonomy — agency over your own work and development 🔓 Competence — the ability to grow, stretch, and master new challenges 📈 Relatedness — genuine connection to the team and its mission 🤝 View A violates all three at once. The AI decides who gets what — removing autonomy. Those never given stretch work never develop — blocking competence. When critical work is concentrated in a small group, everyone else feels excluded from the mission — severing relatedness. The finding is consistent across thousands of studies: organizations that satisfy these three needs produce higher engagement, creativity, and retention. Those that frustrate them produce compliance, not commitment. Output for now. Fragility forever. View A destroys the psychological foundations that make people want to do their best work. 🤖 Cognitive Science: AI Trust, Overconfidence & Cognitive Offloading Peer-reviewed research finds something alarming specifically for AI systems: the more people trust AI, the more overconfident they become in AI-assisted decisions — because they accept the output without sufficiently questioning it. Trusting the system over your own deeper reflection — what researchers call cognitive offloading — progressively erodes critical judgment. 🧩 Applied to View A: organizations following AI task allocation are progressively losing the capacity to question whether the AI is right. The system keeps recommending the same group. Short-term results keep validating it. Leaders grow more confident in a recommendation that is quietly building organizational fragility. By the time the pipeline is empty, the organization will have lost the habit of asking why. View A produces an organization that cannot see the bad outcome coming. 🕶️ 🌿 Biology: The Law of Genetic Diversity Nature settled this argument long ago. In 1845, Ireland concentrated its entire food system in one genetically identical crop — the Lumper potato. The system looked like peak performance. It was efficient, reliable, and data-validated. 🥔 When Phytophthora infestans (potato blight) arrived, the monoculture had no resistant variety to fall back on. Because every potato was genetically identical, the blight spread unchecked. One in eight Irish people died of starvation within three years. Two million emigrated. Evolutionary biology explains why with precision: populations with low genetic variation are far more vulnerable to changing environmental conditions than diverse populations. View A builds the organizational equivalent of a potato monoculture. It concentrates all critical work in the same small group — optimized for today's performance metrics, with no resistant variety to survive when conditions change. Biology does not call this efficiency. Biology calls it fragility. 🥀 🌍 Ecology: Biodiversity and Ecosystem Resilience Ecology confirms what biology proves: diverse ecosystems are resilient; monocultures collapse when conditions change. Peer-reviewed ecological research confirms that a system with greater biological diversity is more resilient than one with less. A monoculture plantation cannot withstand drought, insects, or disease because it has no variation to absorb the shock. Diverse ecosystems can absorb disruption — because when one species fails, another fills the function. 🌲 Organizations are ecosystems. View A plants a monoculture. The financial cost of failing to diversify is now quantified: research published by BCG in Harvard Business Review shows that companies with above-average diversity achieved 19% higher innovation revenues and 9% higher EBIT margins. The diversity-performance relationship remains remarkably strong across all studied geographies. 📈 ⚡ Physics: The Second Law of Thermodynamics The Second Law of Thermodynamics states that closed, concentrated systems tend toward entropy and instability over time. Energy concentrated in one place dissipates. Systems optimized for a single state become brittle when disturbed from outside. 🌌 View A builds a thermodynamically closed system — all critical energy, all development opportunity, and all capability concentrated in a small group, optimized for current conditions. Closed systems in physics are fragile; they cannot absorb external shocks because they have no distributed capacity to reorganize. Open systems — those that distribute energy across multiple nodes — are thermodynamically stable. They do not collapse when a single node fails because the function is distributed. Physics has a name for what happens to systems that concentrate all their energy in one place: entropy. 💥 Five disciplines. One verdict.Concentration produces short-term efficiency and long-term fragility.Distribution produces short-term friction — and long-term resilience. ⚖️ PART 2B — THE BIAS VIEW A CANNOT SEE: WHO GETS EXCLUDED AND WHY IT COSTS YOU View A makes a claim that sounds like pure meritocracy: assign critical work to those most likely to succeed. It is not meritocracy. It is the institutionalization of historical advantage — and three documented biases prove precisely why. 🛑 🔴 Bias 1: The AI Learns From a Biased History The AI is trained on past performance data. What it cannot tell you is how much of that record reflects genuine capability — and how much reflects who was given the chance to perform in the first place. Amazon discovered this when its internal AI recruiting tool systematically downgraded resumes from women — not because women performed worse, but because historically men had been hired more often. The AI learned to replicate that pattern, forcing Amazon to scrap the tool entirely. 🤖 Research shows that up to 60% of a manager's performance rating reflects the manager's own biases, not the employee's actual output. Affinity bias — the documented tendency to rate people similar to yourself more favorably — is already invisibly encoded in every data point the AI learns from. When View A says "assign to the best performer," it is really saying: "assign to whoever the previous system already advantaged." That is compounding, not meritocracy. 🔄 🔴 Bias 2: The AI Cannot Measure What It Cannot See The AI measures visible, quantifiable output: task completion, speed, accuracy, and customer scores. It cannot measure what generates no data point: the analyst who mentors three junior colleagues, the relationship manager who quietly prevents a client escalation, or the specialist who catches a compliance issue before it becomes a regulatory disaster. The people doing invisible, distributed work that sustains organizational resilience are systematically scored lower and excluded from critical assignments. 🔍 🔴 Bias 3: The AI Excludes the Neurodiverse Population This is the bias View A never discusses, and it is the most commercially damaging of all. AI performance systems are designed around neurotypical working patterns: fast verbal responses, consistent communication cadence, visible social engagement, and standardized meeting participation. They reward the performance style of the majority and systematically score down employees who work differently. 🧠 Research published in Springer Nature confirms that AI performance evaluation systems significantly increase the risk of misinterpretation and exclusion for employees whose interaction styles deviate from neurotypical norms. Neurodivergent employees — those with autism, ADHD, dyslexia, and dyspraxia — represent 15–20% of the global population. The commercial cost of this exclusion is heavily documented in banking. JPMorgan Chase's Autism at Work programme found that within the first six months, neurodivergent employees were 48% more productive than neurotypical peers who had been in the same role for three to ten years. View A's AI would never have assigned critical work to these employees because their performance scores, measured against neurotypical benchmarks, would not have qualified them. 🔬 PART 3 — THE RESEARCH AND THE PATTERN A landmark 2022 study by Ingrid Haegele documented talent hoarding — the organizational equivalent of View A — across thousands of firms: 75% of managers actively concentrate opportunity with favored employees 👥 In one-third of firms, workers keep internal job applications secret out of fear of manager retaliation 83% of top publicly listed companies cite talent concentration as a key organizational friction When manager rotations forced redistribution, promotion applications increased by 123% — and those newly surfaced candidates performed equally well or better 📈 The AI is not identifying the best person for the task. It is identifying the person with the best historical record in the current system. Those are not the same thing. 📈 PART 4 — SEVEN PROOF POINTS ACROSS INDUSTRIES 1️⃣ DBS Bank (The Banking Standard) In 2014, CEO Piyush Gupta made a decision no AI model would recommend. He restructured 26,000 employees as a "22,000-person startup" — deliberately distributing digital transformation ownership broadly to people with no historical track record in it. By 2019, DBS became the first bank ever to win World's Best Digital Bank from Euromoney, Global Finance, and The Banker simultaneously. Four consecutive years by 2024. A Harvard Business School case study. This is what View B looks like in banking. 🏦 2️⃣ Microsoft After 2014 (A Controlled Experiment) Before Satya Nadella, Microsoft operated on concentrated opportunity, producing a lost decade of missed innovation. Nadella introduced "Talent Talks" to review the entire talent pool across the organization. EVP Kathleen Hogan put it plainly: the goal was to see "the depth of our talent." Azure, Teams, and the era-defining OpenAI partnership immediately followed this shift to distributed capability. 💻 3️⃣ Gallup Research (Global Scale) Gallup's State of the Global Workplace research — spanning millions of employees across more than 160 countries — consistently finds that organizations where opportunity and development are distributed broadly dramatically outperform concentrated ones: 23% higher profitability, 18% higher productivity, and 43% lower turnover. 📊 4️⃣ Google 20% Time (Products the AI Would Have Rejected) Gmail, Google News, and AdSense did not come from top performers assigned to Google's most critical tasks. They came from engineers given distributed autonomy to explore adjacent problems. An AI task allocator would have flagged this as inefficient. Today, Gmail alone has 1.8 billion users. ✉️ 5️⃣ Amazon Two-Pizza Teams (Data-Driven Autonomy) Despite building the most sophisticated data-tracking mechanisms globally, Jeff Bezos intentionally structured Amazon to prevent opportunity concentration. The "two-pizza team" rule keeps units small and autonomous, while the "Working Backwards" process allows anyone with a great idea to write the foundational press release — not whoever has the best track record. AWS started as a small, distributed project. It now generates roughly 70% of Amazon's operating profit. ☁️ 6️⃣ The New Zealand All Blacks (Sustained Excellence) With a 77% win rate across 125 years, their philosophy directly contradicts View A. Under rules like "Sweep the sheds," even the most decorated players are responsible for cleaning the changing rooms. The "no dickheads" principle means individual brilliance without team contribution is disqualifying, regardless of performance data. The team develops everyone, holds everyone accountable, and ensures no single departure can break the collective capability. 🏉 7️⃣ Pixar's Braintrust (Distributed Peer Challenge) Pixar's unbroken run of commercial hits relies heavily on its Braintrust — a peer review system where directors give candid feedback to each other's work regardless of track record. Toy Story, Finding Nemo, and Up all had fundamental narrative flaws that only distributed peer challenge resolved. Concentration removes the diverse feedback that makes a top performer's work better. View A damages everyone — including the people it thinks it is protecting. 🎬 🎙️ PART 5 — WHAT THE LEADERS ACTUALLY SAY The executives closest to AI-driven performance data are also the most vocal about why it cannot make this decision alone. 🗣️ Each of them uses AI extensively. None of them would let it decide who gets the next important opportunity. Because what AI measures — historical performance — is not the same as organizational potential. ✅ PART 6 — WHERE VIEW A HAS A NARROW HOME — AND WHY IT PROVES MY POINT Intellectual honesty requires admitting exactly where View A is not just defensible — it is the only rational choice. I know this because I have sat in the room when a Section 166 clock was running. In that moment of crisis, you do not distribute. You call the three people you trust most, clear the backlog, and satisfy the regulator. View A is correct in that room. The question I kept asking afterwards was: why did we only have three people we could call? That question is the entire argument for View B. One distinct scenario requires View A: a live regulatory emergency, a fixed SLA, an active regulator watching in real time, and zero tolerance for error. 🏦 NatWest — £264.8M Fine (2021) Following the FCA's first criminal prosecution under Money Laundering Regulations, a Section 166 Skilled Person Review was issued. NatWest correctly mobilized its most experienced AML specialists into a concentrated task force to clear the alert backlog immediately. But the years before — understaffing AML broadly, concentrating financial crime knowledge in too few people — made the emergency worse than it needed to be. Years of neglecting View B made View A necessary. 🏦 Barclays — Section 166 Review (2022) Under active regulatory scrutiny over rising KYC and AML case volumes, Barclays concentrated its top compliance experts on remediation to satisfy a fixed SLA. 🏦 HSBC — $1.92 Billion Fine (2012) Following a deferred prosecution agreement with the DOJ for monitoring failures on more than $670 billion in wire transfers, HSBC concentrated its premier compliance experts under a monitor to rebuild transaction systems on strict deadlines. The $1.92 billion was the cost of years of failing to build broad capability. View B should have prevented it. 📌 The Clean Rule Follow View A when all four conditions are true simultaneously: The failure is immediate The regulator or client is watching The error is irreversible in the short term There is no time for development When any one of those conditions is absent — distribute. Develop. Build the pipeline. In NatWest, Barclays, and HSBC: all four conditions were true. View A was right. In the operations organisation in this question — strategic projects, complex problem-solving, client presentations on 2–6 month horizons — none of those four conditions apply. View B is right. And in every one of those banking crises, the emergency that required View A was made worse by years of failing to apply View B first. 🛠️ PART 7 — HOW TO IMPLEMENT VIEW B WITHOUT SACRIFICING STANDARDS Distributing opportunity does not mean assigning critical work randomly. It means building a deliberate system that develops capability without compromising performance standards. 🏗️ Stage 1: Classify Work by Risk, Not Just Difficulty 📋 Tier 1 (Live Regulatory Emergency): Zero margin for error (e.g., active Section 166 breach). The AI recommendation or top expert assignment stands. This is View A's legitimate home. Tier 2 (Strategic Mid-Horizon): Major presentations or strategic projects on 2–6 month horizons. Assign to capable talent who have not yet led at this level; a top performer acts as a strategic mentor and safety net. Tier 3 (Development-Eligible): Complex problem-solving or client interactions where a slower outcome can be corrected without lasting damage. Assigned with structured oversight, explicit growth targets, and a named escalation contact. Tier 4 (Routine Operational): Competency-based rotational assignments. Stage 2: Mentor, Don't Just Delegate 🤝 Every Tier 2 and 3 assignment includes a named top performer explicitly accountable for overarching outcomes. It requires milestone reviews at 30, 60, and 90 days, clear escalation pathways, and genuine psychological safety to ask for help without career penalty. Stage 3: Make the Ladder Visible 🪜 Clearly articulate progression: "Successfully lead three Tier 2 projects, and you unlock Tier 1 opportunities." Without visible progression, distributed opportunity feels like uncompensated extra work. With it, it becomes the most motivating development tool in the organisation. Stage 4: Measure Resilience, Not Just Efficiency 📊 Track organizational resilience KPIs quarterly: Capability Breadth: How many unique individuals can execute each critical task category? Succession Depth: For each critical role, how many people are exactly 12 months from readiness? Key Person Dependency Index: What percentage of critical business outcomes flows through fewer than three individuals? Stage 5: Review Depth, Not Just Performance 🔍 Hold leaders accountable during talent reviews not just for identifying current stars, but for evidencing the capability and readiness of talent sitting two tiers below them. 🎯 CONCLUSION — THE VERDICT IS CLEAR The standard counter-argument to View B is that giving important work to unproven talent creates short-term friction. That cost is real. A capable person stepping up for the first time will take longer and require more oversight. But this friction is not a loss. It is a calculated investment. And the evidence says it pays every time. Microsoft accepted that cost in 2014 and recovered $3 trillion. DBS accepted it and became the world's best digital bank. Amazon accepted it with AWS and built a business generating 70% of its operating profit. JPMorgan Chase accepted it with Autism at Work and found those employees 48% more productive within six months. NatWest, Barclays, and HSBC refused to accept it — and collectively paid billions in regulatory fines when the pipeline was empty at the moment it mattered most. When we look at what five completely separate disciplines tell us, they all arrive at the same place: Mathematics showed 70 years ago that concentration risk destroys risk-adjusted returns — whether the asset is a bond or a person. Psychology proved across four decades that concentrating opportunity destroys the three conditions — autonomy, competence, and relatedness — that sustain high performance. Cognitive science shows that trusting AI progressively erodes the capacity to question it. Biology proved in 1845 that monocultures are efficient until they are catastrophic. Ecology and BCG research in Harvard Business Review quantify the cost of monoculture at 19% lower innovation revenues and 9% lower EBIT margins. Physics tells us that concentrated, closed systems tend toward entropy. And bias research proves that the AI is not identifying the most capable people — it is identifying the most legible ones, systematically excluding neurodivergent talent, invisible contributors, and everyone the previous biased system failed to surface. The AI in this scenario will keep recommending the same small group. Every assignment it makes to that group is simultaneously an assignment it is not making to someone who could become tomorrow's top performer — if only they were given the chance. Managers must distribute opportunities broadly. The AI should inform that decision — not make it. 🏆 View B is correct.
- May 18May 18
- 20 replies
Should AI Be Allowed to Kill Bold Ideas?
Should AI Be Allowed to Kill Bold Ideas?

Poornima_Gupta_aZ3h replied to Vishwadeep Khatri's topic in We ask and you answer! The best answer wins!

Position: View B. Organizations SHOULD pursue bold innovation despite AI warnings. Bex is correct, but his reasoning is incomplete. WHY BEX IS RIGHT Bex's core argument is sound: Over-reliance on AI risk assessment stifles innovation. Organizations need to pursue bold ideas even when they look risky in historical data. This is correct. WHY BEX IS INCOMPLETE Bex cites Amazon Prime as the example. But Bex doesn't explain the actual decision-making process that led to Amazon Prime succeeding while other bold ideas fail. The difference is not "ignore AI warnings." The difference is understanding what the AI is actually measuring, and when to override it. THE REAL PATTERN IN BOLD IDEAS THAT SUCCEED 📼 Netflix (2000): Blockbuster's risk analysis said Netflix was a niche player with negative unit economics. The analysis was correct about immediate facts. It was wrong about what customers actually valued. Customers valued "no late fees" and "no store trips" more than "immediate access." This wasn't a risk problem—it was a customer preference shift that data couldn't capture. 📦 Amazon Prime (2005): Risk analysis said free shipping would destroy margins. Jeff Bezos understood something the data didn't: customer lifetime value compounds. A customer locked into Prime buys more frequently and across more categories. The margin per transaction decreased. The margin per customer increased. The data measured the first. Bezos measured the second. 💳 Stripe (2010): Risk analysis said two teenagers couldn't compete against PayPal in a regulated industry. What the data missed: developers hated PayPal's integration complexity. They would switch to a simpler solution even from unknown founders. The data measured "incumbent market share." Stripe measured "developer frustration." 💬 Slack (2012): Risk analysis said a messaging tool from a failed gaming company couldn't penetrate enterprise software. What the data missed: teams were already using Slack internally because it solved a real problem: organizing chaotic communication. Enterprise adoption followed organic demand. The data measured "enterprise software success rates." Slack measured "organic team adoption." 🏦 DBS Bank (2014): Singapore's banking incumbents said investing $1B+ in digital disruption was "operationally risky and culturally impossible." CEO Piyush Gupta understood what the data didn't: fintech expectations were shifting. He restructured 26,000 employees as a "22,000-person startup." By 2024, DBS became "World's Best Digital Bank" (Euromoney, 4 consecutive years) and a Harvard Business School case study. 🚗 Tesla (2008): Regulators said "EV adoption is impossible without charging infrastructure. Range is too low. Cost is too high." Elon Musk understood what data couldn't predict: battery costs follow a learning curve (15% drop per doubling of production), regulatory bans on combustion engines were inevitable, and generational preferences for sustainability were irreversible. Tesla didn't wait for charging networks—they created the demand that made them adjacent possible. By 2023, Tesla's market cap reached $1.5T. 📱 ChatGPT (2022): Tech incumbents said "large language models are research tools, not products. Enterprise adoption is 5+ years away." What they missed: a simple chat interface solved the usability problem that made LLMs feel magical. OpenAI launched to consumers first, not enterprises. Within 2 months, ChatGPT had 100M users—faster adoption than any software in history. By the time data showed "consumer LLM adoption is real," OpenAI owned the category. WHAT THESE HAVE IN COMMON The successful bold ideas did not ignore risk. They understood three things: What customer frustration was being solved: Late fees, shipping costs, integration complexity, communication chaos Why incumbents couldn't solve it: Blockbuster's late fees generated $800M annually. Amazon's margins were sacred. PayPal was too focused on merchant relationships. Enterprise software was too focused on procurement cycles. Why this frustration would override other concerns: Customers would pay less (Netflix), take longer shipping (Prime), switch from unknowns (Stripe), and adopt organically (Slack) to solve the real problem. These founders understood something AI risk analysis couldn't: what the customer actually valued versus what the incumbent could actually deliver. WHY ORGANIZATIONS FAIL AT BOLD IDEAS Organizations fail at bold ideas not because they took risks. They fail because: They don't understand the actual customer frustration They proceed without a thesis for why this will work They execute the bold idea without understanding why it solves the customer problem They abandon it when AI risk warnings appear Example: McDonald's AI drive-thru ordering Failed not because it was bold, but because: The real customer problem wasn't "how do I order" (they already have drive-thru systems) The AI ordering system was harder to use than existing systems McDonald's had no clear thesis for why replacing humans with AI would improve customer experience It was a bold idea without understanding what problem it solved. That's different from Netflix or Stripe. THE FRAMEWORK FOR PURSUING BOLD IDEAS Before pursuing a bold idea despite AI warnings, leadership should confirm three things. Here's how winners and failures stack up: COMPANY REAL CUSTOMER FRUSTRATION? DISPLACES INCUMBENT ADVANTAGE? RISK JUSTIFIED BY VALUE CREATED? OUTCOME 📼 Netflix ✅ YES: Customers hate late fees & store trips ✅ YES: Blockbuster's entire business model (stores + late fees = $800M/year) becomes irrelevant ✅ YES: Mail delay risk is worth eliminating the pain point ✅ SUCCESS 💳 Stripe ✅ YES: Developers hate PayPal's integration complexity ✅ YES: PayPal's relationship advantage disappears when code is simple ✅ YES: Regulatory + market risk worth solving developer pain ✅ SUCCESS 💬 Slack ✅ YES: Teams hate fragmented, chaotic communication ✅ YES: Enterprise software's procurement advantage irrelevant when teams already love it ✅ YES: Adoption risk worth solving the chaos problem ✅ SUCCESS 🏦 DBS Bank ✅ YES: Customers frustrated with branch dependency ✅ YES: Physical branch networks become irrelevant if digital works ✅ YES: $1B+ investment risk worth digital transformation ✅ SUCCESS 🚗 Tesla ✅ YES: Consumers want sustainable transportation ✅ YES: Gas car advantage disappears as EVs improve ✅ YES: No charging infrastructure risk worth disrupting auto industry ✅ SUCCESS 📱 ChatGPT ✅ YES: Enterprise/developers frustrated with LLM complexity ✅ YES: Enterprise software's complexity advantage gone with simple chat ✅ YES: Free consumer launch risk worth creating the category ✅ SUCCESS 🏨 Airbnb ✅ YES: Travelers want cheaper, authentic local stays ✅ YES: Hotel standardization irrelevant vs. unique experiences ✅ YES: Regulatory risk worth enabling sharing economy ✅ SUCCESS 🍔 McDonald's AI ❌ NO: Drive-thru ordering already works fine ❌ NO: Doesn't displace food quality, speed, or consistency ❌ NO: AI complexity isn't worth replacing humans for this ❌ FAILED The pattern is clear: All winners answered YES to all three questions. The failure answered NO to all three. WHY AI RISK ASSESSMENT IS STRUCTURALLY BLIND TO BOLD IDEAS AI risk assessment measures historical patterns and answers: "How likely is this to succeed based on past similar efforts?" For category-disrupting ideas, this is the wrong question entirely. The real question is: "Will solving this customer frustration create enough value that the risk is worth taking?" Netflix's question wasn't "Is mail distribution risky?" (it is). The question was "Will eliminating late fees create more value than the operational risk?" AI answered correctly about the risk. It measured the wrong value. The Adjacent Possible Reveals the Gap Innovations expand the space of possibilities itself. The "Adjacent Possible" (Stuart Kauffman, 2024) describes what becomes possible only after someone creates something new. Tesla's charging infrastructure didn't exist in 2008 data, so AI rejected it as a blocker. But Tesla created the demand that made charging networks adjacent possible. By the time data showed "charging infrastructure exists," Tesla owned the market. Netflix, Stripe, and Slack followed the same pattern—each expanded what was possible before historical data could validate them. The Fundamental Limitation AI cannot use past data to predict the expansion of possibility space itself. It's not a flaw in AI; it's a fundamental limitation. AI risk assessment is structurally blind to category disruptions because it can only measure what was. It cannot measure what becomes possible when someone creates something new. Organizations that dominate 2026 will be those whose leaders understand this: they pursue bold ideas not because they ignore risk assessment, but because they recognize that assessment measures the wrong frontier. MY POSITION View B is correct. Organizations SHOULD pursue bold innovation despite AI warnings. The organizations that dominate 2026 are those that pursue when leadership has conviction. Netflix, Tesla, Stripe, Slack all moved while the AI said no. They didn't wait for the data to validate them. They moved because they understood the customer problem. The AI is measuring yesterday. Bold ideas require conviction about tomorrow. Bex is right. Don't let AI veto bold ideas.
- May 14May 14
- 13 replies
Rare but Critical — Should AI Remove the Safeguard?
Rare but Critical — Should AI Remove the Safeguard?

Poornima_Gupta_aZ3h replied to Vishwadeep Khatri's topic in We ask and you answer! The best answer wins!

Position: Retain the Approval Step — View B The AI in this scenario has done its job perfectly. It found the delay. It found the 1% intervention rate. It found the catastrophic consequence in those rare cases. It reported everything. The dilemma is not an AI failure. It is a human decision-making failure waiting to happen. The organisation is now looking at accurate data and considering the wrong conclusion. They are reading a 99% confirmation rate and seeing an unnecessary step. They should be reading a less than 1% catastrophic prevention rate and seeing an irreplaceable safeguard. Same data. Completely different categorisation. Completely different outcome. This is the real root cause. Not a measurement error. Not a design failure. A risk categorisation failure — made by humans, not the AI. To understand why that categorisation failure matters — and why it changes everything — you need to understand the difference between two fundamentally different types of risk. The Risk Categorisation Framework High-frequency low-consequence risks — credit card fraud, customer service errors, data entry mistakes — should be managed for speed and volume. Getting it wrong occasionally is acceptable and recoverable. View A works here. Low-frequency high-consequence risks — severe misdiagnosis, drug approval failures, nuclear safety, aircraft structural integrity — must never be managed for frequency. Getting it wrong once can be catastrophic and irreversible. View B is non-negotiable here. The approval step in this scenario exists entirely to manage a low-frequency high-consequence risk. That categorisation should have been defined by humans before any conclusion was drawn from the AI findings. It was not. The AI therefore presented accurate data that was misread through entirely the wrong lens. "The AI reported everything correctly. The organisation is about to conclude everything wrongly. That gap — between accurate data and sound judgement — is where the risk categorisation failure lives. And it is entirely a human problem." Banking Learned This the Hard Way — Barings Bank In February 1995, Barings Bank — Britain's oldest merchant bank, founded in 1762 — was sold for £1 and ceased to exist overnight. Nick Leeson had been given the dual role of managing both the trading floor and the settlements division — a clear violation of standard banking procedure. This concentration of power allowed him to bypass checks and balances entirely, creating fictitious trades and hiding losses from management. The approval step — segregation of duties — was effectively removed for a star performer. The step would have changed nothing in the vast majority of trades. No one in management accepted responsibility for Leeson's activities between October 1993 and January 1995. Then the 1% arrived. 233 years of history gone in weeks. Nobody categorised Leeson's trading oversight as a low-frequency high-consequence safeguard before removing it. They read the data — a step that rarely changed outcomes for a consistently profitable trader — and drew the wrong conclusion from accurate information. The categorisation failure cost a 233-year-old institution its existence. Every High-Consequence Industry has This Story — NASA Challenger Barings is not an isolated case. Every high-consequence industry has its version — the moment a rarely-triggered safeguard was bypassed in the name of speed and the rare event arrived. On 28 January 1986, Challenger broke apart 73 seconds after launch. Seven crew members were killed. Engineers at Morton Thiokol had formally flagged the O-ring risk the night before and recommended delay. The risk had never caused a catastrophic failure before. The data was accurate — the O-ring had performed without incident in the vast majority of launches. NASA managers read that data and drew the wrong conclusion. The Rogers Commission identified the failure as normalisation of deviance — the gradual acceptance that because the rare catastrophic event has not happened yet, it probably will not. The O-ring risk had never been formally categorised as low-frequency high-consequence before the launch decision was made. It was treated as a manageable operational concern by people who had accurate data and reached a catastrophically wrong conclusion. That single categorisation failure cost seven lives. Three industries. Three warnings. Three times humans read accurate data and drew the wrong conclusion. Three times the rare event arrived. Zero times the damage could be undone. The Healthcare Warning Is Already Playing Out — UnitedHealth nH Predict This is not hypothetical. It is in federal court right now. And it is the closest direct parallel to the scenario in this question. UnitedHealth deployed an AI model called nH Predict to evaluate patient care claims. A 2023 lawsuit alleged the company knowingly used this model to deny elderly Medicare Advantage patients care that their own physicians had determined was medically necessary — and that the AI model had a 90% error rate. Nine out of ten denials that were challenged were ultimately reversed. Yet the system continued to override physician judgement at scale. UnitedHealthcare's post-acute care denial rate more than doubled — from 8.7% to 22.7% — between 2019 and 2022, coinciding directly with the rollout of their algorithmic tool. Elderly patients discharged prematurely. Families depleted savings. Patients worsened and died. UnitedHealth gave an AI system authority over clinical decisions without categorising those decisions as low-frequency high-consequence risks requiring human expert oversight. The outcome is a Senate investigation, a federal lawsuit, and irreversible patient harm. This is what happens when accurate data meets uncategorised risk. The AI reported what it found. The humans drew the wrong conclusion. The patients paid the price. When One Specialist Got the Risk Categorisation Right — Thalidomide and Frances Kelseys History also shows what happen when a single specialist holds the line. This is the most powerful healthcare example available. In the 1950s and 60s, Thalidomide was approved across Europe and prescribed to pregnant women without adequate specialist review of rare but catastrophic side effects. Over 10,000 children were born with severe birth defects across 46 countries. In the United States, a single FDA specialist reviewer named Frances Kelsey refused to approve it. She was seen as causing unnecessary delay for a drug that appeared safe in the vast majority of cases. She was pressured repeatedly to remove her objection and speed up the process. She refused. The United States was largely spared. Frances Kelsey did not have better data than the European regulators. She had better categorisation. She recognised that drug approval for pregnant women was not a high-frequency low-consequence process. It was a low-frequency high-consequence decision where the rare catastrophic outcome was irreversible. Same data available to everyone. One person categorised the risk correctly. An entire country protected from an irreversible catastrophe. This is not an argument about bureaucracy slowing progress. This is an argument about one person with specialist expertise standing between a population and an irreversible outcome. That is exactly what the approval step in this scenario represents. When Risk Is Categorised Correctly — Design Follows Automatically The Four Eyes Principle in banking is the most powerful proof that correct risk categorisation leads directly to correct design. Every major bank — NatWest, HSBC, Barclays, Deutsche Bank — correctly categorised large transactions and critical approvals as low-frequency high-consequence risks decades ago. The design response to that categorisation was immediate and permanent — no single person can initiate and approve a critical transaction. Two independent pairs of eyes on every decision that carries catastrophic potential. The true value lies not just in catching errors, but in creating an environment where accuracy becomes embedded in organisational culture. Nobody has questioned this design since. Not because it catches problems frequently. But because the categorisation that created it has never changed. Large financial transactions remain low-frequency high-consequence risks. The design therefore remains permanently in place. This is the sequence the healthcare organisation in this scenario has reversed. They looked at the design — the approval step — and questioned whether it was necessary. They should have looked at the risk category first. Had they correctly categorised the specialist approval step as a low-frequency high-consequence control — as every bank does with the Four Eyes Principle — the design conclusion would have been automatic. You do not remove low-frequency high-consequence controls. You protect them. And when they are slow you redesign them to be faster. You never remove them. The specialist approval step in this scenario is the medical equivalent of the Four Eyes Principle. Not bureaucracy. A structural design response to a correctly categorised risk — and the last line of defence for a category that demands it. To Be Fair — When Does View A Actually Work? View A is not always wrong. Banking proves it on both sides — and the distinction is exactly what makes the healthcare case so clear. When you swipe your card at a grocery store, approval takes 200 milliseconds. The manual referral step was removed entirely. View A is correct there — because the risk has been correctly categorised as high-frequency low-consequence. The delay causes measurable harm to commerce. The consequences are fully recoverable — a fraudulent charge reversed with one click under Zero Liability policies. And alternative post-transaction safeguards catch catastrophic fraud after the fact. View A's one valid point in this scenario is the 8 to 10 hour delay. That is genuinely harmful to patients. It deserves a direct response. The answer is not removal. The answer is redesign. Go back to the scenario for a moment. A senior specialist approval step adds 8 to 10 hours to a treatment decision. The AI has correctly identified that delay as harmful. But look at what is actually causing those 8 to 10 hours. It is not the specialist. It is everything that happens before the specialist sees the case. The case notes gathered manually. The patient history retrieved separately. The frontline doctor's findings written up and passed across. The specialist starting from scratch on context that AI could have assembled in seconds. The specialist is not the problem. The information gap before the specialist sees the case is the problem. Use AI to triage which cases genuinely need specialist review based on complexity and risk markers. Use AI to pre-summarise the patient case, surface relevant history, and flag historical misdiagnosis patterns before the specialist opens the file. The 8 to 10 hour delay becomes a targeted 90-minute review for the cases that warrant it. The safeguard is retained. The speed problem is solved. Both at the same time. DBS Bank validated this principle in a different context. Rather than removing human oversight from 250,000 monthly customer interactions they built AI to make the human faster and better informed. The human stayed in control. Speed improved dramatically. The safeguard was not removed. It was redesigned. Healthcare can and should apply exactly the same logic. This is why banking uses View A for coffee and groceries but View B for global wire transfers, corporate lending, and Mergers and Acquisitions. The categorisation determines the approach. Always. In healthcare there is no Zero Liability policy. There is no reverse button for a severe misdiagnosis. We are dealing with biological systems, not digital ledgers. You cannot call the patient the next day and tell them the error has been credited back to their account. View A works for the £50 transaction because you can fix it later. View B is for the 1% event where later is too late. Final Verdict The approval step must be retained. Not streamlined. Not reviewed. Not reduced. Retained — because it exists precisely for the moment when everything else has already passed the case and got it wrong. Barings Bank. Accurate data on a star performer's trading. Wrong conclusion drawn. A 233-year-old bank destroyed overnight. NASA. Accurate data on O-ring performance history. Wrong conclusion drawn. Seven lives lost. UnitedHealth. Accurate AI analysis of claims data. Wrong conclusion drawn. Senate investigation. Federal lawsuit. Irreversible patient harm. Frances Kelsey. The same data as every European regulator. Correct conclusion drawn. An entire country spared. Four cases. Same quality of data. One variable. Whether the humans reading it correctly categorised the risk. The approval step in this scenario is not a bottleneck. It is the O-ring. And we already know what happens when you decide the O-ring is not worth the delay. One question — and only one — before removing any critical control: "What category of risk does this safeguard exist to manage — and what is the consequence if it fires and nothing is there?" In banking — a 233-year-old institution destroyed overnight. In space — seven lives lost because a cold morning felt manageable. In healthcare — a patient receives the wrong treatment and cannot be made whole again. In drug approval — 10,000 children harmed across 46 countries because one country categorised the risk correctly and 45 others did not. The AI gave you the data. The categorisation is yours to make. Categorise it wrong and the rare event will arrive. It always does. And in healthcare — unlike banking, unlike digital ledgers, unlike a fraudulent charge reversed with one click — there is no undo.
- May 6May 6
- 12 replies
Efficiency Up, Experience Down — Should AI Win?
Efficiency Up, Experience Down — Should AI Win?

Poornima_Gupta_aZ3h replied to Vishwadeep Khatri's topic in We ask and you answer! The best answer wins!

Position: Reject or Rethink the Change — View B This is a classic case of the AI efficiency versus customer experience trade-off — and it is one of the most common mistakes organisations make when deploying AI for the first time. This organisation measured three things — handling time, cost per interaction, and cases per day. None of those three metrics measure whether the customer actually got what they came for. That is the entire problem. Why the Efficiency Gains Here Are an Illusion, Not a Win Imagine you run a bakery. You find a way to serve customers 30% faster. But the bread tastes worse, customers feel rushed, and half of them stop coming back. Did you win? No. You just moved the cost from the counter to the empty shop. That is exactly what is happening here. The core error in View A is treating cost-per-interaction as a profit driver when it is actually a cost-deferral mechanism. When first-contact resolution drops and satisfaction falls 8–10%, the organisation is not saving money. It is moving costs downstream. Customers who feel rushed and misunderstood do not disappear. They call back, escalate to supervisors, churn, and tell others. Every single one of those outcomes costs more than the handling time saved. This is called the efficiency-satisfaction trap — internal numbers look better while the outcomes customers actually care about quietly get worse, until the real cost shows up in churn, complaints, and lost revenue. The most damning number in this scenario is not the 8–10% satisfaction drop. It is the first-contact resolution decline. That one number tells you everything. Customers are not getting their problems solved. They are calling back. Every repeat call costs more than the time saved on the first one. The efficiency gain is already negative — the organisation just has not counted it yet. Real World Evidence — From Banking, Where I Work 1. NatWest Cora+ — The Closest Mirror to This Scenario NatWest launched a virtual assistant called Cora back in 2017. Sound familiar? It handled routine queries quickly but customers felt like they were talking to a wall. Interactions felt cold, transactional, and unhelpful. The system pointed people to existing content instead of actually solving their problem. Efficiency was there. The customer experience was not. Rather than shrugging and moving on, NatWest stopped and rethought the entire design. In 2024 they rebuilt Cora into something called Cora+ — powered by generative AI and IBM watsonx technology. The new version actually understood what customers were asking and answered them directly in plain language. The guiding principle was beautifully simple — when someone is worried about their money, they need to feel understood, not processed. The results spoke for themselves. Customer satisfaction improved by 150%. Human intervention dropped. Efficiency went up. Experience went up. Both at the same time. NatWest also saw Customer Lifetime Value double and Net Promoter Score triple — not soft feel-good numbers, but hard revenue outcomes. By H1 2025 they had deployed 24 more AI models, all built around the same idea — better experience first, efficiency as the reward. The lesson: You do not have to choose between efficiency and experience. Fix the experience and efficiency follows. 2. DBS Bank — The Most Direct Comparison to This Scenario DBS Bank in Singapore faced the exact same challenge described in this question — how to handle over 250,000 customer service calls every single month without losing quality. Their answer was not to make conversations shorter. It was to make agents smarter. They built a Gen AI tool called CSO Assistant — a live co-pilot that sits alongside the agent during every call. It listens, transcribes in real time, searches the knowledge base instantly, and surfaces the right answer while the conversation is still happening. The agent stays in control. The customer still feels heard. And the problem gets solved faster because the agent is not scrambling to find information. Pilots showed transcription and solutioning accuracy of nearly 100%, call handling time reduced by up to 20%, and close to 90% of customer service officers said it had a positive impact on their work. Across all its AI initiatives in 2024, DBS delivered SGD 750 million in economic value — more than double the previous year. The design difference is everything. DBS reduced time without removing the human. The scenario we are discussing removed the human without solving the problem. That single decision is what separates success from failure. 3. Bank of America Erica — The Gold Standard Bank of America had a simple rule when building their AI assistant Erica — start with what the customer needs, not what is easiest to automate. Erica has now handled over 2.5 billion customer interactions with a 98% success rate. Customers either get their answer from Erica or are passed smoothly to a human. No dead ends. No frustration. Bank of America was ranked the most satisfying mobile banking app of any national bank. The sequencing lesson here is critical. Experience and revenue came first. Efficiency came second. That is the right order. That is why it worked. The Real Issue — The Scorecard Was Wrong From the Start Here is a simple truth. The metric you choose to measure determines the result you will get. If you measure how fast you close a call, you will get fast call closures. If you measure whether the customer actually got what they needed, you will get satisfied customers. The organisation in this scenario chose speed. They got speed. And they lost trust. HSBC proves what good metric design looks like — twice. 1.When HSBC used AI to transform their KYC onboarding process, they did not just measure how fast documents were processed. They measured accuracy alongside speed. The result — processing time dropped from 12 days to under 24 hours while accuracy jumped from 87% to 99%. Both improved because both were measured from day one. 2.When HSBC partnered with Google Cloud on their AML anti-money laundering system, most banks are still drowning in false positive alerts — flagging innocent customers and wasting thousands of investigator hours chasing nothing. HSBC measured what actually mattered — how accurately real criminals were being caught, how much time investigators spent on genuine cases, and how many innocent customers were being unnecessarily disrupted. The system identified two to four times more real suspicious activity while cutting false alerts by 60%. Investigators focused on actual crime. Innocent customers faced fewer unnecessary checks. Everyone won. HSBC built the scorecard before they built the system. The organisation in this scenario did it the other way around. They measured what was easy to count — speed and cost — and got exactly those things. The problem was never the AI. The problem was the scorecard. To Be Fair — When View A Can Actually Work I do not broadly support View A. But a strong argument acknowledges the other side — because showing when something works makes it clearer why it fails here. Amazon is the textbook example. When Kiva robots rolled into fulfillment centers in 2012, things got worse before they got better. Delivery updates became impersonal. Handling exceptions became harder. But picking efficiency jumped by over 50% and costs per order fell sharply. Here is the crucial part — Amazon did not keep those savings. They reinvested every penny into building next-day and same-day delivery. Today, fast reliable delivery is the number one reason customers love Amazon. The short-term experience dip funded a permanent experience upgrade. Lloyds Banking Group shows the same thinking applied in banking. Lloyds automated data entry, transaction processing, and basic back-office inquiries. Tasks customers never see or feel. Nobody notices whether a human or a machine processed their data in the background. What customers noticed was faster, more accurate service. The efficiency metric and the satisfaction metric pointed in exactly the same direction because Lloyds chose the right processes to automate. But this only works when all four conditions are true: Efficiency gains are reinvested into customer experience — not kept as profit The experience decline is temporary and fixable, not permanent Customers have little reason to switch during the difficult period AI is applied to routine back-office tasks — not emotionally sensitive conversations where trust matters In the scenario presented, every single one of these four conditions fails. The savings were not reinvested. The satisfaction decline is not a temporary blip — it is a structural signal that customers consistently feel unheard. Banking customers who lose trust do switch, and winning them back costs far more than any handling time saving ever delivered. And most critically, the AI was placed in exactly the wrong conversations — the moments when someone is worried about their money and needs a human being who actually listens. That is not one mistake. That is four. Final Verdict The change should not be accepted. Not because efficiency does not matter — it absolutely does. But because this organisation has not yet earned the right to claim it. The banks that get AI right — NatWest, DBS, Bank of America, HSBC — all made the same decision before they wrote a single line of code. They decided what success actually looked like from the customer's point of view. They built the scorecard first. Then they built the system. NatWest asked — does the customer feel understood? DBS asked — does the agent have everything they need to help? Bank of America asked — does the customer get what they came for? HSBC asked — are we catching criminals or just closing alerts? The organisation in this scenario asked — how fast can we close the call? That one question, measured alone, produced exactly the outcome described. Faster calls. Lower costs. Unhappy customers. Declining trust. Fix the question you are measuring and you fix the outcome. That is the lesson. That is the only lesson.
- May 4May 4
- 14 replies

Poornima_Gupta_aZ3h

Joined

Last visited

Newbie

Recent Badges

Posts

Solutions

Reputation

Should AI Predict Who Is About to Quit?

Should AI Decide Which Projects Deserve to Survive?

Performance Optimization vs Team Development — What Should AI Prioritize?

Should AI Be Allowed to Kill Bold Ideas?

Rare but Critical — Should AI Remove the Safeguard?

Efficiency Up, Experience Down — Should AI Win?

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)