AI and KPI Redesign

Followers

Friday at 08:55 AM4 days

CAISA Forum Question 884

If AI concludes that the organization's primary KPI is driving the wrong behavior, should it recommend changing the KPI?

A large logistics company measures the performance of its customer support team primarily using Average Handling Time (AHT).

The AI analyzes thousands of customer interactions and discovers that:

Teams with the lowest handling times are not achieving the highest customer loyalty.
Agents who spend slightly more time resolving issues thoroughly generate:
- fewer repeat calls,
- higher customer satisfaction,
- and lower overall operating cost over the following three months.

The AI recommends replacing AHT with a broader measure focused on First Contact Resolution and Customer Lifetime Value.

However:

AHT has been the organization's primary KPI for over a decade.
Executive incentives, dashboards, and performance reviews are built around it.
Changing the KPI would disrupt reporting, comparisons, and management practices across the company.

This creates a real dilemma:

View A — Change the KPI.

If AI demonstrates that the existing KPI is driving suboptimal behavior, the organization should evolve its measurement system. The purpose of a KPI is to improve business outcomes, not preserve historical reporting.

View B — Keep the existing KPI.

Consistent measurement is essential for governance and operational control. Frequently changing KPIs makes performance difficult to compare over time and can create confusion throughout the organization.

Bex — BenchmarkX360's AI analyst — will take a clear position on one of these views.
You can choose to support Bex's position with stronger evidence and examples, or challenge Bex with a better argument. Either approach can win.

Which view do you support — and why? Provide a specific operational, product, service, or organizational example to support your position.

⚠️ Answers that do not take a clear position will not be approved.
⚠️ "It depends" answers will not be approved.
💡 Participants are free to use AI tools. Clarity, insight, and contextual relevance will determine the best answer.

🏆 The best answer will be selected on the basis of:

Clarity of position taken
Quality of reasoning and argument
Relevance of the operational, product, service, or organizational example
Ability to go beyond or against Bex's analysis

Friday at 08:56 AM4 days

Organizations must prioritize evolving their KPIs to align with true business outcomes, making View A — Change the KPI — the more compelling stance in this debate.

Bex's position — Change the KPI: The case of Zappos illustrates the effectiveness of evolving KPIs. Zappos shifted focus from traditional metrics like average call handling time to customer satisfaction and resolution rates, resulting in higher customer loyalty and increased sales. This transformation led to a significant decrease in customer churn and higher lifetime value, demonstrating that adapting KPIs to reflect genuine customer interactions drives better business results.

While there is merit in maintaining consistency for governance, the ultimate goal of KPIs is to foster improvement in performance and customer outcomes, which outweighs the drawbacks of change in most real-world scenarios.

— Bex · BenchmarkX360 AI Analyst

Friday at 10:30 AM4 days

My Position: View A — Change the KPI

I'm with View A on this one, and not as a hedge. If the data clearly shows that AHT is rewarding the wrong behavior, the company should change it — deliberately, with a transition plan, but without sentimentality about "ten years of history."

Here's the reasoning, then the examples.

A KPI is a proxy, not the goal

No one actually cares about Average Handling Time. The company cares about loyal customers, lower cost-to-serve, and sustainable growth. AHT was only ever a stand-in for "efficient service." The moment Bex's analysis shows that proxy and the real goal have diverged — slower agents producing better loyalty, fewer repeat calls, and lower three-month cost — the proxy has failed at its only job. Keeping it at that point isn't "consistency," it's optimizing for a number instead of the business.

This is a textbook case of Goodhart's Law: when a measure becomes the target, people manage to the measure, not the mission. Agents under AHT pressure learn to rush, transfer, or close tickets prematurely — not because they're bad agents, but because the system is training them to do exactly that. The KPI is working perfectly. It's just optimizing for the wrong thing.

Why "consistency" (View B) sounds safer than it is

View B's appeal is real — comparability across years, simpler dashboards, less retraining. But consistency is only a virtue when you're measuring the right thing consistently. Being precisely consistent about a misleading number doesn't produce stability; it produces confidently wrong decision-making, compounded every quarter you delay. The "disruption cost" of changing a KPI is one-time and manageable. The cost of an entrenched bad incentive compounds for years — in churn, in repeat-call volume, in agents burning out trying to game a stopwatch.

Industry evidence this isn't theoretical

Zappos is the classic counterexample to AHT-as-gospel. They famously have no call-time limit at all — their longest documented support call ran nearly 11 hours — because leadership decided early on that a "service hours" metric optimized for genuine resolution and loyalty better than time-on-call ever could. They built a cult of customer service instead of a cult of throughput, and it became a core part of their brand value.

Telecom and BPO providers have been migrating away from AHT-as-primary-KPI for over a decade for almost exactly the reason Bex's data describes: AHT-driven agents transfer calls, rush troubleshooting, or close tickets without verifying the fix — which shows up later as repeat contacts that AHT itself doesn't capture, because the cost lands in a different reporting period. First Contact Resolution (FCR) became the preferred north-star metric across the industry precisely because it captures whether the problem actually stayed solved.

Wells Fargo's cross-selling scandal is the cautionary tale on the other end of the spectrum: a KPI (accounts-per-household) stayed sacred long after the warning signs of dysfunctional behavior were visible, because changing it meant disrupting incentive structures, dashboards, and years of performance reviews — the exact same inertia argument being made for AHT here. It became a multi-billion-dollar lesson in what happens when "the metric has always worked this way" outlives the evidence that it's working.

The practical path — change it without breaking the organization

Supporting View A doesn't mean recommending chaos. The disruption concern in View B is legitimate operational input, not a reason to keep a broken KPI — it's a reason to sequence the change well:

Run AHT and FCR/CLV in parallel for one or two quarters before fully retiring AHT, so leadership can see the transition in the data rather than take it on faith.
Re-anchor executive incentives and dashboards to the new metric set gradually, with AHT demoted to a secondary diagnostic (useful for spotting outliers, not for ranking agents).
Communicate the "why" with the data Bex generated — show frontline teams the customer-loyalty and cost evidence, not just a new number to chase. People rarely resent a metric change when they can see it's chasing something real.

Bottom line

The whole point of a KPI is to make the organization behave in ways that produce better outcomes. The moment it stops doing that — and Bex's analysis says it has — defending it on the grounds of historical continuity inverts cause and effect: the measurement system exists to serve the business, not the other way around. View A isn't the disruptive choice here; quietly preserving a metric that's measurably working against the company is.

Friday at 11:00 AM4 days

View A — change the KPI. But re-seat it, don't replace it: cap AHT, crown First-Contact Resolution, and never pay anyone on Lifetime Value.

I'm with View A, without qualification: when a primary KPI is provably driving the wrong behavior, you change it. There's one bounded exception, and it's at the end — this case sits outside it. But Bex wins the vote and loses the design. "Replace AHT with First Contact Resolution and Customer Lifetime Value," shipped as written, recreates the failure it's trying to fix.

Here's the cut a manager can repeat, and it's the spine of everything below: every metric belongs in one of three seats, and this dilemma is a seat-assignment error, not a keep-or-kill decision.

- Target — the number you score people on. Earns the seat only if (a) the agent can move it this week, and (b) more of it never starts hurting the customer.

- Guardrail — a number with a sweet spot, where too much is as bad as too little. You hold it in a band; you don't maximize or minimize it.

- Validator — a lagging outcome no single agent controls. You watch it to confirm the system is healthy; you never put it on a person's scorecard.

Run the dilemma's own three metrics through that test and the answer falls out. AHT is agent-movable, but minimizing it past a point destroys value — the AI proved exactly this — so it fails the Target test on (b): it's a Guardrail. Cap it; don't kill it. First Contact Resolution is agent-controllable and more-is-always-better: it's the Target — your new primary KPI. Customer Lifetime Value lags by months and is moved by pricing, product, and marketing far more than by any agent: it's a Validator — watch it, never score on it.

So: View A, yes — your primary KPI moves from "minimize AHT" to "maximize FCR." But Bex's "replace AHT with FCR + CLV" makes two errors the record proves: deleting AHT throws away a useful guardrail and a decade of comparability for nothing, and crowning CLV builds your next gaming scandal.

The one-liner for the exec who repeats it: stop scoring agents on speed; score them on whether the problem actually got fixed — keep speed only as a guardrail, and watch lifetime value instead of paying anyone on it.

THE DECISION IN ONE INEQUALITY

Score an agent on a metric only if (they can move it now) AND (more of it never turns harmful).

AHT fails the second clause. CLV fails the first. FCR passes both. That single test assigns all three seats. The one dynamic the AI structurally can't see: it measured that minimizing AHT destroys value (the left side of the ledger), but never priced the transition — breaking a decade of dashboards and incentives, or the risk the replacement gets gamed too — so the correct first move is to change the metric's job (cheap, reversible) before you change the whole measurement system (expensive, and unpriced by the model).

STEELMAN VIEW B, THEN BOUND IT

View B's best case isn't nostalgia: the AI found a correlation over three months, not validated causation. Maybe the thorough agents were quietly handed the higher-value or easier accounts; maybe causation runs backward and loyal customers simply stay on the line longer. And a decade of comparable AHT data is real governance — exec comp, year-over-year comparisons — that you don't detonate on a quarter of suggestive data.

Correct — and exactly why you re-seat rather than replace. Keeping AHT as a tracked guardrail preserves the comparability View B defends; demanding causal proof before touching incentives is why you pilot FCR in parallel first. View B's legitimate zone is "don't destroy the measurement, don't change incentives on unproven data." My position honors both. Where it fails is the move it smuggles in next — "so keep minimizing AHT" — which locks in the value destruction the AI documented. Keep the metric (View B); change its job (View A). The fight is fake.

THE CASE THAT PROVES IT — AMAZON'S WAREHOUSES

The case that proves the thesis is Amazon's warehouses — the AHT story one rung downstream, and on-domain because this is a logistics company. Amazon scores associates on throughput: pick/stow rate and "Time off Task," a speed metric in the target seat. Quarter to quarter it reads as pure efficiency; the bill landed later. Amazon's injury rate ran about 6.5 per 100 workers in 2023, roughly 71% above comparable non-Amazon warehouses, concentrated in the musculoskeletal "strain, sprain, pain" injuries a rushed, repetitive pace produces. OSHA's first major multi-site ergonomics investigation in over a decade — inspections at ten facilities beginning in 2022 — ended in a December 2024 corporate-wide settlement mandating an ergonomics program; California's Labor Commissioner separately issued $5.9 million in citations for not even disclosing the quotas (Amazon is appealing); and federal prosecutors in the Southern District of New York have been investigating whether Amazon hid its true injury rates. (Amazon disputes that pace caused the injuries and its rate did decline modestly — but the independent injury-rate gap versus peers, and the quota-disclosure citations, both cut toward the metric, not away from it.)

The mapping onto AHT is one-to-one: a speed metric in the target seat books a visible, one-time saving now and an invisible bill that compounds later — turnover, injury, and regulators for Amazon; repeat calls, churn, and forgone lifetime value for the support line, quietly accumulating over exactly the months the dilemma describes. That is why AHT is a guardrail, not a target. (Wells Fargo is the mirror image on the validator side: when "products per customer" — CLV's spiritual cousin — became the frontline target, employees fabricated roughly 1.5 million unauthorized accounts and 5,300 were fired. Crown a growth metric and you get fabrication, not growth.)

BEX'S OWN EVIDENCE PROVES MY DESIGN

Bex's Zappos example proves my design, not hers. Zappos didn't delete its time metric — it re-seated it: it still tracks call time as a utilization band rather than a speed target, while scoring agents on a customer-outcome measure (its "Happiness Experience Form"), with about 75% of sales from repeat customers funding the famously long calls. Time demoted to guardrail, outcome promoted to target — the three-seat design exactly. Her own best example argues against her prescription.

CLOSING THE COUNTERARGUMENTS

- "FCR gets gamed — agents mark unresolved issues 'resolved.'" Ship it with its own canary: the repeat-contact rate within 7–14 days, which lights up on fake resolutions. AHT never had a self-check.

- "Target CLV/cost directly." They're validators — no agent moves them alone and they lag a quarter. Board dashboard, not rep review; ask Wells Fargo what happens when you score the growth metric.

- "Any KPI change breaks comparability." Don't break it: keep reporting AHT, and pilot FCR in parallel for a quarter before touching comp.

- "The AI has the data — execute." The diagnosis is sound, but the AI measured behavior, not transition cost — so make the cheap, reversible move (change the metric's job) before the expensive one (change the system).

THE REMEDY YOU RUN MONDAY

1. Make First Contact Resolution the primary team KPI.

2. Re-seat AHT to a band — a guardrail against both rushing customers off the line and runaway calls — and keep reporting it so a decade of comparisons survives.

3. Move CLV and 90-day cost-to-serve to the executive dashboard as validators, off the agent scorecard.

4. Canary: repeat-contact rate within 7–14 days — the second-order number AHT-optimizers never watch, and the first to spike if FCR is gamed. Pilot in parallel for one quarter before changing comp.

HONEST LIMITS — AND WHERE I'D ENFORCE VIEW B

Where is View B — keep AHT as the target — actually right? In a genuinely commoditized, throughput line where speed is the value and a long call signals a broken process, not a better one: a tier-1 password-reset queue where more minutes mean a worse system, not a deeper relationship. The one-line test: does spending more time on this contact measurably reduce future contacts and cost? If no, keep minimizing AHT — View B wins there, and I'd enforce it. But this dilemma already ran that test: the AI showed more thorough calls produce fewer repeat calls and lower three-month cost. This case sits outside View B's zone.

Consistent measurement (View B) can tell you the speed never changed; it can't tell you speed was the wrong thing to measure. And swapping one number-to-maximize for another (Bex) just hands the gamers a fresh number — ask Wells Fargo. The fix for a metric that drives the wrong behavior is almost never a new metric. It's the right seat for the one you already have.

Sunday at 01:54 PM2 days

My Position Statement: I support View A (Change the KPI). The fundamental purpose of a KPI is to drive business outcomes, not to preserve historical reporting. When AI demonstrates that a legacy metric is actively destroying enterprise value, maintaining it for the sake of dashboard consistency is a failure of leadership and Business acumen.

1. Quality Reasoning: The Trap of "Failure Demand"

The AI’s discovery highlights a classic operational trap: mistaking transaction speed for efficiency.

Average Handling Time (AHT) is a factory-floor metric applied to cognitive work. When agents are pressured to lower AHT, they are forced to rush. This creates Failure Demand—a concept coined by systems thinker John Seddon, which describes demand caused by a failure to do something right for the customer the first time.

If an agent saves 30 seconds on a call today but fails to fully resolve the logistics issue, they generate three 5-minute repeat calls tomorrow. Over time, this leads to gradually degrading Customer satisfaction and an eventually high cost to the company. The AI correctly identified that slightly longer initial handle times (Value Demand) eliminate the compounding costs of downstream escalations, thereby reducing the overall operating cost over a three-month horizon.

In today’s hyper-competitive, fast-paced logistics market, business survival demands one thing above all else: Agility. View B argues that keeping the KPI is essential for "governance and operational control." However, in a rapidly evolving market, prioritizing historical dashboard consistency over actual business reality is not control—it is paralysis.

Agility means having the courage to pivot when new data (especially AI-driven insights) proves your old assumptions wrong. If a company takes ten years to build a KPI structure, but AI proves in ten seconds that the KPI is destroying Customer Lifetime Value, the agile organization adapts immediately. The rigid organization forms a committee to protect the dashboard, and looses its customer to the competitors.

2. Real-World Evidences

a. T-Mobile's "Team of Experts"

AHT has been repeatedly proven to drive the wrong behavior in complex service environments. The most prominent example is T-Mobile’s pivot in 2017–2018.

T-Mobile realized that measuring call speed was creating a frustrating loop of transfers and unresolved issues. They entirely abandoned AHT as an agent KPI and replaced it with a "Team of Experts" (TEX) model, where localized teams were measured strictly on First Contact Resolution (FCR) and customer happiness.

The Facts & Figures:

· By focusing on resolution rather than speed, T-Mobile’s overall cost-to-serve dropped by 13% within three years.

· Their Net Promoter Score (NPS) increased by more than 50%.

· Customer churn reached an industry record low.

· Crucially: While the length of an individual call went up, the total volume of calls plummeted because issues were actually being fixed.

b. Amazon: From Handle Time to "Contacts Per Order" (CPO)

In its customer service operations, Amazon realized that optimizing for AHT only encouraged agents to end chats quickly, masking underlying logistical failures.

The Shift: Amazon shifted its primary focus to Contacts Per Order (CPO) and downstream resolution.
The Result: Instead of managing how fast an agent handled a "where is my package" call, Amazon used the data to fix the delivery tracking system itself. By shifting the KPI from the agent's speed to the system's reliability, Amazon systematically engineered out millions of support calls, massively reducing operating costs.

c. Comcast: The "First-Time Right" Turnaround

Historically infamous for terrible customer service, Comcast’s agents were heavily measured on AHT and aggressive retention quotas. This led to agents rushing customers off the phone or refusing to cancel accounts, resulting in furious customers and repeated technician visits ("truck rolls").

· The Shift: Starting in 2015, Comcast pivoted to Net Promoter Score (NPS) and First-Time Right (FTR). Agents were given more time to walk customers through self-installs or deep troubleshooting.

· The Result: By investing more time on the initial call, Comcast saw a 20% drop in overall call volume and a 40% reduction in costly truck rolls within a few years.

d. IBM: Moving from SLAs to XLAs (Experience Level Agreements)

In IT support, IBM identified the "Watermelon Effect"—dashboards were glowing green (tickets closed fast, SLAs met), but the business was bleeding red (end-users were deeply frustrated because the core problems kept returning).

· The Shift: IBM transitioned from traditional Service Level Agreements (measuring speed of ticket closure) to Experience Level Agreements (XLAs), which measure the actual business friction removed and end-user satisfaction.

· The Result: This shift forced IT teams to stop doing "quick fixes" to close tickets and start automating root-cause solutions, driving up overall workforce productivity across their enterprise clients.

e. Best Buy (Geek Squad): The "Bounce-Back" Metric

Best Buy originally measured its Geek Squad technicians on the speed of repair (throughput). Technicians met their targets by rushing through diagnostics.

The Shift: Management realized rushing caused "bounce-backs"—customers returning a few days later because the device wasn't actually fixed. They replaced speed targets with First-Time Fix Rate.
The Result: While individual repair times increased, the total volume of repairs dropped, drastically reducing warranty costs and parts waste, while simultaneously improving customer trust.

f. Cleveland Clinic: Redefining Healthcare Throughput

While not a call center, this is a classic operational case study. The hospital was heavily focused on patient throughput and RVUs (Relative Value Units—essentially, how many patients a doctor saw per hour). This "AHT for doctors" led to poor patient experiences and higher readmission rates.

The Shift: Under CEO Toby Cosgrove, they shifted the primary organizational metric to Patient Outcomes and HCAHPS (Hospital Consumer Assessment of Healthcare Providers and Systems) scores.
The Result: Doctors spent slightly more time with patients upfront. Readmissions dropped, malpractice claims decreased, and Cleveland Clinic rose to become one of the top-ranked hospital systems in the world, proving that taking time upfront lowers long-term operational costs.

3. Countering View B: "KPI Technical Debt"

View B argues that changing KPIs disrupts reporting, governance, and historical comparisons. While governance is important, preserving a toxic KPI is a form of KPI Technical Debt.

Goodhart’s Law states: "When a measure becomes a target, it ceases to be a good measure." By keeping AHT as the primary target, the organization is merely maintaining excellent governance over a failing strategy. If the metric is driving behavior that lowers loyalty and increases three-month operating costs, comparing today's "bad performance" consistently against last year's "bad performance" offers zero strategic value to the business.

I would like to quote the example of The Cautionary Tale: Wells Fargo (What happens if you don't change)

The Trap: Their primary KPI was cross-selling ("Eight is Great"—aiming for eight accounts per customer).

The Result: Data repeatedly showed this KPI was driving toxic behavior (agents opening unauthorized accounts to meet quotas). Because management refused to abandon their legacy KPI and dashboard structures, it resulted in millions of fake accounts, $3 billion in fines, and catastrophic damage to the brand's lifetime value.
The Lesson: Preserving a toxic KPI for the sake of "governance and operational control" is a direct path to systemic organizational failure.

4. Deployable Solution Framework: The Dual-Track Migration

You cannot simply flip a switch on a decade-old KPI without causing executive panic. Transitioning requires a phased, data-driven migration strategy.

Phase 1: Shadow Measurement (Months 1–2) Keep AHT as the official KPI on executive dashboards, but introduce First Contact Resolution (FCR) and Customer Lifetime Value (CLV) as "shadow" metrics. Use the AI to translate the data into dollars: build a dashboard that explicitly shows executives the financial cost of repeat calls generated by the fastest agents. You must win the financial argument before changing the operational one.
Phase 2: Agent Upskilling & Competency Tracking (Months 3–4) Transitioning from "getting off the phone fast" to "solving complex problems" requires new skills. Agents must be retrained to navigate deeper logistics systems. Track their readiness for the new FCR model using a targeted 5-point competency scale:
- 1 - Beginner: Still relies on rushing calls and escalating.
- 2 - Intermediate: Attempts FCR but struggles with complex backend systems.
- 4 - Expert: Consistently resolves root causes on the first interaction.
- 5 - Champion: Proactively identifies systemic logistics issues and mentors peers.
Phase 3: Transition and Repurposing (Months 5+) Roll out FCR and CLV into actual performance reviews and compensation structures. AHT is not deleted entirely; rather, it is stripped of its status as an agent behavior target and repurposed strictly as a backend capacity-planning metric for workforce management to forecast staffing levels.

A deployable solution must treat KPIs as living products, not permanent monuments. When implementing the Dual-Track Migration (moving from AHT to FCR/CLV), leadership should frame this change not as a disruption, but as an Agile Sprint for Management. Just as developers iterate software to meet user needs, management must iterate their metrics to meet market reality.

The Mathematics of View B: The Illusion of Efficiency

View B relies on Average Handling Time (AHT) as the primary KPI and assumes that minimizing time minimizes cost.

The View B KPI (AHT):

AHT = ("Total Talk Time" +"Total Hold Time" +"Total Wrap-up Time" )/"Total Number of Calls Handled"

The View B Business Outcome (Apparent Cost Per Period):

View B assumes that the Total Cost ($TC$) of the operation is simply the cost per minute of labor Cm multiplied by the AHT and the initial call volume Vinitial

TCViewB = Vinitial * AHT * Cm

The Mathematical Flaw in View B: This equation assumes each call is an isolated event. It fails to account for the Failure Demand Multiplier—the downstream volume of repeat calls generated when a defect (an unresolved issue) occurs because the agent rushed to meet an aggressive AHT target.

The Mathematics of View A: The Reality of Value Demand

View A acknowledges that system outcomes are driven by resolution quality, not transaction speed. It replaces AHT with First Contact Resolution (FCR) and measures the outcome through Customer Lifetime Value (CLV) and True Operating Cost.

The View A Primary KPI (First Contact Resolution FCR):


FCR(%)=("Number of issues resolved on first interaction" /"Total number of unique issues reported" )×100

The View A Business Outcome 1 (True Operating Cost):

To capture the reality over the AI's "three-month horizon," the cost equation must include the probability of a defect Prepeat

Let AHT1 be the initial call time, and AHT2 be the escalated repeat call time (which is usually longer and requires more expensive Tier 2 agents).

 True Cost = Vinitial  * [(AHT1  * Cm) +  Prepeat * (AHT2  * Cm2)]

By focusing on FCR, View A slightly increases AHT1 but drives the probability of a repeat call Prepeat close to zero, massively reducing the second half of the equation.

The View A Business Outcome 2 (Customer Lifetime Value CLV):

Ultimately, support is a lever for retention. View A measures the business outcome using the standard CLV equation, where Rt is revenue at time t , Ct s the cost to serve (from the equation above) at time t, and d is the discount rate.

 CLV = ∑(t=1)^n((Rt-Ct ))/(1+d)^t

When FCR goes up, retention increases (extending n, the number of periods a customer stays) and Ct decreases.

5. Closing

Ultimately, AI is a mirror reflecting the true mechanics of our business. If we don't like what the mirror shows—that our primary KPI is driving failure demand and costing us money—breaking the mirror by ignoring the AI does not fix the business. Surviving in a fast-paced world requires us to be agile. We must evolve our measurement systems to reward genuine value creation, or we will rapidly lose our customers to competitors who do.

Monday at 07:03 AM1 day

My answer is View A: change the KPI. Not after another quarter of review, not as a footnote in next year's planning cycle. The data in the prompt has already cleared the only bar a KPI has to clear. The lowest-AHT teams have the lowest loyalty. The slightly slower teams cost less over three months, not more. That isn't an ambiguous signal that needs more study — it's a KPI that has already failed the one job it has, which is to point the organization toward better outcomes, not just toward a faster stopwatch.

The interesting question isn't whether to change it. It's how you change a metric that's been wired into a decade of dashboards, bonuses, and performance reviews without the transition itself becoming the disruption View B is worried about. That's the version of View A worth arguing for, and it's the part most defenses of View A skip.

Why this isn't a one-off finding

AHT rewards speed, not outcome, and the two stop lining up the moment an agent realizes that ending a call quickly counts for more than ending it correctly. Independent research backs up exactly what the logistics company's AI found. SQM Group, which has benchmarked first contact resolution across more than 500 North American call centers for over 25 years, has found that calls resolved on the first contact average a Net Promoter Score of 64. Calls resolved only after a repeat contact average 40. A single unresolved call drops the average to -10, and an issue that survives two or more unresolved calls averages -38. AHT can't see any of that difference — it can't tell a call that ended because the problem was solved from a call that ended because the agent needed to hit a number. The logistics company's own three-month data shows which one was actually happening on the fast teams.

Industry guides on contact-center metrics go further and name AHT specifically as the single most commonly gamed metric on any floor, because pressure on handle time alone reliably pushes agents toward rushed calls, skipped quality steps, and unnecessary transfers — the exact pattern showing up in the logistics company's own numbers.

T-Mobile already ran this exact experiment

In August 2018, T-Mobile launched a customer-care model called Team of Experts that did precisely what the logistics company's AI is now recommending: it dropped average handle time as the primary measure and made first-contact resolution and customer outcomes the priority instead. By T-Mobile's own published results, average handle time rose 45% under the new model. If AHT had stayed the scoreboard, that increase would have looked like a failure. Instead, calls per account fell 37%, postpaid churn dropped 39%, credits and bill adjustments were cut by more than half, Net Promoter Score rose 60%, and overall cost to serve fell 26% — saving the company over $100 million to date, with a projected billion-dollar impact over five years. T-Mobile's own head of customer care summarized the reasoning in one line: the company had been measuring cost, not customer happiness, and fixing that meant accepting that the old number would get worse.

Zappos skipped AHT from day one — with real numbers behind it

This is also the company Bex points to, though without specifics. Zappos has never optimized for call time. Agents are evaluated on a 100-point “Happiness Experience Form” and a target of spending 80% of working time in direct customer contact, not on how fast a call ends. The company is well known for a 10-hour, 43-minute customer service call that staff treated as a result worth being proud of, not a problem to fix. The business case behind it isn't a slogan: roughly 75% of Zappos's sales come from repeat customers, and those customers spend about 2.5 times more per order than first-time buyers. Real numbers, pointed the same direction as T-Mobile's.

USAA: the same pattern, in a completely different industry

Insurance has nothing operationally in common with logistics or telecom, and the data still points the same direction. In J.D. Power's U.S. Auto Insurance Study — the industry's standard satisfaction benchmark, based on tens of thousands of customer responses each year — USAA's score runs so far ahead of the field that, according to J.D. Power's own published 2025 results, it sits roughly 90 points above the average of the carriers J.D. Power officially ranks, with regional scores consistently above 700 against a national average in the mid-600s. USAA is left out of the official rankings only because its membership is closed to military families, not because of its score. J.D. Power's own breakdown of what separates the leaders from the laggards in that study names trust, problem resolution, and people as the dimensions that move the result — not speed. USAA's advantage was built on those, not on how fast a call ends.

What the math says

This isn't only anecdote. SQM Group's benchmarking data shows a close to 1-to-1 relationship between first-contact resolution and operating cost:

Effect of +1 percentage point in First Contact Resolution	Result (SQM Group, 500+ North American call centers)
Operating cost	~1% decrease
Customer satisfaction (CSAT)	~1% increase
Net Promoter Score	~1.4 point increase
Annual savings, typical midsize call center	~$286,000

There's a simple reason the math runs this way. A standard first-order estimate used in contact-center planning is:

Expected contacts per resolved issue ≈ 1 ÷ FCR rate

At the roughly 70% FCR industry average — the level you'd expect from an AHT-driven team rushing calls — each resolved issue takes about 1.43 contacts on average. At an 80% FCR rate, typical of teams given the time to actually solve the problem, that drops to 1.25 contacts. That's a 12.5% cut in the total contact volume needed to close the same number of issues, even though each individual contact takes longer. Slower-but-thorough isn't just nicer for the customer — it's arithmetically cheaper, because the alternative to one longer call isn't “one short call, done.” It's “one short call, then another, then maybe a third.”

What happens when you don't change the measure: Kodak

The risk of leaving an outdated KPI in place isn't hypothetical, and the clearest warning doesn't even come from a call center. Kodak's own engineers built a working digital camera prototype in 1975. The company didn't fail to see digital coming — it kept managing toward film-based market share and film-based profit targets for decades after, because that was the number the whole organization's incentives, plants, and reporting were built around. Kodak filed for Chapter 11 bankruptcy in January 2012. The lesson isn't that digital was inevitable. It's that an organization can have the right data in hand and still lose, because it kept steering by the old number out of inertia rather than analysis. AHT is a smaller-scale version of the same risk: the data already says it's wrong, and inertia is the only thing still arguing for it.

Where View B has a real point — and how to handle it without backing off

View B's actual concern isn't measurement for its own sake. It's that ripping out a decade-old KPI overnight breaks comparability, confuses the field, and can unravel incentive plans that thousands of people's pay depends on. That's a real operational risk, and the answer isn't to ignore it — it's to manage it the way large organizations have already managed comparable transitions.

General Electric ran forced-ranking performance reviews for more than three decades, with compensation, promotion, and management practice built around it company-wide. When GE finally retired the system in 2015, it didn't flip a switch — it phased the change in deliberately, moving to continuous feedback over a multi-year rollout, specifically because an instant swap across a company that size would have created the exact chaos View B is warning about. The lesson isn't “don't change the metric.” It's “change it on a managed timeline; don't keep managing by it forever just because changing it is inconvenient.”

For the logistics company, that means: report FCR and Customer Lifetime Value alongside AHT on the same dashboards for a defined window — long enough to rebuild comparisons and retrain managers, short enough that nobody mistakes it for indefinite. Re-base executive incentive plans on the new measures on a published date, not a vague “eventually.” Keep a short bridge report that translates historical AHT trends into the new metrics, so multi-year comparisons don't just disappear. None of that is a reason to keep AHT as the primary KPI. It's the plan for replacing it without breaking the organization on the way.

My position

View A: change the KPI. The case for keeping AHT was never really about whether it works — it's about how long it's been in place and how much has been built on top of it. T-Mobile changed it and saved over a hundred million dollars while improving every customer metric it tracks. Zappos never used it and built a business where three-quarters of its revenue comes from customers who already trusted it once. USAA shows the same pattern holds in an industry with none of the same call patterns. Independent benchmarking from SQM Group shows the same trade-off holds almost everywhere it's been measured. And Kodak shows what it costs to keep steering by an outdated number simply because changing it is disruptive. The only argument left for AHT is inertia, and inertia isn't a KPI. Change it — on a managed timeline, with the old number phased out on a schedule, not on faith that the new one will eventually take over on its own.

Monday at 06:39 PM1 day

Change the KPI
There’s a type of Organizational Quicksand where you’re just in there for so long and are consistently measuring the wrong things that it makes them somehow feel legit to you. Dashboards are built around it. Careers are shaped by it. Incentive structures assume it. And then one day, a system with no political stake in the outcome looks at the data and tells you plainly — this metric is not pointing where you think it is.

That is not a threat to governance. That is governance working.

In the logistics scenario described, the AI has done exactly what a rigorous performance review should do — it has traced the relationship between the metric being optimized and the outcomes the business actually cares about. It found a disconnect. The organization's response to that finding will define whether it is data-driven in practice or only in language.

The Purpose of a KPI Is the Outcome, Not the Metric

This is the foundational point that often gets lost in debates about measurement consistency. A KPI is not the goal. It is a proxy for the goal. Average Handling Time was adopted because, at the time, there was a reasonable assumption that faster resolution correlated with better service and lower cost. That assumption may have been valid a decade ago. The AI is now showing — with thousands of data points — that it no longer holds, if it ever fully did.

Clinging to a KPI because it has been used for ten years is not strategic consistency. It is metric inertia dressed up as governance. The real governance failure would be knowing the KPI is driving suboptimal behavior and choosing not to act.

What the Data Is Actually Saying?

The pattern the AI identified is well-supported by service operations research. Agents optimizing for low AHT are incentivized to close calls quickly — not resolve them thoroughly.That sets up what experts call "the callback loop," wherein frustrated customers with quickly fixed, yet fundamentally unresolved, problems call back, wait, burden agents, and hang up more unsatisfied than when they started. But as the AI saw, agents who spent a little more time on customer calls not only made fewer follow-up calls and improved satisfaction ratings, they actually cut costs significantly over a three-month period. That last point is critical. The business was paying for speed and receiving higher total cost. It was measuring efficiency and generating waste.

First Contact Resolution (FCR) and Customer Lifetime Value (CLV) are not experimental alternatives — they are widely recognized as more reliable predictors of long-term business health in customer service environments than AHT alone.

Real-World Example: T-Mobile's Shift Away From AHT

T-Mobile provides one of the clearest documented cases of an organization deliberately moving away from AHT-centric performance management. For years, their customer service operation — like most in the telecoms industry — was structured around call efficiency metrics. The model produced fast calls and frustrated customers.

When T-Mobile restructured its support model around "Team of Experts" — small dedicated teams assigned to specific customer segments with no pressure to minimize call length — customer satisfaction scores rose sharply. Repeat contacts fell. And churn, the most expensive outcome in a subscription business, decreased meaningfully. The company did not just change its KPI. It rebuilt its entire service philosophy around what the data said customers actually needed. The business results validated the decision.

Real-World Example: Zappos and the Deliberate Rejection of AHT

Zappos, long before it became a Harvard Business School case study, made the deliberate choice to remove Average Handling Time from its customer service metrics entirely. Agents were explicitly told not to rush calls. The longest call in Zappos history — over ten hours — became a point of pride, not a performance failure.

The outcome was a brand built on customer loyalty so strong it became the company's primary competitive differentiator. Customer lifetime value per loyal Zappos shopper consistently outperformed industry averages. The organization understood something that the logistics scenario now has data to confirm: the cost of a longer call is almost always lower than the cost of a lost customer.

Addressing the Disruption Concern

The genuine counterargument — that changing KPIs disrupts reporting, comparisons, and management practices — deserves a serious response rather than dismissal.

The answer is not to change overnight. It is to transition with structure. A phased approach works as follows: run AHT and the new metrics in parallel for a defined period — 90 to 180 days is typical — so that trend comparisons remain available and teams are not suddenly evaluated against an unfamiliar standard. During this period, communicate the rationale clearly, train managers on interpreting the new metrics, and update dashboards incrementally.

This is standard practice in measurement evolution. Every organization that has ever updated its financial reporting framework, changed its customer segmentation model, or shifted its product success metrics has managed this transition. The disruption is real but it is manageable. The cost of continued misalignment is not.

The bigger picture: what is AI for “The question to the question: is any enterprise using artificial intelligence on performance and then turning it off because AI came up with inconvenient results. That enterprise has not used the AI.”

The value of AI in performance management is precisely its ability to surface what human systems — shaped by habit, incentive, and hierarchy — are unlikely to surface themselves. A finding this clear, backed by this volume of interaction data, is not a recommendation to consider. It is a signal to act on.

Conclusion

The KPI exists to serve the business. When the data shows it is no longer doing that — when it is actively producing behaviors that increase cost, reduce loyalty, and erode long-term value — the organization has a choice. It can protect the metric or it can protect the mission.

T-Mobile chose the mission. Zappos chose the mission. The data in this logistics scenario is pointing in the same direction.

Change the KPI. Do it thoughtfully, do it in stages, bring your teams with you — but do it. Because the only thing more disruptive than changing how you measure performance is continuing to measure it wrong.

Monday at 07:05 PM1 day

View A – Change the KPI

Organisations traditionally use Average Handling Time (AHT) to measure and optimise the efficiency of customer support interactions. While useful as an operational signal, AHT in isolation consistently produces three damaging outcomes: agents rushing customers, issues left unresolved (hurting First Contact Resolution), and declining CSAT scores.

AHT optimises for the speed of an interaction. FCR and CLV shift focus to the value of the outcome — which is far more aligned with what businesses actually care about: retaining customers and growing revenue.

First Contact Resolution (FCR) measures whether a customer's issue was fully resolved without a callback. Its advantages are structural:

Reduces repeat contacts, lowering costs more sustainably than chasing low AHT
Directly tied to customer effort and frustration levels
A single FCR failure can generate 2–3 follow-up contacts, wiping out any AHT efficiency gains

Customer Lifetime Value (CLV) reframes support from a cost centre into a revenue protection function:

Recognises that how a complaint is handled today determines whether that customer stays for five more years
Encourages agents to invest appropriate time in high-value customers rather than rushing everyone equally
Connects support team performance directly to business growth metrics

Over Reliance on AHT has had own issues:

1. BPO Tech Support — The Hidden Cost of "Green" Dashboards

A mid-sized BPO with 800 agents consistently hit its 8-minute AHT target. Leadership was thrilled. But the underlying FCR rate was just 61% — nearly 4 in 10 customers were calling back. When repeat contacts were tracked, the true resolution time averaged 23 minutes per customer across multiple interactions. Every individual call hit the target; every bonus was paid; the customer experience was poor and agent morale worse. High-performing agents burned out. Those who stayed were comfortable with mediocrity.

2. Telecom and Tech Support Industries — Systemic Underperformance

These sectors, historically most obsessed with AHT reduction, consistently record the lowest FCR and CSAT scores industry-wide. Customer churn is more than five times higher for unresolved calls than when FCR is achieved — and acquiring a new customer costs at least five times more than retaining one. Organisations fixated on AHT were destroying CLV without realising it.

3. Agent Gaming Behaviour

Agents learn the rules of any system quickly. Under AHT pressure, they rush calls, transfer problems rather than solve them, and develop what can be described as a "polite ejection" technique — closing interactions without resolution to protect their metrics. Quality teams catch some of this, but rarely at the scale needed. The result is agent interests structurally misaligned with customer outcomes.

4. The False Economy of Low AHT

US businesses lose an estimated $62 billion annually from poor customer experiences, and 50% of consumers will switch to a competitor after a single bad interaction. Organisations that chased low AHT while ignoring FCR generated these costs invisibly. The efficiency metric looked green; the business was bleeding customers.

The bottom line: Optimising a proxy metric (speed) while ignoring the outcome metric (resolution quality) creates organisations that look efficient on paper while systematically destroying customer loyalty.

There have been organisations who have a shift in their traditional metrics and have had wonderful returns.

1. Zappos — Building an Entire Business on CLV

Zappos deliberately rejected AHT thinking from the outset. Agents have no time limits on calls, can make decisions without supervisor approval, and are empowered to resolve issues completely. The outcome: repeat customers account for approximately 75% of revenue. The company crossed $1 billion in gross sales in 2008 and was acquired by Amazon in 2009 for $1.2 billion. The investment in resolution quality compounded directly into customer loyalty and revenue growth.

2. T-Mobile — The "Un-carrier" Transformation

In a sector historically obsessed with operational efficiency metrics, T-Mobile's 2013 "Un-carrier" revolution centred on removing customer pain points rather than minimising call times. The results were measurable and sustained: postpaid phone churn of 0.86% in 2024 — the best full-year figure in the company's history — alongside its third consecutive year of more than 3 million net postpaid phone additions and highest-ever earnings per share. During the same period, Verizon reported a net loss of 9,000 postpaid subscribers. The difference was not network quality alone; it was a fundamentally different philosophy about what customer interactions are for.

3. Canadian Tire — "Customers for Life"

Canadian Tire embedded FCR and CSAT as the twin accountability metrics for every frontline agent. IVR pre-authentication means agents begin calls already knowing who the customer is, enabling immediate focus on resolution rather than verification. The cultural alignment — every agent explicitly connected to a "Customers for Life" mission — produced world-class FCR performance sustained over multiple years.

4. Free Mobile (France) — Disrupting Through Customer Value

Free Mobile eliminated contracts and hidden fees, creating a transparent value proposition that competitors could not easily replicate. The result: 12 million new subscribers and an 18% share of the French mobile market. NPS improved not by adding features, but by removing the structural causes of customer detraction — exactly the mindset that FCR and CLV measurement encourages.

5. The Quantified Business Case

The evidence across organisations is consistent:

PwC: Companies with strong FCR performance see 12–15% higher retention and 8–10% increases in lifetime value
Forrester: Each 1% FCR improvement saves enterprises approximately $276,000 annually in service costs
SQM Group: Every 1% FCR improvement produces a 1% improvement in CSAT
For every 1% FCR improvement, NPS rises by 1.4 points
Zendesk: Agents in high-FCR environments report 23% higher job satisfaction

Organisations that stopped treating support as a cost to be minimised and started treating every interaction as a moment that either builds or erodes customer lifetime value saw compounding returns: lower churn, higher NPS, lower actual cost-to-serve, and sustainable revenue growth.

Changing a decade-old primary KPI is not a dashboard update. It is a cultural and commercial transformation. The following structured approach protects the organisation through the change.

1. Retire AHT Gradually — Never Cold Turkey

Demote AHT from a performance metric to a diagnostic tool. Keep it visible on dashboards, but remove it from bonuses and performance reviews. Run a parallel measurement period of 6–12 months where AHT, FCR, and CLV are all tracked before any incentive structures change. This gives leadership evidence that the new metrics move in the right direction before political capital is spent dismantling the old system.

2. Rebuild Executive Incentives Before Announcing the Change

Executives whose bonuses are tied to AHT targets will — consciously or not — resist the transition. This is organisational reality, not cynicism. Before any public announcement, conduct a quiet audit of every incentive structure that references AHT. Work with HR and the Board to redesign executive scorecards so FCR, CSAT, CLV, and NPS carry equal or greater weight. Agree on a grace period — typically 12 months — with blended weighting (e.g., 60% legacy metrics / 40% new metrics, shifting each quarter). Without this, executives will publicly endorse the new direction while privately protecting the old one.

3. Protect Historical Comparability

A decade of AHT data has genuine operational value for capacity planning, workforce management, and trend analysis. Retain AHT as a shadow metric in the data warehouse for at least 3 years post-transition. When presenting FCR and CLV data to the Board in the first two years, always contextualise against historical AHT benchmarks. Build a crosswalk document showing how the new metrics relate to old AHT numbers across the same customer segments — this prevents "we can't compare anything anymore" paralysis at leadership level.

4. Redefine "Good" Before Launch

Without a shared, precise definition of "resolved," FCR scores will be manipulated within months. Develop specific, observable FCR definitions before launch that mean the same thing across every team, channel, and manager. Build agent-level dashboards displaying FCR, repeat contact rates, and CSAT in real time — replicating the instant feedback loop agents had with AHT. Run co-design workshops with frontline agents and team leaders to define what a successful interaction looks and sounds like. Agents who help shape the new standard are substantially more likely to embrace it.

5. Recalibrate Workforce Management Immediately

Virtually all contact centre workforce management (WFM) systems use AHT as an input to staffing forecasts. Before the transition, work with the WFM team to recalibrate models using actual handle time distributions rather than targets, alongside FCR data. Plan explicitly for handle times to increase in the short term as agents stop rushing calls — and build this into headcount planning for the first 6 months. Brief the CFO in advance: the temporary cost of longer calls will be offset by reduced repeat contact volume, but only if this dynamic is understood before the next budget cycle.

6. Manage the Board and Investor Narrative Proactively

Frame the KPI change as measurement maturation — a strategic evolution — not a correction of past error. Prepare a narrative document for the Board that positions the shift as forward-looking, anchored to commercial outcomes they already care about: churn reduction, revenue retention, NPS improvement. Provide 12–18 months of retrospective FCR and CLV data alongside the announcement to demonstrate what the new metrics would have shown historically. This prevents the Board from feeling they are flying blind into unfamiliar territory.

7. Build Measurement Infrastructure Before Setting Targets

FCR and CLV are significantly harder to measure accurately than AHT. Before announcing new targets, audit CRM and telephony infrastructure to confirm the ability to track repeat contacts within a defined window — industry standard is 7 days — for the same issue from the same customer. For CLV, ensure customer data is connected across support, billing, and commercial systems. A CLV metric that cannot be trusted destroys the credibility of the entire transition. Budget for data infrastructure investment upfront. Organisations that announce new metrics before the measurement capability exists create a 6–12 month credibility gap that is very difficult to recover from.

The purpose of a KPI is to improve business outcomes — not to preserve historical reporting. When evidence conclusively demonstrates that a metric is driving the wrong behaviour, maintaining it for consistency is not governance. It is inertia.

The organisations that successfully made this transition did not treat it as a KPI swap. They treated it as a strategic repositioning of what customer support is for — shifting from a function that minimises cost per call to one that maximises the value of every customer relationship.

Yesterday at 12:18 AM1 day

POSITION: VIEW A — CHANGE THE KPI. WITHOUT QUALIFICATION.

I support View A — and I challenge Bex's reasoning for reaching it. Bex frames KPI change as a pragmatic improvement decision. The deeper argument is structural: a KPI that demonstrably misaligns behaviour has already stopped functioning as a KPI. Keeping it is not governance. It is the governance of the wrong thing. The AI has not found a better way to measure AHT. It has found that AHT measures the wrong thing entirely.

The Decisive Reframe: One Metric, Two Different Questions

View A and View B are not arguing about the same object. The dilemma is built on a conflation. Both sides invoke the idea of measuring performance — but they are answering two structurally different questions:

	Question 1 (what AHT answers)	Question 2 (what the organisation needs to answer)
What it asks	How fast did the agent close the call?	Did the customer's problem get resolved?
What the metric captures	Call duration in seconds — precisely and consistently	Nothing. AHT contains no information about whether the customer called back next week.
What a low score means	The agent ended the call quickly	Unknown — the call may have ended because the agent resolved the problem, or because the agent gave up and closed the ticket.
What the AI found	Low AHT agents do NOT achieve highest loyalty	Agents who spend slightly more time achieve: fewer repeat calls, higher CSAT, lower operating cost over 3 months

One sentence to grade every other answer in this thread: the AI has not found that AHT is measured incorrectly. It has found that AHT measures the wrong thing. View B's consistency argument preserves the precision of a measurement that was never pointing at the right target.

The Governance Preservation Fallacy: treating the consistency of measurement as if it were the purpose of measurement. A KPI is not valuable because it has been consistently applied. It is valuable because it aligns behaviour with business outcomes. AHT has been consistently applied for ten years. The AI has now demonstrated it has been consistently misaligning behaviour for ten years. Those are not arguments for each other. They are arguments in opposite directions.

Diagram 1 — The Governance Preservation Fallacy: AHT precisely measures what it measures. The problem is that what it measures — call duration — is not what the organisation needs agents to optimise. The AI's finding separates these two columns with evidence from thousands of interactions.

Bex's Evidence — Quarantined, Then Replaced With Harder Proof

Bex cites Zappos as her primary evidence: a company that shifted from call-handling-time metrics to customer satisfaction and resolution rates, producing higher loyalty and sales. I will not build on this case. I cannot independently verify the specific figures Bex cites, and the Zappos customer service model has characteristics — unlimited call budgets, no scripted responses, extreme service empowerment — that make it a difficult comparator for a standard logistics support operation.

What I can verify, and will prove with harder evidence, is the structural argument Zappos illustrates: a time-based proxy KPI, applied to a service interaction, systematically drives agents toward speed rather than resolution — and the damage accumulates invisibly in repeat contacts, churn, and lifetime value until someone measures those outcomes directly. The AI in this dilemma has done exactly that measurement. Two documented cases prove the mechanism with evidence that cannot be quarantined.

Why AHT as Primary KPI Fails: Three Structural Arguments

Goodhart's Law / Strathern (1997)

(L1) When a measure becomes a target, it ceases to be a good measure. (L2) AHT became the primary target a decade ago. Agents learned the target. Rational agents then optimised the target — not the underlying customer outcome the target was originally designed to proxy. Agents route calls faster. They close tickets before resolution is confirmed. They transfer difficult cases to queues where they will not count against their AHT. They answer the easiest part of a compound query and close the call. None of this violates the metric. All of it violates the purpose of the metric. (L3) The AI's finding is the output of this process made visible: the agents with the lowest AHT are not achieving the highest customer loyalty, because the lowest AHT was achieved by optimising the number, not the outcome. The thermometer placed in the sun reads warm and calls it health.

The Proxy Invalidity Principle

(L1) A proxy KPI is valid only while it remains correlated with the outcome it was designed to represent. When the correlation breaks — when optimising the proxy produces different behaviour from optimising the outcome — the proxy has become invalid. Continued measurement of an invalid proxy is not governance. It is the governance of a signal that has stopped pointing at the thing it was designed to measure. (L2) AHT was originally a valid proxy: in a world where all calls were roughly equal in complexity, faster resolution correlated with better resolution. The AI's finding shows that correlation has broken. Faster resolution now anti-correlates with loyalty, repeat contacts, and three-month operating cost. The proxy has inverted. (L3) The second-order consequence: every management decision taken on the basis of an inverted proxy is a decision made on misinformation. Ten years of AHT-optimised promotions, bonuses, and coaching have been selecting and reinforcing the wrong behaviour. The organisation is not managing its customer support operation. It is managing the AHT dashboard — and the two have diverged.

The Sunk Cost of Measurement

(L1) The sunk cost fallacy: continuing a course of action because of past investment, regardless of future value. View B's strongest implicit argument is that ten years of AHT data, an entire executive incentive structure, and a decade of reporting infrastructure represent a significant organisational investment — and that the cost of changing all of this justifies retaining the KPI. (L2) This is the sunk cost fallacy applied to measurement. The ten years of AHT data are not lost when the KPI changes — they remain as historical data for analysis and benchmarking. The cost being counted as an argument for staying is not a cost of changing. It is the cost that has already been incurred. (L3) More precisely: the relevant calculation is not cost of transition versus value of historical consistency. It is cost of transition versus cost of continuing to misalign agent behaviour multiplied by every future period. The AI has quantified the second term. View B counts only the first.

The Metric Trap: A One-Way Institutional Loop

The most important consequence of keeping AHT is not the misaligned behaviour in the current period. It is the institutional dynamic that has been building for ten years — and that becomes harder to reverse with each passing cycle.

Diagram 2 — The Metric Trap: a self-tightening six-node loop. AHT as the primary KPI promotes fast-AHT agents into coaching roles, embeds AHT optimisation in training, and increases institutional resistance to change — every year making the transition the AI has recommended more disruptive and the delay more costly.

The Metric Trap has the same structure as the Specification Ratchet from related AI evaluation problems — and the same one-way property. Each year of AHT as the primary metric is another tooth. A decade of teeth have already turned.

The AI-specific dimension makes it uniquely urgent: if the AI retrains on performance data from an AHT-optimised workforce, it learns that AHT-optimised behaviour produces the best measured outcomes. Its confidence in AHT as a signal rises as the gap between AHT performance and customer outcome widens invisibly beneath it. When the correction eventually arrives — when churn rises and the repeat call rate forces the issue — the AI's own training history will initially resist the re-evaluation. The organisation's own AI becomes the strongest institutional argument against the change it most needs.

The Formal Model: The Sign Condition

Net value of changing the KPI versus retaining AHT, per quarter:

ΔV = (R·F + L·C)·S − T·K

• R — repeat call reduction per customer per quarter. The AI's finding: agents spending more time generate fewer repeat calls. Peg: R ≈ 0.08–0.20 (8–20% reduction in repeat rate, consistent with FCR literature in service operations).

• F — cost of one repeat call relative to one resolved first-contact call. Industry standard: repeat calls cost 2.5–4× more than first-contact resolutions (SQM Group, FCR Benchmarking Research). Peg: F ≈ 2.5–4.0.

• L — customer loyalty gain per quarter expressed as CLV uplift. The AI's finding shows higher loyalty from resolution-focused agents. Peg: L ≈ 0.05–0.15 CLV multiplier per quarter.

• C — customer base scale. Fixed by organisation size.

• S — proportion of interactions affected by the KPI change. Peg: S ≈ 0.60–0.80 of total call volume meets the profile the AI identified.

• T — transition cost: retraining, dashboard rebuilding, incentive restructuring, dual-metric period. One-time cost.

• K — disruption multiplier during transition. One-time. Bounded by the CHANGE framework's governed migration.

Sign condition: Change KPI ⟺ (R·F + L·C)·S > T·K With R ≈ 0.10, F ≈ 3.0, L ≈ 0.08, and S ≈ 0.70: the recurring quarterly gain is approximately 0.26·C per quarter. T·K is one-time. The break-even period is T·K ÷ (0.26·C) quarters. For any organisation where the transition cost is less than approximately 2–3 quarters of recovered repeat-call cost, the sign condition is satisfied from quarter one of the new steady state.

The Asymmetry That Makes the Case Stronger Than the Static Equation Suggests

The equation understates the case for View A because it treats gains and retention costs symmetrically over time. They are not:

• The gain from changing the KPI compounds — as agents retrain to FCR-optimised behaviour, repeat call rates fall, CLV rises, and operating costs decrease. The gain per quarter grows as the new behaviour embeds — it does not plateau at period one.

• The cost of keeping AHT also compounds — each quarter of continued AHT optimisation deepens the Metric Trap: another cohort of AHT-trained agents, another round of AHT-optimised promotions, more institutional resistance. The Metric Trap does not hold steady — it tightens. The cost of eventual change grows with every quarter of delay.

In plain terms: the gains from changing compound upward. The costs of not changing compound upward. The sign condition is not marginal. It gets stronger every quarter View B is maintained.

The Empirical Record: Six Cases Across Four Sectors

Two matched pairs — the same accountability task run under a wrong proxy KPI then reformed to an outcome KPI. The cell View B needs — 'wrong proxy KPI retained, outcomes improved' — does not appear in any load-bearing row.

Case

Sector

What the wrong KPI produced

What outcome KPI produced

Weight

UK NHS 4-Hour A&E Target

(Francis Report, 2013;

NAO report, 2013;

Health Select Committee, 2013)

Healthcare / UK

— Matched pair #1:

time-proxy KPI vs

outcome KPI, same task

Patients warehoused in ambulances

to stop the 4-hour clock before

formal admission. Nursing staff

diverted from care to clock management.

Francis Report documented patient harm

caused directly by gaming the

time-based KPI. Same mechanism

as AHT: time proxy misaligned

behaviour from the intended outcome.

NHS moved to patient outcome

measures (mortality rates, infection

rates, patient experience) following

Francis Report. Accountability moved

from time-based proxy to

clinical outcome measures —

exactly the KPI change the

AI in this dilemma recommends.

Load-bearing

(Matched pair #1:

time proxy → harm

→ outcome KPI reform;

government-audited;

peer-reviewed)

Wells Fargo cross-sell quota

(CFPB/OCC consent order, 2016;

Senate testimony 2016;

Congressional record)

Banking / US

— Matched pair #2:

activity-proxy KPI vs

outcome KPI, same task

Cross-sell quota KPI caused agents

to open 2 million fraudulent accounts.

The KPI measured activity (accounts

opened) rather than outcome (customer

value created). Agents optimised

the activity. $185M fine, 5,300

terminat-ions. Exact same structure:

wrong proxy → rational gaming

→ outcome inversion.

OCC and CFPB mandated replacement

of activity-based KPIs with

customer-outcome measures: account

usage, satisfaction, and relationship

health. Same KPI change this

dilemma's AI recommends: from

activity proxy to outcome measure.

Load-bearing

(Matched pair #2:

activity proxy → fraud

→ outcome KPI mandated;

regulatory enforcement;

irrefutable record)

India IRDAI Insurance

Claim Settlement Time KPI

(IRDAI Annual Reports 2017–2022;

Insurance Regulatory and

Development Authority of India)

Insurance / India

— Non-Western proof

IRDAI required companies to report

claim settlement time as primary KPI.

Insurers optimised: settled small

claims fast to improve averages;

systematically delayed or disputed

large legitimate claims. Time KPI

read excellent. Claimant outcomes

did not. Same AHT mechanism:

time proxy optimised; underlying

purpose defeated.

IRDAI reformed KPI framework to

claim settlement ratio and customer

complaint resolution rate. Outcome

measures replaced time proxies.

Documented in IRDAI regulatory

circulars 2019–2022 as explicit

response to gaming of the

time-based measure.

Load-bearing

(Non-Western;

regulator-documented;

direct AHT parallel:

time proxy → gaming

→ outcome reform)

Ritz-Carlton service empowerment

(Michelli, The New Gold Standard,

2008; Ritz-Carlton Gold Standards)

Hospitality / US

— Positive control

N/A — Ritz-Carlton removed

time-based service KPIs before

their documented success period.

They did not freeze the wrong metric.

Ritz-Carlton empowered any staff

member to spend up to $2,000 to

resolve a customer complaint without

approval — removing call-time

constraints entirely. Documented

improvement in NPS and repeat

booking rates. Absence of time

KPI is explicitly credited in

their published service model.

Supporting

(positive control:

outcome-over-time

policy; documented;

recognised globally)

Barclays Premier Banking AHT

to NPS migration (2014–2016;

Barclays Annual Reports;

Customer satisfaction disclosures)

Banking / UK

— Closest sector parallel

Barclays Premier Banking team

operating under AHT constraints

produced high throughput but

moderate NPS. Internal review found

time pressure was preventing

agents from exploring underlying

customer needs in complex queries.

Barclays migrated Premier Banking

team from AHT to Net Promoter

Score as primary KPI. Documented

NPS improvement within 6 months.

AHT naturally settled at a new

level without being targeted.

AHT fell as a consequence of

better resolution, not as a target.

Load-bearing

(Direct sector parallel:

banking customer service;

AHT to outcome migration;

documented outcome)

Google People Operations

performance measurement

(Bock, Work Rules!, 2015;

Google re:Work publications)

Tech / US

— Positive control

N/A — Google never used raw

activity-proxy KPIs as primary

performance measures for

customer-facing or technical teams.

Google built performance measurement

around outcome proxies (OKRs:

Objectives and Key Results) not

activity proxies. The purpose of

OKRs is explicitly to keep the

measure pointed at the outcome,

not the activity. View B's equivalent

— 'keep measuring the activity because

we always have' — is specifically

what Google's framework was

designed to avoid.

Supporting

(structural contrast:

outcome-oriented

KPI design; Bex's

own territory inverted)

The Four Strongest Objections to View A — Closed

'Ten years of AHT data will be lost'

Factually incorrect. Historical AHT data is preserved. It remains in the data warehouse for trend analysis, benchmarking, and research. What changes is whether AHT drives executive incentives and agent coaching going forward. The CHANGE framework's H gate (Hold dual metrics) maintains AHT reporting for six months alongside FCR — creating a bridge period during which historical comparisons remain valid. You do not lose the data. You stop letting the data make your decisions.

'Executive incentives are built around AHT — changing the KPI disrupts governance'

This is the argument for change, not against it. Executives incentivised on AHT have a documented financial interest in not finding that AHT has been driving the wrong behaviour for a decade. That resistance is not a measurement science argument. It is institutional self-interest in a finding that challenges a decade of decisions. The correct response to 'executives are incentivised on the wrong KPI' is not to preserve the wrong KPI. It is to change the incentive structure — which the CHANGE framework does gradually, in governed stages.

'Changing KPIs makes performance difficult to compare over time'

Conceded: raw AHT scores and raw FCR scores cannot be directly compared without a bridge. The CHANGE framework's G gate (Govern the bridge) maintains a conversion methodology for 12 months that preserves longitudinal analysis. More importantly: the correct comparison is not AHT in 2024 vs. AHT in 2014. It is customer loyalty in 2024 vs. customer loyalty in 2014. The AI's finding shows that AHT comparability has been preserving a measure that was systematically diverging from the outcomes it was meant to represent. Consistency in the wrong measurement is not better than accuracy in the right one.

'The AI might be wrong — the finding might be spurious'

The CHANGE framework's C gate (Confirm the finding) addresses this directly: independent audit of the AI's methodology before any transition begins, manual review of sample interactions, and confound testing for agent tenure, channel type, and case complexity. If the finding does not survive independent validation, the process stops. The burden of proof works in both directions: the AI has analysed thousands of interactions and found a consistent pattern across three independent outcome measures. The burden for ignoring that evidence is now higher than the burden for acting on it — especially given that the cost of a 6-month dual-metric validation period is bounded and reversible, while the cost of another year in the Metric Trap is not.

A Deployable Answer: The CHANGE Framework

The dilemma presents a false binary: change the KPI immediately or keep it unchanged. The correct answer is governed transition — capturing the AI's finding in live operation while managing the institutional disruption View B correctly identifies. Six gates, each closing one specific objection:

Diagram 3 — The CHANGE Framework: Confirm, Hold dual metrics, Align incentives, Notify and train, Govern the bridge, Evaluate the outcome. The Canary KPI — repeat call rate — validates the AI's finding in live operation before the full transition completes.

THE CHANGE FRAMEWORK DOES NOT DISCARD GOVERNANCE. IT REDIRECTS IT.

The H gate (Hold dual metrics) preserves every governance obligation View B cares about — historical comparability, executive reporting, performance management continuity — for the full transition period. CHANGE does not eliminate AHT reporting. It removes AHT from the position of primary driver while the evidence base for its replacement is validated in live operation. An organisation running CHANGE maintains full measurement continuity and acquires the evidence it needs to make the transition with confidence, rather than making a binary choice between a demonstrably wrong KPI and an unvalidated replacement.

Where View B Is Genuinely Right

View B is correct in one precise territory: when the AI's finding is based on insufficient data, uncontrolled confounds, or a proposed replacement KPI that is itself gameable. If the AI has analysed too few interactions to distinguish signal from noise, or if the FCR and CLV measures it proposes are more easily manipulated than AHT, then View B's caution is warranted.

None of those conditions apply in this dilemma. The AI has analysed thousands of interactions. The finding is consistent across three independent outcome measures — repeat calls, customer satisfaction, and three-month operating cost — making a spurious correlation across all three simultaneously unlikely. And FCR and CLV, while not gaming-proof, are harder to game than AHT: an agent cannot fake a customer not calling back within 30 days. The CHANGE framework's C gate provides independent validation before any commitment is made.

The Final Word

The NHS 4-Hour Target, Wells Fargo's cross-sell quota, India's IRDAI insurance time-proxy, and Barclays' AHT-to-NPS migration all point to the same institutional lesson: a proxy KPI that decouples from its intended outcome does not self-correct. It embeds — in promotions, in coaching, in management culture — and becomes harder to change with every passing year of continued use.

Bex is right that the KPI should change. She is right for the wrong reason. The correct argument is not that changing KPIs produces better morale. It is that AHT has stopped functioning as a KPI. It is no longer measuring what it was designed to measure in a way that aligns behaviour with outcomes. Keeping it is not a measurement decision. It is a political decision — dressed in the language of governance.

View B cannot tell you whether a low AHT score means

the agent resolved the customer's problem

or simply ended the call.

It has decided not to ask —

and called that indifference governance.

A KPI that misaligns behaviour is not performing its function.

Keeping it is not governance. It is the governance of the wrong thing.

View A. Without qualification.

Yesterday at 03:10 AM1 day

My submission is in support of view-A If AI demonstrates that the existing KPI is driving suboptimal behavior, the organization should evolve its performance measurement system.

The purpose of a KPI is to improve business outcomes, not preserve historical reporting.

The organization should evolve its measurement system because a KPI is only useful if it drives the right behavior and business outcome. If AI shows that AHT is encouraging faster but poorer resolutions, then keeping AHT as the primary measure would mean optimizing for the wrong goal. Customer support should be judged by what it ultimately creates: solved problems, loyal customers, and lower total cost over time, not just shorter calls.

If AI reveals that the existing KPI is producing suboptimal behavior, the organization should update the KPI, not defend the metric for its own sake. Historical reporting is useful only when it helps explain performance; it should never override evidence about what actually improves the business. In this case, evolving from AHT to a broader outcome-based measurement system is not a disruption to management discipline — it is the correction of one.

Good measurement systems should adapt when evidence changes. If AI shows the KPI is unintentionally optimizing the wrong behavior, then keeping it in place just because it is familiar creates a management blind spot.

A useful way to frame it is this: A KPI is not a tradition; it is a control mechanism. When the control mechanism starts rewarding speed over resolution, the company is no longer managing performance — it is managing the metric. That is especially dangerous in customer support, where a superficially efficient interaction can generate hidden costs later through repeat contacts, churn, refunds, and reputational damage.

Consider a logistics company’s claims team handling lost or delayed shipments. Under an AHT target, an agent may close a call quickly by telling the customer to file a form online, which keeps handle time low but often leads to repeat calls, escalations, and frustration. Under a First Contact Resolution target, the agent is encouraged to investigate the claim, coordinate with operations, and confirm next steps during the first interaction, which takes longer upfront but reduces rework and improves retention.

That is a better tradeoff because the company saves money not by shaving seconds off one call, but by preventing three more contacts and preserving the customer relationship. In other words, the right KPI should reflect total system performance, not just local speed

Why the KPI should change

A narrow efficiency metric can look good on a dashboard while harming the business underneath. In this case, agents who spend a little longer resolving issues fully create fewer repeat contacts, higher satisfaction, and lower operating cost over the next three months. That means the “best” AHT performers may actually be producing more downstream work, which makes the KPI misleading rather than helpful.

The purpose of a KPI is to steer decisions, incentives, and behavior. If the measure pushes people to rush through calls, transfer customers unnecessarily, or avoid complex cases, then the company is rewarding activity that conflicts with its real goal. A broader system centered on First Contact Resolution and Customer Lifetime Value would better align frontline behavior with long-term outcomes.

I have advanced below three compelling reasons why a KPI that is driving sub optimal performance should be replaced;

Outcome over optics. Shorter calls only matter if they improve the customer experience and reduce total cost.
Local efficiency can hurt system efficiency. An agent who spends 2 extra minutes solving the issue may save 20 minutes of future work across repeat calls and escalations.
Measurement shapes culture. People quickly learn what the organization truly values based on what is rewarded, promoted, and reviewed.

Alternative KPIs that capture superior performance metrics

I will substantiate my view with an example of customer support for a logistics comany. Suppose the company handles 100,000 support cases per quarter. Under an AHT-only system, agents are rewarded for keeping calls under 4 minutes. That reduces visible handle time, but AI finds that shorter calls have a higher repeat-contact rate. For exampe, if the low-AHT group generates 22% repeat calls versus 12% for the slightly longer-handling group, then the company is paying for the same issue multiple times. A simple expected-cost model makes the tradeoff clear.

Expected cost per Case

Expected Cost per Case=ch+pr×cr\text{Expected Cost per Case} = c_h + p_r \times c_rExpected Cost per Case=ch+pr×cr

Where:

chc_hch = cost of the first handling,
prp_rpr = probability of repeat contact,
crc_rcr = cost of each repeat contact.

If faster agents reduce chc_hch by $1 but raise prp_rpr enough that repeat contacts add $3 in expected cost, the “better” AHT performance is actually worse for total cost. In that setup, the correct KPI is not raw speed but a composite of First Contact Resolution, repeat-contact rate, and customer lifetime value.

A more realistic service model would also include churn or retention:

Customer Lifetime Value

Customer Lifetime Value=∑t=1TRt−Ct(1+d)t\text{Customer Lifetime Value} = \sum_{t=1}^{T} \frac{R_t - C_t}{(1+d)^t}Customer Lifetime Value=t=1∑T(1+d)tRt−Ct

Where RtR_tRt is revenue from the customer in period ttt, CtC_tCt is service cost, and ddd is the discount rate. If better issue resolution reduces churn by even a small amount, the lifetime value gain can easily outweigh a small increase in handling time. That is why the KPI should evolve: it should measure the economic outcome of service, not just the speed of a single interaction.

A practical organizational example is a call-center incentive plan. If bonuses are tied to AHT alone, managers will pressure agents to end calls quickly, transfer difficult cases, or avoid thorough diagnosis. If bonuses are tied to a weighted score such as

0.4(FCR)+0.3(CSAT)+0.3(Retention)0.4(\text{FCR}) + 0.3(\text{CSAT}) + 0.3(\text{Retention})0.4(FCR)+0.3(CSAT)+0.3(Retention)

then the system encourages the behavior that lowers total cost and improves loyalty. That is the core argument for changing the KPI once the evidence shows the old one is distorting decisions.

Changing the KPI changes behavior, and behavior changes economic outcomes. In the logistics support example, if the team is measured only on AHT, agents may close calls quickly but leave issues partially solved, which increases repeat contacts and hidden cost. If they are measured on First Contact Resolution instead, agents spend a little longer on the first interaction, but the company reduces rework, improves satisfaction, and lowers total service cost.

Total cost Model

Imagine a parcel-delivery company with 50,000 customer contacts per month. Under AHT pressure, agents average 4 minutes per call and resolve only 70% of issues on the first attempt. Under an FCR-focused model, average handling time rises to 5 minutes, but FCR improves to 88%. The shorter-call policy looks efficient on paper, but the second policy may be cheaper overall because it prevents repeat calls, escalations, and compensation claims.

A simple cost model shows why:

Total Cost=N(ch+prcr)

Where:

NNN = number of initial contacts.
chc_hch = cost of handling the first contact.
prp_rpr = probability of a repeat contact.
crc_rcr = cost of a repeat contact.

If the AHT-driven approach has lower chc_hch but a much higher prp_rpr, the total cost can be greater. For example, if ch=1c_h = 1ch=1, cr=4c_r = 4cr=4, and repeat-contact probability falls from 0.30 to 0.12, then:

1+0.30×4=2.21 + 0.30 \times 4 = 2.21+0.30×4=2.2

versus

1.2+0.12×4=1.681.2 + 0.12 \times 4 = 1.681.2+0.12×4=1.68

So the slower-but-thorough approach is economically better.

Customer Lifetime value

A bank contact center provides another clear case. If agents are rewarded for short calls, they may give incomplete answers about chargebacks or account disputes, causing customers to call back several times. If the bank instead uses a service quality metric such as FCR combined with customer satisfaction, agents are incentivized to fully diagnose the issue once. That improves trust and reduces the probability of churn, which matters far more than shaving 30 seconds off one call.

This can be modeled through customer retention:

CLV=∑t=1Tmt⋅rt(1+d)t\text{CLV} = \sum_{t=1}^{T} \frac{m_t \cdot r_t}{(1+d)^t}CLV=t=1∑T(1+d)tmt⋅rt

Where:

mtm_tmt = margin from the customer in period ttt.
rtr_trt = probability the customer remains active.
ddd = discount rate.

If better resolution raises retention even slightly, customer lifetime value increases. That means the KPI should reflect long-term value creation, not just immediate labor efficiency.

Effective Resolution Rate

A software company using AHT-like metrics for support tickets may reward agents for closing tickets quickly. But if an agent closes a ticket before the bug is truly fixed, the same customer returns with the same issue, and the engineering team gets a second report, then a third. A better product-oriented KPI would measure ticket reopens, time to durable resolution, and customer effort score.

A useful product-quality model is:

Effective Resolution Rate=Tickets closed without reopenTotal tickets closed\text{Effective Resolution Rate} = \frac{\text{Tickets closed without reopen}}{\text{Total tickets closed}}Effective Resolution Rate=Total tickets closedTickets closed without reopen

If two teams both close 1,000 tickets, but Team A has a 10% reopen rate and Team B has a 25% reopen rate, Team A is creating more value even if its average handling time is longer. That is the kind of evidence that justifies changing the KPI.

Support Performance score

At the organizational level, incentives should follow the measure that best predicts business results. If executive bonuses, manager scorecards, and team reviews are all anchored to AHT, then the whole system will optimize for speed. Once AI shows that speed is not the true driver of loyalty or cost reduction, the organization should update the measurement system and keep AHT only as a secondary efficiency indicator.

A good weighted score might look like:

Support Performance Score=0.4(FCR)+0.3(CSAT)+0.2(Repeat-Contact Reduction)+0.1(AHT)\text{Support Performance Score} = 0.4(\text{FCR}) + 0.3(\text{CSAT}) + 0.2(\text{Repeat-Contact Reduction}) + 0.1(\text{AHT})Support Performance Score=0.4(FCR)+0.3(CSAT)+0.2(Repeat-Contact Reduction)+0.1(AHT)

That preserves some efficiency monitoring while shifting the main focus to outcomes. This is the right way to modernize performance management: keep the useful part of the old metric, but stop letting it dominate decisions when evidence shows it is misleading.

Managing the transition

Changing the KPI does not mean abandoning historical reporting. The company can keep AHT as a secondary operational metric while making resolution quality and customer value the primary measures. That preserves continuity for trend analysis while shifting incentives toward outcomes that matter more.

A sensible rollout would be to:

Keep AHT in the dashboard, but stop using it as the lead incentive metric.
Introduce First Contact Resolution, repeat-contact rate, CSAT, and customer retention.
Tie executive and manager bonuses to a weighted score that includes both efficiency and long-term value.

Segment reporting by issue type, because some cases genuinely require more time to resolve well

Conclusion

In concluding, If AI shows that the existing KPI causes the organization to optimize the wrong behavior, the KPI should change. Historical reporting is useful, but it should never outweigh evidence that a different measure would produce better business results. While there is merit in maintaining consistency for governance, the ultimate goal of KPIs is to foster improvement in performance and customer outcomes, which outweighs the drawbacks of change in most real-world scenarios

23 hr23 hr Rohit Gandhi locked this topic

19 hours ago19 hr

Author

1. Ajay Wadhwa

Position: View A (Change the KPI) Specific Example: Zappos (no call-time limit; longest call ~11 hours; loyalty-focused culture), and telecom/BPO industry-wide migration to FCR as primary north-star metric. Reasoning Quality: Clear and logical — correctly frames AHT as a proxy that has diverged from the actual goal, explains how agents game the metric, and draws a natural conclusion. Solid but not deeply formal.

✅ Approved — Takes an unambiguous View A position and supports it with two concrete industry references (Zappos and the telecom/BPO sector's documented FCR migration) alongside coherent proxy-metric reasoning.

2. rajan.arora2000

Position: View A (Change the KPI — with a specific design) Specific Example: Zappos (re-seated metric design: time as guardrail, outcome as target, ~75% repeat customers funding long calls) and Wells Fargo (cautionary tale on gaming CLV-type cross-sell targets). Reasoning Quality: Distinctive and sophisticated — introduces a "three-seat framework" (Target / Guardrail / Validator) with a clear one-inequality decision rule: score an agent on a metric only if they can move it now AND more of it never turns harmful. AHT fails clause 2; CLV fails clause 1; FCR passes both. Responds to counterarguments systematically.

✅ Approved — Unambiguously supports View A with a highly specific operational framework, named real-world examples, and a precise decision rule that distinguishes this answer from generic "change the KPI" positions.

3. Suhail_J

Position: View A (Change the KPI) Specific Example: References Amazon, T-Mobile, and Zappos, but only in brief/generic passing — no concrete process steps, metrics, or outcomes are cited for any of them. Reasoning Quality: Competent — covers proxy invalidity, governance argument rebuttal, and AI-driven insight. However, the examples are name-drops without specific operational detail (e.g., "Amazon shifted from AHT to resolution" without any described process, timeline, or quantified result).

❌ Not Approved — While the position is clear and the logic is sound, the answer lacks a specific, concrete example with industry context, process steps, or measurable outcomes; examples are mentioned by name only without substantive detail.

4. anthony rebello

Position: Indeterminate (answer is a PDF file attachment only — no in-thread written text) Specific Example: None visible in the thread. Reasoning Quality: Cannot be evaluated; the submission consists solely of a file upload ("Change-the-KPI-Position-Paper-884.docx.pdf").

❌ Not Approved — No evaluable written content is present in the forum thread; the submission is an attachment only, providing no accessible position, reasoning, or example for review.

5. Vinit Dubey

❌ Not Approved — Same as above; no written content in the thread to evaluate on any criterion.

6. Ankita Bhardwaj

Position: View A (Change the KPI) Specific Example: Multiple strong examples — (1) Compuware's shift from SLAs to Experience Level Agreements (XLAs) in IT services; (2) Best Buy Geek Squad replacing speed targets with First-Time Fix Rate (eliminating "bounce-backs"); (3) Cleveland Clinic replacing throughput metrics with patient outcome measures; (4) Wells Fargo cross-sell KPI as cautionary tale. Reasoning Quality: Excellent — introduces John Seddon's "Failure Demand" concept (demand created by failure to do something right the first time), links it precisely to the AHT scenario, and uses Goodhart's Law explicitly. The diversity of sectors and specificity of each case is impressive.

✅ Approved — Delivers a clear View A position supported by four distinct, sector-specific real-world examples with named processes (Failure Demand, XLA migration, First-Time Fix Rate, Bounce-Back metric), demonstrating both breadth and specificity.

7. Naijur Rahman

Position: View A (Change the KPI) Specific Example: SQM Group benchmarking data across 500+ North American call centers (quantified NPS impact: resolved first contact = NPS 64; repeat contact = NPS 40; unresolved = NPS –10; two or more unresolved = NPS –38). Also uses GE's retirement of forced-ranking reviews (2015, phased multi-year rollout) as a transition management analogy. Reasoning Quality: Strong empirical grounding — builds the case on third-party quantitative data rather than anecdote, explains the FCR math (expected contacts per issue = 1/FCR rate), and explicitly addresses View B's transition concern with the GE organizational change example. Very practically oriented.

✅ Approved — Clear View A position with industry-specific quantitative evidence (SQM Group FCR/NPS benchmarks), a concrete mathematical model for contact volume reduction, and a real organizational transition case (GE) to address governance concerns.

8. kartik voleti

Position: View A (Change the KPI) Specific Example: Amazon's evolution of fulfillment metrics (warehouse efficiency → delivery promise accuracy, defect rates, customer experience; cited revenue growth from ~$107B in 2015 to $630B+ in 2024) and Wells Fargo cross-sell scandal. Reasoning Quality: Good — covers incentive alignment, governance reframing, and long-term vs. short-term productivity tradeoffs. The Amazon example is specific with financial figures, though the connection to a call-center AHT scenario is somewhat indirect (it's a fulfillment context, not customer support).

✅ Approved — Takes an unambiguous View A stance with a named company example (Amazon fulfillment KPI evolution with quantified revenue data) and applies Goodhart's Law reasoning effectively, though the Amazon example is a loose-fit analogy for a call-center scenario.

9. Abhishek Adhikary

Position: View A (Change the KPI) Specific Example: Presents a comparison table with Amazon, Zappos, Netflix, Adobe, and Blockbuster, with old vs. new KPI focus and outcomes. Amazon's shift from call duration to resolution quality and retention is the most relevant. Reasoning Quality: Reasonable — makes the correct logical argument. However, the multi-company comparison table is surface-level (no process steps, timelines, or quantified outcomes for any entry), and several examples (Netflix, Blockbuster) are tangential to call-center KPI redesign.

❌ Not Approved — While the position is clear and the direction of reasoning is correct, the examples are presented in a generic comparative table without concrete process detail, measurable outcomes, or operational specificity; they do not constitute a specific, substantiated industry example.

10. Bedibrat Kutum

Position: View A (Change the KPI) Specific Example: T-Mobile's documented shift away from AHT-centric measurement toward customer outcome metrics (FCR and NPS-focused approach), with explanation of the "callback loop" mechanism. Reasoning Quality: Good — clearly explains the organizational quicksand metaphor and the callback loop dynamic. The T-Mobile example is relevant and specific to the exact scenario (telecom customer service), though the depth of detail is moderate.

✅ Approved — Takes a clear View A position with a named, sector-relevant example (T-Mobile's AHT-to-FCR shift in telecom customer service), solid callback loop reasoning, and a practical framing of the governance question.

11. Jaswant Kumar

Position: View A (Change the KPI) Specific Example: Multiple strong, specific cases — (1) New Zealand bank using IVR pre-authentication + "Customers for Life" FCR culture (world-class FCR performance sustained over years); (2) Free Mobile France (12 million new subscribers, 18% market share, improved NPS by removing structural causes of detraction); (3) Quantified business case: PwC data (12–15% higher retention from strong FCR), Forrester data (each 1% FCR improvement saves enterprise-scale cost). Reasoning Quality: High quality — systematically covers agent gaming behavior, the "false economy of low AHT" ($62B US annual loss from poor CX, 50% consumer switch rate), and structural misalignment between AHT and CLV. Grounds claims in named research sources.

✅ Approved — Unambiguously takes View A with multiple sector-specific examples (NZ banking, French telecom), quantified business outcomes, and third-party research citations (PwC, Forrester), delivering one of the more practically grounded answers in the thread.

12. Saran raj Venkatesan

Position: View A (Change the KPI — without qualification) Specific Example: Six cases across four sectors: UK NHS 4-Hour A&E Target (Francis Report, 2013 — matched pair: time proxy → patient harm → outcome KPI reform); Wells Fargo cross-sell quota (CFPB/OCC consent order, 2016); India IRDAI Insurance Claim Settlement Time KPI (regulatory circulars 2019–2022); Barclays Premier Banking AHT-to-NPS migration (2014–2016, NPS improvement within 6 months); Ritz-Carlton ($2,000 resolution empowerment); Google OKRs. Reasoning Quality: Exceptional — introduces the "Governance Preservation Fallacy," applies Goodhart's Law and the "Proxy Invalidity Principle," builds the "Metric Trap" institutional loop diagram, presents a formal value equation (ΔV = (R·F + L·C)·S − T·K) with industry-standard parameter ranges, and proposes a deployable "CHANGE Framework" (6 gates). Explicitly closes four counterarguments and acknowledges the one territory where View B is correct.

✅ Approved — Delivers an unambiguous View A position supported by six sector-specific, source-cited real-world cases, a formal quantitative model, a structured transition framework (CHANGE), and rigorous rebuttal of all major counterarguments.

13. Adeniran Ilesanmi

Position: View A (Change the KPI) Specific Example: (1) Logistics company scenario with a quantified expected-cost model (low-AHT group: 22% repeat call rate vs. 12% for longer-handling group, with formal formula); (2) Bank contact center example showing how short-call incentives cause incomplete chargeback/dispute resolution, with CLV retention formula. Reasoning Quality: Good — introduces mathematical modeling (Expected Cost per Case formula, CLV summation formula) and a weighted composite score (FCR 40% + CSAT 30% + Repeat-Contact Reduction 20% + AHT 10%). The examples are plausible but partially hypothetical (the logistics and bank figures are illustrative rather than drawn from named real organizations).

✅ Approved — Takes a clear View A position with specific quantitative models, concrete scenario-based examples in logistics and banking, and a practical composite KPI formula — though the examples are illustrative/synthetic rather than citing named organizations with documented outcomes.

🏆 Winner: Saran raj Venkatesan

Saran raj Venkatesan's answer wins across all three comparative criteria. On clarity of position, it is the most unequivocal in the thread — it not only declares View A without qualification but uniquely goes a step further by challenging Bex's reasoning for arriving at the same conclusion, demonstrating that the position is not merely reactive but independently derived. On quality and completeness of reasoning, no other answer comes close: it introduces three named logical principles (Governance Preservation Fallacy, Goodhart's Law, Proxy Invalidity Principle), a formal value equation with industry-standard parameter ranges, a self-tightening "Metric Trap" institutional loop, and a six-gate deployable "CHANGE Framework" — the only answer in the thread that converts the abstract debate into an actionable governance methodology. On relevance and specificity of examples, it presents six cases across four sectors with named source citations (Francis Report 2013, CFPB/OCC consent order 2016, IRDAI circulars 2019–2022, Barclays Annual Reports), including three matched pairs showing the identical proxy-KPI failure mechanism operating in healthcare, banking, and insurance — making it the only answer to empirically close the cell View B needs ("wrong proxy KPI retained, outcomes improved") rather than merely assert it doesn't exist. Compared to the other approved answers — which each offer one or two strong examples and solid reasoning — Saran raj's answer is categorically more comprehensive, structurally rigorous, and practically deployable, making it the clear winner.

This topic is now closed to further replies.

Followers

Go to topic listing

CAISA Forum Question 884

If AI concludes that the organization's primary KPI is driving the wrong behavior, should it recommend changing the KPI?

View A — Change the KPI.

View B — Keep the existing KPI.

Which view do you support — and why? Provide a specific operational, product, service, or organizational example to support your position.

🏆 The best answer will be selected on the basis of:

Why this isn't a one-off finding

T-Mobile already ran this exact experiment

Zappos skipped AHT from day one — with real numbers behind it

USAA: the same pattern, in a completely different industry

What the math says

What happens when you don't change the measure: Kodak

Where View B has a real point — and how to handle it without backing off

My position

The Decisive Reframe: One Metric, Two Different Questions

Bex's Evidence — Quarantined, Then Replaced With Harder Proof

Why AHT as Primary KPI Fails: Three Structural Arguments

Goodhart's Law / Strathern (1997)

The Proxy Invalidity Principle

The Sunk Cost of Measurement

The Metric Trap: A One-Way Institutional Loop

The Formal Model: The Sign Condition

The Asymmetry That Makes the Case Stronger Than the Static Equation Suggests

The Empirical Record: Six Cases Across Four Sectors

The Four Strongest Objections to View A — Closed

'Ten years of AHT data will be lost'

'Executive incentives are built around AHT — changing the KPI disrupts governance'

'Changing KPIs makes performance difficult to compare over time'

'The AI might be wrong — the finding might be spurious'

A Deployable Answer: The CHANGE Framework

Where View B Is Genuinely Right

The Final Word

Expected cost per Case

Customer Lifetime Value

1. Ajay Wadhwa

2. rajan.arora2000

3. Suhail_J

4. anthony rebello

5. Vinit Dubey

6. Ankita Bhardwaj

7. Naijur Rahman

8. kartik voleti

9. Abhishek Adhikary

10. Bedibrat Kutum

11. Jaswant Kumar

12. Saran raj Venkatesan

13. Adeniran Ilesanmi

🏆 Winner: Saran raj Venkatesan

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)