Observations from AI in Our Appeals Process
In our RCM process, we have been using an AI tool to draft appeal letters, this helps in reducing time to prepare manually. It collects data like denial reasons, patient details, and payer rules from available supporting documents in the first draft. When we rolled it out, it did a good job. The letters were mostly spot-on and helped us get quite a few denials overturned.
But over time, we noticed a few things started to slip and not in a way that set off alarms. Even though there are no errors reported, we could tell the quality wasn’t what it used to be. A few examples:
The prompts feeding the tool was not updated for two quarters, so the letters felt outdated or missed key points.
There were changes in the denial codes or policy from payers, but the AI was not able to recognize them or respond in the right way.
Our Corporate Compliance teams have made updates to how letters should be written, maybe changes in wording or layout. But the AI continued using the older style.
Because the system does not throw any errors, these kinds of changes quietly affect how well the AI performs. If no one’s checking, it can decline over weeks or months before it starts showing up in results.
Signs That Something’s Off
Here are some early signals we’ve learned to watch for:
Appeal overturn rates drop, even though the types of denials and volume haven’t really changed.
Team members are spending more time editing or fixing the letters before they go out.
QA or nurse audits start flagging the same types of issues, problem with words, missing clinical reasoning, or formatting mistakes.
During one of the MBR (Monthly Business Review), we get comments that payers saying the letters are not clear. Our client express concern over this statement.
How We Stay Ahead of It
1. Regular Reviews
Every week, we extract a handful of AI generated letters and assign to our senior QA to go through them manually. It’s not a big batch, but it gives us a sense of whether anything’s off.
We also compare them with manually written letters to see if there’s a pattern or quality gap.
2. What We Track
Metric
When to Act
Action Taken
Appeal success rate
Drops more than 10% below recent average
Dig into specific cases, check AI inputs
Manual edits per 100 letters
More than 15% need rework
Review prompt logic and update if needed
3. Spotting Data Drift
We track changes in denial codes, document types, or shifts in payer policies. If something new starts appearing frequently, it may take as action to check the need of AI to be retrained or adjusted.
4. Real-Time User Feedback
We have added a simple tagging feature in the letter review system. If our team sees something off, like wrong justification or missing details, they immediately flag it. We go through those tags monthly to spot any recurring themes and take action accordingly.
Why AI Needs Closer Watch
With regular systems, it’s usually easier to spot when something’s not right — you get an error message, or a result looks obviously wrong. But AI is different. Once it’s set up and passes UAT, we tend to assume it will just keep working fine. The most important part is, AI-generated content often considered as correct, hence we stop double-checking. That is when small mistakes start to creep in. And if there is no governance, the errors like outdated formats or missing key updates can cause bigger problems, especially when it affects compliance or our payer communication.
AI is definitely useful, and it saves time no doubt about that. But it’s not something we can leave on autopilot. A bit of regular review, some honest feedback from the people using it, and tracking a few basic metrics can really help. It does not take much, just some weekly attention to catch things early before they turn into real issues.