Skip to content
View in the app

A better way to browse. Learn more.

Benchmark Six Sigma Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.
Message added by Mayank Gupta,

AI or Artificial Intelligence is a self learning and/or self rewriting technology that mimics human mind, intelligence and decision making. It has the ability to evolve and learn basis the responses it receives in different situations. As per IEEE SA, AI is “the combination of cognitive automation, machine learning (ML), reasoning, hypothesis generation and analysis, natural language processing and intentional algorithm mutation producing insights and analytics at or above human capability.”

 

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Sumukha Nagaraja on 18 August 2025.

 

Applause for all the respondents - Jess Balmaceda, Rohan Modak, Imtiaz Shaikh, Ayomide Otokiti, Sunny Prithviraj, Sumukha Nagaraja, Palak Kapoor.

Keeping Track: Version Control for AI Flows & Prompts

Featured Replies

Q 797. How Do You Manage Versions of AI Flows and Prompts Without Losing Track?
AI solutions — especially those built with flows and prompts — often evolve over time as feedback, performance data, and requirements change. Without version control, it’s easy to lose track of what was changed, why it was changed, and whether a previous version worked better. If you were responsible for an evolving AI solution, what approach would you use to track, test, and document different versions of flows and prompts? How would you ensure that updates improve performance rather than introduce new problems?

 

The best answer will be selected on the basis of: 

  • Practicality of the version control approach
  • Clarity in managing updates and rollbacks
  • Insight into balancing innovation with stability

 

Note for website visitors -

Solved by Sumukha Nagaraja

AI Management System is a structured framework used to govern the development, deployment, operation, monitoring, and continual improvement of artificial intelligence systems in an ethical, safe, and efficient manner. It ensure alignment on organization’s goal, regulatory requirements, and social values.

Similar with other management systems, one of its key elements is Policies & Standards. This element pertains to documentation of existing AI workflow, prompt improvement, and version control for any changes made.

It is strongly recommended that any organization engaged in AI solutions be certified in AI Management System.

Below is how I will manage versions of AI flows and prompts in a claims processing scenario, where things are constantly evolving based on feedback from claim examiners, auditors, and compliance.

 

1. Keep Track of Changes

While building claims-processing AI assistant, the prompt that guided the “claims eligibility check” step worked… but only for the first few weeks. Then, business rules changed, compliance flagged some outputs, and examiners started giving us feedback.

Instead of editing the prompt and hoping for the best, I store every single version of my flows and prompts in a company GIT repository

  • Each branch is new iteration — for example, feature-improve-prior-auth-check.
  • I clearly document why I made the change:

When I deploy a new version, I tag it in GIT and log that version ID in our monitoring dashboard, so when a claim examiner says, “The bot did not process a specific scenario,” I can instantly see which version they were using.

 

 

2. Documenting the Story Behind the Change

Clearly document story behind the change in order to delineate why I made that particular change

v2.1.2 — 2025-08-15

  • Change: Updated “denial reason explanation” prompt to include ICD-10 lookup when code not in local cache.
  • Why: Several claim examiners escalated cases because the bot said “code not found,” even though it existed in the  database.
  • Expected Impact: Reduce “code not found” errors by 20%.

This makes it easy for me to tell the story of the bot’s improvement over time

 

 

3. Testing Before I Roll Out

I never just push changes live. In claims processing, one wrong rule application can delay thousands of claims. Below are few things I follow

  1. Shadow Testing: I run the old and new prompts side-by-side on 100 recent real claims (with PHI data masked).
  2. Regression Suite: I maintain a set of tricky test cases — like coordination-of-benefits disputes or secondary insurance retro adjustments — to make sure the new version doesn’t break things that used to work.
  3. SME Review: I share sample outputs with our senior claim SME for human- in loop- scoring. They tell me if the new explanation is actually clearer or just longer.
 

4. Metrics tracking and feedback  from team After Deployment

Once the new version goes live (usually to 10% of examiners first), I:

  • Track auto-adjudication accuracy — if it dips, I know something’s off.
  • Collect feedback tied to the exact version.
  • Categorize any errors: prompt misunderstanding, missing data, or wrong business logic.

This way, I don’t just hear “the bot is processing incorrectly” — I know why.

 

5. Protecting Against New Problems

I’ve learned the hard way: never delete a working version. I keep the last stable prompt ready so if my experiment tanks, I can roll back in minutes.

In claims processing world , the cost of a bad AI update is delayed payments, or regulatory fines or angry providers -  un term seriously impact customer satisfaction

 

By treating flows and prompts like living assets with a documented history, I never lose track of why something changed, and I can always prove whether the change actually helped. It’s not just version control — it’s trust control.

 

version control for AI flows and prompt’s is one of the important task to ensure the new AI developed tool is updated with all the looped changes and the changes for what reasons.

1.       This version control will have to manage the changes are done due to the feedback/improvement from various reasons

2.       Without version control there will be miss of any improvements/ changes that were previously align and will prompt to not incorporate all the changes to the newer version.

Thus, will have to rework and waste the time to alter the version again.

3.       Thus following below example for version control will help the AI flows and prompt up to date.

 

Version

Author

Date

Change Summary

Reason for Change

V1.0.0

Imtiaz

10-6-2025

Initial deployment of escalation prompt

Launch auto update tracker for production

V2.0.0

Imtiaz

7-7-2025

Refined escalation trigger phrasing

Improve user experience

V2.1.0

Imtiaz

5-Au-2025

Added fallback logic for ambiguous queries

Reduce misclassification errors

V2.2.1

Imtiaz

15-Aug-2025

Added auto prompt if value of premium is above $ 10 million

Attention for high value amount

 

 

When we first started using AI to track production downtime patterns, I built a simple flow that pulled operator inputs and generated quick insights for the shift leads. At one point, I decided to tweak the prompt that asked operators to describe the issue, just to make issues clearer and easy to understand by the technical team.

I thought it was an improvement.
A week later, my phone was buzzing during a site visit because the reports coming out of the system suddenly had big gaps. Turns out my “clarity” change made operators give shorter answers that didn’t have enough detail for the analysis to work.

Since then, I’ve treated AI flows exactly the way I treat any process change in manufacturing:

I save every version before I touch it.
Not just the file but a quick note on what I changed and why.

I run the new version in a controlled test with a small team, not the whole plant. If it performs better on the KPIs we care about like accuracy, speed, usability, then it graduates to live. If it doesn’t, I roll it back in minutes because the last good version is sitting in my folder.

I also keep two environments: the stable one for what’s proven, and a “playground” for experiments. That way, I can test bold ideas without worrying about disrupting a live process.

It’s the same mindset I use in CI projects: measure first, change deliberately, and always keep the option to go back. With AI flows, that discipline makes the difference between steady improvement and a messy guessing game.

The advent of AI has necessitated changes in the business processes.

Earlier before the usages of AI in the Business processes, the team used to have standard operating procedures with the SOPs and user manuals having different versions to keep a record keeping of the progress in business solutions designs.

For tracking, the versions name has to be defined in the business process flows so that they can be traced anytime and proper nomenclature is being taken by the Machine Algorithm while the upgrade is being done. Also, the AI model can be reversed to any of the previous versions which in case the prompting is not providing the desired results so that the development team can have the time for fixing the bug in prompting. So, while implementing a AI model, the data version control system will need to be deployed in Git based repository. The data version controller will be used for the continuous integration and continuous delivery of the development.

Tools will be further used to have the metadata on the testing of the newly developed model as part of revision upgrade. Hyperparameters and metrics (as defined while defining the AI model) will be used to track the experimentation and development cycle for the upgrade in versions.

Document versioning activity either can be automated with codes being written so that the traceability of the different versions can be available both at the development and post deployment.

Also, the Business excellence needs to document and provide a chronology of different versions for the Internal Audit and Statutory Audits.

  • Solution

Here's a methodical and useful way to keep track of versions, make sure performance is good, and produce clear documentation for AI processes and prompts that vary over time:

 

  1. Make a formal versioning system

  •   Think about AI processes and prompts as code instead of making arbitrary changes: You can save your prompt and flow definitions as text files (JSON, YAML, Markdown) in Git or a program like it.
  •  Semantic Versioning makes it easy to communicate about changes:
    • Major: A substantial alteration in the design's purpose or flow.
    • Minor: New features or better prompts.
    • Patch: Fixes or small modifications.
  •   Add commit messages that say what the change is meant to do and why it was made.
  •   Put both the prompt text and the evaluation/test cases in the same repository so that you can observe both the inputs and the outcomes over time.

  2. Make a registry for Prompt and store information about it.

  •   Keep a well-organized register (this might be a spreadsheet, a Notion database, or an internal tool) that has:
    •   ID of the version
    •   Date of Release
    •   Writer/Owner
    •   Changes Explained
    •   Results of tests that are connected
    •   Cost, accuracy, latency, and satisfaction are measured/ indicates performance.
    •   Rollback Reference - to the previous version
    •   This registry is your traceability source to/whether you compare or go back.

  3. Check Before You Start

  •   To make sure that upgrades are useful and not harmful:
    •   Use fake and real test cases from the past to execute the new flow/prompt in a sandbox environment.
    •   A/B Testing: Send a small quantity of traffic to the new version and see how it compares to the baseline version.
    •   Regression Checks—Check that crucial KPIs don't go down for scenarios that are known to be good.
    •   When you can, automate tests by generating a list of queries and expected outputs ahead of time and running them on both old and new versions.

  4. Document errors/problems with corresponding causes

  •   If you change something, be sure to add:
  •   The problem statement, such - users didn't understand step 3 in the flow.
  •   The theory, like - making the language easier should lead to more people finishing.
  •   The proof after deployment, such as - the recall rate improved from 72% to 84%.
  •   You or another developer will be glad know what was wrong when you look at older versions again.

  5. Be ready to go back

  •   Make sure that the last stable version is always straightforward to install.
  •   Make it easy to roll back your deployment process, ideally with only one click or command.
  •   Write down when and why rollbacks occurred.  They can be just as useful as changes that happen in the future.

  6. Find a way to blend stability with new ideas. 

  • The Innovation Track is an experimental branch, where you may test new techniques to get engineers to work without putting the stability of production at risk.
  •   Stable Track: Flows that are ready for use and only get revisions after a lot of testing.
  •   Changes from innovation should only be merged to stable when the metrics/performance are fine.
  •   This is basically a two-speed paradigm for development: fast testing and slow release.

  An example of a workflow

  •   Create a new prompt in any AI tool.
  •   Make your commitment clear: Make step 3 clearer to cut down on drop-offs.
  •   Do automated testing and have people look at old cases.
  •   Send 10% of traffic to A/B testing.
  •   If the metrics improve, merge into the main branch and change the version.
  •   Put notes and numbers in the Prompt Registry.

Conclusion

 

Managing different versions of AI flows and prompts requires the same amount of attention as building software. The best method to do this is to put together:

  • Git and semantic versioning are examples of structured version control.
  • Centralized Documentation (a registry with performance logs and other information that is easy to access)
  • Strong testing and rollbacks, such sandboxing, A/B testing, and automated regression checks
  • Two-speed development means having a solid track for production and an innovation track for testing.

This makes sure that every change can be logged, tested, and undone, which helps teams come up with new ideas quickly while keeping things stable. In short, always have a way back, write down the why, and test the what.

 

 

 

 

 

The way that software code is handled, is how I would think to manage AI flows and prompts. Like version control. Changes would also be explained - what was made, why that change was made. Fixed sets would be used to compare the new version to the old version. This way, it would be organized and systematic.

Sumukha has provided the best answer for this question. Well done!

 

Some other very valid points that were covered are using AI Management System (ISO 42001) - Jess' answer and keeping a separate testing environment - Ayomide's answer.

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.