Skip to content
View in the app

A better way to browse. Learn more.

Benchmark Six Sigma Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Arul Palani

Members
  • Joined

  • Last visited

  1. Today, AI shows a narrow, ‘synthetic’ form of creativity. In other words, it shows real but limited form of creativity. It re-combines patterns from human data in ways that can be genuinely useful and sometimes surprising, but it lacks the lived experience, intention, emotion and self-driven goals behind human creativity. Whether this counts as creativity depends on whether the creativity is defined by internal states or by outward behavior. ‘creativity’ means Many treat something as creativity if it means both novel and valuable or appropriate for a goal. 1. Generative/Combinational creativity: producing novel combinations or variations within existing style or framework. 2. Transformational creativity: inventing something genuinely new styles, concepts or problem framings. Today’s generative AI models are experts in first type: they methodically generate new artifacts by recombining patterns that they have learned from massive datasets, which is why they can write, draw and design in various convincing ways. But they lack emotional insight, personal experience, and independent goals, so this ‘creativity’ is closer to a high-end remix engine shaped by prompts and training data. ‘new’ things Technically, large models balance memorization: regurgitating training data with generalization (outputs that are statistically consistent but no duplicates/copies). Studies shows as the models scale, they memorize simple fact-recall tasks, but for complex reasoning tasks they depend on generalization and generate more novel text than the one appears in the training data. So AI is not just copy-pasting; it constructs new things guided by learned probability distributions over sequence of words, pixels or tokens. Again, at the same time, it operates entirely within patterns extracted from human produced data/content, with no grounding in personal experience, emotion or intrinsic motivation, which one can say are core to human creativity. Designing a GCP architect training Let’s say the task is to design a 6-8 weeks training program to prepare engineers in the organization for Google’s Cloud Architect certification, including labs, simulations, and scenarios that reflect the org’s stack as well as the one’s available in certification prep notes available through Google. An AI assistant with knowledge of GCP patterns, architecture, blueprints, and Gen-AI based learning tools can: Map exam objectives like networking, IAM, security, data, reliability, case studies, etc to a week by week curriculum with readings, Qwiklabs-style labs, and mock case studies. Design cloud architecture scenarios like multi-region High Availability, Cost-Peformance trade-offs, IAM design with minimal privileges that all look similar to real Google exam case studies and Cloud Architecture Center’s examples. This feels creative because it proposes coherent sequences, novel lab combinations, and scenarios narratives that did not exists as a single source in its training data. However, it is recomposing known GCP concepts, exam patterns, and architecture templates as one. AI looks genuinely ‘creative’ in this task 1. Scenario Synthesis: Given constraints like “APAC heavy traffic, strict data residency, unpredictable load, cost ceiling and quota limitations”, AI can invent a new case: a fictional retail tech company on GKE with regional Cloud SQL, Cloud Armor, and multi-layer IAM, then ask the learners/engineers to design the desired target state and migration plan. This specific combination of requirements and failure modes may be unique and sometimes AI combines together few constraints that human would not have thought of combining, which feels like new or fresh ideation. 2. Adaptive learning path: If an engineer struggles with say VPC peering and shared VPC, the AI can dynamically generate remedial labs, analogies, and extra quiz questions pertaining to those gaps. This dynamic reconfiguration of the learning path across many learners/engineers in the org is not static remixing; it is a continuous, context-sensitive recomposition goes beyond typical templated training design. 3. Cross-domain analogy and framing: AI might explain GCP concepts using analogies drawn from simple domains like traffic police for firewalls, apartment complexes for projects and folders which will make learning more easy to grasp and also showcases AI’s creative teaching. In all the above, AI is creating new, coherent structures that solve real design constraints. That is a legitimate, if engineered, form of creativity within a bounded space. ‘Limits’ shows up 1. Reinventing the role itself: Try asking the AI to rethink “What should a cloud architect role look like in the present world of GenAI on GCP?” pushes it into remixing existing data from blog post, role definitions from official documentation, and from strategy pieces. It will blend ideas but will not take into considerations like organizational politics, talent market, or other factors in the way a human does. 2. Deep pedagogical innovation: if you ask the AI to invent a new radical pedagogical for teaching reliability, it will combine existing learning theories and practices rather than inventing a new one. It can simulate “novelty”, but will not take risks, won’t consider cultural change, or have lived experimentation to invent a new teaching change/culture. 3. Value and intent: Human creativity in training is value driven. The intent is to make sure the learners will be enriched with the training, achieve the goal of obtaining the certification, and make progress in career. However, AI does not care about these things. Conclusion In the GCP Cloud architect training, AI is only creatively recombining than inventing new from scratch. It will design fresh feeling curriculum, scenarios, and analogies by systematically remixing GCP patterns, exam structures, and pedagogy knowledge. This is also how human create by recombining data from various resources. Hence this still deserved to be called as ‘creativity’ in a narrow, instrumental sense since it provides solutions that we human can find it genuinely useful or surprising. However, without its own experiences, intentions or stakes in the outcome, the AI’s creativity always remains derivative and bounded. In the GCP Cloud architect program, we will need to set up AI as a creative partner, where humans set goals, values and constraints, then refine and give feedback to what AI generates into training that will foster real architects in the organization.
  2. When AIs interact across the company, it will make market faster, more dynamic, and more fine-grained. However, it will also introduce new set of power concentration, opaque collision, and systemic risk that we human are poorly equipped to detect or control. In a cloud/infra , the biggest changes come from autonomous negotiation agents, cross-company multi-agent systems, and always-on algorithmic pricing shaping outcomes in real time. How the “rules” of competition change Advantage shifts from human intuition to quality of data, design of the model, and agent strategy. If we feed agents with richer, cleaner signals (telemetry, contracts, macro data, competitor traces) then we win since the agents will be able to learn better policies faster. Competitive barriers are moving from plant/manufacturing investments to ownership of unique data and realistic simulations, favoring big tech and logistics leaders. In a cloud domain, multi-agent systems are already coordinate inventory, routing, and production across organizations, reallocating work when one node is congested, down or at risk. As this extends across company boundaries, competition is becoming “competition between ecosystems of agents” rather than between the firms. It is similar to how today’s competition is often between cloud, marketplace and app-partner stacks rather than single products. Pricing, negotiation, and value creation AI negotiation agents are starting to benchmark rates against vast transaction data pools, simulate contract options, and adjust terms accordingly, leading to tighter spreads. When both sides run agents, you effectively get continuous, machine-speed bargaining where discounts, service levels, and risk premiums update with market conditions. Value creation shifts in three ways 1. Resilience: Agents can re-route, re-allocate resources, and pre-book capacity before humans notice a disruption, so “uptime under volatility” becomes core differentiator. 2. Orchestration: Whoever owns the coordinating agents captures more value than any single asset owner. 3. Personalization: Agents can tailor SLAs, bundles, and prices per transaction/resource/service. Today, pricing involves periodic, manually negotiate, coarse tiers. With AI-2-AI, it will continuous, dynamic, per-transaction or per resources. Competition, between firms and contracts. However, it will become between ecosystems of agents and data networks. Value capture is based on Margin on assets and services, it can become Margin on orchestration, data and AI decision services. Supply decisions, is now batch planning, slow re-location. It will become real-time, autonomous rerouting and rebalancing across firms. Finally differentiation, it is now Brand, scale and relationship. It can become model performance, data moats, auditability, and governance. New risks and vulnerabilities: Algorithmic pricing and negotiation will introduce a real risk of ‘algorithmic collusion’, where AI agents will learn to keep prices high or avoid aggressive moves without any explicit agreement between the firms. Additionally, one might get correlated failure modes: many firms may depend on a small set of model/API providers, so a bug, exploit or misaligned update can propagate simultaneously across agents. Power, fairness, and ethical dilemmas Humans may not be able to easily understand the way agents negotiate and allocate, the question of fairness, discrimination, and due process intensify. Customer facing agents can implicitly learn to offer worse terms to certain segments or regions based purely on historical profitability signals.AI-ready firms may secure systematically better rates and reliability, while smaller or less data mature players might become permanent price takers. There is also an accountability gap: when an AI-to-AI negotiation locks a critical supplier out of a market, or an emergent pricing pattern harms consumers, it is unclear who should be held responsible. Without strong requirements for audit trails, human override mechanisms, and simulation based testing of multi-agent interactions, societies may only notice harmful equilibria after they are deeply entrenched. Strategic implications: 1. Treat “agent strategy + data” as a competitive asset: invest in telemetry, clean data pipelines, and sandboxed simulations where the agents can be trained and tested against realistic market behavior. 2. Design for observability and control: enforce logs for all agent-to-agent deals, provide human in the loop thresholds for high impact actions and build tools to inspect learned policies and emergent patterns. 3. Anticipate Regulation: prepare for transparency, non-discrimination, and anticollision requirements around pricing and negotiation agents. Picture in-house “Infra-Copilot” (agent with Terraform, monitoring and billing access) talking directly to Gemini Cloud Assist inside GCP, with a thin orchestration layer between them. This setup can absolutely change how we compete on reliability, cost, and engineering leverage. How the interaction would work In the in-house Infra-Copilot will become the primary brain for cross-cloud/org context, while Gemini will be the specialist that understands GCP internals, recommendations, and support workflows. 1. A multi-agent orchestrator routes tasks: high-level intents and policies from Infra-Copilot, low-level plan/execute/troubleshoot steps delegated to Gemini Cloud Assist and FinOps features. 2. Incident handling: Infra-Copilot detects an SLO breach from observability tools and ask Gemini investigations to correlate logs/metrics, propose a fix, and generate a change with impact analysis 3. Cost Control: Infra-Copilot will handle the business budget/OKRs and periodically asks Gemini’s cost optimization/Cloud Billing context for safe right sizing, committed use changes, or architecture tweaks, then negotiates trade-offs with product teams. 4. Design and rollout: Infra-Copilot captures requirements from PMs, drafts infra blueprints, and calls Gemini Application Design Center to get GCP-best-practice templates and diagrams, then merges them back into IaC repo (Infrastructure as Code). How this change affects Time to change collapses: provisioning, rightsizing, and rollback become near-instant; if competitors still run ticket queues, the release and recovery cycle times beat them systematically. 1. Value shifts: Many companies can turn on Gemini Cloud Assist, but fewer have a strong in-house AI that knows long-term cost strategy, reliability risk tolerance, and business priorities to drive it. 2. Vendor leverage: Infra AI can continuously simulate “What if”s (different region, SKUs, discounts, utilization patterns) using Gemini’s cost and architecture insights, giving much more negotiating power with Google and internal stakeholders. New advantages and risks in this specific setup Advantages: Autonomous FinOps + SRE : a lot of day-2 work (alerts triage, RCA drafts, cost drift detection) can be offloaded to this AI pair, freeing engineers for higher-order reliability and product work. Better than stock Gemini: Infra-Copilot can inject org-specific runbooks, exceptions, and political reality, turning generic Cloud Assist advice into something that is actually deployable. Learning: Infra-Copilot can compare how similar workloads behave on other providers or on-prem, and use Gemini’s migrations or re-balancing suggestions. New Risks: Over-optimization: left unchecked, the two agents might converge on patterns that look on great on metrics (e.g very high utilization) but quietly erode safety buffers, incident playbook clarity, or team understanding. Hidden dependency on Google: the more Infra-Copilot depends on Gemini recommendations and internal APIs, the harder it becomes to move away or even to reason about behavior if Google changes defaults, pricing, or SLAs. Governance: Because Infra-Copilot auto-accepted a Gemini recommendation which might lead to an outage or high cost, who “owns” the decision – the SRE team, the platform team that wired the agents, or Google as a service provider. How you could design it sanely For a pragmatic, low-regret implementation: Treat a Gemini as a specialized plugin, not a peer brain: Infra-Copilot stays the policy owner; Gemini is a domain expert invoked with clear scopes (design, troubleshoot, optimize) and bounded permissions. Use an orchestrator with strong RBAC and logging between the two: log every “ask Gemini -> get plan -> apply/change” loop, and gate high-impact actions (quota changes, region moves, risk optimizations) behind human approval thresholds. Continuously simulate: replay incidents and cost scenarios in a sandbox where Infra-Copliot and Gemini interact on synthetic data, and only promote behaviors that look robust over a wide range of conditions.
  3. In today’s world, one has multiple AI options to choose from. The option to select AIs which are designed to do a specific work or domain to AIs which can do almost anything and everything. If an organization adopts AIs from two different companies to collaborate, it will be more powerful and progressive. This adoption can collaborate very effectively through standardized protocols, well-governed APIs, shared platforms and clear contracts for data security and accountability. This unlocks new forms of automation that can be carried out across the organizational boundaries, but it will also introduce multi-agent risks like emerging behavior, failures and complex liabilities which needs to be actively governed. Considering our domain: an enterprise with an in-house ChatGPT-style model hosted via private endpoint used as DevOps AI partners with Google Gemini via Google Cloud Platform (GCP) for GCP infra changes. This AI-AI collaboration is feasible and powerful, however it needs to be wrapped in strict controls as they will be working on the production environments. The safest bet is to let agents propose, validate and simulate changes, while humans and strong policy engines guard any changes that will mutates cloud resources. Scenario: DevOps AI + Google Gemini 1. The DevOps AI agent hosted in the internal portal will take requests from platform team users like “scale test environment in GKE for load test during performance test schedule” or “create new GCP project enabling services of Vertex AI with necessary guardrails”. 2. A change request will be created and assigned to the CAB team for review and approval. The CAB team will review the priority of the request, resource allocation requested, reason behind the request and budget constraints. Post review, the team will approve the change or cancel. 3. When a request is approved by the change process, the DevOps AI will call the Gemini agent running on the GCP which is wired in to Google Gemini Cloud Assist to execute the request. The Loop: 1. DevOps AI Agent: a. Interprets the Change Request, checks internal policies or SOPs, and turns into a structured desired spec or template containing details about the projects, subnetworks, IAM bindings, labels, quotas along with environment, risk details and criticality. b. Sends the spec via API to Gemini agent without exposing internal secrets, only required parameters and policy constraints. 2. Gemini Agent in GCP: a. Uses Geminin Cloud assist to prepare a concrete GCP implementation plan which includes project structure, VPN topology, labels, IAM roles, GKE templates, org policies, etc through IaC (Terraform) files . b. Implements the requested change via IaC (Terraform) files in GCP and returns a summarized impact diff plus validation signals (policy checks, cost estimate, security posture) 3. Joint Decision and Execution: a. The DevOps agent will compare the summarized impact diff against internal rules which will include cost ceilings, SLOs, security policies, baselines. It will either ask Gemini Agent for alternatives or prepares a human readable change request for the CAB for approval. b. Once the change request is approved by CAB/approver, the DevOps agent will signal the Gemini Agent to proceed with the change. Gemini Agent will then executes the plan. It will report back the status and log the steps taken in the CR for auditing. This will gives clean separation on roles and responsibilities handled by the DevOps AI Agent and Gemini Agent. DevOps AI owns intent and internal policy. Gemini Agent will own GCP Actions inside a governed, observable GCP boundary. Opportunities for GCP infra work 1. Speed and consistency: Gemini Cloud Assist can accelerate infra design and implementation, troubleshooting and cost/security optimization; pairing it with DevOps AI agent will standardize the benefits of using GCP across the organization without having to train every internal team in organization. 2. Better Change quality: Misconfigurations in the infrastructure through Humans will be reduced since the Agents are able to pre-run playbooks, simulate the blast radius, check policies and cross-check against log/metrics for any failures before proposing the change. 3. Shared Operational Context: With DevOps AI agent holding ownership for business context, Incident-driven changes and Gemini Agent responsible for reading GCP logs/metrics. Will help teams like SRE in day-to-day operations without worrying about business constraints. Risks and Governance challenges: 1. Over-automation a. If either agents can push changes to prod when reacting to the same alert it will lead to overshooting scaling, over provisioning resources or policy updates b. Change windows, rate limiting , environment boundaries must be enforced in the orchestration layer, not just “told’ to the models. 2. Policy drift: a. The internal infra standards (like naming, network patterns) may not match the default standards of Google’s. If policies live only in prompt instructions, they will drift. b. Codify policies as executable checks that Gemini must satisfy before it can mark a plan as “eligible for approval. Checks include for Org policies, custom validations, project policies. 3. Privilege and identity of agents a. The agents can be granted powerful roles: a compromised integration or overly broad privileges can mutate entire orgs. b. Use separate service account per agent to have minimal IAM roles, VPC control, secret management for better access management. 4. Auditability a. Combined logs for agents to track who requested the change and why? If not it will become hard to find answers. b. Require end-to-end traces: log every intent from DevOps AI agent, every plan and action from Gemini Agent and attach these in GCP’s Audit logs and in ITSM to fully trace the events. 5. Failure Modes a. Emergent behavior like repetitive reconfiguration, oscillating scaling policies, etc are realistic risks even if the agents are safe in isolation. b. Regularly run chaos-style simulations in a sandbox GCP org where both AI agents operate under stress, observe the failure patterns and then update the guardrails based on the observation. Governance Patterns 1. A single “control plane” service between the two agents, owning workflows, approvals, environments routes (lab/dev/prod), and hard enforcement of policies and schedules. This will let you swap models without changing governance. 2. Keep design, approval and execution logically separated. For example: Gemini can propose IaC and recommendations, but a dedicated CI/CD pipeline with policy-as-code decides whether anything actually hits GCP. 3. Explicit scope per integration means separate agents and service accounts for read-only observability, for cost-optimization suggestions, non-prod provisioning and prod provisioning with approvals. This will have narrower rights and stricter reviews.
  4. In the aerospace industry, almost every organization depends on a wide network of partners from raw-material suppliers and precision machining firms to avionics manufacturers and logistics specialists. Each of these players is now using its own AI tools for planning, risk assessment, scheduling, or compliance checks. As a result, one AI driven decision in one part of the chain can trigger major changes for everyone else. Take the case of production planning between an aircraft manufacturer and its Tier 1 and Tier 2 suppliers. Suppliers need to know the reason behind the changes whenever the OEM’s AI system adjusts the build rates or parts requirements. They don’t need access to the model itself, just enough explanation to understand if the shift came from certification delays, changes in fleet demand, or a disruption somewhere else. Without this clarity, suppliers either over prepare or under prepare, which ultimately leads to missed delivery windows or excess inventory. The same need for transparency shows up when AI tools flag supplier risks. A supplier deserves to know which area triggered the concern missed shipments, audit issues, financial signals, or external events. Aerospace depends heavily on trust, documented evidence, and shared responsibility for safety. Opaque AI decisions undermine these foundations. In certain areas AI transparency should stop. Most of the suppliers have AI systems to optimize machine parameters, composite layup processes or in inspection routines. These methods cannot be exposed risking the competitive advantage. Due to regulatory and commercial reasons there is a need to protect Pricing models, sourcing strategies, and controlled-technology details. Tiered transparency frame work would be a better approach to use. Partners involved in day-to-day operations get explanations that help them plan production and quality. Commercial teams receive only the outputs relevant to contracts and capacity. Regulators get visibility into AI decisions that relate to airworthiness and compliance. This can be formalized by organizations through agreements outlining what must be shared and what should remain confidential. In an industry like aerospace where safety is critical, above structure makes collaboration smoother and keeps AI accountable without forcing to reveal sensitive information or trade secrets. This balance is very essential in this serious business. This kind of structure makes collaboration smoother, prevents misunderstandings, and keeps AI accountable without forcing any organization to reveal sensitive information or trade secrets. In a safety critical industry like aerospace, this balance is essential.
  5. Using AI as an onboarding new employee in the organization can transform the on-boarding process of new employee by streamlining tasks, personalizing experience, increase efficiency, and providing ongoing support. However, it cannot build them alone without the human empathy and trust. The organization need to ensure there is a balance of automation with human touch to ensure the new hires feel they are valued right from the day one. Organization Responsibility: 1. Being honest and transparent: Need to inform the new hires in advance that the on-boarding process is handled by AI, explain what is recorded or tracked and how decisions are made. This to give confidence that the automation is transparent and it can be trusted. 2. Handle Errors: During onboarding if the AI is not able to decide the action to be taken or provide effective personalization, there should be option for human input to help in making the decisions so that new hire does not need to re-enter their details and data is recorded accurately. 3. History tracking: AI should provide the new hires to track the history on what information they have provided, when and how it was used by AI. 4. Performance: Ensure the AI tool is always available and responding minimal latency for a rich experience for the new hire. 5. Compliance: Explain new hire what rules and regulations that AI is being governed and under what regulations the data is being governed. 6. Consent: New Hire’s consent needs to be taken before starting the onboarding process with AI by providing all the details and data. AI helps in: 1. It will automate routine tasks such as filling the applications for PF office, personal details, bank account details, compliance documentation. Setting up LAN accounts, scheduling mandatory trainings, and computer allocation details. This helps in reducing the administration burden for HR department and speed up the on-boarding process. 2. Chatbots or virtual assistants will answer all the questions asked by new hire instantly. It will also provide reminders about the tasks that needs to be completed and also take feedback for each task. 3. It can allocate tailored trainings, personalized onboarding materials and allocate mentors by analyzing new hire’s skill set, experience, learning preferences. 4. It can send out notifications to the hiring manager about the progress being made during on-boarding process. The hiring manager can propose additional trainings or support that can help the new hire to understand the work that is going to assigned in the team. Advantage: 1. New hire will experience faster integration into the organization and will have less confusion about the process. The process will remain uniform across remote and non-remote hires. 2. Organization might see higher retention rates, more engagement and job satisfaction based on the smooth onboarding and the clarity it gives to new hire. 3. Based on the data collected, HR teams can further enhance or improve the onboarding experience by spotting the gaps, and address issues. Disadvantage: 1. Bias: It can re-enforce bias that already exists or create bad experience if the AI is trained with bad data or incorrectly. 2. Over-reliance: Relying only AI automation/bots can create bad impressions for the new hire. They might feel disconnected or unsupported if there is no personal check by HR team. 3. Privacy and Data governance: Will create issues if the new hire’s personal data is not handled in accordance to prevailing data security and compliance rules of the organization and government. The organization needs to ensure the AI automation should make the new hire’s onboarding journey is smooth, friction less, personalized materials and path. There should be a balanced mix of AI automation and human touch.
  6. Cloud Enablement leader in a retail organization As a Cloud Enablement leader in a retail organization, one can use AI as an advisor or decision aiding system for allocation of resources in Cloud. Goal is to make informed decision using AI’s analysis, predictive insight and scenario modelling. Forecasting: In a retail industry, there is continuous events happening based on holidays, promotions, salary day, new product launches. The leader needs to decide how much extra capacity needs to be provisioned during each event, estimate how much storage growth is required for product expansions and make sure resources are not overprovisioned based on historical event data. With AI, one can predict compute, storage and network needs based on historical events, season, marketing calendars, custom behaviors. “what if” scenarios can be simulated for each event, with new app/website features, weather, and regional expansions. Identify anomalies if there is spike in traffic, increase in prices of commodities, weather impact, and unexpected demand of a particular item like iphone, pokemon cards, PS 5 gaming consoles. Budgeting: Budget Planning for next quarter needs to be made on how much resources needs to be reserved with Cloud Service provider for every event based on historical data. Show case previous quarters planned vs actual cost, overview of where the spending is increased. Action to be taken in the next quarter to optimize the resource utilization. AI can provide better insights on cost optimization and budget governance since it will be able to track cloud costs from thousands of line items. It can identify the following and aid the leader in making better informed decisions and less on intuition. Identify resources like VMs, Clusters, load-balancers which are underutilized, idle or over-provisioned cost wise. Provide recommendations on what instance size is better, where to enable autoscale policies, what type of resources needs to be used, how much reservations for a resource can be made, adopt plans which help in saving costs. Predicting the money spent quarterly wise as well as monthly wise. Providing cost insights on cost per transaction, per store, per guest, per region. Trade-offs: Often it is difficult to make decision whether to choose a managed service from the Cloud provider or go self-managed service. Whether to allocate less resources in a cloud region where traffic is less or maintain the same amount resources found in the region where traffic is more. AI can help in evaluating complex options and quantify what trade off can be chosen from a generated recommendation matrix contain pros and cons, risk involved, and cost impact. The trade-off can be between cost vs performance, latency vs capacity, managed service vs self-managed. Managing Risks: In a retail industry, managing customer data security and compliance with policies is the biggest challenge. Leader is required to ensure there is enough checks in place to make sure compliance is met before audits, ensure all governance rules are followed and identify any potential risks with data, SLA etc. Ensure enough resources are allocated in every cloud region. Disaster recovery management documentation needs to be in place if services are down along with business continuity plan. AI can continuously monitor and sends out alerts if any before an issue turns into disaster: Any violations of security and compliance policies like PCI-DSS for retail. Data encryption issues Outages, resource shortages. Reports and Insights: Leader needs spend considerable time to have weekly ops review meeting the cloud enablement teams. Prepare dashboards or deck for senior leadership at the end of every week, end of every month, end of every quarter, end of every half year and finally end of every year. Needs document every major incident along with RCA. AI can be used to automate the creation of dashboards with KPI, cloud resource usage and spend, efficiency using cloud metrics. Summary on incidents happened based on priorities (p1, p2, p3 and p4) along with actions taken to avoid future re-occurrence prepared from incident management system. Performance: Leader needs to ensure item checkout performance in website/app stays above SLA during peak sales or events, scaling policies are in place to make sure speed of website/app and cost are ideal. Ensure appropriate resource allocations are in place for every event. AI can be used to anticipate any operational roadblocks and suggest mitigations. It can constantly monitor for latency and scaling issues, predict if any resources or managed service might hit quota limits, identify which product or product division has high customer transactions. Infrastructure upgrades: Every year the infrastructure planning needs to be made to adapt new technologies that will increase efficiency and offers best experience to customers. Looks for services or environments in the existing architecture which needs to be upgraded or retired. Making sure infrastructure strategy is aligning with the organization’s digital transformation. AI can help in evaluating the migration candidates to new managed services with existing cloud provider, upgradation opportunities, evaluating new cloud provider for specific services against present cloud provider, and retail-specific innovations. Leader needs to face new challenges that comes along with AI assistance: Over reliance on recommendations: Accepting recommendations from AI without validating the outcome with human validation or context checks. Data bias and quality: Decisions are based on bad quality of data. LLM Model limitations: AI cannot predict sudden changes in market like pandemics, supply chain issues nor it can under organization culture. Security, privacy and compliance risks: customer data, infrastructure data exposed . Compliance policies not followed. Accountability and auditability: Who is accountable for wrong decisions and How to explain decisions taken to regulators? Whether decision logs are in place. Ethical use: Unnecessary data exposure.
  7. Unlike traditional application lifecycle, an AI system or AI agent will also degrade over time due to data drift, concept drift and environmental or infrastructure changes. When setting up AI system/agent as an DevOps CI/CD agent, we will need to define criteria as to what is acceptable and what is unacceptable or obsolete. Define few criteria to consider to what is unacceptable: 1. Performance: if the performance is consistently below acceptable SLA measured through key metrics like accuracy, resource usage, precision, performance, data retrieval, latency, etc. Takes lot of time to build and deploy built application to the environments. 2. Data consistency: if training data is no longer consistent with the production data, Application builds starts giving errors like no library found, the features we are trying to test is not supported. 3. Ops efficiency: if pipelines consistently fails to build, takes longer time to build and test the application, does not log reasons for failures, metrics does not match or missing. 4. Compliance and security: if built application fail to pass security tests, flagged for non-compliance with regulations or data handling. Exposes vulnerability by downloading libraries without testing or intimation to the team. 5. Maintainability: if AI agent is no longer able to support the QA tools, testing tools and programming languages due to deprecated APIs or not supporting latest application architectures required by these tools (example: supporting only 32 bit systems instead of both 32 and 64 bit machines). 6. Relevance: if DevOps agent no longer align with the business policy or process. There is a deviation every time an application is built. Does not allow modifications that needs to be done to adapt to new or updated policies. Monitoring DevOps CI/CD AI Agent: A continuous monitoring process needs to be in place. 1. Performance dashboards which will track key metrics against the baseline metrics. 2. Setup tools like EvidentlyAI or WhyLabs (open source) which will track data drift, model performance degradation, model bias, diagnose issues and do predictive quality. Give summaries and plots at specific periods on the performance of ML. Detect, alert, and control against common LLM issues including toxicity, prompt injections, malicious activities. 3. AI Agent’s event audits to track deployment failures, rollbacks , false triggers for not finding resources like library files and modules, compatibility issues in building applications for a particular architecture or OS, etc. 4. Incident logs track which issues are getting logged more and regularly in spite of fixing issues in AI Agent. 5. User feedbacks. Perform end of life assessment: 1. Check model health and performance review via MLOps governance process (if one is adopted). 2. Perform a root cause analysis as to why the Agent/Model is failing consistently whether it is due to data drift, code regression, infra shortcomings or environmental change. 3. Do a cost-benefit run to check whether doing a retraining or upgrading is cheaper than replacing with a new model/agent. 4. Check for version conflicts, un-supported dependencies or any tech debt. 5. Evaluate risks and document the same if you plan to continue or retiring the agent. AI Lifecycle governance policy: Will need to adopt AI lifecycle governance policy or framework which can guide in taking a well informed decision on when and how to retire or continue with DevOps CI/CD agent apart from helping in assessment of risk, performance and other stages in the AI life cycle. Few frameworks that can be considered are NIST - AI RMF 2023 or ISO/IEC 42001:2023 which helps in having governance frame work in place for the DevOps CI/CD AI Agent or for any AI adopting in the organization. Retirement Setups: Once the decision is made to retire or decommission the DevOps CI/CD AI agent, we need to take following steps 1. Have the new DevOps AI agent ready. Trained with new set of data and tested parallelly in lower environments for accuracy of built applications before going into production with new agent. 2. Stop re-training the old model 3. Store all the data related to old model including training data, artifacts, metadata in a secure storage as per the compliance rules for auditing. 4. Record impact analysis, metrics which was involved and why the decision was made to retire the old model/agent. 5. Disable pipelines, remove access, disconnect tools attached to the old model. 6. Have a transition plan as to what will be temporary solution post retiring the agent for building applications and new framework to do ground work on the replacement AI Agent or automation process.
  8. Monitoring personalization/recommendation AI models post deployment is one of the complicated systems in e-commerce retail business. Traditionally, one would monitor if the e-commerce site is up, how fast the site loads, check on whether any backend services are failing, latency due to bot attacks. However, with AI coming into frame, the whole game of monitoring the systems has become even more complicated and more indicators to track. There are couple of indicators that needs to be in-place to monitor the recommendation AI models in the system. The goal is to minimize the cost of errors, data accuracy with transparency (data drift), resource utilization, system issues, performance issues. 1. Metrics: setting up metrics that is important and directly impacts the results. Most importantly end-user experience. a. Latency: how fast the model will respond. The systems need to be constantly monitored for any increase in latency which could happen for various reasons not limited to Bot attack, backend services failing, unscheduled upgrades by service providers or in-house application upgrades, new changes or deployments. b. Throughput: how many requests the model can handle per second. How the systems will handle the increase in traffic especially during the holiday seasons or promotional events. One needs to keep tweaking the resources to make sure the requests are handled efficiently without compromising on the speed and accuracy. c. Accuracy: how well the model performs in real word. The product recommendations or personalization either in the app/website needs to be more accurate, relevant and timely. If there is a small drop in the accuracy then recommendations will be off (data drift) and it is time for the models needs a re-training with new data, new behavioural patterns, new external changes influencing the data. d. Resource Utilization: Monitory resources like CPU, GPU, Memory and network. If one of the resources under-performs then it will the performance of the AI system and in-turn will affect the throughput. e. Incident and alert management: We need to set thresholds for every metric we set. An alert should be triggered if the threshold is crossed and automatically send out alerts to the concerned teams via tools like PagerDuty, Slack or SMS. It should also log an incident ticket for every issue or alert triggered so that the concerned teams like SRE, Dev-Ops and application teams can react and act. 2. Tools: Due to the nascent stage of AI models, we will need tools which can be integrated that align in monitoring the infrastructure. Tools like Uptime robot which can give AI specific metrics and other traditional metrics. 3. Securing the AI monitoring a. Authentication: how the monitoring dashboards are secured and what kind of authentication is required. Data like logs, APIs must have secure logins. b. Data security: Ensure the data in logs or traces does not include any confidential data including user’s personal details like driver’s license, government IDs etc. Ensure all the confidential data are either masked or restricted from accessing. Making sure everything follows regulations and data security. Future Actions: 1. Optimization and cost control: We always need to keep an eye on new tools or infrastructure changes which can help in optimization of performance and efficiency in turn bring down the overall cost. 2. Continuous improvement: We need to make sure we are retraining the models with new data by tracking behaviours of the users throughout the year especially during the holiday season, promotional events or long weekend to make the AI models more effective and accurate in recommending products. 3. Incident Management: Make sure the user manual or application readme are up to date. Every team is communicated about the change being implemented in the models and keep the changes documented in a common place for future reference. Review priorities in a timely manner so that everything is regulated.
  9. Q 821: How Should Organizations Certify AI Before It Goes Live? Before any organization starts working on a framework for AI preparedness, the organization needs to run an assessment of AI readiness. AI readiness means whether the organization has the mindset to adopt AI, do we have necessary data to work, do we have enough people in organisation who have capability and understanding of AI, present architecture scalable and secure to support AI loads, measurable outcomes and governance of data.dopting AI code assistant in a software development team. What is AI code assistance? It is a tool that uses AI to help developers to write code which are effective and efficient. The tool also helps in debugging, doing code review, manipulation of files, and command line execution.efore adopting AI code assistant there are couple of steps an organization needs to do to make sure this tool creates value for the investment made by the organization. § AI preparedness framework: It needs to work on a framework which will create a de-facto standard in the organization to refer before any team in the organization would like to adopt an AI tool in their process or automation. § Sandbox Testing: Choose a mix of team members from a project and ask them to test the AI code assistant in their day-to-day activity to test the accuracy, reliability and consistency. Ask them to track the experiences every day. Set a time limit as to till when they can evaluate the tool and submit their feedback. § Data compliance: In co-ordination with compliance team, the organization team needs to evaluate how the AI is handling our data especially if the data is labelled as confidential, internal. How the data is stored and where it is stored? if there is an exception on how data is being handle, the compliance team should be able to work with the AI service provider to make sure the data handling by service provider aligns with the organization's data compliance and also ensure all the regulatory compliances are adhered. Does the tool bring in any data from outside the organization? § Internal Controls/External certifications: Have internal controls on the AI tool as to what it has access to and how it can be used. The organisation can also work on securing certifications like ISO/IEC 42001. The certification will enable the organization to evaluate frameworks in place, data compliance, storage policies, data flow, security. § Security: Evaluate the tool on how much secure it is. Whether the learning models are accessing data outside the organization for training (any unreliable source or from competitor organization). How it will access organization data and train? § Efficiency: Steps should be taken on how to streamline the process which help in reducing the cost of managing the tools in the organization. § Cost: Evaluate the cost of implementing the AI tool in the organization including how many resources required, time to implement, etc. § Trainings: Schedule trainings from the service provider to the teams on how to use the tool effectively as part of the offerings. Action post adoption: § AI policy for usage: A policy needs to be in place in the organization for users on how to use the AI tools in their day to day. Restrictions that have been applied including terms and conditions on usage. The trainings that a team member need to attend or complete before getting access to the tool which upskills the team member at the same time educate on features that are allowed to use with the tool. § Transparency: Keeping a tab whether transparency is maintained with tool and by service provider as to what has changed, how the data is being handle including storage policies with every new release or update of the AI tool. Also, if there is a change in user-agreement from the service provider. § Check on competition: Organization will need to keep a tab on similar AI tools in the market which might be making significant progress in terms feature and cost when compared to the tool that has been adopted by the organization.
  10. Self-learning AI is one of the newest concepts to arrive in artificial and machine learning domain area. Here, the algorithms and system can learn and improve on their own without any intervention from either humans or another AI. Self-learning AI agent can be adopted in DevOps to run, monitor, and deploy various application builds along with testing of code to different environments. In a traditional DevOps setup, a developer writes the code, tests it locally, merges the code to source code repository. A DevOps engineer will setup a pipeline using CI/CD tools to pick the code from the source code repository, builds the code on a build machine, runs tests on the built code, quality of the code. The built code or application is deployed to various testing environments and finally to production. Setting up an autonomous self-learning AI agent as a build and deployment agent will be beneficial for analysing vast amount of data including build logs with build errors, machine errors, testing outcomes, build quality, build failures, software patches error, timelines of deployments, deployment freeze cycles. It should be able to manage the build and deployment lifecycle management of an application. However, few guardrails need to be in place to make the AI agent accountable and responsible before it takes any action of its own. This is to make sure the applications that are being deployed to the production should be risk free for organization and maintain trust with the clients. 1. Supervised Learning Enabling the machine learning models to learn from labelled data (readme, manuals, tool documentation) will help in accurately classify the data, the type of build and deployment required based on data classification, looks to solve build errors from trusted source. 2. Cybersecurity protection Security controls are built into workflows to prevent data privacy violations, unauthorized use of sensitive data. Share security logs with the cybersecurity team or cybersecurity AI agent in organization for evaluation with every build. Detection of any prompt injection to the code during the build process. 3. Reliable Workflows Ensuring only necessary compute resources are created from the available compute (efficient usage of resources), Check for priority builds based on Data Classification, prioritization and freeze timelines. Send notifications at completion of each stage and the action that will be taken in the next stage to the stake-holders. 4. Content Safeguards This guardrail should be embedded directly into the model pipelines to filter harmful or sensitive content especially financial or personal information. 5. Human Check This needs to be in place whenever the application is getting deployed to higher environments especially Production environment to check what is getting deployed and is it right time to deploy since the external influences will be varying which AI might not be aware. Considerations for having guard rails a. AI Behaviour: LLMs and generative AI can produce unwanted outputs. b. Latency: The time it takes for validation, filtering, classification, requesting for resources and optimizing it, logging every move made. c. Open Source risk: making sure software and library patches are downloaded from trusted source and are tested before using in codes. Benefits: § Adoption rate will be fast § Regulatory compliance § Report generation § Trust § More time for developers to work on improving code.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.