AI agents going rogue isn’t just a theoretical scenario anymore. Recent incidents show AI agents making errors in a variety of situations, from legal problems and technical malfunctions, to actually deleting entire production databases. These incidents will not be one-offs; we can expect more AI agents to go awry. Security leaders need to be prepared to contain the damage when that happens.
As AI agents move from pilot projects to production environments, we’re entering uncharted territory where traditional security frameworks fall short. The market is already responding to the risks: Gartner anticipates that at least 40% of agentic AI projects will be withdrawn by the end of 2027, with risk management concerns being a key reason. The root of the problem lies in how AI agents fail differently than anything we’ve secured before.
Failure is not predictable
Traditional systems fail in predictable ways. You have logs and structured rollbacks exist. It’s a solved problem. But when AI agents fail, they don’t just malfunction and stop—they act. And the blast radius can be significant. When an agent decides to, for example, clean up redundant data and target your production database, there’s no kill switch to stop it.
Consider the downstream implications: An agent modifying CRM data near quarter-end could compromise earnings reporting. An agent updating the wrong customer records could trigger compliance violations.
Subscribe to the Daily newsletter.Fast Company’s trending stories delivered to you every day
Privacy Policy
|
Fast Company Newsletters
We have spent decades designing security systems around human behavior. However, AI agents operate on probabilistic models rather than deterministic logic, which means they may sometimes generate inaccuracies or fabricate information. When these fabrications lead to actions within production systems, they introduce an entirely new risk category.
Compound failures
The situation becomes exponentially worse when multiple agents interact. If each agent is correct 90% of the time, the error rate compounds quickly when working together. We’re not just dealing with individual agent failures; we’re looking at cascading failures across interconnected systems.
We’re already seeing this in the wild with generative AI. A code editor maker had their chatbot hallucinate a login rule that was unpopular with customers, prompting outrage online and subscription cancellations. A global airline had to issue refunds to customers after its chatbot provided false information that invented a ticket refund policy.
Industries at risk
Some sectors face disproportionate risk. Healthcare organizations using AI agents that go awry could face life-threatening disruptions, when critical monitoring systems are damaged or lost. Financial services and government agencies are also high-stakes environments where agent errors could have far-reaching consequences.
The business functions most vulnerable are those adopting agents fastest: sales, support, DevOps, and automation. These areas have repetitive tasks that make them ideal for agent deployment, but they also touch mission-critical applications and customer data where mistakes can have direct business impacts.
Rethink security architecture
AI agents represent a new category of risk that existing security playbooks don’t address. Here are five recommendations for building resilience into your AI strategy:
Embrace “Secure by Design” for AI. Just as CISA’s Secure by Design framework transformed software development, we need explainability and reversibility built into AI systems from day one. Resilience cannot be an afterthought.
Prepare for recovery. Just as we “assume breach” in cybersecurity, we must “assume agent error” in AI deployments. When agents fail, visibility alone isn’t enough; you need the ability to understand what went wrong and reverse it quickly. Traditional logs won’t help you roll back an agent’s multisystem rampage through your infrastructure.
Design for compound failures. When multiple agents interact, errors multiply exponentially. Using ballpark figures, if each agent is right 90% of the time, the combined error rate in multi-agent workflows could reach 30-40%. Build your AI architecture to isolate agent actions and prevent error propagation between systems.
Treat agent permissions as privileged access. Overprovisioned agents represent the same risk as overprivileged users, but with superhuman speed and scale. Apply the same access control rigor to agents that you do to humans, especially for agents touching mission-critical applications or customer data.
Build responsible AI with visibility. Based on our conversations with enterprises, fewer than 10% have any AI forensics capabilities in place. Most can’t answer basic questions like “What did the agent do?” or “Why did it take that action?” Implement full lifecycle visibility from source prompt to final impact, with auditability that goes beyond traditional logs to actual reversibility capabilities.
The enterprises that survive the agent era will be those that plan for failure while pursuing innovation. The time to build that capability is now, before your next production outage comes with an AI agent saying it “panicked.”
Arvind Nithrakashyap is CTO and cofounder of Rubrik.