Most enterprise AI agents that fail do not fail in the lab. They fail after they go live. New 2026 research from Sinch found that 74% of enterprises have rolled back or shut down a live AI customer communications agent after deployment. The hardest part of enterprise AI is no longer launching it. It is keeping it running.
Why the Bottleneck in AI Has Moved
For two years, the dominant story in enterprise AI was the pilot trap. Most proof-of-concept projects never crossed into production, a pattern we examined in why most AI projects stall before reaching production. That framing is now out of date for a large share of companies.
According to research published by Sinch, 62% of enterprises already have AI agents live in production. The problem is that many of those agents do not survive contact with real customers, real traffic, and real edge cases. Sinch calls this the AI Production Paradox: organizations got good at shipping AI, and then discovered that shipping was the easy part.
This is a meaningful shift in where the risk sits. A pilot that stalls costs you a budget line and some credibility. An agent that is pulled after launch costs you customer trust, internal momentum, and the political capital you spent getting buy-in. The failure is louder, more public, and harder to recover from.
What the Sinch Study Found
The Sinch report is based on an independent survey of 2,527 senior decision makers across 10 countries and six industries. The headline number, the 74% rollback rate, refers specifically to agents that reached live service and were then pulled or shut down, not projects that failed before launch.
As The Register reported, the findings suggest these systems are far harder to manage reliably in production than the early hype implied. The study frames the cause as governance failure: not the model being incapable, but the surrounding system failing to monitor, control, and correct the agent once it was operating at scale.
It is worth being precise about what a rollback is. It is not a model that cannot answer questions. It is an agent that answered questions in a way the business could not stand behind: inconsistent responses, drift from approved policy, weak escalation to humans, or behavior under load that pre-launch testing never surfaced. The capability was there. The operational scaffolding was not.
The Governance Paradox: Why Maturity Is Not Enough
Here is the finding that should make every AI leader pause. Sinch reports that the rollback rate actually rises to 81% among organizations with the most mature governance frameworks. More governance investment correlated with more rollbacks, not fewer.
That sounds like a contradiction until you read the explanation. "Higher rollback rates reflect better monitoring and control, not weaker performance," said Daniel Morris, Chief Product Officer at Sinch. Organizations with mature governance can see when an agent is misbehaving, and they have the authority and the mechanism to pull it. Organizations with weak governance often cannot tell, so the agent stays live while quietly eroding trust.
Our take: a rollback is not always a failure. Sometimes it is the system working. A company that detects a problem and reverses a deployment in days has more control than a company whose broken agent runs unnoticed for a quarter. The metric that should worry you is not your rollback rate. It is whether you would even know.
This reframes the goal. The objective is not zero rollbacks. The objective is fast, informed, low-drama rollbacks, paired with a clear path back to a fixed version. That is an operational discipline, and it connects directly to building a practical AI governance framework that treats monitoring and reversibility as core requirements rather than afterthoughts.
What Actually Predicts Success
If governance maturity does not separate the winners from the losers, what does? The Sinch study points to infrastructure. Satisfaction with communications infrastructure was the strongest predictor of successful AI deployment, ahead of both raw investment levels and guardrail maturity. The correlation reached 0.52, described as the strongest relationship across 4,656 variable pairs the study analyzed. Separately, 87% of organizations rated high-performance infrastructure as essential or very important.
The study also identified where current providers fall short. The most common gaps were insufficient reliability for AI at scale (42%), limited multi-channel capability (37%), and a lack of AI platform integrations (32%). None of those are model problems. They are plumbing problems.
This lines up with a broader pattern in mid-May 2026. On May 14, IBM described a new delivery model for enterprise AI built around small, senior teams focused on hands-on execution rather than strategy decks. Two unrelated organizations, one a research report and one a vendor, arrived at the same conclusion in the same week: the constraint on enterprise AI is no longer access to capable models. It is the operational and infrastructural work of running them.
For business leaders, the practical implication is uncomfortable but clarifying. The agents that survive are usually backed by continuous post-deployment monitoring and tuning, not the ones with the most elaborate pre-launch checklist. Reliability is earned in operations, not in the design review.
How to Keep an AI Agent Alive in Production
The Sinch data suggests four adjustments that materially change the odds.
-
Instrument the agent before you launch it. You cannot manage what you cannot see. Decide in advance which signals you will track: containment rate, escalation rate, customer sentiment, policy-adherence sampling, and latency under load. If those dashboards do not exist on day one, you are flying blind on day two.
-
Define rollback triggers in writing. Agree, before launch, on the specific thresholds that will pause or revert the agent. A pre-agreed trigger turns a tense judgment call into a routine operational action, and it removes the temptation to let a degrading agent run while people debate.
-
Treat infrastructure as a first-class decision. The study is blunt that reliability at scale, multi-channel reach, and integration depth predict outcomes. Evaluate the systems your agent depends on with the same rigor you apply to the model itself. A capable model on fragile infrastructure is a rollback waiting to happen.
-
Staff for operations, not just the build. Most AI budgets are front-loaded toward the launch. The Sinch findings argue for the opposite weighting. Assign clear ownership for the agent after go-live, with the time and authority to tune it. An AI agent is a system you operate, not a project you complete. The same logic applies whether the agent handles support, sales, or internal workflows, and it is why modern AI customer service increasingly looks more like running a service than installing software.
Common Mistakes to Avoid
Treating launch as the finish line. The celebration when an agent goes live is exactly when the hard work starts. Budgets, attention, and senior involvement that evaporate at launch are a leading cause of quiet degradation.
Optimizing for a zero rollback rate. Chasing zero rollbacks pushes teams to suppress or ignore problems rather than surface them. A healthy program rolls back quickly and visibly when needed, then redeploys a fixed version.
Assuming pre-launch testing covers production. Real traffic produces inputs, volumes, and edge cases that staging environments do not. Plan for the gap between test behavior and live behavior, because it always exists.
Buying the model and neglecting the system. A frontier model does not fix unreliable infrastructure, missing integrations, or weak monitoring. The model is one component in a system, and the system is what customers experience.
Key Takeaways
- A 2026 Sinch study of 2,527 enterprise leaders found 74% have rolled back or shut down a live AI customer communications agent after deployment.
- The bottleneck in enterprise AI has moved from getting agents into production to keeping them reliable once live; 62% of enterprises already run agents in production.
- Rollback rates rise to 81% among organizations with the most mature governance, because better monitoring catches more problems. A fast, informed rollback is a sign of control, not failure.
- Satisfaction with infrastructure was the strongest predictor of success, ahead of investment levels and governance maturity, with a 0.52 correlation across 4,656 variable pairs.
- The agents that endure are instrumented before launch, have pre-defined rollback triggers, run on reliable infrastructure, and are staffed for ongoing operations.
Navigating the production paradox does not have to be a solo effort. Book a free discovery call and let's map out what keeping AI agents reliable in production means for your business.