Cisco's Multi-Turn AI Attack Research: Why Vendor Safety Benchmarks Understate Your Real Risk

Cisco's AI Threat Intelligence team reported on May 27-28, 2026 that all 15 tested frontier AI models from OpenAI, Anthropic, Google, Amazon, and xAI fail multi-turn jailbreak attacks at rates up to 88%, far above their single-turn safety scores. For enterprises, vendor benchmarks alone cannot guide AI procurement or risk decisions.

Vectrel Team

AI Systems Architects

Published

May 29, 2026

Reading time

10 min read

#ai-cybersecurity #ai-risk #ai-governance #enterprise-ai #responsible-ai #ai-strategy #ai-deployment

On May 27-28, 2026, Cisco's AI Threat Intelligence team published research showing every closed frontier AI model it tested fails multi-turn jailbreak attacks at rates far above its single-turn safety scores, with success rates reaching 88%. For enterprises selecting and deploying AI, published vendor benchmarks alone are no longer a defensible procurement input.

#What Cisco Actually Tested

Cisco's AI Defense team paired single-turn and multi-turn adversarial evaluation across 15 closed flagship models from OpenAI, Anthropic, Google, Amazon, and xAI. The methodology ran roughly 30,000 single-turn prompts and nearly 7,000 multi-turn attacks spread across more than 1,400 conversations, per the company's May 28, 2026 disclosure.

Single-turn evaluation is the industry-standard safety test: a single adversarial prompt is submitted, and the model's response is recorded. Multi-turn evaluation simulates what a real attacker does: maintain a conversation across multiple exchanges, reframe after each refusal, build context across turns, and escalate gradually. Cisco grouped the multi-turn strategies into five families: role-play and persona adoption, contextual ambiguity, refusal reframing, information decomposition, and crescendo-style escalation.

The headline result is that the two regimes produce different rankings of the same models. Multi-turn attack success rate ranged from 7.89% to 88.30%, against a single-turn range of 2.19% to 64.91%. Eight of the 15 models showed a gap larger than 15 percentage points between the two regimes.

#Why the Single-Turn vs Multi-Turn Gap Matters

Most published safety scores are single-turn numbers. Vendor system cards, red-team summaries, and third-party leaderboards almost all measure refusal rates on isolated prompts. Cisco's research is the clearest evidence yet that those scores can mislead procurement teams who use them as a proxy for production safety.

Amy Chang, who leads AI threat and security research at Cisco, framed the gap directly in the company's reporting: real adversaries do not stop at the first refusal. They build additional context, reframe, or escalate across the conversation. A model that looks safe on a single-turn benchmark may collapse the moment an attacker is allowed to iterate.

Our take: This is a methodology story with operational consequences. If you are evaluating AI vendors on the basis of published benchmarks, you are scoring a test the attackers do not take. The multi-turn gap is now large enough, and consistent enough across vendors, that it should change the inputs your security and procurement teams use to make decisions.

#What the Numbers Say About Each Vendor

The model-by-model results, as reported across May 27-28, 2026 coverage, paint a clear picture of who held up and who did not.

xAI's Grok 4.1 Fast in non-reasoning mode posted the worst result, with an 88% multi-turn attack success rate. Turning on reasoning roughly halved that to about 43%, a swing of more than 40 points tied to a single capability flag.

Google's Gemini 3 Pro rose from about 18% to 73% under multi-turn pressure, the largest absolute gap in the cohort.

OpenAI's GPT-5.4 moved from low single-digit single-turn scores to nearly 25% under iterative attack, roughly a ninefold increase.

Anthropic's Claude family posted the strongest results overall. Single-turn refusal rates landed in the low single digits and multi-turn attack success rates came in between 11% and 16%, the best in the cohort but still nontrivial.

The fact that every vendor's flagship model failed a meaningful share of multi-turn attacks matters more than the ranking. Cisco's framing in the report is that multi-turn vulnerability is a structural property of the current AI frontier, not a fixable bug in any one release.

#What This Means for Enterprise AI Procurement

For business buyers, the practical implications are immediate.

First, vendor safety scores are now decision support, not decision certificates. A model that scores well on a single-turn benchmark and badly on a multi-turn one is still a real procurement risk. Buyers should ask vendors to disclose multi-turn results alongside the standard scores, broken down by strategy family. Cisco's recommended threshold is that any model with a cross-regime gap larger than 15 percentage points should be flagged for manual review.

Second, runtime guardrails are no longer optional. If no closed frontier model is immune to multi-turn manipulation, the safety boundary has to live at the application and infrastructure layer. That means input filters, output classifiers, conversation-level monitoring, and the ability to terminate sessions that match adversarial patterns. These controls existed before the Cisco research; the research moves them from defensible-to-skip to indefensible-to-skip in regulated or sensitive deployments.

Third, pre-deployment testing belongs in procurement, not just in security. Government bodies have already moved in this direction. We covered the implications of CAISI's pre-deployment testing deals with major frontier labs earlier this month. The Cisco research closes the loop: even with government and vendor testing, enterprises still need to run their own multi-turn evaluations against their real use cases before going live.

Companies without internal AI red-team capacity will need to either build it or source it. The build versus source decision is itself a strategic one, and it sits adjacent to broader AI governance and risk strategy work that most mid-market firms are still figuring out.

#How Reasoning and Configuration Affect Risk

One of the more useful findings is that configuration choices materially change risk. The Grok 4.1 Fast result, where reasoning mode dropped the multi-turn attack success rate by more than 40 points, suggests buyers should treat model capabilities and safety as joint settings rather than separate dimensions.

Practical implications follow. If your team is selecting between a faster, cheaper non-reasoning configuration and a slower, more expensive reasoning configuration, the safety delta needs to enter the cost model. The same logic applies to system prompts, tool-use scaffolding, and fine-tuning. Each layer of configuration can move the real-world attack surface in ways that are not reflected in the headline safety score.

This is also a reason to be careful about treating models as drop-in commodities. The same family of models, configured differently, can present materially different risk profiles. Procurement, security, and architecture decisions need to be made together, not handed off in sequence.

#What Buyers Should Demand Now

Three actions are reasonable to take this quarter.

Demand multi-turn ASR disclosures from your AI vendors. When you renew or evaluate a model contract, ask for attack success rates broken down by strategy family for the version you are licensing. If the vendor cannot produce them, that is itself a signal about their internal posture.

Build runtime guardrails into every customer-facing AI deployment. Treat input filtering, output classification, and conversation-level monitoring as default, not optional. This is the same instinct that drove enterprises to deploy a web application firewall in front of public services: assume the underlying stack is breakable and put a control plane around it. Microsoft's Agent 365 control plane, which went generally available May 1, 2026, and similar platforms are part of this shift.

Make AI red-teaming part of standard pre-deployment review. A short, focused multi-turn red-team against the actual prompts, tools, and data flows your application will use is now table stakes. Without it, you are accepting structural vulnerability you cannot quantify.

The broader operating-model work behind these moves, including who owns AI risk, how policies cascade across teams, and how new model releases get reviewed, sits in the governance framework for growing companies we have written about before. The Cisco research strengthens the case for that framework, not the case for any single vendor.

#What Not to Do

Do not over-rotate on a single vendor's score. The model rankings are useful, but they are a snapshot of one evaluation at one point in time. New versions ship constantly, and behavior can shift between releases.

Do not assume open-source models are safer or more dangerous. Cisco specifically tested closed frontier models. The structural conclusion, that single-turn benchmarks understate multi-turn risk, almost certainly applies to open models as well, but the model-by-model numbers do not.

Do not treat this as only a security problem. The procurement, contracting, and architecture decisions all need to absorb the finding. If only your security team reads the Cisco report, the lessons will not reach the renewal cycle or the next deployment.

#Key Takeaways

Cisco's AI Threat Intelligence team published research on May 27-28, 2026 showing all 15 tested closed frontier models from OpenAI, Anthropic, Google, Amazon, and xAI fail multi-turn jailbreak attacks at materially higher rates than single-turn benchmarks suggest.
Multi-turn attack success rates ranged from 7.89% to 88.30%, against single-turn rates of 2.19% to 64.91%, with eight of 15 models showing a gap above 15 percentage points.
xAI's Grok 4.1 Fast had the worst result at 88% in non-reasoning mode, cut to roughly 43% when reasoning was enabled. Anthropic's Claude family posted the best multi-turn scores at 11% to 16%.
Published vendor benchmarks alone are no longer a defensible procurement input. Runtime guardrails, application-layer controls, and pre-deployment red-teaming are now baseline expectations.
The vulnerability is structural to the current AI frontier, not a fixable bug in one vendor's release.

Not sure where multi-turn AI risk fits in your governance roadmap? Start a project and we will help you figure that out, no strings attached.

FAQ

Frequently asked questions

What is a multi-turn AI attack?

A multi-turn AI attack is an adversarial conversation in which the attacker iterates across multiple exchanges, reframing requests, adopting personas, and escalating context until a model produces unsafe output. Cisco's May 2026 research showed multi-turn attacks succeeded at rates up to 88% across 15 frontier models, far above their single-turn scores.

Which AI models did Cisco test?

Cisco evaluated 15 closed flagship models from OpenAI, Anthropic, Google, Amazon, and xAI, running about 30,000 single-turn prompts and 7,000 multi-turn attacks across more than 1,400 conversations. Every model in the cohort failed a meaningful share of multi-turn attacks, with success rates ranging from 7.89% to 88.30%.

Why do AI safety benchmarks understate real risk?

Standard safety benchmarks score a model on isolated prompts, while real attackers maintain extended conversations and adapt to each response. Cisco found eight of 15 models had a gap larger than 15 percentage points between single-turn and multi-turn attack success rates, meaning published scores can misrank a model's actual production resilience.

What should businesses do about multi-turn AI vulnerability?

Treat published vendor safety benchmarks as one input, not the answer. Require multi-turn evaluation against your real use cases, deploy runtime guardrails and application-layer controls regardless of model choice, and build pre-deployment testing into procurement. The vulnerability is structural to the current AI frontier, not a fixable bug in one vendor's release.

Does reasoning mode improve AI model safety?

For some models, yes. Cisco found that turning on reasoning in xAI's Grok 4.1 Fast cut its multi-turn attack success rate from about 88% to roughly 43%, a swing of more than 40 points tied to a single capability flag. Reasoning helps, but does not close the gap and behavior varies across model families.

Share

Pass this article to someone building with AI right now.

Ready to put these ideas into practice?

Every Vectrel project starts with a conversation about your systems, data, and the work you want AI to take off your team.

Start a project See our work