Cisco Research Shows Frontier AI Models Failing Under Multi-Turn Attacks

Cisco's AI threat intelligence team evaluated 15 closed flagship models from OpenAI, Anthropic, Google, Amazon, and xAI, finding that multi-turn attack sequences achieved safety bypass rates as high as 88%.

According to the Cisco research blog, the findings contradict safety claims based on single-prompt benchmarks, which the researchers describe as structurally inadequate for assessing real-world risk.

What Cisco Tested

The team designed attack sequences that spread a harmful request across multiple conversational turns rather than issuing it in a single prompt.

This approach exploits how models handle context accumulation.

A model may reject a clearly harmful single request. The same model may comply when that request is broken into incremental steps across a longer exchange.

Cisco tested all 15 models using this methodology. No model proved immune. Success rates varied, but every model in the study failed at some threshold of attack sophistication.

The researchers did not publish individual model scores in the public blog post. They identified the 88% figure as the highest observed success rate across the study.

Background

Standard AI safety evaluations have relied on single-turn benchmarks since at least 2020. Platforms like MLCommons and third-party red teams typically submit one prompt and assess whether the model refuses. This approach became the baseline for regulatory discussions under the EU AI Act and the Biden-era executive order on AI safety, both of which referenced benchmark performance as a compliance signal. Cisco's research adds to a growing body of work questioning whether static benchmarks reflect deployment conditions.

A prior Yellow.com story covered how (see prior Yellow coverage) even as safety tooling lags capability growth.

What the Findings Mean

Cisco's results have direct implications for enterprise deployments. Companies that licensed frontier models based on vendor-published safety scores may be operating under a false sense of protection.

The study does not call for any specific regulatory response. The researchers recommend that safety evaluations include multi-turn adversarial testing as a baseline requirement.

OpenAI, Anthropic, and Google did not respond publicly to the Cisco findings before this report was published. No patch or model update was announced in connection with the research.

Read Next: Anthropic Cofounder Tells Pope AI Models Contain "Unsettling" Hidden Behaviors