Stateful Online Monitoring Catches Distributed Agent Attacks

**摘要**
语言模型可以发现数千个严重的软件漏洞，并且代理越来越多地被滥用于网络攻击。为了避免被检测到，攻击者经常分发他们的滥用行为，将有害任务分散到许多用户帐户中，因此每个单个成绩单看起来都是良性的。由于安全监控器一次只能对一个客服代表的情境进行评分，因此他们在结构上对滥用行为视而不见，
👤 作者: Davis Brown, Samarth Bhargav, Arav Santhanam, Kasper Hong, Ivan Zhang, Matan Shtepel, Steffi Chern, Alexander Robey, Eric Wong, Hamed Hassani

---
🔗 **[Stateful Online Monitoring Catches Distributed Agent Attacks](https://arxiv.org/abs/2605.31593v1)**

> Stateful Online Monitoring Catches Distributed Agent Attacks
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-06-01 14:00

Stateful Online Monitoring Catches Distributed Agent Attacks

回复