**摘要**
Voice agents, artificial intelligence systems that conduct spoken conversations to complete tasks, are increasingly deployed across enterprise applications. However, no existing benchmark jointly addresses two core evaluation challenges: generating realistic simulated conversations, and measuring quality across the full scope of voice-specific failure modes. We present EVA-Bench, an end-to-end eva
👤 作者: Tara Bogavelli, Gabrielle Gauthier Melançon, Katrina Stankiewicz, Oluwanifemi Bamgbose, Fanny Riols, Hoang H. Nguyen, Raghav Mehndiratta, Lindsay Devon Brin, Joseph Marinier, Hari Subramani, Anil Madamala, Sridhar Krishna Nemala, Srinivas Sunkara

---
🔗 **[EVA-Bench :评估语音代理的新端到端框架](https://arxiv.org/abs/2605.13841v1)**

> EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-05-15 08:00