**摘要**
Deep research, in which an agent searches the open web, collects evidence, and derives an answer through extended reasoning, is a prominent use case for frontier language models. Frontier deep research products score high on existing benchmarks, making it difficult to distinguish their capabilities from current evaluation data alone. We introduce DeepWeb-Bench, a deep research benchmark that is su
👤 作者: Sixiong Xie, Zhuofan Shi, Haiyang Shen, Jiuzheng Wang, Siqi Zhong, Mugeng Liu, Chongyang Pan, Peilun Jia, Baoqing Sun, 详净 Xiang Jing, Yun Ma

---
🔗 **[DeepWeb-Bench :需要大量跨源证据和长期推导的深度研究基准](https://arxiv.org/abs/2605.21482v1)**

> DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-05-22 08:00