**摘要**
LLM agents increasingly act over long horizons, where a single trajectory can contain hundreds or thousands of actions. In these settings, outcome-only rewards provide too sparse guidance, failing to inform the model about the goodness of intermediate actions. Dense supervision methods aim to solve this problem by scoring intermediate steps, from intrinsic confidence to self-distillation and embed
👤 作者: 塞尔吉奥·埃尔南德斯-古铁雷斯, Matteo Merler, Ilze Amanda Auzina, Joschka Strüber, Ameya Prabhu, Matthias Bethge
---
🔗 **[QVal :廉价评估长期LLM专员的密集监督信号](https://arxiv.org/abs/2606.32034v1)**
> QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-07-01 14:00
加载回复中...