**摘要**
Prior work on imitation learning from suboptimal demonstrations typically relies on compressed supervision signals such as confidence estimates, discriminator scores, or importance weights. These scalar signals are inherently limited, as they cannot explicitly express intermediate reasoning about task progress, failure modes, or corrective actions. We propose a language-critique framework for imit
👤 作者: Chih-Han Yang, Dai-Jie Wu, Yun-Ping Huang, Ping-Chun Hsieh, Kenneth Marino, Shao-Hua Sun
---
🔗 **[Language-Critique Imitation Learning from Suboptimal Demonstrations](https://arxiv.org/abs/2607.01225v1)**
> Language-Critique Imitation Learning from Suboptimal Demonstrations
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-07-02 14:00
news
Language-Critique Imitation Learning from Suboptimal Demonstrations
加载回复中...