**摘要**
策略自馏通过使用单一模型作为教师和学生来实现强PASS @ 1准确性,教师以正确的演示为条件提供密集的令牌级反馈。我们表明,这可能会带来隐性成本:推出多样性降低,并通过@ k曲线变平(即,生成更多的推出未能提高准确性)。我们将其追溯到复合bi
👤 作者: Andrei Liviu Nicolicioiu, Mohammad Pezeshki, Aaron Courville

---
🔗 **[On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity](https://arxiv.org/abs/2606.26091v1)**

> On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-06-25 14:00