On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity

**摘要**
策略自馏通过使用单一模型作为教师和学生来实现强PASS @ 1准确性，教师以正确的演示为条件提供密集的令牌级反馈。我们表明，这可能会带来隐性成本：推出多样性降低，并通过@ k曲线变平（即，生成更多的推出未能提高准确性）。我们将其追溯到复合bi
👤 作者: Andrei Liviu Nicolicioiu, Mohammad Pezeshki, Aaron Courville

---
🔗 **[On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity](https://arxiv.org/abs/2606.26091v1)**

> On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-06-25 14:00

On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity

回复