**摘要**
监督微调( SFT )通常会最大化演示轨迹中每个令牌的可能性。但是,观察到的令牌可能是非唯一的、嘈杂的或与之前的模型不对齐。严格拟合这个单热目标可能是次优的,特别是当预训练模型先前编码了丰富的知识时。在这项工作中,我们将SFT重新解释为目标分销设计: instea
👤 作者: Tong Xie, Yuanhao Ban, Yunqi Hong, Sohyun An, Yihang Chen, Cho-Jui Hsieh
---
🔗 **[A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design](https://arxiv.org/abs/2606.11189v1)**
> A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-06-10 14:00
news
A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design
加载回复中...