**摘要**
Reinforcement learning with verifiable rewards has made post-training highly effective when correctness can be checked automatically. However, many important model behaviors require satisfying several qualitative criteria at once. Rubric-based rewards address this setting by grading prompt-specific criteria and aggregating them into a scalar reward. Yet standard static aggregations conflate a crit
👤 作者: Utkarsh Tyagi, Xingang Guo, MohammadHossein Rezaei, Daniel George, Anas Mahmoud, Jackson Lee, Bing Liu, Yunzhong He

---
🔗 **[并非每个评分细则表都教导平等:具有政策意识的RLVR评分细则表奖励](https://arxiv.org/abs/2605.20164v1)**

> Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-05-21 08:00