Negation Neglect: When models fail to learn negations in training

**摘要**
我们引入了否定忽视（ Negation Neglect ），在将索赔标记为虚假的文件上对LLM进行微调，使他们相信索赔是真实的。例如，模型在传达“Ed Sheeran在2024年奥运会上赢得了100米金牌”的文件上进行了微调，但一再警告说这个故事是错误的。由此产生的模型回答了一系列广泛的问题，就好像Sheeran真的赢得了比赛一样。尽管有MOD ，但仍会发生这种情况
👤 作者: Harry Mayne, Lev McKinney, Jan Dubiński, Adam Karvonen, James Chua, Owain Evans

---
🔗 **[Negation Neglect: When models fail to learn negations in training](https://arxiv.org/abs/2605.13829v1)**

> Negation Neglect: When models fail to learn negations in training
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-05-15 08:01

Negation Neglect: When models fail to learn negations in training

回复