Explaining Attention with Program Synthesis

**摘要**
A longstanding goal of research on interpretable deep learning is to replace opaque neural computations with human-meaningful symbolic descriptions. In this paper, we propose an approach for approximating the behavior of components of deep networks with executable programs. We focus on attention heads in transformer language models. For a given head, we first compute its associated attention matri
👤 作者: Amiri Hayes, Belinda Li, Jacob Andreas

---
🔗 **[Explaining Attention with Program Synthesis](https://arxiv.org/abs/2606.19317v1)**

> Explaining Attention with Program Synthesis
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-06-18 14:01

Explaining Attention with Program Synthesis

回复