**摘要**
Multimodal Large Language Models have advanced visual reasoning, yet a purely textual chain of thought remains a bottleneck for questions that require fine-grained focus or view transformations. The ''think with images'' paradigm narrows this gap, but existing approaches are either constrained by fixed predefined toolkits or produce noisy intermediate images from unified multimodal methods. We pur
👤 作者: Beichen Zhang, Yuhong Liu, Jinsong Li, Yuhang Zang, Jiaqi Wang, 林大华
---
🔗 **[ETCHR :编辑以澄清和利用推理](https://arxiv.org/abs/2605.23897v1)**
> ETCHR: Editing To Clarify and Harness Reasoning
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-05-26 08:01
加载回复中...