**摘要**
Building trustworthy medical multimodal large language models (MLLMs) is critical for reliable clinical decision support. Existing medical hallucination benchmarks mainly focus on data collection, but often ignore where hallucinations originate within the reasoning process. We find that hallucination sources vary across samples: errors may arise from visual misrecognition, incorrect medical knowle
👤 作者: Sicheng Yang, Hangjie Yuan, Wenjun Zhang, Jinwang Wang, Yichen Qian, Weihua Chen, Fan Wang, Lei Zhu

---
🔗 **[ClinHallu :诊断医学MLM推理中阶段性幻觉的基准](https://arxiv.org/abs/2606.14697v1)**

> ClinHallu: A Benchmark for Diagnosing Stage-Wise Hallucinations in Medical MLLM Reasoning
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-06-15 14:00