**摘要**
Long-rollout causal video diffusion has converged on a fixed-size sliding-window KV cache, with recent progress innovating within this layout by changing which tokens occupy the window or how their positions are encoded. The per-head KV layout itself, a dominant contributor to streaming memory and latency, has been mostly left unchanged. In this paper, we present the first study of Multi-Head Late
👤 作者: Hidir Yesiltepe, Jiazhen Hu, 金枪鱼汉萨利赫梅拉尔, Adil Kaan Akan, Kaan Oktay, Hoda Eldardiry, Pinar Yanardag

---
🔗 **[VideoMLA :用于微小尺度自回归视频扩散的低秩潜伏KV缓存](https://arxiv.org/abs/2605.30351v1)**

> VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-05-29 14:00