**摘要**
As LLM agents are deployed in long-horizon sessions, context accumulation drives up inference costs. Existing approaches utilize text pruning or dynamic memory eviction to minimize token footprints; however, their unconstrained sequence mutations alter layouts, introducing prefix mismatches and cache invalidation. This reveals a critical trade-off between text sparsity and prompt cache continuity.
👤 作者: Buqiang Xu, Zirui Xue, Dianmou Chen, Chenyang Fu, Chiyu Wu, Caiying Huang, Chen Jiang, Jizhan Fang, Xinle Deng, Yijun Chen, Yunzhi Yao, Xuehai Wang, Jin Shang, Gong Yu, Ningyu Zhang
---
🔗 **[TokenPilot : LLM代理的高缓存效率上下文管理](https://arxiv.org/abs/2606.17016v1)**
> TokenPilot: Cache-Efficient Context Management for LLM Agents
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-06-16 14:01
加载回复中...