**摘要**
Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert capacity. However, recent analyses and our routing probe challenge this allocation rule: replacing a deeper layer's learn
👤 作者: Minbin Huang, Han Shi, Chuanyang Zheng, Yimeng Wu, Guoxuan Chen, Xintong Yu, Yichun Yin, Hong Cheng

---
🔗 **[UniPool :混合专家的全球共享专家池](https://arxiv.org/abs/2605.06665v1)**

> UniPool: A Globally Shared Expert Pool for Mixture-of-Experts
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-05-09 08:11