**摘要**
Large-scale model training increasingly relies on composing multiple parallelism strategies, such as data, pipeline, and expert parallelism, together with memory-saving optimizations like ZeRO. Deployed systems for foundation model pretraining often rely on human experts to manually design a high-level parallelism strategy then implement the corresponding low-level execution strategy, making it di
👤 作者: Megan Frisella, Shubham Tiwari, Andy Ruan, Yi Pan, Parker Gustafson, Mat Jacob, 吉尔伯特·伯恩斯坦, Stephanie Wang
---
🔗 **[Piper :可编程的分布式培训系统](https://arxiv.org/abs/2606.11169v1)**
> Piper: A Programmable Distributed Training System
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-06-10 14:00
加载回复中...