Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

**摘要**
Hyperparameter transfer allows extrapolating optimal optimization hyperparameters from small to large scales, making it critical for training large language models (LLMs). This is done either by fitting a scaling law to the hyperparameters or by a judicious choice of parameterization, such as Maximal Update ($μ$P), that renders optimal hyperparameters approximately scale invariant. In this paper,
👤 作者: Dayal Singh Kalra, Maissam Barkeshli

---
🔗 **[Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate](https://arxiv.org/abs/2605.21486v1)**

> Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate
🏷️ 来源: ArXiv cs.AI
⏱️ 2026-05-22 08:00

Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

回复