Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits

Name: Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits
Author: Daniel Morales-Brotons, Thijs Vogels, Hadrien Hendrikx

Daniel Morales-Brotons, Thijs Vogels, Hadrien Hendrikx

Paper2024-11-27English

Start Reading

deep learning portfolioarxiv

Description

Weight averaging of Stochastic Gradient Descent (SGD) iterates is a popular method for training deep learning models. While it is often used as part of complex training pipelines to improve generalization or serve as a `teacher' model, weight averaging lacks proper evaluation on its own. In this work, we present a systematic study of the Exponential Moving Average (EMA) of weights. We first explore the training dynamics of EMA, give guidelines for hyperparameter tuning, and highlight its good ea...