paper

A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation

Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation

Name: A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
Author: Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

paper2018-10-29English

Start Reading

deep learning portfolioarxiv

Description

The convergence rate and final performance of common deep learning models have significantly benefited from heuristics such as learning rate schedules, knowledge distillation, skip connections, and normalization layers. In the absence of theoretical underpinnings, controlled experiments aimed at explaining these strategies can aid our understanding of deep learning landscapes and the training dynamics. Existing approaches for empirical analysis rely on tools of linear interpolation and visualiza...