Which Algorithmic Choices Matter at Which Batch
Sizes? Insights From a Noisy Quadratic Model
Guodong Zhang1,2,3, Lala Li3 , Zachary Nado3 , James Martens4 ,
Sushant Sachdeva1 , George E. Dahl3 , Christopher J. Shallue3 , Roger Grosse1,2
1
University of Toronto, 2 Vector Institute, 3 Google Research, Brain Team, 4 DeepMind
Abstract
Increasing the batch size is a popular way to speed up neural network training,
but beyond some cri ...
附件列表