A Second look at Exponential and Cosine Step Sizes:
Simplicity, Adaptivity, and Performance
Xiaoyu Li 1 Zhenxun Zhuang 2 Francesco Orabona 1 2 3
Abstract typically better scale with the complexity of the predic-
tors and the amount of training data compared with convex
Stochastic Gradient Descent (SGD) is a popu- ones. One such example is the deep neural networks. Over
lar tool in traini ...
附件列表