Towards Theoretically Understanding Why S GD
Generalizes Better Than A DAM in Deep Learning
Pan Zhou , Jiashi Feng , Chao Ma , Caiming Xiong , Steven HOI , Weinan E
Salesforce Research, National University of Singapore, Princeton University
{pzhou,shoi,cxiong}@salesforce.com elefjia@nus.edu.sg {chaom@, weinan@math.}princeton.edu
Abstract
It is not clear yet why A DAM-alike adaptive gradient algorithms suffer from worse
generalization ...
附件列表