Bayesian Attention Belief Networks
Shujian Zhang * 1 Xinjie Fan * 1 Bo Chen 2 Mingyuan Zhou 1
Abstract of the Transformer structure, it becomes possible to train
unprecedented large models on big datasets (Devlin et al.,
Attention-based neural networks have achieved 2018), which stimulates a great amount of research to pre-
state-of-the-art results on a wide range of tasks. train models on large unlabeled d ...
附件列表