Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag 1 Kazuki Irie 1 Jurgen Schmidhuber 1
Abstract field network (Ramsauer et al., 2021; Krotov & Hopfield,
2016; Demircigil et al., 2017). It extends a form of attention
We show the formal equivalence of linearised self-
(Bahdanau et al., 2015) originally introduced to complement
attention mechanisms and fast ...
附件列表