Thinking Like Transformers
Gail Weiss 1 Yoav Goldberg 2 3 Eran Yahav 1
Abstract a transformer operates at a higher-level of abstraction, rea-
soning in terms of a composition of sequence operations
What is the computational model behind a Trans- rather than neural network primitives.
former? Where recurrent neural networks have
direct parallels in finite state machines, allow- We are inspired ...
附件列表