Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius 1 Heng Wang 1 Lorenzo Torresani 1 2
Abstract Video understanding shares several high-level similarities
with NLP. First of all, videos and sentences are both sequen-
We present a convolution-free approach to video
tial. Furthermore, precisely as the meaning of a word can
classification built exclusively ...
附件列表