Unifying Vision-Language Representation Space
with Single-tower Transformer
Jiho Jang1 Chaerin Kong1 Donghyeon Jeon2 Seonhoon Kim3 Nojun Kwak1
1 2 3
Seoul National University NAVER Coupang
arXiv:2211.11153v1 [cs.LG] 21 Nov 2022
Figure 1: A truly unified vision-language representation s ...
附件列表