TeraPipe: Token-Level Pipeline Parallelism for Training
Large-Scale Language Models
Zhuohan Li 1 Siyuan Zhuang 1 Shiyuan Guo 1 Danyang Zhuo 2 Hao Zhang 1 Dawn Song 1 Ion Stoica 1
Abstract bit floating-point numbers. This significantly exceeds the
memory capacity of existing hardware accelerators, such
Model parallelism has become a necessity for
as GPUs and TPUs, which makes mo ...
附件列表