BASE Layers: Simplifying Training of Large, Sparse Models
Mike Lewis 1 Shruti Bhosale 1 Tim Dettmers 1 2 Naman Goyal 1 Luke Zettlemoyer 1 2
Abstract Worker 1 Worker 2
We introduce a new balanced assignment of ex- Re-route to original worker Dogs bark Cats purr
perts (BASE) layer for large language models that
greatly simplifies existing high capacity sparse Mix in expert output:
...
附件列表