论文标题
水库变压器
Reservoir Transformers
论文作者
论文摘要
我们证明,即使某些层被随机初始化并且从未更新,变压器即使某些层也获得了令人印象深刻的性能。受到机器学习的旧思想的启发,我们探索了各种非线性“储层”层,上面散布着常规变压器层,并在墙壁上计算时间的改进,直到收敛,整体性能,各种机器翻译和(掩盖)语言建模任务。
We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear "reservoir" layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance, on various machine translation and (masked) language modelling tasks.