论文标题

EdgeFormer:用于启动seq2seq生成的参数效率变压器

EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation

论文作者

Ge, Tao, Chen, Si-Qing, Wei, Furu

论文摘要

我们介绍了EdgeFormer - 在严格的计算和内存约束下,用于智障SEQ2SEQ生成的参数效率变压器。与先前的参数效率变压器相比,EdgeFormer应用了两个新的原理进行具有成本效益的参数化,从而使其在相同的参数预算的情况下可以更好地执行。此外,通过图层适应创新进一步增强了EdgeFormer,该创新提出了用于改善网络使用共享层的创新。 广泛的实验表明,在计算和内存约束下,EdgeFormer可以有效地优于先前的参数有效变压器基线,并在计算和内存约束下获得竞争结果。鉴于有希望的结果,我们发布了Edgelm,这是EdgeFormer的验证版本,这是第一个预估计的在设备上的SEQ2SEQ模型,可以轻松地针对SEQ2SEQ任务进行微调,并具有强大的结果,从而促进了在实践中促进基于evice evice seq2Seq的生成。

We introduce EdgeFormer -- a parameter-efficient Transformer for on-device seq2seq generation under the strict computation and memory constraints. Compared with the previous parameter-efficient Transformers, EdgeFormer applies two novel principles for cost-effective parameterization, allowing it to perform better given the same parameter budget; moreover, EdgeFormer is further enhanced by layer adaptation innovation that is proposed for improving the network with shared layers. Extensive experiments show EdgeFormer can effectively outperform previous parameter-efficient Transformer baselines and achieve competitive results under both the computation and memory constraints. Given the promising results, we release EdgeLM -- the pretrained version of EdgeFormer, which is the first publicly available pretrained on-device seq2seq model that can be easily fine-tuned for seq2seq tasks with strong results, facilitating on-device seq2seq generation in practice.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源