论文标题
通过多任务学习训练灵活的深度模型,用于神经机器翻译
Training Flexible Depth Model by Multi-Task Learning for Neural Machine Translation
论文作者
论文摘要
标准的神经机器翻译模型只能以与训练相同的深度配置进行解码。受此功能的限制,我们必须部署各种尺寸的模型以保持相同的翻译延迟,因为不同终端设备(例如,手机)上的硬件条件可能会有很大差异。这样的个人培训会增加模型维护成本和模型迭代较慢,尤其是对于行业。在这项工作中,我们建议使用多任务学习来训练一个灵活的深度模型,该模型可以在推理过程中适应不同的深度配置。实验结果表明,我们的方法可以同时支持24个深度配置中的解码,并且优于单个训练和另一种灵活的深度模型训练方法 - layerDrop。
The standard neural machine translation model can only decode with the same depth configuration as training. Restricted by this feature, we have to deploy models of various sizes to maintain the same translation latency, because the hardware conditions on different terminal devices (e.g., mobile phones) may vary greatly. Such individual training leads to increased model maintenance costs and slower model iterations, especially for the industry. In this work, we propose to use multi-task learning to train a flexible depth model that can adapt to different depth configurations during inference. Experimental results show that our approach can simultaneously support decoding in 24 depth configurations and is superior to the individual training and another flexible depth model training method -- LayerDrop.