在快速端到端训练中利用多个序列长度进行图像字幕

论文标题

在快速端到端训练中利用多个序列长度进行图像字幕

Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning

论文作者

Hu, Jia Cheng, Cavicchioli, Roberto, Capotondi, Alessandro

论文摘要

我们介绍了一种称为扩展机制的方法，该方法处理了序列中元素数量不受限制的输入。通过这样做，与传统的基于注意力的方法相比，模型可以更有效地学习。为了支持这一主张，我们设计了一种新型的体系结构扩展V2，该v2在其各自类别中的MS COCO 2014图像字幕挑战和最新技术方面取得了良好的成果，在离线测试分配中得分为143.7 ciderd，在线评估服务器中的140.8 CIDERD和72.9在nocaps验证集中的72.9 Allcider。此外，我们引入了最终训练算法的末端到最大速度的速度比已建立的替代方案快2.8倍。源代码可用：https：//github.com/jchenghu/expansionnet_v2

We introduce a method called the Expansion mechanism that processes the input unconstrained by the number of elements in the sequence. By doing so, the model can learn more effectively compared to traditional attention-based approaches. To support this claim, we design a novel architecture ExpansionNet v2 that achieved strong results on the MS COCO 2014 Image Captioning challenge and the State of the Art in its respective category, with a score of 143.7 CIDErD in the offline test split, 140.8 CIDErD in the online evaluation server and 72.9 AllCIDEr on the nocaps validation set. Additionally, we introduce an End to End training algorithm up to 2.8 times faster than established alternatives. Source code available at: https://github.com/jchenghu/ExpansionNet_v2

下载PDF全文

下载文献需遵守相关版权规定

论文标题