论文标题
在快速端到端训练中利用多个序列长度进行图像字幕
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
论文作者
论文摘要
我们介绍了一种称为扩展机制的方法,该方法处理了序列中元素数量不受限制的输入。通过这样做,与传统的基于注意力的方法相比,模型可以更有效地学习。为了支持这一主张,我们设计了一种新型的体系结构扩展V2,该v2在其各自类别中的MS COCO 2014图像字幕挑战和最新技术方面取得了良好的成果,在离线测试分配中得分为143.7 ciderd,在线评估服务器中的140.8 CIDERD和72.9在nocaps验证集中的72.9 Allcider。此外,我们引入了最终训练算法的末端到最大速度的速度比已建立的替代方案快2.8倍。源代码可用:https://github.com/jchenghu/expansionnet_v2
We introduce a method called the Expansion mechanism that processes the input unconstrained by the number of elements in the sequence. By doing so, the model can learn more effectively compared to traditional attention-based approaches. To support this claim, we design a novel architecture ExpansionNet v2 that achieved strong results on the MS COCO 2014 Image Captioning challenge and the State of the Art in its respective category, with a score of 143.7 CIDErD in the offline test split, 140.8 CIDErD in the online evaluation server and 72.9 AllCIDEr on the nocaps validation set. Additionally, we introduce an End to End training algorithm up to 2.8 times faster than established alternatives. Source code available at: https://github.com/jchenghu/ExpansionNet_v2