论文标题
face2Text重新访问:改进的数据集和基线结果
Face2Text revisited: Improved data set and baseline results
论文作者
论文摘要
当前的图像描述生成模型不能很好地转移到描述人脸的任务。为了鼓励开发更多以人为中心的描述,我们根据Celeba图像数据集开发了一个新的面部描述数据集。我们描述了此数据集的属性,并从对其进行训练的Face Description发出了介绍,该介绍探讨了使用VGGFACE/RESNET CNNS转移学习的可行性。比较是通过自动化指标和人类评估的76位讲英语的参与者来得出的。根据人类评估,VGGFACE-LSTM +注意模型产生的描述最接近地面真相,而Resnet-LSTM +注意模型获得了最高的苹果酒和Cider-D结果(分别为1.252和0.686)。新的数据集和这些实验结果共同为该领域的未来工作提供了数据和基准。
Current image description generation models do not transfer well to the task of describing human faces. To encourage the development of more human-focused descriptions, we developed a new data set of facial descriptions based on the CelebA image data set. We describe the properties of this data set, and present results from a face description generator trained on it, which explores the feasibility of using transfer learning from VGGFace/ResNet CNNs. Comparisons are drawn through both automated metrics and human evaluation by 76 English-speaking participants. The descriptions generated by the VGGFace-LSTM + Attention model are closest to the ground truth according to human evaluation whilst the ResNet-LSTM + Attention model obtained the highest CIDEr and CIDEr-D results (1.252 and 0.686 respectively). Together, the new data set and these experimental results provide data and baselines for future work in this area.