视频游戏类型分类的深度学习

论文标题

视频游戏类型分类的深度学习

Deep learning for video game genre classification

论文作者

Jiang, Yuhang, Zheng, Lukun

论文摘要

基于其封面和文本描述的视频游戏类型分类将对许多现代识别，搭配和检索系统完全有益。同时，由于以下原因，这也是一项极具挑战性的任务：首先，存在各种各样的视频游戏类型，其中许多类型没有具体定义。其次，视频游戏以许多不同的方式覆盖，例如颜色，样式，文本信息等，甚至对于同一类型的游戏。第三，由于许多外部因素，例如国家，文化，目标读者人群等，封面设计和文本描述可能会有所不同。随着视频游戏行业的竞争力的不断增长，封面设计师和版式的竞争力将封面设计提升到极限，以期吸引销售。近年来，基于计算机的自动视频游戏类型分类系统成为一个特别令人兴奋的研究主题。在本文中，我们提出了一个多模式深度学习框架来解决这个问题。本文的贡献是四倍。首先，我们编译了一个大型数据集，该数据集由50,000个视频游戏组成，这些游戏来自21种流派，由封面图像，描述文本和标题文本以及流派信息组成。其次，基于图像和基于文本的最先进模型将对视频游戏的流派分类进行彻底评估。第三，我们基于图像和文本开发了一个高效且可销售的多模式框架。第四，对实验结果进行了详尽的分析，并提出了提高绩效的未来工作。结果表明，多模式框架的表现优于当前基于图像或基于文本的模型。为此任务概述了一些挑战。为了达到令人满意的水平，此分类任务需要更多的努力和资源。

Video game genre classification based on its cover and textual description would be utterly beneficial to many modern identification, collocation, and retrieval systems. At the same time, it is also an extremely challenging task due to the following reasons: First, there exists a wide variety of video game genres, many of which are not concretely defined. Second, video game covers vary in many different ways such as colors, styles, textual information, etc, even for games of the same genre. Third, cover designs and textual descriptions may vary due to many external factors such as country, culture, target reader populations, etc. With the growing competitiveness in the video game industry, the cover designers and typographers push the cover designs to its limit in the hope of attracting sales. The computer-based automatic video game genre classification systems become a particularly exciting research topic in recent years. In this paper, we propose a multi-modal deep learning framework to solve this problem. The contribution of this paper is four-fold. First, we compiles a large dataset consisting of 50,000 video games from 21 genres made of cover images, description text, and title text and the genre information. Second, image-based and text-based, state-of-the-art models are evaluated thoroughly for the task of genre classification for video games. Third, we developed an efficient and salable multi-modal framework based on both images and texts. Fourth, a thorough analysis of the experimental results is given and future works to improve the performance is suggested. The results show that the multi-modal framework outperforms the current state-of-the-art image-based or text-based models. Several challenges are outlined for this task. More efforts and resources are needed for this classification task in order to reach a satisfactory level.

下载PDF全文

下载文献需遵守相关版权规定

论文标题