Alichi：一个大规模的多模式数据集和类似人类对话系统的自动化评估工具

论文标题

Alichi：一个大规模的多模式数据集和类似人类对话系统的自动化评估工具

AliCHI: A Large-scale Multi-modal Dataset and Automated Evaluation Tool for Human-like Dialogue Systems

论文作者

Luo, Zhiling, Shi, Qiankun, Zhao, Sha, Zhou, Wei, Chen, Haiqing, Ma, Yuankai, Leng, Haitao

论文摘要

预计设计良好的类似人类的交互式对话系统将采取行动（例如微笑），并以类似于人类的模式做出响应。但是，由于单模式（仅语音）或当前公共数据集的限制，大多数对话系统只能在语音中做出响应，而不能采取类似人类的行动。在这项工作中，我们以面对面的方式构建了一个大规模的人与人之间对话的多模式数据集，并具有细粒度的注释。视频格式的原始数据包含635次对话会议，从200个参与者那里收集了有关主题的参与者，总共持续52小时。此外，我们在每个对话会话中手动注释了他们的开始/结束时间戳的口头和非语言行为。此外，我们开发了一种类似人类的对话系统的相应评估工具，以自动评估两个基本任务的准确性，即转弯预测和时间和内容预测。我们已经打开了数据，工具将在会议上发布。

A well-designed interactive human-like dialogue system is expected to take actions (e.g. smiling) and respond in a pattern similar to humans. However, due to the limitation of single-modality (only speech) or small volume of currently public datasets, most dialogue systems can only respond in speech and cannot take human-like actions. In this work, we build a large-scale multi-modal dataset of human-to-human conversation in a face-to-face fashion, with fine-grained annotations. The raw data in video format contains 635 dialogue sessions, being collected from 200 participants on designed topics and lasting 52 hours in total. Moreover, we manually annotated the verbal and non-verbal behaviors in each dialogue session on their start/end timestamp. Furthermore, we developed a corresponding evaluation tool for human-like dialogue systems to automatically evaluates the accuracy of two basic tasks, turn-taking prediction, and backchannel prediction, on both time and content. We have opened the data, the tools will be released at the conference.

下载PDF全文

下载文献需遵守相关版权规定

论文标题