论文标题

终端端对话法分类的神经韵律编码器

A neural prosody encoder for end-ro-end dialogue act classification

论文作者

Wei, Kai, Knox, Dillon, Radfar, Martin, Tran, Thanh, Muller, Markus, Strimel, Grant P., Susanj, Nathan, Mouchtaris, Athanasios, Omologo, Maurizio

论文摘要

对话ACT分类(DAC)是对话系统中口语理解的关键任务。韵律特征(例如能量和音高)已被证明对DAC有用。尽管很重要,但很少的研究探索了神经方法,将韵律特征集成到端到端(E2E)DAC模型中,这些模型直接从音频信号中推断出对话。在这项工作中,我们提出了一种E2E神经架构,该神经结构考虑到表征韵律现象在语音内部不同层次上共发生的必要性。该体系结构的新颖部分是一种可学习的门控机制,它评估了韵律特征的重要性,并有选择地保留E2E DAC所需的核心信息。我们提出的模型在三个公开可用的基准数据集中将DAC的准确性提高了1.07%。

Dialogue act classification (DAC) is a critical task for spoken language understanding in dialogue systems. Prosodic features such as energy and pitch have been shown to be useful for DAC. Despite their importance, little research has explored neural approaches to integrate prosodic features into end-to-end (E2E) DAC models which infer dialogue acts directly from audio signals. In this work, we propose an E2E neural architecture that takes into account the need for characterizing prosodic phenomena co-occurring at different levels inside an utterance. A novel part of this architecture is a learnable gating mechanism that assesses the importance of prosodic features and selectively retains core information necessary for E2E DAC. Our proposed model improves DAC accuracy by 1.07% absolute across three publicly available benchmark datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源