使用注意模型探索音频质量评估和异常定位

论文标题

使用注意模型探索音频质量评估和异常定位

Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models

论文作者

Huang, Qiang, Hain, Thomas

论文摘要

语音技术的许多应用都需要越来越多的音频数据。自动评估收集的录音质量对于确保满足相关应用程序的要求很重要。但是，没有干净的参考，有效和高表现的评估仍然是一项具有挑战性的任务。在本文中，通过共同使用双向长期记忆和注意机制共同提出了一种用于音频质量评估的新型模型。前者是模仿人类的听觉感知能力，可以从录音中学习信息，而后者是通过突出显示目标相关特征来进一步区分干扰与所需的信号。为了评估我们所提出的方法，通过与各种自然声音混合使用TIMIT数据集并增强。在我们的实验中，探索了两个任务。第一个任务是预测话语质量得分，第二个任务是确定录音中异常发生异常的变形。获得的结果表明，我们所提出的方法的使用优于强大的基线方法，并在三个指标，线性相关系数和Spearman等级相关系数和F1测量后获得了约5％的改善。

Many applications of speech technology require more and more audio data. Automatic assessment of the quality of the collected recordings is important to ensure they meet the requirements of the related applications. However, effective and high performing assessment remains a challenging task without a clean reference. In this paper, a novel model for audio quality assessment is proposed by jointly using bidirectional long short-term memory and an attention mechanism. The former is to mimic a human auditory perception ability to learn information from a recording, and the latter is to further discriminate interferences from desired signals by highlighting target related features. To evaluate our proposed approach, the TIMIT dataset is used and augmented by mixing with various natural sounds. In our experiments, two tasks are explored. The first task is to predict an utterance quality score, and the second is to identify where an anomalous distortion takes place in a recording. The obtained results show that the use of our proposed approach outperforms a strong baseline method and gains about 5% improvements after being measured by three metrics, Linear Correlation Coefficient and Spearman Rank Correlation Coefficient, and F1.

下载PDF全文

下载文献需遵守相关版权规定

论文标题