论文标题

上下文在英语中的神经音调重音检测中的作用

The role of context in neural pitch accent detection in English

论文作者

Nielsen, Elizabeth, Steedman, Mark, Goldwater, Sharon

论文摘要

韵律是自然语言的丰富信息来源,是对比的现象的标记。为了使这些信息可用于下游任务,我们需要一种检测语音中的韵律事件的方法。我们提出了一个新的模型,以灵感来自Stehwien等人的工作。 (2018年),他为此任务提供了基于CNN的模型。我们的模型通过将完整的话语用作输入和添加LSTM层来更大程度地利用上下文。我们发现,这些创新导致在波士顿大学广播新闻语料库中对美国英语演讲的音调口音检测的准确性从87.5%提高到88.7%,这是最先进的结果。我们还发现,一个简单的基线只能预测每个内容词的音高重音,其精度为82.2%,我们建议这是适合此任务的基线。最后,我们进行消融测试,以显示俯仰是该任务和该语料库的最重要的声学特征。

Prosody is a rich information source in natural language, serving as a marker for phenomena such as contrast. In order to make this information available to downstream tasks, we need a way to detect prosodic events in speech. We propose a new model for pitch accent detection, inspired by the work of Stehwien et al. (2018), who presented a CNN-based model for this task. Our model makes greater use of context by using full utterances as input and adding an LSTM layer. We find that these innovations lead to an improvement from 87.5% to 88.7% accuracy on pitch accent detection on American English speech in the Boston University Radio News Corpus, a state-of-the-art result. We also find that a simple baseline that just predicts a pitch accent on every content word yields 82.2% accuracy, and we suggest that this is the appropriate baseline for this task. Finally, we conduct ablation tests that show pitch is the most important acoustic feature for this task and this corpus.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源