rwen-tts：自然文本到语音综合网络的关系 - 感知词编码网络

论文标题

rwen-tts：自然文本到语音综合网络的关系 - 感知词编码网络

RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis

论文作者

Oh, Shinhyeok, Noh, HyeongRae, Hong, Yoonseok, Oh, Insoo

论文摘要

随着深度学习的出现，出现了大量产生类似人类语音的文本到语音模型。最近，通过引入句法和语义信息W.R.T输入文本，已经提出了各种方法来丰富TTS模型的自然性和表达性。尽管这些策略显示出令人印象深刻的结果，但它们在利用语言信息方面仍然存在一些局限性。首先，大多数方法仅使用图形网络来利用句法和语义信息，而无需考虑语言特征。其次，大多数以前的作品在编码句法和语义信息时不会明确考虑相邻的单词，即使很明显，在编码当前单词时相邻的单词通常是有意义的。为了解决这些问题，我们提出了关系感知的单词编码网络（RWEN），该单词有效地允许基于两个模块（即语义级关系编码和相邻的单词关系编码）有效地句法和语义信息。实验结果表明，与以前的工作相比，实验结果的改善。

With the advent of deep learning, a huge number of text-to-speech (TTS) models which produce human-like speech have emerged. Recently, by introducing syntactic and semantic information w.r.t the input text, various approaches have been proposed to enrich the naturalness and expressiveness of TTS models. Although these strategies showed impressive results, they still have some limitations in utilizing language information. First, most approaches only use graph networks to utilize syntactic and semantic information without considering linguistic features. Second, most previous works do not explicitly consider adjacent words when encoding syntactic and semantic information, even though it is obvious that adjacent words are usually meaningful when encoding the current word. To address these issues, we propose Relation-aware Word Encoding Network (RWEN), which effectively allows syntactic and semantic information based on two modules (i.e., Semantic-level Relation Encoding and Adjacent Word Relation Encoding). Experimental results show substantial improvements compared to previous works.

下载PDF全文

下载文献需遵守相关版权规定

论文标题