神经文本归一化利用字符串和声音的相似性

论文标题

神经文本归一化利用字符串和声音的相似性

Neural text normalization leveraging similarities of strings and sounds

论文作者

Kawamura, Riku, Aoki, Tatsuya, Kamigaito, Hidetaka, Takamura, Hiroya, Okumura, Manabu

论文摘要

我们提出的神经模型可以通过考虑单词字符串和声音的相似性来使文本标准化。我们通过实验比较了一个模型，该模型考虑了单词和声音的相似性，该模型仅考虑单词字符串或声音的相似性，以及一个没有相似之处的模型。结果表明，利用字符串相似性成功地处理了拼写错误和缩写，并考虑到合理的相似性成功地处理语音替代并强调了角色。因此，提议的模型获得了比基线更高的f $ _1 $得分。

We propose neural models that can normalize text by considering the similarities of word strings and sounds. We experimentally compared a model that considers the similarities of both word strings and sounds, a model that considers only the similarity of word strings or of sounds, and a model without the similarities as a baseline. Results showed that leveraging the word string similarity succeeded in dealing with misspellings and abbreviations, and taking into account the sound similarity succeeded in dealing with phonetic substitutions and emphasized characters. So that the proposed models achieved higher F$_1$ scores than the baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题