与ESPNET的基于传感器的端到端ASR的研究：建筑，辅助损失和解码策略

论文标题

与ESPNET的基于传感器的端到端ASR的研究：建筑，辅助损失和解码策略

A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies

论文作者

Boyer, Florian, Shinohara, Yusuke, Ishii, Takaaki, Inaguma, Hirofumi, Watanabe, Shinji

论文摘要

在这项研究中，我们介绍了通过ESPNET中RNN-T损失训练的模型的最新发展。它涉及使用各种架构，例如最近提出的构象异构体，具有不同辅助标准和多种解码策略（包括我们自己的主张）的多任务学习。通过实验和基准，我们表明我们提出的系统可以与众所周知的数据集（如Librispeech和Aishell-1）上的其他最先进的系统具有竞争力。此外，我们证明了这些模型在ESPNET中的其他已经实现的系统方面有望在性能和解码速度方面，从而有可能具有强大的系统来进行流媒体任务。通过这些增加，我们希望扩大ESPNET工具包对研究社区的实用性，并为ASR行业提供工具，以将我们的系统部署在现实和生产环境中。

In this study, we present recent developments of models trained with the RNN-T loss in ESPnet. It involves the use of various architectures such as recently proposed Conformer, multi-task learning with different auxiliary criteria and multiple decoding strategies, including our own proposition. Through experiments and benchmarks, we show that our proposed systems can be competitive against other state-of-art systems on well-known datasets such as LibriSpeech and AISHELL-1. Additionally, we demonstrate that these models are promising against other already implemented systems in ESPnet in regards to both performance and decoding speed, enabling the possibility to have powerful systems for a streaming task. With these additions, we hope to expand the usefulness of the ESPnet toolkit for the research community and also give tools for the ASR industry to deploy our systems in realistic and production environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题