具有动态潜在感知器的有效语音翻译

论文标题

具有动态潜在感知器的有效语音翻译

Efficient Speech Translation with Dynamic Latent Perceivers

论文作者

Tsiamas, Ioannis, Gállego, Gerard I., Fonollosa, José A. R., Costa-jussà, Marta R.

论文摘要

近年来，变形金刚一直是语音翻译的主要体系结构，在翻译质量方面取得了重大改进。由于语音信号比其文本对应物更长，并且由于变压器的二次复杂性，因此，下采样步骤对于在语音翻译中采用至关重要。取而代之的是，在这项研究中，我们建议通过使用感知器编码器将语音输入映射到固定长度的潜在表示，以简化复杂性。此外，我们引入了一种新颖的训练感感知者，并具有动态潜在访问（DLA），从而解锁了较大的潜在空间，而没有任何其他计算开销。用DLA的语音到文本感知器可以在必须使用的三种语言对中符合变压器基线的性能。最后，在推理时很容易适应DLA训练的模型，并且可以通过各种计算预算来灵活地部署，而不会显着下降翻译质量。

Transformers have been the dominant architecture for Speech Translation in recent years, achieving significant improvements in translation quality. Since speech signals are longer than their textual counterparts, and due to the quadratic complexity of the Transformer, a down-sampling step is essential for its adoption in Speech Translation. Instead, in this research, we propose to ease the complexity by using a Perceiver encoder to map the speech inputs to a fixed-length latent representation. Furthermore, we introduce a novel way of training Perceivers, with Dynamic Latent Access (DLA), unlocking larger latent spaces without any additional computational overhead. Speech-to-Text Perceivers with DLA can match the performance of Transformer baselines across three language pairs in MuST-C. Finally, a DLA-trained model is easily adaptable to DLA at inference, and can be flexibly deployed with various computational budgets, without significant drops in translation quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题