分区梯度匹配的数据子集选择，用于计算效率强大的ASR训练

论文标题

分区梯度匹配的数据子集选择，用于计算效率强大的ASR训练

Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training

论文作者

Mittal, Ashish, Sivasubramanian, Durga, Iyer, Rishabh, Jyothi, Preethi, Ramakrishnan, Ganesh

论文摘要

培训最先进的ASR系统（例如RNN-T）通常具有高度关联的财务和环境成本。如果选定的子集可以通过整个数据集的培训实现PAR性能，则使用培训数据子集的培训可以减轻此问题。尽管有许多数据子集选择（DSS）算法，但是在RNN-T上的直接应用是困难的，尤其是自适应的DSS算法并使用学习动力学（例如梯度），因为RNN-T倾向于具有更大的内存足迹的梯度。在本文中，我们提出了分区的梯度匹配（PGM）一种新型的可分布DSS算法，适用于像训练RNN-T一样的大量数据集。通过对Librispeech 100h和Librispeech 960h进行的广泛实验，我们表明PGM仅具有非常小的精度降解（绝对差异1％），达到3倍至6倍的速度。此外，即使在训练数据被噪声损坏的设置中，我们也证明了PGM的相似结果。

Training state-of-the-art ASR systems such as RNN-T often has a high associated financial and environmental cost. Training with a subset of training data could mitigate this problem if the subset selected could achieve on-par performance with training with the entire dataset. Although there are many data subset selection(DSS) algorithms, direct application to the RNN-T is difficult, especially the DSS algorithms that are adaptive and use learning dynamics such as gradients, as RNN-T tend to have gradients with a significantly larger memory footprint. In this paper, we propose Partitioned Gradient Matching (PGM) a novel distributable DSS algorithm, suitable for massive datasets like those used to train RNN-T. Through extensive experiments on Librispeech 100H and Librispeech 960H, we show that PGM achieves between 3x to 6x speedup with only a very small accuracy degradation (under 1% absolute WER difference). In addition, we demonstrate similar results for PGM even in settings where the training data is corrupted with noise.

下载PDF全文

下载文献需遵守相关版权规定

论文标题