论文标题

朝着改进的房间冲动响应估算语音识别的估计

Towards Improved Room Impulse Response Estimation for Speech Recognition

论文作者

Ratnarajah, Anton, Ananthabhotla, Ishwarya, Ithapu, Vamsi Krishna, Hoffmann, Pablo, Manocha, Dinesh, Calamia, Paul

论文摘要

在下游应用程序场景,远场自动语音识别(ASR)的背景下,我们提出了一种新颖的盲室冲动响应(RIR)估计系统的新方法。我们首先提出了改进的RIR估计和改善ASR性能之间的联系,以评估神经RIR估计器。然后,我们提出了一个基于生成的对抗网络(GAN)的架构,该体系结构编码Reverant语音中的RIR功能,并从编码的特征中构造RIR,并使用新颖的能源衰减救济损失来优化以捕获输入Reverberant语音的基于能量的属性。我们表明,我们的模型优于原声基准的最先进基准(能量衰减的缓解率为17%,在早期反射能量指标上以及ASR评估任务(单词错误率为6.9 \%)中的基准比例为22 \%)。

We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a generative adversarial network (GAN) based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features, and uses a novel energy decay relief loss to optimize for capturing energy-based properties of the input reverberant speech. We show that our model outperforms the state-of-the-art baselines on acoustic benchmarks (by 17\% on the energy decay relief and 22\% on an early-reflection energy metric), as well as in an ASR evaluation task (by 6.9\% in word error rate).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源