由DOA基于麦克风的基于DOA的口罩支撑的GEV束面

论文标题

由DOA基于麦克风的基于DOA的口罩支撑的GEV束面

GEV Beamforming Supported by DOA-based Masks Generated on Pairs of Microphones

论文作者

Grondin, Francois, Lauzon, Jean-Samuel, Vincent, Jonathan, Michaud, Francois

论文摘要

遥远的语音处理是一项具有挑战性的任务，尤其是在处理鸡尾酒会效应时。因此，通常需要在语音识别之前作为预处理步骤，以提高信号与失真比（SDR）。最近，已经提出了波束形成和语音分离网络的组合，以提高目标到达方向的目标源质量。但是，使用这种方法，需要提前对神经网络进行特定的麦克风阵列几何形状进行训练，该几何形状在添加/删除麦克风或更改阵列的形状时会限制多功能性。本文介绍的解决方案是在具有不同间距和声学环境条件的麦克风对上训练神经网络，然后使用该网络估算所有成对麦克风的时频掩码，这些麦克风构成具有任意形状的阵列。使用此掩码，可以估算目标和噪声协方差矩阵，然后用于执行广义特征值（GEV）波束形成。结果表明，对于与市售硬件相对应的各种麦克风阵列几何形状，所提出的方法平均将SDR从4.78 dB提高到7.69 dB。

Distant speech processing is a challenging task, especially when dealing with the cocktail party effect. Sound source separation is thus often required as a preprocessing step prior to speech recognition to improve the signal to distortion ratio (SDR). Recently, a combination of beamforming and speech separation networks have been proposed to improve the target source quality in the direction of arrival of interest. However, with this type of approach, the neural network needs to be trained in advance for a specific microphone array geometry, which limits versatility when adding/removing microphones, or changing the shape of the array. The solution presented in this paper is to train a neural network on pairs of microphones with different spacing and acoustic environmental conditions, and then use this network to estimate a time-frequency mask from all the pairs of microphones forming the array with an arbitrary shape. Using this mask, the target and noise covariance matrices can be estimated, and then used to perform generalized eigenvalue (GEV) beamforming. Results show that the proposed approach improves the SDR from 4.78 dB to 7.69 dB on average, for various microphone array geometries that correspond to commercially available hardware.

下载PDF全文

下载文献需遵守相关版权规定

论文标题