论文标题
通过使用心理模型启发的融合框架来增强单渠道语音
Single-channel speech enhancement by using psychoacoustical model inspired fusion framework
论文作者
论文摘要
当根据人类听觉系统的特征选择贝叶斯短期光谱幅度(STSA)估计器的贝叶斯短期光谱幅度(STSA)估计值时,估计器的增益函数变得更加灵活。尽管声学结构域中的这种类型的估计量在降低高频下的后台噪声方面非常有效,但它会产生更多的语音扭曲,这使得诸如摩擦剂的高频含量(例如,在繁重的噪声条件下易感性降低,导致降低的可理解性降低)。另一方面,发现语音增强方案在调制域中利用了频率选择性的心理声学证据,被发现能够将嘈杂语音的清晰度大量提高,但由于其基本的设计约束而遭受了时间静态问题。为了实现感知到的语音质量和清晰度的关节改进,我们通过结合声学和调制域方法的优点,同时避免其各自的弱点,提出并研究了融合框架。客观的度量评估表明,与其他基线技术相比,在各种噪声条件下,提出的语音增强融合框架可以在不同SNR水平的不同SNR级别的语音质量和清晰度方面提供一致的改善。
When the parameters of Bayesian Short-time Spectral Amplitude (STSA) estimator for speech enhancement are selected based on the characteristics of the human auditory system, the gain function of the estimator becomes more flexible. Although this type of estimator in acoustic domain is quite effective in reducing the back-ground noise at high frequencies, it produces more speech distortions, which make the high-frequency contents of the speech such as friciatives less perceptible in heavy noise conditions, resulting in intelligibility reduction. On the other hand, the speech enhancement scheme, which exploits the psychoacoustic evidence of frequency selectivity in the modulation domain, is found to be able to increase the intelligibility of noisy speech by a substantial amount, but also suffers from the temporal slurring problem due to its essential design constraint. In order to achieve the joint improvements in both the perceived speech quality and intelligibility, we proposed and investigated a fusion framework by combining the merits of acoustic and modulation domain approaches while avoiding their respective weaknesses. Objective measure evaluation shows that the proposed speech enhancement fusion framework can provide consistent improvements in the perceived speech quality and intelligibility across different SNR levels in various noise conditions, while compared to the other baseline techniques.