论文标题
SAQAM:空间音频质量评估指标
SAQAM: Spatial Audio Quality Assessment Metric
论文作者
论文摘要
音频质量评估对于评估声音的感知现实主义至关重要。但是,获得“黄金标准”判断的时间和费用限制了此类数据的可用性。对于AR&VR,良好的可感知声音质量和来源的可本质性是确保完全沉浸在用户中的关键要素。我们的工作介绍了SAQAM,该SAQAM使用多任务学习框架来评估任何给定的双耳信号之间的听力质量(LQ)和空间化质量(SQ),而无需使用任何主观数据。我们通过在三胞胎人类判断的模拟数据集上训练LQ,并通过利用训练有素的到达方向(DOA)估计的网络的激活级距离来对LQ进行建模。我们表明,SAQAM与四个不同数据集的人类响应良好相关。由于它是一个深层网络,因此该度量是可区分的,因此可以作为其他任务的损失函数。例如,只需在语音增强网络中以我们的度量收益率提高现有损失。
Audio quality assessment is critical for assessing the perceptual realism of sounds. However, the time and expense of obtaining ''gold standard'' human judgments limit the availability of such data. For AR&VR, good perceived sound quality and localizability of sources are among the key elements to ensure complete immersion of the user. Our work introduces SAQAM which uses a multi-task learning framework to assess listening quality (LQ) and spatialization quality (SQ) between any given pair of binaural signals without using any subjective data. We model LQ by training on a simulated dataset of triplet human judgments, and SQ by utilizing activation-level distances from networks trained for direction of arrival (DOA) estimation. We show that SAQAM correlates well with human responses across four diverse datasets. Since it is a deep network, the metric is differentiable, making it suitable as a loss function for other tasks. For example, simply replacing an existing loss with our metric yields improvement in a speech-enhancement network.