模型匹配原理应用于基于阵列的全神经双耳渲染系统的设计音频触发系统

论文标题

模型匹配原理应用于基于阵列的全神经双耳渲染系统的设计音频触发系统

Model-matching Principle Applied to the Design of an Array-based All-neural Binaural Rendering System for Audio Telepresence

论文作者

Hsu, Yicheng, Ma, Chenghumg, Bai, Mingsian R.

论文摘要

触觉旨在为近端用户在远端创建一个沉浸式但虚拟的音频和视觉场景体验。在此贡献中，我们提出了一个基于阵列的双耳渲染系统，该系统将阵列麦克风信号转换为与头部相关的传输功能（HRTF）过滤的输出信号，用于耳机渲染。提出的方法是根据模型匹配原理（MMP）制定的，并且能够提供比常规定位 - 孔形成HRTF滤波（LBH）方法更具沉浸式体验。基于MMP的渲染系统可以通过多通道逆滤波（MIF）和多通道深滤波（MDF）实现。在这项研究中，我们采用了MDF方法，并将LBH和MIF和MIF用作基准。全神经系统共同捕获空间信息（空间渲染），保留环境声音（增强），并在产生双耳输出之前降低噪声（增强）。使用客观和主观测试将拟议的远程敏感系统与两个基线进行比较。

Telepresence aims to create an immersive but virtual experience of the audio and visual scene at the far end for users at the near end. In this contribution, we propose an array-based binaural rendering system that converts the array microphone signals into the head-related transfer function (HRTF) filtered output signals for headphone-rendering. The proposed approach is formulated in light of a model-matching principle (MMP) and is capable of delivering more immersive experience than the conventional localization-beamforming-HRTF filtering (LBH) approach. The MMP-based rendering system can be realized via multichannel inverse filtering (MIF) and multichannel deep filtering (MDF). In this study, we adopted the MDF approach and used the LBH as well as MIF as the baselines. The all-neural system jointly captures the spatial information (spatial rendering), preserves ambient sound (enhancement), and reduces noise (enhancement) before generating binaural outputs. Objective and subjective tests are employed to compare the proposed telepresence system with two baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题