论文标题

蒙版的代理损失,无关文本的扬声器验证

Masked Proxy Loss For Text-Independent Speaker Verification

论文作者

Lian, Jiachen, Kumar, Aiswarya Vinod, Dhamyal, Hira, Raj, Bhiksha, Singh, Rita

论文摘要

开放式扬声器识别可以被视为度量学习问题,即最大化课堂间差异并最大程度地减少阶层内差异。监督的度量学习可以分为基于实体的学习和基于代理的学习。大多数现有的度量学习目标(例如对比度,三重态,典型,GE2E等)都属于以前的部门,其性能高度依赖于样品挖掘策略,或者由于迷你批次中的标签信息不足而受到限制。基于代理的损失减轻了这两个缺点,但是,实体之间的细粒度连接不是间接利用的。本文提出了一个蒙面的代理(MP)损失,该损失直接结合了基于代理的关系和基于配对的关系。我们进一步提出了多项式掩盖代理(MMP)损失,以利用说话者对的硬度。这些方法已应用于在Voxceleb测试集上评估并达到最新的均等错误率(EER)。

Open-set speaker recognition can be regarded as a metric learning problem, which is to maximize inter-class variance and minimize intra-class variance. Supervised metric learning can be categorized into entity-based learning and proxy-based learning. Most of the existing metric learning objectives like Contrastive, Triplet, Prototypical, GE2E, etc all belong to the former division, the performance of which is either highly dependent on sample mining strategy or restricted by insufficient label information in the mini-batch. Proxy-based losses mitigate both shortcomings, however, fine-grained connections among entities are either not or indirectly leveraged. This paper proposes a Masked Proxy (MP) loss which directly incorporates both proxy-based relationships and pair-based relationships. We further propose Multinomial Masked Proxy (MMP) loss to leverage the hardness of speaker pairs. These methods have been applied to evaluate on VoxCeleb test set and reach state-of-the-art Equal Error Rate(EER).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源