论文标题
ITU-T建议的开源实施P.808带有验证
An Open source Implementation of ITU-T Recommendation P.808 with Validation
论文作者
论文摘要
ITU-T建议P.808提供了一种使用绝对类别评级(ACR)方法对语音质量进行主观评估的众包方法。我们提供ITU-T REC的开源实现。 P.808在亚马逊机械土耳其人平台上运行。我们将实施扩展到包括降级类别评分(DCR)和比较类别评分(CCR)测试方法。与两阶段资格和评级解决方案相比,我们还通过将参与者资格步骤集成到主要评分任务中,从而大大加快了测试过程。我们提供用于创建和执行主观测试的程序脚本,以及数据清理和分析答案以避免操作错误。为了验证实施,我们将通过我们的实施收集的平均意见分数(MOS)与基于ITU-T REC进行的标准实验室实验的MOS值进行了比较。第800页。我们还通过使用我们的实施来评估主观语音质量评估结果的可重复性。最后,我们量化了旨在提高可靠性的系统部分的影响:环境测试,黄金和诱捕问题,评分模式以及耳机使用测试。
The ITU-T Recommendation P.808 provides a crowdsourcing approach for conducting a subjective assessment of speech quality using the Absolute Category Rating (ACR) method. We provide an open-source implementation of the ITU-T Rec. P.808 that runs on the Amazon Mechanical Turk platform. We extended our implementation to include Degradation Category Ratings (DCR) and Comparison Category Ratings (CCR) test methods. We also significantly speed up the test process by integrating the participant qualification step into the main rating task compared to a two-stage qualification and rating solution. We provide program scripts for creating and executing the subjective test, and data cleansing and analyzing the answers to avoid operational errors. To validate the implementation, we compare the Mean Opinion Scores (MOS) collected through our implementation with MOS values from a standard laboratory experiment conducted based on the ITU-T Rec. P.800. We also evaluate the reproducibility of the result of the subjective speech quality assessment through crowdsourcing using our implementation. Finally, we quantify the impact of parts of the system designed to improve the reliability: environmental tests, gold and trapping questions, rating patterns, and a headset usage test.