论文标题
DCVQE:视频质量评估的分层变压器
DCVQE: A Hierarchical Transformer for Video Quality Assessment
论文作者
论文摘要
用户生成的视频的爆炸激发了对无参考视频质量评估(NR-VQA)的巨大需求。受我们对人类注释行为的观察的启发,我们提出了NR-VQA的鸿沟和征服视频质量估计器(DCVQE)。从提取帧级质量嵌入(QE)开始,我们的建议将整个序列分为多个剪辑,并应用变压器以学习夹子级量化量子量,并同时更新框架级别的量化量子量;引入了另一个变压器来组合夹子级量化宽松,以生成视频级别的量化宽度量子。我们将变压器的分层组合称为鸿沟和征服变压器(DCTR)层。可以通过多次重复此DCTR层的过程来实现准确的视频质量提取。考虑到带注释的数据之间的订单关系,我们还提出了模型培训的新型相关性损失项。各种数据集的实验证实了我们DCVQE模型的有效性和鲁棒性。
The explosion of user-generated videos stimulates a great demand for no-reference video quality assessment (NR-VQA). Inspired by our observation on the actions of human annotation, we put forward a Divide and Conquer Video Quality Estimator (DCVQE) for NR-VQA. Starting from extracting the frame-level quality embeddings (QE), our proposal splits the whole sequence into a number of clips and applies Transformers to learn the clip-level QE and update the frame-level QE simultaneously; another Transformer is introduced to combine the clip-level QE to generate the video-level QE. We call this hierarchical combination of Transformers as a Divide and Conquer Transformer (DCTr) layer. An accurate video quality feature extraction can be achieved by repeating the process of this DCTr layer several times. Taking the order relationship among the annotated data into account, we also propose a novel correlation loss term for model training. Experiments on various datasets confirm the effectiveness and robustness of our DCVQE model.