NR-DFERNET：用于动态面部表达识别的噪声射击网络

论文标题

NR-DFERNET：用于动态面部表达识别的噪声射击网络

NR-DFERNet: Noise-Robust Network for Dynamic Facial Expression Recognition

论文作者

Li, Hanting, Sui, Mingzhe, Zhu, Zhaoqing, zhao, Feng

论文摘要

由于视频序列中的大量嘈杂框架，野外动态面部表达识别（DFER）是一项极具挑战性的任务。以前的作品着重于提取更多的判别特征，但忽略了将关键帧与嘈杂框架区分开来。为了解决这个问题，我们提出了一个噪声动态的面部表达识别网络（NR-DFERNET），该网络可以有效地减少嘈杂框架对DFER任务的干扰。具体而言，在空间阶段，我们设计了一个动态静态融合模块（DSF），该模块（DSF）将动态特征引入静态特征，以学习更多的判别空间特征。为了抑制目标无关框架的影响，我们在时间阶段引入了一个新型的动态类令牌（DCT）。此外，我们在决策阶段设计了基于摘要的滤镜（SF），以减少过多中性帧对非中性序列分类的影响。广泛的实验结果表明，我们的NR-dfernet在DFEW和AFEW基准测试基准上都优于最先进的方法。

Dynamic facial expression recognition (DFER) in the wild is an extremely challenging task, due to a large number of noisy frames in the video sequences. Previous works focus on extracting more discriminative features, but ignore distinguishing the key frames from the noisy frames. To tackle this problem, we propose a noise-robust dynamic facial expression recognition network (NR-DFERNet), which can effectively reduce the interference of noisy frames on the DFER task. Specifically, at the spatial stage, we devise a dynamic-static fusion module (DSF) that introduces dynamic features to static features for learning more discriminative spatial features. To suppress the impact of target irrelevant frames, we introduce a novel dynamic class token (DCT) for the transformer at the temporal stage. Moreover, we design a snippet-based filter (SF) at the decision stage to reduce the effect of too many neutral frames on non-neutral sequence classification. Extensive experimental results demonstrate that our NR-DFERNet outperforms the state-of-the-art methods on both the DFEW and AFEW benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题