情绪不撒谎：使用情感提示的视听深层检测方法

论文标题

情绪不撒谎：使用情感提示的视听深层检测方法

Emotions Don't Lie: An Audio-Visual Deepfake Detection Method Using Affective Cues

论文作者

Mittal, Trisha, Bhattacharya, Uttaran, Chandra, Rohan, Bera, Aniket, Manocha, Dinesh

论文摘要

我们提出了一种基于学习的方法，用于检测真实和假的DeepFake多媒体内容。为了最大程度地提高学习信息，我们从同一视频中提取和分析两个音频和视觉方式之间的相似性。此外，我们提取并比较视频中两种方式感知情感的情感提示，以推断输入视频是“真实”还是“假”。我们提出了一个深入学习网络，灵感来自暹罗网络架构和三胞胎损失。为了验证我们的模型，我们在两个大规模的深板检测数据集（DeepFake-Timit数据集和DFDC）上报告了AUC度量。我们将我们的方法与几种SOTA DeepFake检测方法进行比较，并报告DFDC的每日video AUC为84.4％，DF-TIMIT数据集报告为96.6％。据我们所知，我们的方法是第一种同时利用音频和视频方式的方法，也从两种方式中感知到了深层检测的情绪。

We present a learning-based method for detecting real and fake deepfake multimedia content. To maximize information for learning, we extract and analyze the similarity between the two audio and visual modalities from within the same video. Additionally, we extract and compare affective cues corresponding to perceived emotion from the two modalities within a video to infer whether the input video is "real" or "fake". We propose a deep learning network, inspired by the Siamese network architecture and the triplet loss. To validate our model, we report the AUC metric on two large-scale deepfake detection datasets, DeepFake-TIMIT Dataset and DFDC. We compare our approach with several SOTA deepfake detection methods and report per-video AUC of 84.4% on the DFDC and 96.6% on the DF-TIMIT datasets, respectively. To the best of our knowledge, ours is the first approach that simultaneously exploits audio and video modalities and also perceived emotions from the two modalities for deepfake detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题