论文标题

互动强度预测的集体发育视频变压器

Class-attention Video Transformer for Engagement Intensity Prediction

论文作者

Ai, Xusheng, Sheng, Victor S., Li, Chunhua, Cui, Zhiming

论文摘要

为了处理变异长度的长视频,先前的作品提取了多模式功能并将其融合以预测学生的参与强度。在本文中,我们在视频变压器(CAVT)中提出了一个新的端到端方法类的关注,该方法涉及单个向量来处理类嵌入,并统一地对变异长的长视频和固定长度短视频进行端到端学习。此外,为了解决缺乏足够的样本,我们提出了一种二进制代表采样方法(BOR)来添加每个视频的多个视频序列以增强训练集。 BORS+CAVT不仅可以在EMOTIW-EP数据集上实现最先进的MSE(0.0495),而且还可以在Daisee数据集中获得最新的MSE(0.0377)。该代码和模型已在https://github.com/mountainai/cavt上公开提供。

In order to deal with variant-length long videos, prior works extract multi-modal features and fuse them to predict students' engagement intensity. In this paper, we present a new end-to-end method Class Attention in Video Transformer (CavT), which involves a single vector to process class embedding and to uniformly perform end-to-end learning on variant-length long videos and fixed-length short videos. Furthermore, to address the lack of sufficient samples, we propose a binary-order representatives sampling method (BorS) to add multiple video sequences of each video to augment the training set. BorS+CavT not only achieves the state-of-the-art MSE (0.0495) on the EmotiW-EP dataset, but also obtains the state-of-the-art MSE (0.0377) on the DAiSEE dataset. The code and models have been made publicly available at https://github.com/mountainai/cavt.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源