论文标题
小村庄:基于层次多模式的人类活动识别算法
HAMLET: A Hierarchical Multimodal Attention-based Human Activity Recognition Algorithm
论文作者
论文摘要
为了与人流利地合作,机器人需要能够准确识别人类活动的能力。尽管现代机器人配备了各种传感器,但由于与多模式数据融合有关的困难,对机器人的强大人力活动识别(HAR)仍然是一项艰巨的任务。为了应对这些挑战,在这项工作中,我们引入了一个基于神经网络的深度多模式HAR Algorithm,Hamlet。哈姆雷特(Hamlet)结合了一个分层体系结构,其中下层通过采用多头自我注意力专门机制来编码单峰数据中的时空特征。我们开发了一种新型的多模式注意机制,用于解开和融合显着的单峰特征,以计算上层中的多模式特征。最后,将多模式特征用于完全连接神经网络以识别人类活动。我们通过将其性能与三个人类活动数据集中的几种最先进的活动识别算法进行比较来评估我们的算法。结果表明,Hamlet在所有测试的所有数据集和指标上都优于所有其他评估的基线,在UTD-MHAD [1]和UT-MHAD [1]和UT-KINECT [2]数据集的最高前1个准确性和97.45%的速度和97.45%的速度和97.45%,F1评分对UCSD-MIT [3] DATDASASES上的F1评分为81.52%。我们进一步可视化单峰和多模式的注意图,这些图形为我们提供了一种工具来解释注意机制对HAR的影响。
To fluently collaborate with people, robots need the ability to recognize human activities accurately. Although modern robots are equipped with various sensors, robust human activity recognition (HAR) still remains a challenging task for robots due to difficulties related to multimodal data fusion. To address these challenges, in this work, we introduce a deep neural network-based multimodal HAR algorithm, HAMLET. HAMLET incorporates a hierarchical architecture, where the lower layer encodes spatio-temporal features from unimodal data by adopting a multi-head self-attention mechanism. We develop a novel multimodal attention mechanism for disentangling and fusing the salient unimodal features to compute the multimodal features in the upper layer. Finally, multimodal features are used in a fully connect neural-network to recognize human activities. We evaluated our algorithm by comparing its performance to several state-of-the-art activity recognition algorithms on three human activity datasets. The results suggest that HAMLET outperformed all other evaluated baselines across all datasets and metrics tested, with the highest top-1 accuracy of 95.12% and 97.45% on the UTD-MHAD [1] and the UT-Kinect [2] datasets respectively, and F1-score of 81.52% on the UCSD-MIT [3] dataset. We further visualize the unimodal and multimodal attention maps, which provide us with a tool to interpret the impact of attention mechanisms concerning HAR.