论文标题

具有3D身体,手和脸部重建的独立手语识别

Independent Sign Language Recognition with 3D Body, Hands, and Face Reconstruction

论文作者

Kratimenos, Agelos, Pavlakos, Georgios, Maragos, Petros

论文摘要

独立手语识别是一个复杂的视觉识别问题,它结合了计算机视觉的几项具有挑战性的任务,因为有必要从手势,身体特征和面部表情中利用和融合信息。据我们所知,许多最先进的作品已经独立地深入详细介绍了这些功能,但没有任何工作都充分结合了所有三个信息渠道以有效地识别手语。在这项工作中,我们采用了SMPL-X,这是一种现代参数模型,可从单个图像中联合提取3D身体形状,面部和手动信息。我们将这种整体3D重建用于SLR,这表明它导致了与RAW RGB图像的识别及其在最新的I3D型网络中的识别的识别,以进行3D动作识别,并从2D开放式骨骼识别中,并从喂入复发性神经网络的2D开放式骨架中。最后,在身体,面部和手部特征上进行了一组实验表明,忽略其中任何一个,大大降低了分类精度,证明了共同建模身体形状,面部表达和手动姿势的重要性,以识别手语。

Independent Sign Language Recognition is a complex visual recognition problem that combines several challenging tasks of Computer Vision due to the necessity to exploit and fuse information from hand gestures, body features and facial expressions. While many state-of-the-art works have managed to deeply elaborate on these features independently, to the best of our knowledge, no work has adequately combined all three information channels to efficiently recognize Sign Language. In this work, we employ SMPL-X, a contemporary parametric model that enables joint extraction of 3D body shape, face and hands information from a single image. We use this holistic 3D reconstruction for SLR, demonstrating that it leads to higher accuracy than recognition from raw RGB images and their optical flow fed into the state-of-the-art I3D-type network for 3D action recognition and from 2D Openpose skeletons fed into a Recurrent Neural Network. Finally, a set of experiments on the body, face and hand features showed that neglecting any of these, significantly reduces the classification accuracy, proving the importance of jointly modeling body shape, facial expression and hand pose for Sign Language Recognition.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源