论文标题

通过高密度EMG信号识别基于变压器的手势识别:从瞬时识别到融合运动单元尖峰火车

Transformer-based Hand Gesture Recognition via High-Density EMG Signals: From Instantaneous Recognition to Fusion of Motor Unit Spike Trains

论文作者

Montazerin, Mansooreh, Rahimian, Elahe, Naderkhani, Farnoosh, Atashzar, S. Farokh, Yanushkevich, Svetlana, Mohammadi, Arash

论文摘要

设计高效且节省劳动的假手是需要强大的手势识别算法,这些算法可以通过有限的复杂性和延迟来实现高精度。在这种情况下,本文提出了一个称为CT-HGR的紧凑型深度学习框架,该框架采用视觉变压器网络使用高密度SEMG(HD-SEMG)信号进行手势识别。所提出模型中的注意机制确定了不同数据段之间的相似性,并具有更大的并行计算能力,并解决了记忆限制问题,同时处理了较大的序列长度的输入。 CT-HGR可以从头开始训练,而无需转移学习,并且可以同时提取HD-SEMG数据的时间和空间特征。此外,CT-HGR框架可以使用由HD-SEMG信号空间组成的SEMG图像执行瞬时识别。 CT-HGR的一种变体还旨在以使用盲源分离(BSS)从HD-SEMG信号中提取的运动单元尖峰列车(必须)的形式结合微观神经驱动信息。该变体通过混合体系结构结合使用其基线版本,以评估融合宏观和微观神经驱动信息的潜力。利用的HD-SEMG数据集涉及128个电极,这些电极收集了与20个受试者的65个等距手势相关的信号。提出的CT-HGR框架应用于上述数据集的32、64、64、128电极通道的31.25、62.5、125、250 ms的窗口尺寸。使用32个电极和31.25毫秒的窗口大小的所有参与者的平均准确性为86.23%,逐渐增加,直到128个电极的平均值达到91.98%,窗口尺寸为250毫秒。基于单个HD-SEMG图像的单帧,CT-HGR的瞬时识别可实现89.13%的精度。

Designing efficient and labor-saving prosthetic hands requires powerful hand gesture recognition algorithms that can achieve high accuracy with limited complexity and latency. In this context, the paper proposes a compact deep learning framework referred to as the CT-HGR, which employs a vision transformer network to conduct hand gesture recognition using highdensity sEMG (HD-sEMG) signals. The attention mechanism in the proposed model identifies similarities among different data segments with a greater capacity for parallel computations and addresses the memory limitation problems while dealing with inputs of large sequence lengths. CT-HGR can be trained from scratch without any need for transfer learning and can simultaneously extract both temporal and spatial features of HD-sEMG data. Additionally, the CT-HGR framework can perform instantaneous recognition using sEMG image spatially composed from HD-sEMG signals. A variant of the CT-HGR is also designed to incorporate microscopic neural drive information in the form of Motor Unit Spike Trains (MUSTs) extracted from HD-sEMG signals using Blind Source Separation (BSS). This variant is combined with its baseline version via a hybrid architecture to evaluate potentials of fusing macroscopic and microscopic neural drive information. The utilized HD-sEMG dataset involves 128 electrodes that collect the signals related to 65 isometric hand gestures of 20 subjects. The proposed CT-HGR framework is applied to 31.25, 62.5, 125, 250 ms window sizes of the above-mentioned dataset utilizing 32, 64, 128 electrode channels. The average accuracy over all the participants using 32 electrodes and a window size of 31.25 ms is 86.23%, which gradually increases till reaching 91.98% for 128 electrodes and a window size of 250 ms. The CT-HGR achieves accuracy of 89.13% for instantaneous recognition based on a single frame of HD-sEMG image.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源