通过高密度EMG信号识别基于变压器的手势识别：从瞬时识别到融合运动单元尖峰火车

论文标题

通过高密度EMG信号识别基于变压器的手势识别：从瞬时识别到融合运动单元尖峰火车

Transformer-based Hand Gesture Recognition via High-Density EMG Signals: From Instantaneous Recognition to Fusion of Motor Unit Spike Trains

论文作者

Montazerin, Mansooreh, Rahimian, Elahe, Naderkhani, Farnoosh, Atashzar, S. Farokh, Yanushkevich, Svetlana, Mohammadi, Arash

论文摘要

设计高效且节省劳动的假手是需要强大的手势识别算法，这些算法可以通过有限的复杂性和延迟来实现高精度。在这种情况下，本文提出了一个称为CT-HGR的紧凑型深度学习框架，该框架采用视觉变压器网络使用高密度SEMG（HD-SEMG）信号进行手势识别。所提出模型中的注意机制确定了不同数据段之间的相似性，并具有更大的并行计算能力，并解决了记忆限制问题，同时处理了较大的序列长度的输入。 CT-HGR可以从头开始训练，而无需转移学习，并且可以同时提取HD-SEMG数据的时间和空间特征。此外，CT-HGR框架可以使用由HD-SEMG信号空间组成的SEMG图像执行瞬时识别。 CT-HGR的一种变体还旨在以使用盲源分离（BSS）从HD-SEMG信号中提取的运动单元尖峰列车（必须）的形式结合微观神经驱动信息。该变体通过混合体系结构结合使用其基线版本，以评估融合宏观和微观神经驱动信息的潜力。利用的HD-SEMG数据集涉及128个电极，这些电极收集了与20个受试者的65个等距手势相关的信号。提出的CT-HGR框架应用于上述数据集的32、64、64、128电极通道的31.25、62.5、125、250 ms的窗口尺寸。使用32个电极和31.25毫秒的窗口大小的所有参与者的平均准确性为86.23％，逐渐增加，直到128个电极的平均值达到91.98％，窗口尺寸为250毫秒。基于单个HD-SEMG图像的单帧，CT-HGR的瞬时识别可实现89.13％的精度。

Designing efficient and labor-saving prosthetic hands requires powerful hand gesture recognition algorithms that can achieve high accuracy with limited complexity and latency. In this context, the paper proposes a compact deep learning framework referred to as the CT-HGR, which employs a vision transformer network to conduct hand gesture recognition using highdensity sEMG (HD-sEMG) signals. The attention mechanism in the proposed model identifies similarities among different data segments with a greater capacity for parallel computations and addresses the memory limitation problems while dealing with inputs of large sequence lengths. CT-HGR can be trained from scratch without any need for transfer learning and can simultaneously extract both temporal and spatial features of HD-sEMG data. Additionally, the CT-HGR framework can perform instantaneous recognition using sEMG image spatially composed from HD-sEMG signals. A variant of the CT-HGR is also designed to incorporate microscopic neural drive information in the form of Motor Unit Spike Trains (MUSTs) extracted from HD-sEMG signals using Blind Source Separation (BSS). This variant is combined with its baseline version via a hybrid architecture to evaluate potentials of fusing macroscopic and microscopic neural drive information. The utilized HD-sEMG dataset involves 128 electrodes that collect the signals related to 65 isometric hand gestures of 20 subjects. The proposed CT-HGR framework is applied to 31.25, 62.5, 125, 250 ms window sizes of the above-mentioned dataset utilizing 32, 64, 128 electrode channels. The average accuracy over all the participants using 32 electrodes and a window size of 31.25 ms is 86.23%, which gradually increases till reaching 91.98% for 128 electrodes and a window size of 250 ms. The CT-HGR achieves accuracy of 89.13% for instantaneous recognition based on a single frame of HD-sEMG image.

下载PDF全文

下载文献需遵守相关版权规定

论文标题