用于在线手势识别数学表达式的变压器架构

论文标题

用于在线手势识别数学表达式的变压器架构

A Transformer Architecture for Online Gesture Recognition of Mathematical Expressions

论文作者

Ramo, Mirco, Silvestre, Guénolé C. M.

论文摘要

显示变压器体系结构可提供一个强大的框架，作为一种端到端模型，用于从与字形笔触相对应的在线手写手势构建表达树。特别是，注意机制成功地用于编码，学习和执行表达式的基本语法，从而创建了可正确解码到确切的数学表达树的潜在表示，从而为消融的输入提供了鲁棒性和看不见的glyphs。编码器首次以时空数据令牌的形式喂养可能形成无限大型词汇，从而找到超出在线手势识别的应用程序。为通用手写识别任务进行培训模型提供了一个新的监督手写手势数据集，并提出了一个新的指标，以评估输出表达式树的句法正确性。适用于边缘推理的小型变压器模型成功地训练了94％的平均归一化Levenshtein精度，从而导致有效的后缀RPN树表示，以预测的94％。

The Transformer architecture is shown to provide a powerful framework as an end-to-end model for building expression trees from online handwritten gestures corresponding to glyph strokes. In particular, the attention mechanism was successfully used to encode, learn and enforce the underlying syntax of expressions creating latent representations that are correctly decoded to the exact mathematical expression tree, providing robustness to ablated inputs and unseen glyphs. For the first time, the encoder is fed with spatio-temporal data tokens potentially forming an infinitely large vocabulary, which finds applications beyond that of online gesture recognition. A new supervised dataset of online handwriting gestures is provided for training models on generic handwriting recognition tasks and a new metric is proposed for the evaluation of the syntactic correctness of the output expression trees. A small Transformer model suitable for edge inference was successfully trained to an average normalised Levenshtein accuracy of 94%, resulting in valid postfix RPN tree representation for 94% of predictions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题