论文标题

连续矢量空间中数学表达式的语义表示

Semantic Representations of Mathematical Expressions in a Continuous Vector Space

论文作者

Gangwar, Neeraj, Kani, Nickvash

论文摘要

数学符号构成了STEM文献的很大一部分,但是找到公式的语义表示仍然是一个具有挑战性的问题。由于数学符号是精确的,并且其含义随着较小的字符变化而大大变化,因此适用于自然文本的方法不一定适合数学表达式。这项工作描述了一种表示连续矢量空间中数学表达式的方法。我们使用对序列到序列体系结构的编码器,该体系结构对视觉上不同但在数学上等效的表达式进行了训练,以生成向量表示(或嵌入)。我们将这种方法与一种结构方法进行比较,该方法认为视觉布局嵌入了表达式,并表明我们提出的方法更好地捕获数学语义。最后,为了加快未来的研究,我们发表了同等的先验和代数表达对的语料库。

Mathematical notation makes up a large portion of STEM literature, yet finding semantic representations for formulae remains a challenging problem. Because mathematical notation is precise, and its meaning changes significantly with small character shifts, the methods that work for natural text do not necessarily work well for mathematical expressions. This work describes an approach for representing mathematical expressions in a continuous vector space. We use the encoder of a sequence-to-sequence architecture, trained on visually different but mathematically equivalent expressions, to generate vector representations (or embeddings). We compare this approach with a structural approach that considers visual layout to embed an expression and show that our proposed approach is better at capturing mathematical semantics. Finally, to expedite future research, we publish a corpus of equivalent transcendental and algebraic expression pairs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源