UNIGEO：通过重新计算数学表达来统一几何逻辑推理

论文标题

UNIGEO：通过重新计算数学表达来统一几何逻辑推理

UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression

论文作者

Chen, Jiaqi, Li, Tong, Qin, Jinghui, Lu, Pan, Lin, Liang, Chen, Chongyu, Liang, Xiaodan

论文摘要

几何问题解决方案是一个公认的测试床，用于评估深层模型的高级多模式推理能力。在大多数现有作品中，两个主要的几何问题：计算和证明通常被视为两个特定任务，阻碍了一个深层模型，以统一其在多个数学任务上的推理能力。但是，从本质上讲，这两个任务具有相似的问题表示和重叠的数学知识，可以提高深层模型在这两个任务上的理解和推理能力。因此，我们构建了一个大规模的统一几何问题基准Unigeo，其中包含4,998个计算问题和9,543个证明问题。每个证明的问题都有带有多步证明的注释，并带有数学表达式。可以轻松地将证明与证明序列进行重新重新重新审核，该序列与带注释的程序序列共享相同格式的计算问题。自然，我们还提出了一个统一的多任务几何变压器框架Geoformer，以以序列生成的形式同时解决计算并证明问题，最终显示了通过统一配方在两个任务上可以提高推理能力。此外，我们提出了一种数学表达预处理（MEP）方法，该方法旨在预测问题解决方案中的数学表达式，从而改善地理样品模型。对Unigeo的实验表明，我们提出的地理格产品通过优于特定于任务的模型NG，分别超过5.6％和3.2％的计算和证明问题，获得了最先进的性能。

Geometry problem solving is a well-recognized testbed for evaluating the high-level multi-modal reasoning capability of deep models. In most existing works, two main geometry problems: calculation and proving, are usually treated as two specific tasks, hindering a deep model to unify its reasoning capability on multiple math tasks. However, in essence, these two tasks have similar problem representations and overlapped math knowledge which can improve the understanding and reasoning ability of a deep model on both two tasks. Therefore, we construct a large-scale Unified Geometry problem benchmark, UniGeo, which contains 4,998 calculation problems and 9,543 proving problems. Each proving problem is annotated with a multi-step proof with reasons and mathematical expressions. The proof can be easily reformulated as a proving sequence that shares the same formats with the annotated program sequence for calculation problems. Naturally, we also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously in the form of sequence generation, which finally shows the reasoning ability can be improved on both two tasks by unifying formulation. Furthermore, we propose a Mathematical Expression Pretraining (MEP) method that aims to predict the mathematical expressions in the problem solution, thus improving the Geoformer model. Experiments on the UniGeo demonstrate that our proposed Geoformer obtains state-of-the-art performance by outperforming task-specific model NGS with over 5.6% and 3.2% accuracies on calculation and proving problems, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题