论文标题
INT:评估定理中概括的不平等基准证明
INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving
论文作者
论文摘要
在学习辅助定理时,最关键的挑战之一是概括为定理,与培训时间不同。在本文中,我们介绍了INT,这是一种不平等定理,证明了基准,专为测试代理的概括能力而设计。 INT基于生成定理和证明的程序;该过程的旋钮使我们能够测量6种不同类型的概括,每种概括都反映了自动定理证明的独特挑战。此外,与以前的学习辅助定理证明的基准不同,INT通过快速模拟提供了轻巧且用户友好的定理证明环境,有利于进行基于学习和基于搜索的研究。我们介绍了基于学习的基准,并通过基准在6个概括的概括中对其进行评估。然后,我们在测试时通过蒙特卡洛树搜索(MCT)评估相同的代理,并证明MCT可以帮助证明新定理。
In learning-assisted theorem proving, one of the most critical challenges is to generalize to theorems unlike those seen at training time. In this paper, we introduce INT, an INequality Theorem proving benchmark, specifically designed to test agents' generalization ability. INT is based on a procedure for generating theorems and proofs; this procedure's knobs allow us to measure 6 different types of generalization, each reflecting a distinct challenge characteristic to automated theorem proving. In addition, unlike prior benchmarks for learning-assisted theorem proving, INT provides a lightweight and user-friendly theorem proving environment with fast simulations, conducive to performing learning-based and search-based research. We introduce learning-based baselines and evaluate them across 6 dimensions of generalization with the benchmark. We then evaluate the same agents augmented with Monte Carlo Tree Search (MCTS) at test time, and show that MCTS can help to prove new theorems.