大型语言模型是具有自我验证的更好的推理者

论文标题

大型语言模型是具有自我验证的更好的推理者

Large Language Models are Better Reasoners with Self-Verification

论文作者

Weng, Yixuan, Zhu, Minjun, Xia, Fei, Li, Bin, He, Shizhu, Liu, Shengping, Sun, Bin, Liu, Kang, Zhao, Jun

论文摘要

最近，随着思想链（COT）提示，大型语言模型（LLM），例如GPT-3，在几种自然语言处理任务（例如算术，常识性和逻辑推理）中表现出强大的推理能力。但是，具有COT的LLM需要多步骤提示和多token预测，这对单个错误高度敏感，并且容易累积错误。以上问题使LLM需要验证答案的能力。实际上，在推断出某些思维决策任务中的结论之后，人们经常通过重新验证步骤来避免某些错误来检查它们。在本文中，我们提出并证明LLMS也具有相似的自我验证能力。我们以COT获得的结论为解决原始问题的条件之一。通过对LLM为自己推论的答案进行落后验证，我们可以获得可解释的答案验证分数，以选择具有最高分数的候选答案。实验结果表明，所提出的方法可以改善各种算术，常识和逻辑推理数据集的推理性能。我们的代码可在以下网址公开获取：https：//github.com/wengsyx/self-verification。

Recently, with the chain of thought (CoT) prompting, large language models (LLMs), e.g., GPT-3, have shown strong reasoning ability in several natural language processing tasks such as arithmetic, commonsense, and logical reasoning. However, LLMs with CoT require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes and vulnerable to error accumulation. The above issues make the LLMs need the ability to verify the answers. In fact, after inferring conclusions in some thinking decision tasks, people often check them by re-verifying steps to avoid some mistakes. In this paper, we propose and prove that LLMs also have similar self-verification abilities. We take the conclusion obtained by CoT as one of the conditions for solving the original problem. By performing a backward verification of the answers that LLM deduced for itself, we can obtain interpretable answer validation scores to select the candidate answer with the highest score. Experimental results demonstrate that the proposed method can improve the reasoning performance on various arithmetic, commonsense, and logical reasoning datasets. Our code is publicly available at: https://github.com/WENGSYX/Self-Verification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题