反应：在语言模型中协同推理和作用

论文标题

反应：在语言模型中协同推理和作用

ReAct: Synergizing Reasoning and Acting in Language Models

论文作者

Yao, Shunyu, Zhao, Jeffrey, Yu, Dian, Du, Nan, Shafran, Izhak, Narasimhan, Karthik, Cao, Yuan

论文摘要

尽管大型语言模型（LLMS）在语言理解和互动决策方面表现出了令人印象深刻的能力，但其推理能力（例如，经过思考的提示）和ACTING（例如，行动计划生成）的能力主要被研究为单独的主题。在本文中，我们探讨了LLM的使用以交织的方式生成推理轨迹和特定于任务的操作，从而使两者之间有更大的协同作用：推理轨迹有助于模型引起，跟踪和更新动作计划以及处理异常，而动作可以与外部源相互接触，例如知识基础或环境，以收集其他信息，以收集其他信息。我们将我们的方法（命名为React）应用于各种语言和决策任务，并证明了其对最先进的基准的有效性，并提高了人类对方法的可解释性和可信度，而无需推理或行动组成部分。具体而言，通过与简单的Wikipedia API相互作用，反应克服了幻觉和误差传播的问题，在经过思考的推理中普遍存在，并产生类似人类的人类的任务解决轨迹，这些轨迹比没有理解痕迹的基础更具解释性。在两个交互式决策基准（Alfworld和Webshop）上，反应优于模仿和强化学习方法的绝对成功率分别为34％和10％，而仅通过一个或两个封闭式示例提示。带代码的项目站点：https：//react-lm.github.io

While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples. Project site with code: https://react-lm.github.io

下载PDF全文

下载文献需遵守相关版权规定

论文标题