在线贝叶斯的目标推断有限理性的计划代理商

论文标题

在线贝叶斯的目标推断有限理性的计划代理商

Online Bayesian Goal Inference for Boundedly-Rational Planning Agents

论文作者

Zhi-Xuan, Tan, Mann, Jordyn L., Silver, Tom, Tenenbaum, Joshua B., Mansinghka, Vikash K.

论文摘要

人们通常会随着时间的推移观察自己的行动，从而经常推断他人的目标。值得注意的是，即使这些行动导致失败，我们也可以这样做，这使我们能够在发现他们可能无法实现目标时协助他人。我们如何具有类似功能的机器？在这里，我们提出了一种架构，能够从最佳和非最佳动作序列中推断出代理的目标。我们的体系结构将代理模型为有限理性的规划师，通过重新启动通过执行搜索，从而考虑了亚最佳行为。这些模型被指定为概率程序，使我们能够代表并对代理的目标和内部计划过程进行有效的贝叶斯推断。为了执行这种推理，我们开发了顺序的反计划搜索（SIP），这是一种顺序蒙特卡洛算法，利用这些模型的在线重新启动假设，通过观察到新的动作来限制计算，从而限制了计算。我们提出的实验表明，这种建模和推理体系结构的表现优于贝叶斯逆增强学习基准，从而准确地从涉及故障和背向轨迹的最佳和非最佳轨迹中推断目标，同时跨越具有组成结构和稀疏回报的域的跨域。

People routinely infer the goals of others by observing their actions over time. Remarkably, we can do so even when those actions lead to failure, enabling us to assist others when we detect that they might not achieve their goals. How might we endow machines with similar capabilities? Here we present an architecture capable of inferring an agent's goals online from both optimal and non-optimal sequences of actions. Our architecture models agents as boundedly-rational planners that interleave search with execution by replanning, thereby accounting for sub-optimal behavior. These models are specified as probabilistic programs, allowing us to represent and perform efficient Bayesian inference over an agent's goals and internal planning processes. To perform such inference, we develop Sequential Inverse Plan Search (SIPS), a sequential Monte Carlo algorithm that exploits the online replanning assumption of these models, limiting computation by incrementally extending inferred plans as new actions are observed. We present experiments showing that this modeling and inference architecture outperforms Bayesian inverse reinforcement learning baselines, accurately inferring goals from both optimal and non-optimal trajectories involving failure and back-tracking, while generalizing across domains with compositional structure and sparse rewards.

下载PDF全文

下载文献需遵守相关版权规定

论文标题