具有预测奖励的多代理主动感知

论文标题

具有预测奖励的多代理主动感知

Multi-agent active perception with prediction rewards

论文作者

Lauri, Mikko, Oliehoek, Frans A.

论文摘要

多代理主动感知是一项任务，其中一组代理团队合作地观察到计算隐藏变量的联合估计。该任务是分散的，只有在任务结束后通过融合所有代理的观察来计算关节估计。目的是最大化估计值的准确性。准确性是由集中式决策者确定的集中预测奖励来量化的，该决策者意识到任务结束后所有代理人收集的观察结果。在本文中，我们将多代理主动感知建模为一种分散的部分可观察到的马尔可夫决策过程（DEC-POMDP），并具有凸集中的预测奖励。我们证明，通过为每个代理引入各个预测动作，问题将转换为具有分散预测奖励的标准DEC-POMDP。由于权力下放而造成的损失是有限的，我们为零时提供了足够的条件。我们的结果允许将任何DECOMDP解决方案算法应用于多代理主动感知问题，并使计划在不明确计算关节估计的情况下减少不确定性。我们通过将标准的DECOMDP算法应用于多代理主动感知问题，显示出结果的经验实用性，显示了计划范围内的可伸缩性。

Multi-agent active perception is a task where a team of agents cooperatively gathers observations to compute a joint estimate of a hidden variable. The task is decentralized and the joint estimate can only be computed after the task ends by fusing observations of all agents. The objective is to maximize the accuracy of the estimate. The accuracy is quantified by a centralized prediction reward determined by a centralized decision-maker who perceives the observations gathered by all agents after the task ends. In this paper, we model multi-agent active perception as a decentralized partially observable Markov decision process (Dec-POMDP) with a convex centralized prediction reward. We prove that by introducing individual prediction actions for each agent, the problem is converted into a standard Dec-POMDP with a decentralized prediction reward. The loss due to decentralization is bounded, and we give a sufficient condition for when it is zero. Our results allow application of any Dec-POMDP solution algorithm to multi-agent active perception problems, and enable planning to reduce uncertainty without explicit computation of joint estimates. We demonstrate the empirical usefulness of our results by applying a standard Dec-POMDP algorithm to multi-agent active perception problems, showing increased scalability in the planning horizon.

下载PDF全文

下载文献需遵守相关版权规定

论文标题