论文标题

不再通信:位置RL代理学习简洁的通信协议

Over-communicate no more: Situated RL agents learn concise communication protocols

论文作者

Kalinowska, Aleksandra, Davoodi, Elnaz, Strub, Florian, Mathewson, Kory W, Kajic, Ivana, Bowling, Michael, Murphey, Todd D, Pilarski, Patrick M

论文摘要

尽管众所周知,沟通有助于在多代理设置中的合作,但尚不清楚如何设计可以学会有效有效地相互沟通的人工代理。关于沟通出现的许多研究都使用了加强学习(RL),并探讨了一步参考任务中无关的沟通 - 任务在时间上并不是互动的,并且缺乏自然交流中通常存在的时间压力。在这些环境中,代理商可能会成功学会进行交流,但他们并没有学会简洁地交流信息 - 它们倾向于过度交流和编码效率低下。在这里,我们探索了一个多步骤任务中的位置通信,代理必须放弃环境行动才能进行交流。因此,我们将机会成本施加在沟通上,并模仿过去时间的现实压力。我们比较了在这种压力下的沟通出现,反对学习与表达努力成本进行交流,以每人的惩罚(固定和逐步增加)实施。我们发现,尽管所有经过测试的压力都可以阻止过度通信,但位置的交流最有效,并且与努力成本不同,并不会对出现产生负面影响。在时间扩展的环境中实施机会成本是迈向体现的一步,并且可能是激励有效,类似人类的沟通的前提。

While it is known that communication facilitates cooperation in multi-agent settings, it is unclear how to design artificial agents that can learn to effectively and efficiently communicate with each other. Much research on communication emergence uses reinforcement learning (RL) and explores unsituated communication in one-step referential tasks -- the tasks are not temporally interactive and lack time pressures typically present in natural communication. In these settings, agents may successfully learn to communicate, but they do not learn to exchange information concisely -- they tend towards over-communication and an inefficient encoding. Here, we explore situated communication in a multi-step task, where the acting agent has to forgo an environmental action to communicate. Thus, we impose an opportunity cost on communication and mimic the real-world pressure of passing time. We compare communication emergence under this pressure against learning to communicate with a cost on articulation effort, implemented as a per-message penalty (fixed and progressively increasing). We find that while all tested pressures can disincentivise over-communication, situated communication does it most effectively and, unlike the cost on effort, does not negatively impact emergence. Implementing an opportunity cost on communication in a temporally extended environment is a step towards embodiment, and might be a pre-condition for incentivising efficient, human-like communication.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源