论文标题
提高对价值和规范估计的信心
Improving Confidence in the Estimation of Values and Norms
论文作者
论文摘要
在我们的日常生活中,自主代理(AA)将越来越多地与我们互动。尽管我们希望与AAS相关的好处,但必须将其行为与我们的价值观和规范保持一致。因此,AA将需要估计它与人类相互作用的人类的价值和规范,这在仅观察代理人的行为时并不是一项简单的任务。本文分析了AA在多大程度上能够根据Ultimatum游戏中的动作来估计模拟人类代理商(SHA)的价值和规范。我们提出了两种减少SHAS分析的歧义的方法:一种基于搜索空间探索,另一种基于反事实分析。我们发现,这两种方法都能够提高对人类价值和规范的置信度,但在适用性方面有所不同,当与代理的相互作用的数量被最小化时,后者更有效。这些见解对于改善AAS与人类价值和规范的一致性非常有用。
Autonomous agents (AA) will increasingly be interacting with us in our daily lives. While we want the benefits attached to AAs, it is essential that their behavior is aligned with our values and norms. Hence, an AA will need to estimate the values and norms of the humans it interacts with, which is not a straightforward task when solely observing an agent's behavior. This paper analyses to what extent an AA is able to estimate the values and norms of a simulated human agent (SHA) based on its actions in the ultimatum game. We present two methods to reduce ambiguity in profiling the SHAs: one based on search space exploration and another based on counterfactual analysis. We found that both methods are able to increase the confidence in estimating human values and norms, but differ in their applicability, the latter being more efficient when the number of interactions with the agent is to be minimized. These insights are useful to improve the alignment of AAs with human values and norms.