论文标题
道德代码是否具有道德代码?探究德尔菲的道德哲学
Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy
论文作者
论文摘要
为了确保机器学习模型的输出符合人类的道德价值观,最近的工作已经开始探索明确训练模型以了解对与错之间的差异的可能性。这通常以自下而上的方式完成,通过将模型暴露在不同的情况下,并用人类的道德判断注释。但是,一个问题是,训练有素的模型是否真的从这些数据集中学习了任何一致,更高的道德原则 - 如果是的话,什么?在这里,我们探究了具有一组标准化道德问卷的Allen AI Delphi模型,并发现尽管有一些不一致之处,但Delphi倾向于反映与涉及注释过程的人口统计学群体相关的道德原则。我们质疑这是否是理想的,并讨论我们如何从这些知识中前进。
In an effort to guarantee that machine learning model outputs conform with human moral values, recent work has begun exploring the possibility of explicitly training models to learn the difference between right and wrong. This is typically done in a bottom-up fashion, by exposing the model to different scenarios, annotated with human moral judgements. One question, however, is whether the trained models actually learn any consistent, higher-level ethical principles from these datasets -- and if so, what? Here, we probe the Allen AI Delphi model with a set of standardized morality questionnaires, and find that, despite some inconsistencies, Delphi tends to mirror the moral principles associated with the demographic groups involved in the annotation process. We question whether this is desirable and discuss how we might move forward with this knowledge.