论文标题
专业知识问题:从专业反馈中学习
The Expertise Problem: Learning from Specialized Feedback
论文作者
论文摘要
从人类反馈(RLHF)中学习的强化学习是培训代理执行难以指定任务的强大技术。但是,人类的反馈可能很吵,尤其是当人类教师缺乏相关知识或经验时。教师的专业水平各不相同,给定的教师对于任务的不同组成部分可能具有不同的专业知识。因此,从多个教师那里学习的RLHF算法面临一个专业知识问题:给定反馈的可靠性取决于教师的来源,以及教师对任务的相关组成部分的专业化。现有的最先进的RLHF算法假定所有评估都均来自相同的分布,掩盖了这种人间差异,并阻止它们对专业知识的变化进行解释或利用。我们将此问题正式化,将其实施为现有RLHF基准测试的扩展,评估最先进的RLHF算法的性能,并探索改善查询和教师选择的技术。我们的主要贡献是证明和表征专业知识问题,并提供开源实施以测试未来解决方案。
Reinforcement learning from human feedback (RLHF) is a powerful technique for training agents to perform difficult-to-specify tasks. However, human feedback can be noisy, particularly when human teachers lack relevant knowledge or experience. Levels of expertise vary across teachers, and a given teacher may have differing levels of expertise for different components of a task. RLHF algorithms that learn from multiple teachers therefore face an expertise problem: the reliability of a given piece of feedback depends both on the teacher that it comes from and how specialized that teacher is on relevant components of the task. Existing state-of-the-art RLHF algorithms assume that all evaluations come from the same distribution, obscuring this inter- and intra-human variance, and preventing them from accounting for or taking advantage of variations in expertise. We formalize this problem, implement it as an extension of an existing RLHF benchmark, evaluate the performance of a state-of-the-art RLHF algorithm, and explore techniques to improve query and teacher selection. Our key contribution is to demonstrate and characterize the expertise problem, and to provide an open-source implementation for testing future solutions.