增强的诗人：通过无限的学习挑战的发明及其解决方案，开放式增援学习

论文标题

增强的诗人：通过无限的学习挑战的发明及其解决方案，开放式增援学习

Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

论文作者

Wang, Rui, Lehman, Joel, Rawal, Aditya, Zhi, Jiale, Li, Yulun, Clune, Jeff, Stanley, Kenneth O.

论文摘要

创建开放式算法，从而产生自己的无休止的新颖和适当挑战的学习机会流，可以帮助自动化和加速机器学习的进步。朝这个方向朝着这个方向的最新一步是配对的开放式开拓者（诗人），这是一种生成和解决自己的挑战的算法，并允许解决方案在挑战之间进行目标转换以避免本地Optima。但是，由于算法本身的局限性以及包括有限的问题空间和缺乏普遍进步措施，原始诗人无法证明其全部创造潜力。重要的是，这两种局限性不仅对诗人都构成障碍，而且对一般的开放式追求构成了障碍。在这里，我们介绍并经验验证了原始算法的两项新创新，以及两项旨在帮助阐明其全部潜力的外部创新。这四个进步共同使最开放的算法示范能够迄今为止。算法创新是（1）域名衡量新的新挑战的领域衡量标准，使系统能够无休止地创造和解决有趣的挑战，以及（2）确定代理何时应该从一个问题到另一个问题到另一个问题的目标启发式启发式启发式启发式（帮助开放式搜索范围更好）。在算法本身之外，为了使开放性的更确定的证明，我们介绍了（3）一种新颖，更灵活的方式来编码环境挑战，（4）对系统继续表现出开放式创新的程度的一般度量。增强的诗人产生了各种各样的复杂行为，这些行为解决了广泛的环境挑战，其中许多挑战无法通过其他方式解决。

Creating open-ended algorithms, which generate their own never-ending stream of novel and appropriately challenging learning opportunities, could help to automate and accelerate progress in machine learning. A recent step in this direction is the Paired Open-Ended Trailblazer (POET), an algorithm that generates and solves its own challenges, and allows solutions to goal-switch between challenges to avoid local optima. However, the original POET was unable to demonstrate its full creative potential because of limitations of the algorithm itself and because of external issues including a limited problem space and lack of a universal progress measure. Importantly, both limitations pose impediments not only for POET, but for the pursuit of open-endedness in general. Here we introduce and empirically validate two new innovations to the original algorithm, as well as two external innovations designed to help elucidate its full potential. Together, these four advances enable the most open-ended algorithmic demonstration to date. The algorithmic innovations are (1) a domain-general measure of how meaningfully novel new challenges are, enabling the system to potentially create and solve interesting challenges endlessly, and (2) an efficient heuristic for determining when agents should goal-switch from one problem to another (helping open-ended search better scale). Outside the algorithm itself, to enable a more definitive demonstration of open-endedness, we introduce (3) a novel, more flexible way to encode environmental challenges, and (4) a generic measure of the extent to which a system continues to exhibit open-ended innovation. Enhanced POET produces a diverse range of sophisticated behaviors that solve a wide range of environmental challenges, many of which cannot be solved through other means.

下载PDF全文

下载文献需遵守相关版权规定

论文标题