学习智能代理和机制的算法

论文标题

学习智能代理和机制的算法

Learning Algorithms for Intelligent Agents and Mechanisms

论文作者

Rahme, Jad

论文摘要

在本论文中，我们研究了在两种不同情况下进行最佳决策的学习算法，即第一部分中的增强学习和第二部分的拍卖设计。强化学习（RL）是机器学习的领域，它与代理在环境中应如何作用，以最大程度地提高其累积奖励。在受统计物理学启发的第2章中，我们开发了一种新颖的增强学习方法（RL），该方法不仅以增强的理想特性学习最佳策略，而且还为最大熵RL提供了新的启示。在第3章中，我们使用贝叶斯观点解决了RL的概括问题。我们表明，对环境动态的不完善知识有效地将马尔可夫决策过程（MDP）变成了我们称为认知POMDP的部分观察到的MDP（POMDP）。通过此观察结果，我们开发了一种新的政策学习算法LEEP，从而改善了概括属性。设计一种兼容，个人合理拍卖以最大化收入的动力是一个具有挑战性且棘手的问题。最近，已经提出了基于深度学习的方法来从数据中学习最佳拍卖。尽管成功，但这种方法遭受了一些限制，包括样本效率低下，对新拍卖缺乏概括以及培训困难。在第4章中，我们构建了一个对称性的神经网络体系结构，Equivariantnet，适用于匿名拍卖。 EquivariantNet不仅更有效地样本效率，而且还能够学习将其概括为其他设置的拍卖规则。在第5章中，我们提出了作为两名玩家游戏的拍卖学习问题的新颖表述。由此产生的学习算法Algnet更容易训练，更可靠，更适合非固定设置。

In this thesis, we research learning algorithms for optimal decision making in two different contexts, Reinforcement Learning in Part I and Auction Design in Part II. Reinforcement learning (RL) is an area of machine learning that is concerned with how an agent should act in an environment in order to maximize its cumulative reward over time. In Chapter 2, inspired by statistical physics, we develop a novel approach to Reinforcement Learning (RL) that not only learns optimal policies with enhanced desirable properties but also sheds new light on maximum entropy RL. In Chapter 3, we tackle the generalization problem in RL using a Bayesian perspective. We show that imperfect knowledge of the environments dynamics effectively turn a fully-observed Markov Decision Process (MDP) into a Partially Observed MDP (POMDP) that we call the Epistemic POMDP. Informed by this observation, we develop a new policy learning algorithm LEEP which has improved generalization properties. Designing an incentive compatible, individually rational auction that maximizes revenue is a challenging and intractable problem. Recently, deep learning based approaches have been proposed to learn optimal auctions from data. While successful, this approach suffers from a few limitations, including sample inefficiency, lack of generalization to new auctions, and training difficulties. In Chapter 4, we construct a symmetry preserving neural network architecture, EquivariantNet, suitable for anonymous auctions. EquivariantNet is not only more sample efficient but is also able to learn auction rules that generalize well to other settings. In Chapter 5, we propose a novel formulation of the auction learning problem as a two player game. The resulting learning algorithm, ALGNet, is easier to train, more reliable and better suited for non stationary settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题