论文标题
学习不后悔的匹配:马尔可夫匹配市场的加强学习
Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets
论文作者
论文摘要
我们研究了马尔可夫匹配的市场,该市场涉及计划者和一组市场的战略代理。在每个步骤中,代理都会出现一个动态上下文,其中上下文确定了实用程序。计划者控制上下文的过渡,以最大程度地提高累积社会福利,而代理商则旨在在每个步骤中找到近视稳定的匹配。这样的设置捕获了包括乘车平台在内的一系列应用程序。我们通过提出一个加强学习框架来正式化问题,该框架将乐观的价值迭代与最大的重量匹配整合在一起。所提出的算法解决了顺序探索,匹配稳定性和功能近似的耦合挑战。我们证明该算法实现了统一的遗憾。
We study a Markov matching market involving a planner and a set of strategic agents on the two sides of the market. At each step, the agents are presented with a dynamical context, where the contexts determine the utilities. The planner controls the transition of the contexts to maximize the cumulative social welfare, while the agents aim to find a myopic stable matching at each step. Such a setting captures a range of applications including ridesharing platforms. We formalize the problem by proposing a reinforcement learning framework that integrates optimistic value iteration with maximum weight matching. The proposed algorithm addresses the coupled challenges of sequential exploration, matching stability, and function approximation. We prove that the algorithm achieves sublinear regret.