在双面市场中用于非政策评估的多项式增强学习框架

论文标题

在双面市场中用于非政策评估的多项式增强学习框架

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

论文作者

Shi, Chengchun, Wan, Runzhe, Song, Ge, Luo, Shikai, Song, Rui, Zhu, Hongtu

论文摘要

乘车共享公司等双面市场通常涉及一组跨时间和/或位置做出顺序决策的受试者。随着智能手机和物联网的迅速发展，它们实质上改变了人类的运输格局。在本文中，我们考虑在乘车共享公司中进行大规模的车队管理，这些公司涉及随着时间的流逝，在不同领域接收产品序列（或处理）的多个单位。在这些研究中出现了主要的技术挑战，例如政策评估，因为（i）空间和时间邻近会导致位置和时间之间的干扰；（ii）大量位置导致维度的诅咒。为了同时解决这两个挑战，我们引入了在这些研究中进行政策评估的多机构增强学习（MARL）框架。我们提出了新的估计量，即使国家行动空间的高度差异性，这些产品下的平均结果是一致的。提出的估计量在模拟实验中有利。我们进一步使用从两侧市场公司获得的真实数据集来说明我们的方法，以评估应用不同的补贴策略的影响。我们提出的方法的Python实现可在https://github.com/runzhestat/causalmarl上获得。

The two-sided markets such as ride-sharing companies often involve a group of subjects who are making sequential decisions across time and/or location. With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings. In this paper we consider large-scale fleet management in ride-sharing companies that involve multiple units in different areas receiving sequences of products (or treatments) over time. Major technical challenges, such as policy evaluation, arise in those studies because (i) spatial and temporal proximities induce interference between locations and times; and (ii) the large number of locations results in the curse of dimensionality. To address both challenges simultaneously, we introduce a multi-agent reinforcement learning (MARL) framework for carrying policy evaluation in these studies. We propose novel estimators for mean outcomes under different products that are consistent despite the high-dimensionality of state-action space. The proposed estimator works favorably in simulation experiments. We further illustrate our method using a real dataset obtained from a two-sided marketplace company to evaluate the effects of applying different subsidizing policies. A Python implementation of our proposed method is available at https://github.com/RunzheStat/CausalMARL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题