论文标题
扩展开放的强盗管道以模拟行业挑战
Extending Open Bandit Pipeline to Simulate Industry Challenges
论文作者
论文摘要
当前标记的数据不可用时,在电子商务行业中通常使用Bandit算法来培训机器学习(ML)系统。但是,行业设置提出了各种挑战,使实践中实施匪徒算法是非平凡的。在本文中,我们详细阐述了非政策优化,延迟奖励,概念漂移,奖励设计和业务规则限制的挑战,在应用强盗算法时遇到的从业者会遇到。我们的主要贡献是扩展到开放匪徒(OBP)框架。我们为一些上述挑战提供模拟组件,以使未来的从业者,研究人员和教育工作者提供资源,以应对电子商务行业遇到的挑战。
Bandit algorithms are often used in the e-commerce industry to train Machine Learning (ML) systems when pre-labeled data is unavailable. However, the industry setting poses various challenges that make implementing bandit algorithms in practice non-trivial. In this paper, we elaborate on the challenges of off-policy optimisation, delayed reward, concept drift, reward design, and business rules constraints that practitioners at Booking.com encounter when applying bandit algorithms. Our main contributions is an extension to the Open Bandit Pipeline (OBP) framework. We provide simulation components for some of the above-mentioned challenges to provide future practitioners, researchers, and educators with a resource to address challenges encountered in the e-commerce industry.