论文标题

基于会话建议的批处理约束的分布强化学习

Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation

论文作者

Garg, Diksha, Gupta, Priyanka, Malhotra, Pankaj, Vig, Lovekesh, Shroff, Gautam

论文摘要

大多数现有的深入强化学习(RL)方法用于基于会话的建议依赖于与真实用户的昂贵在线互动,或者依靠潜在的基于规则或数据驱动的用户行为模型来学习。在这项工作中,我们专注于在纯批处理或离线设置中学习推荐策略,即仅从离线的历史互动日志中学习政策或从未知和亚地区行为策略产生的批处理数据,而无需从现实世界或用户世界模型中获取数据。我们提出BCD4REC:基于会话建议的批处理分布RL。 BCD4REC建立在批处理(离线)RL和分布RL的最新进展基础上,以从离线日志中学习,同时由于不同的潜在兴趣偏好(环境)来处理用户奖励的本质随机性质。我们证明,BCD4REC在行为政策以及强大的RL和非RL基准的情况下在批处理设置中显着改善了标准绩效指标,例如点击率或买入价格。 BCD4REC的其他有用属性包括:i。从正确的潜在类别中推荐项目,表明尽管动作空间很大(项目数量)和II。在离线日志中通常存在的单击或购买的物品中克服普及的偏见。

Most of the existing deep reinforcement learning (RL) approaches for session-based recommendations either rely on costly online interactions with real users, or rely on potentially biased rule-based or data-driven user-behavior models for learning. In this work, we instead focus on learning recommendation policies in the pure batch or offline setting, i.e. learning policies solely from offline historical interaction logs or batch data generated from an unknown and sub-optimal behavior policy, without further access to data from the real-world or user-behavior models. We propose BCD4Rec: Batch-Constrained Distributional RL for Session-based Recommendations. BCD4Rec builds upon the recent advances in batch (offline) RL and distributional RL to learn from offline logs while dealing with the intrinsically stochastic nature of rewards from the users due to varied latent interest preferences (environments). We demonstrate that BCD4Rec significantly improves upon the behavior policy as well as strong RL and non-RL baselines in the batch setting in terms of standard performance metrics like Click Through Rates or Buy Rates. Other useful properties of BCD4Rec include: i. recommending items from the correct latent categories indicating better value estimates despite large action space (of the order of number of items), and ii. overcoming popularity bias in clicked or bought items typically present in the offline logs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源