通过增强学习优化赞助商搜索的广告修剪

论文标题

通过增强学习优化赞助商搜索的广告修剪

Optimizing AD Pruning of Sponsored Search with Reinforcement Learning

论文作者

Lian, Yijiang, Chen, Zhijie, Pei, Xin, Li, Shuang, Wang, Yifei, Qiu, Yuefeng, Zhang, Zhiheng, Tao, Zhipeng, Yuan, Liang, Guan, Hanju, Zhang, Kefeng, Li, Zhigang, Liu, Xiaochun

论文摘要

工业赞助的搜索系统（SSS）可以在逻辑上分为三个模块：关键字匹配，广告检索和排名。在广告检索过程中，AD候选人成倍增长。具有较高商业价值的查询可能会检索大量的AD候选人，因此排名模块负担不起。由于延迟和计算资源有限，因此必须先对候选人进行修剪。假设我们设置了一条修剪线，将SSS切成两个部分：上游和下游。我们要解决的问题是：如何从上游提供的$ N $候选者中挑选出最大的$ K $项目，以最大程度地提高系统的总收入。由于下游工业非常复杂和更新，因此此问题的关键限制是选择方案应适应下游。在本文中，我们提出了一种新颖的无模型增强学习方法来解决此问题。我们的方法将下游视为黑盒环境，代理商顺序选择项目并最终进入下游，在那里，将估算收入并将其用作改善选择策略的奖励。据我们所知，这是第一次从下游适应视图中考虑系统优化。这也是第一次使用强化学习技术来解决这个问题。这个想法已在百度赞助的搜索系统中成功实现，在线长时间A/B测试显示了收入的显着改善。

Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned earlier. Suppose we set a pruning line to cut SSS into two parts: upstream and downstream. The problem we are going to address is: how to pick out the best $K$ items from $N$ candidates provided by the upstream to maximize the total system's revenue. Since the industrial downstream is very complicated and updated quickly, a crucial restriction in this problem is that the selection scheme should get adapted to the downstream. In this paper, we propose a novel model-free reinforcement learning approach to fixing this problem. Our approach considers downstream as a black-box environment, and the agent sequentially selects items and finally feeds into the downstream, where revenue would be estimated and used as a reward to improve the selection policy. To the best of our knowledge, this is first time to consider the system optimization from a downstream adaption view. It is also the first time to use reinforcement learning techniques to tackle this problem. The idea has been successfully realized in Baidu's sponsored search system, and online long time A/B test shows remarkable improvements on revenue.

下载PDF全文

下载文献需遵守相关版权规定

论文标题