弱监督的学习符合乘车共享用户体验增强

论文标题

弱监督的学习符合乘车共享用户体验增强

Weakly Supervised Learning Meets Ride-Sharing User Experience Enhancement

论文作者

Guo, Lan-Zhe, Kuang, Feng, Liu, Zhang-Xun, Li, Yu-Feng, Ma, Nan, Qie, Xiao-Hu

论文摘要

弱监督的学习目的旨在应对稀缺标记的数据。以前的弱监督研究通常认为数据中只有一种弱监督。但是，在许多应用程序中，原始数据通常同时包含一种以上的弱监督。例如，在最大的在线乘车共享平台之一迪迪（Didi）的用户体验增强中，乘车评论数据包含严重的标签噪声（由于乘客的主观因素）和严重的标签分配偏见（由于采样偏见）。我们称之为“复合弱监督学习”之类的问题。在本文中，我们提出了CWSL方法，以基于DIDI乘坐评论数据来解决此问题。具体而言，实例重新加权策略用于应对评论数据中的严重标签噪声，其中有害嘈杂的实例的权重很小。诸如AUC而不是准确性之类的鲁棒标准和验证性能被优化，以校正偏置数据标签。交替优化和随机梯度方法加速了大规模数据的优化。关于DIDI乘车分享评论数据的实验清楚地验证了有效性。我们希望这项工作可能会阐明将弱监督的学习应用于复杂的真实情况。

Weakly supervised learning aims at coping with scarce labeled data. Previous weakly supervised studies typically assume that there is only one kind of weak supervision in data. In many applications, however, raw data usually contains more than one kind of weak supervision at the same time. For example, in user experience enhancement from Didi, one of the largest online ride-sharing platforms, the ride comment data contains severe label noise (due to the subjective factors of passengers) and severe label distribution bias (due to the sampling bias). We call such a problem as "compound weakly supervised learning". In this paper, we propose the CWSL method to address this problem based on Didi ride-sharing comment data. Specifically, an instance reweighting strategy is employed to cope with severe label noise in comment data, where the weights for harmful noisy instances are small. Robust criteria like AUC rather than accuracy and the validation performance are optimized for the correction of biased data label. Alternating optimization and stochastic gradient methods accelerate the optimization on large-scale data. Experiments on Didi ride-sharing comment data clearly validate the effectiveness. We hope this work may shed some light on applying weakly supervised learning to complex real situations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题