大规模在线实验中的基于聚类的辍学者插入

论文标题

大规模在线实验中的基于聚类的辍学者插入

Clustering-based Imputation for Dropout Buyers in Large-scale Online Experimentation

论文作者

Shen, Sumin, Mao, Huiying, Zhang, Zezhong, Chen, Zili, Nie, Keyu, Deng, Xinwei

论文摘要

在在线实验中，适当的指标（例如购买）提供了有力的证据，以支持假设并增强决策过程。但是，在线实验中经常发生不完整的指标，这使得可用数据比计划的在线实验（例如A/B测试）少得多。在这项工作中，我们介绍了辍学购买者的概念，并将用户不完整的度量值分为两组：访问者和辍学者。为了分析不完整的指标，我们建议使用$ k $ neart邻居提出一种基于聚类的插补方法。我们提出的插补方法考虑了特定于实验的功能和用户的购物路径活动，从而为不同的用户提供了不同的插补值。为了促进在线实验中大规模数据集的有效插补，该方法结合了分层和聚类。在仿真研究和eBay上的真实在线实验中，将提出方法的性能与几种常规方法进行了比较。

In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process. However, incomplete metrics are frequently occurred in the online experimentation, making the available data to be much fewer than the planned online experiments (e.g., A/B testing). In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a clustering-based imputation method using $k$-nearest neighbors. Our proposed imputation method considers both the experiment-specific features and users' activities along their shopping paths, allowing different imputation values for different users. To facilitate efficient imputation of large-scale data sets in online experimentation, the proposed method uses a combination of stratification and clustering. The performance of the proposed method is compared to several conventional methods in both simulation studies and a real online experiment at eBay.

下载PDF全文

下载文献需遵守相关版权规定

论文标题