论文标题
关于对项目推荐的采样软效果丢失的有效性
On the Effectiveness of Sampled Softmax Loss for Item Recommendation
论文作者
论文摘要
学习目标起着建立推荐系统的基本作用。大多数方法通常会采用点或成对损失来训练模型参数,而在扩展到大数据集或流媒体数据时,由于其计算复杂性,由于其计算复杂性而很少关注SoftMax损失。采样的软马克斯(SSM)损失是有效替代软马克斯损失的。它的特殊情况是Infonce损失,已被广泛用于自我监督的学习中,并在对比度学习方面表现出色。尽管如此,有限的建议工作将SSM损失作为学习目标。更糟糕的是,他们都没有彻底探索其属性,并回答``SSM损失诉讼供项目推荐吗?''和``与普遍的损失相比,SSM损失的概念优势是什么?'',据我们所知,SSM损失的概念优势是什么。 在这项工作中,我们旨在更好地了解SSM,以供项目推荐。具体而言,我们首先在理论上揭示了三个模型不足的优势:(1)缓解流行偏见; (2)开采硬否定样品; (3)最大化排名指标。但是,根据我们的经验研究,我们认识到SSM中余弦相似性函数的默认选择限制了其学习表示向量的幅度的能力。因此,SSM与在调整幅度方面也缺乏的模型的组合可能导致表示不良。更进一步,我们提供了数学证据,即图形卷积网络中传递的消息传递方案可以根据节点程度调整表示幅度,从而自然可以补偿SSM的缺点。在四个基准数据集上进行的广泛实验证明了我们的分析是合理的,证明了SSM的优势用于项目建议。我们的实现在Tensorflow和Pytorch中都可以使用。
The learning objective plays a fundamental role to build a recommender system. Most methods routinely adopt either pointwise or pairwise loss to train the model parameters, while rarely pay attention to softmax loss due to its computational complexity when scaling up to large datasets or intractability for streaming data. The sampled softmax (SSM) loss emerges as an efficient substitute for softmax loss. Its special case, InfoNCE loss, has been widely used in self-supervised learning and exhibited remarkable performance for contrastive learning. Nonetheless, limited recommendation work uses the SSM loss as the learning objective. Worse still, none of them explores its properties thoroughly and answers ``Does SSM loss suit for item recommendation?'' and ``What are the conceptual advantages of SSM loss, as compared with the prevalent losses?'', to the best of our knowledge. In this work, we aim to offer a better understanding of SSM for item recommendation. Specifically, we first theoretically reveal three model-agnostic advantages: (1) mitigating popularity bias; (2) mining hard negative samples; and (3) maximizing the ranking metric. However, based on our empirical studies, we recognize that the default choice of cosine similarity function in SSM limits its ability in learning the magnitudes of representation vectors. As such, the combinations of SSM with the models that also fall short in adjusting magnitudes may result in poor representations. One step further, we provide mathematical proof that message passing schemes in graph convolution networks can adjust representation magnitude according to node degree, which naturally compensates for the shortcoming of SSM. Extensive experiments on four benchmark datasets justify our analyses, demonstrating the superiority of SSM for item recommendation. Our implementations are available in both TensorFlow and PyTorch.