有效的基于设备会话的建议

论文标题

有效的基于设备会话的建议

Efficient On-Device Session-Based Recommendation

论文作者

Xia, Xin, Yu, Junliang, Wang, Qinyong, Yang, Chaoqun, Nguyen, Quoc Viet Hung, Yin, Hongzhi

论文摘要

基于设备会话的建议系统由于低能/资源消耗和隐私保护而在提供有希望的建议性能的同时，人们一直在越来越关注。为了适合资源受限的移动设备中强大的基于神经会话的建议模型，通过将嵌入式表分解为较小的张量，可以广泛应用张量 - 培训分解及其变体，以减少内存足迹，在压缩建议模型中显示出很大的潜力。但是，由于复杂的生成索引列表的复杂过程和一系列张量乘法，这些模型压缩技术大大增加了本地推理时间，以形成项目嵌入，并且由此产生的智障推荐人无法提供实时响应和建议。为了提高在线推荐效率，我们建议学习基于组成的编码紧凑型项目表示。具体而言，每个项目都由由几个代码字组成的组成代码表示，我们学会嵌入向量来表示每个代码字，而不是每个项目。然后，来自不同嵌入矩阵（即代码簿）嵌入向量的代码字的组成形成了项目嵌入。由于代码簿的大小可能非常小，因此推荐模型因此能够适合资源约束的设备，同时可以为快速本地推理保存代码簿。Besides，为了防止压缩导致模型容量损失，我们提出了一个双向自学知识知识蒸馏框。两个基准数据集的广泛实验结果表明，与现有方法相比，提议的在设备上的建议不仅实现了具有较高压缩比的8倍推理速度，而且还显示出优异的建议性能。

On-device session-based recommendation systems have been achieving increasing attention on account of the low energy/resource consumption and privacy protection while providing promising recommendation performance. To fit the powerful neural session-based recommendation models in resource-constrained mobile devices, tensor-train decomposition and its variants have been widely applied to reduce memory footprint by decomposing the embedding table into smaller tensors, showing great potential in compressing recommendation models. However, these model compression techniques significantly increase the local inference time due to the complex process of generating index lists and a series of tensor multiplications to form item embeddings, and the resultant on-device recommender fails to provide real-time response and recommendation. To improve the online recommendation efficiency, we propose to learn compositional encoding-based compact item representations. Specifically, each item is represented by a compositional code that consists of several codewords, and we learn embedding vectors to represent each codeword instead of each item. Then the composition of the codeword embedding vectors from different embedding matrices (i.e., codebooks) forms the item embedding. Since the size of codebooks can be extremely small, the recommender model is thus able to fit in resource-constrained devices and meanwhile can save the codebooks for fast local inference.Besides, to prevent the loss of model capacity caused by compression, we propose a bidirectional self-supervised knowledge distillation framework. Extensive experimental results on two benchmark datasets demonstrate that compared with existing methods, the proposed on-device recommender not only achieves an 8x inference speedup with a large compression ratio but also shows superior recommendation performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题