想象力就是您所需要的！用于长期短期对话计划的抽象序列建模的弯曲对比学习

论文标题

想象力就是您所需要的！用于长期短期对话计划的抽象序列建模的弯曲对比学习

Imagination is All You Need! Curved Contrastive Learning for Abstract Sequence Modeling Utilized on Long Short-Term Dialogue Planning

论文作者

Erker, Justus-Jonas, Schaffer, Stefan, Spanakis, Gerasimos

论文摘要

受时空曲率的启发（爱因斯坦，1921年），我们介绍了曲线对比学习（CCL），这是一种新颖的表示学习技术，用于学习多扭转对话中的话语对之间的相对转弯距离。由此产生的双重编码器模型可以通过将目标话语和相应的答复候选者投射到潜在的空间中，将变形金刚作为零拍方式的响应排名模型指向目标。这里的余弦相似性指示候选人对相应目标的距离/达到性。此外，我们探讨了如何利用这些前进的语言表示形式来评估序列强度的可能性，即通过其单个成员的余弦相似性（分别编码）作为弯曲空间中的新兴特性。这些非本地属性使我们能够想象对话中未来模式的可能性，特别是通过订购/识别对话环境的订购/识别未来的目标话语。作为我们分析的一部分，我们研究了使对话（UN）可计划的特征，并在DailyDialog（Li等，2017）数据集中的对话中找到了多个转弯（3圈的61.56％）的计划能力的有力证据。最后，与以前的工作相比，由于我们的相对论方法，我们如何在序列建模任务中实现更高的效率，在推理期间，只有最后一句话需要编码和计算最后的话语。

Inspired by the curvature of space-time (Einstein, 1921), we introduce Curved Contrastive Learning (CCL), a novel representation learning technique for learning the relative turn distance between utterance pairs in multi-turn dialogues. The resulting bi-encoder models can guide transformers as a response ranking model towards a goal in a zero-shot fashion by projecting the goal utterance and the corresponding reply candidates into a latent space. Here the cosine similarity indicates the distance/reachability of a candidate utterance toward the corresponding goal. Furthermore, we explore how these forward-entailing language representations can be utilized for assessing the likelihood of sequences by the entailment strength i.e. through the cosine similarity of its individual members (encoded separately) as an emergent property in the curved space. These non-local properties allow us to imagine the likelihood of future patterns in dialogues, specifically by ordering/identifying future goal utterances that are multiple turns away, given a dialogue context. As part of our analysis, we investigate characteristics that make conversations (un)plannable and find strong evidence of planning capability over multiple turns (in 61.56% over 3 turns) in conversations from the DailyDialog (Li et al., 2017) dataset. Finally, we show how we achieve higher efficiency in sequence modeling tasks compared to previous work thanks to our relativistic approach, where only the last utterance needs to be encoded and computed during inference.

下载PDF全文

下载文献需遵守相关版权规定

论文标题