论文标题

神经键形提取的双曲线相关性匹配

Hyperbolic Relevance Matching for Neural Keyphrase Extraction

论文作者

Song, Mingyang, Feng, Yi, Jing, Liping

论文摘要

键形提取是自然语言处理和信息检索的基本任务,旨在从源文档中提取一组用重要信息的短语。识别重要的键形是钥匙拼式提取任务的核心组成部分,其主要挑战是如何全面地表示信息并准确地辨别重要性。在本文中,为了解决这些问题,我们设计了一个新的双曲匹配模型(超匹配),以表示在相同双曲空间中的短语和文档,并通过庞加莱距离明确地估计短语文档的相关性,这是每个短语的重要分数。具体而言,要捕获层次句法和语义结构信息,超匹配利用了罗伯塔多层中隐藏的表示,并通过自适应混合层将它们集成为嵌入词。同时,考虑到文档中隐藏的层次结构,超匹配将短语和文档通过双曲线短语编码器和双曲线文档编码器嵌入相同的双曲线空间中。由于双曲线空间的良好特性,该策略可以进一步增强短语文档相关性的估计。在这种情况下,可以将键形提取作为匹配问题,并通过最大程度地减少基于双曲线的三重态损失来有效地实现。广泛的实验是在六个基准上进行的,并证明超级匹配表现优于最先进的基准。

Keyphrase extraction is a fundamental task in natural language processing and information retrieval that aims to extract a set of phrases with important information from a source document. Identifying important keyphrase is the central component of the keyphrase extraction task, and its main challenge is how to represent information comprehensively and discriminate importance accurately. In this paper, to address these issues, we design a new hyperbolic matching model (HyperMatch) to represent phrases and documents in the same hyperbolic space and explicitly estimate the phrase-document relevance via the Poincaré distance as the important score of each phrase. Specifically, to capture the hierarchical syntactic and semantic structure information, HyperMatch takes advantage of the hidden representations in multiple layers of RoBERTa and integrates them as the word embeddings via an adaptive mixing layer. Meanwhile, considering the hierarchical structure hidden in the document, HyperMatch embeds both phrases and documents in the same hyperbolic space via a hyperbolic phrase encoder and a hyperbolic document encoder. This strategy can further enhance the estimation of phrase-document relevance due to the good properties of hyperbolic space. In this setting, the keyphrase extraction can be taken as a matching problem and effectively implemented by minimizing a hyperbolic margin-based triplet loss. Extensive experiments are conducted on six benchmarks and demonstrate that HyperMatch outperforms the state-of-the-art baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源