TSAR-2022共享任务：改进的无监督的词汇简化，并预审计编码器

论文标题

TSAR-2022共享任务：改进的无监督的词汇简化，并预审计编码器

MANTIS at TSAR-2022 Shared Task: Improved Unsupervised Lexical Simplification with Pretrained Encoders

论文作者

Li, Xiaofei, Wiechmann, Daniel, Qiao, Yu, Kerz, Elma

论文摘要

在本文中，我们介绍了关于TSAR-2022的贡献，内容涉及有关EMNLP 2022关于文本简化，可访问性和可读性的词汇简化的任务。我们的方法以以下方式建立并扩展了无监督的词汇简化系统（LSBERT）系统：对于简化候选人选择的子任务，它使用了Roberta Transformer语言模型，并扩大了生成的候选列表的大小。对于随后的替换排名，它引入了一种新的功能加权方案，并采用了基于文本需要的候选过滤方法，以最大程度地提高目标单词及其简化之间的语义相似性。我们表现最好的系统将LSBERT提高了5.9％的精度，并在33个排名解决方案中获得了第二名。

In this paper we present our contribution to the TSAR-2022 Shared Task on Lexical Simplification of the EMNLP 2022 Workshop on Text Simplification, Accessibility, and Readability. Our approach builds on and extends the unsupervised lexical simplification system with pretrained encoders (LSBert) system in the following ways: For the subtask of simplification candidate selection, it utilizes a RoBERTa transformer language model and expands the size of the generated candidate list. For subsequent substitution ranking, it introduces a new feature weighting scheme and adopts a candidate filtering method based on textual entailment to maximize semantic similarity between the target word and its simplification. Our best-performing system improves LSBert by 5.9% accuracy and achieves second place out of 33 ranked solutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题