论文标题
样式:使用人类词汇注释来解释样式
StyLEx: Explaining Style Using Human Lexical Annotations
论文作者
论文摘要
大型的预训练的语言模型在各种样式的分类任务上取得了令人印象深刻的结果,但是他们经常学习虚假的领域特定单词以做出预测(Hayati等,2021)。尽管人类的解释突出了风格代币作为此任务的重要特征,但我们观察到模型解释通常与它们不符。为了解决此问题,我们介绍了Stylex,该模型从对文体特征的人类宣传的解释中学习,并共同学习执行任务并预测这些功能作为模型解释。我们的实验表明,Stylex可以提供类似人类的风格词汇解释,而无需牺牲句子级风格的性能在内域和室外数据集上的性能。 Stylem的解释显示了解释指标(足够,合理性)的显着改善,并且在用人类注释进行评估时。与广泛使用的显着性解释基线相比,人类法官也更容易理解。
Large pre-trained language models have achieved impressive results on various style classification tasks, but they often learn spurious domain-specific words to make predictions (Hayati et al., 2021). While human explanation highlights stylistic tokens as important features for this task, we observe that model explanations often do not align with them. To tackle this issue, we introduce StyLEx, a model that learns from human-annotated explanations of stylistic features and jointly learns to perform the task and predict these features as model explanations. Our experiments show that StyLEx can provide human-like stylistic lexical explanations without sacrificing the performance of sentence-level style prediction on both in-domain and out-of-domain datasets. Explanations from StyLEx show significant improvements in explanation metrics (sufficiency, plausibility) and when evaluated with human annotations. They are also more understandable by human judges compared to the widely-used saliency-based explanation baseline.