法律意愿评估作为自然语言推断的有效性评估

论文标题

法律意愿评估作为自然语言推断的有效性评估

Validity Assessment of Legal Will Statements as Natural Language Inference

论文作者

Kwak, Alice Saebom, Israelsen, Jacob O., Morrison, Clayton T., Bambauer, Derek E., Surdeanu, Mihai

论文摘要

这项工作介绍了一种自然推论（NLI）数据集，该数据集的重点是法律意愿中陈述的有效性。该数据集是独一无二的，因为：（a）每个构成决定都需要三个输入：遗嘱，法律和遗嘱人去世时持有的条件的陈述；（b）随附的文本比当前NLI数据集中的文本更长。我们在此数据集中培训了八个神经NLI模型。所有模型都达到了80％以上的宏F1和精度，这表明神经方法可以很好地处理此任务。但是，群体准确性是一种更严格的评估措施，它是由与单位同一陈述产生的一组正面和负面示例计算得出的，充其量是在80年代中期，这表明模型对任务的理解仍然是肤浅的。进一步的消融分析和解释实验表明，所有三个文本段均用于预测，但是某些决定依赖于语义上无关的令牌。这表明对这些较长的文本的过度拟合可能会发生，并且要解决此任务需要进行其他研究。

This work introduces a natural language inference (NLI) dataset that focuses on the validity of statements in legal wills. This dataset is unique because: (a) each entailment decision requires three inputs: the statement from the will, the law, and the conditions that hold at the time of the testator's death; and (b) the included texts are longer than the ones in current NLI datasets. We trained eight neural NLI models in this dataset. All the models achieve more than 80% macro F1 and accuracy, which indicates that neural approaches can handle this task reasonably well. However, group accuracy, a stricter evaluation measure that is calculated with a group of positive and negative examples generated from the same statement as a unit, is in mid 80s at best, which suggests that the models' understanding of the task remains superficial. Further ablative analyses and explanation experiments indicate that all three text segments are used for prediction, but some decisions rely on semantically irrelevant tokens. This indicates that overfitting on these longer texts likely happens, and that additional research is required for this task to be solved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题