论文标题

评估15个欧盟官方资源不足的语言工具

Evaluating Language Tools for Fifteen EU-official Under-resourced Languages

论文作者

Alves, Diego, Thakkar, Gaurish, Tadić, Marko

论文摘要

本文介绍了可用于15种欧洲官方资源不足语言的语言工具评估运动的结果。该评估是在MSC ITN Cleopatra行动中进行的,该行动旨在在语言处理链(LPC)应用于至少24种EU-Excin语言的情况下构建以跨语义事件为中心的知识处理。在此广告系列中,我们专注于三个现有的NLP平台(Stanford Corenlp,NLP Cube,UDPipe),它们都为资源不足的语言提供了模型,在第一次运行中,我们涵盖了15种可用模型的资源不足的语言。我们介绍了评估活动的设计,并介绍结果并讨论它们。我们认为在一个百分比内报告的结果与我们的测试结果之间的差异为可接受的公差的限制,因此将此结果视为可复制的。但是,对于多种语言,结果低于文献中报告的结果,在某些情况下,我们的测试结果甚至比以前报告的结果更好。尤其有问题的是对NERC系统的评估。原因之一是缺乏普遍或交叉适用的命名实体分类方案,该方案将以类似于解析任务中的通用依赖方案的不同语言来执行NERC任务。建立这样的计划已成为我们未来的研究方向之一。

This article presents the results of the evaluation campaign of language tools available for fifteen EU-official under-resourced languages. The evaluation was conducted within the MSC ITN CLEOPATRA action that aims at building the cross-lingual event-centric knowledge processing on top of the application of linguistic processing chains (LPCs) for at least 24 EU-official languages. In this campaign, we concentrated on three existing NLP platforms (Stanford CoreNLP, NLP Cube, UDPipe) that all provide models for under-resourced languages and in this first run we covered 15 under-resourced languages for which the models were available. We present the design of the evaluation campaign and present the results as well as discuss them. We considered the difference between reported and our tested results within a single percentage point as being within the limits of acceptable tolerance and thus consider this result as reproducible. However, for a number of languages, the results are below what was reported in the literature, and in some cases, our testing results are even better than the ones reported previously. Particularly problematic was the evaluation of NERC systems. One of the reasons is the absence of universally or cross-lingually applicable named entities classification scheme that would serve the NERC task in different languages analogous to the Universal Dependency scheme in parsing task. To build such a scheme has become one of our the future research directions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源