梵语中数据驱动依赖性解析的神经方法

论文标题

梵语中数据驱动依赖性解析的神经方法

Neural Approaches for Data Driven Dependency Parsing in Sanskrit

论文作者

Krishna, Amrith, Gupta, Ashim, Garasangi, Deepak, Sandhan, Jivnesh, Satuluri, Pavankumar, Goyal, Pawan

论文摘要

在过去的几十年中，数据驱动的依赖解析方法对自然语言处理引起了极大的兴趣。但是，梵语仍然缺乏纯粹的数据驱动依赖解析器，可能是克里希纳（Krishna）（2019）例外。这主要归因于缺乏特定于任务标记的数据的可用性以及该语言的形态丰富性质。在这项工作中，我们评估了四种不同数据驱动的机器学习模型，这些模型最初是针对不同语言的，并比较了他们在梵语数据上的表现。我们尝试2个基于图的和2个基于过渡的解析器。我们将每个模型在低资源环境中的性能和1,500句进行培训进行比较。此外，由于我们的重点是每个模型的学习能力，因此我们不会将任何梵语特定功能明确地纳入模型中，而是使用本文中每个论文中的默认设置来获取功能。在这项工作中，我们使用内域和室外测试数据集分析了解析器的性能。我们还通过解析经文及其相应的散文顺序（Anvaya）句子来研究单词排序的影响，其中提供句子作为这些系统的输入。

Data-driven approaches for dependency parsing have been of great interest in Natural Language Processing for the past couple of decades. However, Sanskrit still lacks a robust purely data-driven dependency parser, probably with an exception to Krishna (2019). This can primarily be attributed to the lack of availability of task-specific labelled data and the morphologically rich nature of the language. In this work, we evaluate four different data-driven machine learning models, originally proposed for different languages, and compare their performances on Sanskrit data. We experiment with 2 graph based and 2 transition based parsers. We compare the performance of each of the models in a low-resource setting, with 1,500 sentences for training. Further, since our focus is on the learning power of each of the models, we do not incorporate any Sanskrit specific features explicitly into the models, and rather use the default settings in each of the paper for obtaining the feature functions. In this work, we analyse the performance of the parsers using both an in-domain and an out-of-domain test dataset. We also investigate the impact of word ordering in which the sentences are provided as input to these systems, by parsing verses and their corresponding prose order (anvaya) sentences.

下载PDF全文

下载文献需遵守相关版权规定

论文标题