论文标题
捷克顶法院的引文数据
Citation Data of Czech Apex Courts
论文作者
论文摘要
在本文中,我们介绍了捷克顶点法院的引文数据(最高法院,最高行政法院和宪法法院)。该数据集自动从捷克法院决策的文本语料库中提取-CZCDC 1.0。我们通过构建自然语言处理管道来提取法院决策标识符,从而获得了引文数据。该管道包括(i)文档分割模型和(ii)参考识别模型。此外,手动处理数据集以获得高质量的引文数据,作为后续定性和定量分析的基础。该数据集将提供给公众。
In this paper, we introduce the citation data of the Czech apex courts (Supreme Court, Supreme Administrative Court and Constitutional Court). This dataset was automatically extracted from the corpus of texts of Czech court decisions - CzCDC 1.0. We obtained the citation data by building the natural language processing pipeline for extraction of the court decision identifiers. The pipeline included the (i) document segmentation model and the (ii) reference recognition model. Furthermore, the dataset was manually processed to achieve high-quality citation data as a base for subsequent qualitative and quantitative analyses. The dataset will be made available to the general public.