poligraph：使用知识图（期刊版本）的自动隐私政策分析

论文标题

poligraph：使用知识图（期刊版本）的自动隐私政策分析

PoliGraph: Automated Privacy Policy Analysis using Knowledge Graphs (Journal Version)

论文作者

Cui, Hao, Trimananda, Rahmadi, Jordan, Scott, Markopoulou, Athina

论文摘要

隐私政策披露组织如何收集和处理个人信息。最近的工作在利用自然语言处理（NLP）来自动化隐私政策分析并从不同句子中提取数据收集语句的进展取得了进展，这些句子彼此隔离。在本文中，我们首次以整合的方式查看和分析了隐私政策的整个文本。在方法论方面：（1）我们定义了poligraph，这是一种知识图，将策略中的陈述捕获为文本不同部分之间的关系；（2）我们重新审视了以前以启发式方式定义的本体论的概念，以捕获术语之间的收藏关系。我们在本地和全球本体论之间进行了明确的区分，以捕捉单个政策，应用领域和隐私法的背景。我们开发了Poligrapher，这是一种NLP工具，可以使用语言分析从文本中自动提取poligraph。我们使用公共数据集进行评估，我们表明，poligrapher识别的收集声明比先前的最新时间多40％，精度为97％。在应用方面，poligraph可以对政策语料库进行自动分析，并允许我们：（1）在不同策略的文本中揭示常见模式，（2）评估策略中定义的条款的正确性。我们还将pligraph应用于：（3）检测政策中的矛盾，在该政策中，我们通过先前的工作显示错误警报，以及（4）分析政策和网络流量的一致性，在这里我们确定的明确披露要比先前的工作明显更明确。最后，利用新兴大语模型（LLMS）的功能，我们还提出了Poligrapher-LM，该工具使用LLM提示而不是NLP语言分析，从策略文本中提取poligraph，并表明它进一步改善了覆盖范围。

Privacy policies disclose how an organization collects and handles personal information. Recent work has made progress in leveraging natural language processing (NLP) to automate privacy policy analysis and extract data collection statements from different sentences, considered in isolation from each other. In this paper, we view and analyze, for the first time, the entire text of a privacy policy in an integrated way. In terms of methodology: (1) we define PoliGraph, a type of knowledge graph that captures statements in a policy as relations between different parts of the text; and (2) we revisit the notion of ontologies, previously defined in heuristic ways, to capture subsumption relations between terms. We make a clear distinction between local and global ontologies to capture the context of individual policies, application domains, and privacy laws. We develop PoliGrapher, an NLP tool to automatically extract PoliGraph from the text using linguistic analysis. Using a public dataset for evaluation, we show that PoliGrapher identifies 40% more collection statements than prior state-of-the-art, with 97% precision. In terms of applications, PoliGraph enables automated analysis of a corpus of policies and allows us to: (1) reveal common patterns in the texts across different policies, and (2) assess the correctness of the terms as defined within a policy. We also apply PoliGraph to: (3) detect contradictions in a policy, where we show false alarms by prior work, and (4) analyze the consistency of policies and network traffic, where we identify significantly more clear disclosures than prior work. Finally, leveraging the capabilities of the emerging large language models (LLMs), we also present PoliGrapher-LM, a tool that uses LLM prompting instead of NLP linguistic analysis, to extract PoliGraph from the policy text, and we show that it further improves coverage.

下载PDF全文

下载文献需遵守相关版权规定

论文标题