论文标题
PYCANON:一个用于检查数据集匿名级别的Python库
pyCANON: A Python library to check the level of anonymity of a dataset
论文作者
论文摘要
与敏感属性和隐私限制公开共享数据是一项艰巨的任务。 In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymization techniques: k-anonymity, ($α$,k)-anonymity, $\ell$-diversity, entropy $\ell$-diversity, recursive (c,$\ell$)-diversity, basic $β$ -LIKENESS,增强$β$ - LIKENESS,T-CLOSEMESS和$δ$限制隐私。对于多个敏感属性的情况,提出了两种评估该技术的方法。该库的主要优势是获得上述每种技术所满足参数的完整报告,并具有一组准标识符和敏感属性的独特要求。我们介绍了实施的方法以及它们预防的攻击,对图书馆的描述,使用不同功能的示例,以及影响和可能开发的可能应用。最后,提出了一些可能在以后更新中纳入的可能方面。
Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymization techniques: k-anonymity, ($α$,k)-anonymity, $\ell$-diversity, entropy $\ell$-diversity, recursive (c,$\ell$)-diversity, basic $β$-likeness, enhanced $β$-likeness, t-closeness and $δ$-disclosure privacy. For the case of more than one sensitive attributes, two approaches are proposed for evaluating this techniques. The main strength of this library is to obtain a full report of the parameters that are fulfilled for each of the techniques mentioned above, with the unique requirement of the set of quasi-identifiers and that of sensitive attributes. We present the methods implemented together with the attacks they prevent, the description of the library, use examples of the different functions, as well as the impact and the possible applications that can be developed. Finally, some possible aspects to be incorporated in future updates are proposed.