论文标题

被阻塞还是打破?自动检测隐私干预何时破坏网站

Blocked or Broken? Automatically Detecting When Privacy Interventions Break Websites

论文作者

Smith, Michael, Snyder, Peter, Haller, Moritz, Livshits, Benjamin, Stefan, Deian, Haddadi, Hamed

论文摘要

人群过滤器列表的开发和维护中的一个核心问题是,他们的维护者无法自信地预测(以及何处)新的过滤器列表规则是否会破坏网站。这是Web巨大的结果,它阻止了过滤器列表作者在将新的阻止规则运送到数百万用户之前广泛理解其影响。过滤器列表作者无法评估新规则的Web兼容性影响在运输之前严重降低了基于滤波器列表的内容阻止的好处:滤波器列表都过于保存(即规则狭窄地量身定制,以降低破坏事物的风险)和折断错误(即阻塞工具仍然破坏大量的站点)。为了扩展到Web的大小和范围,过滤器列表作者需要一个自动化系统来检测新的过滤器规则何时破坏网站,在此之前,断裂有机会使其成为最终用户。 在这项工作中,我们设计并实施了第一个自动化系统,用于预测何时滤波列表规则打破网站。我们构建了一个分类器,该分类器在通过easylist项目和新型浏览器仪器中的兼容性数据结合的数据集中训练,并发现其与实际级别相准确(AUC 0.88)。在评估提出的隐私干预的兼容性风险时,我们的开源系统不需要人类的互动。我们还提出了40页的行为,这些行为最预测观察到的网站中的破损。

A core problem in the development and maintenance of crowd-sourced filter lists is that their maintainers cannot confidently predict whether (and where) a new filter list rule will break websites. This is a result of enormity of the Web, which prevents filter list authors from broadly understanding the impact of a new blocking rule before they ship it to millions of users. The inability of filter list authors to evaluate the Web compatibility impact of a new rule before shipping it severely reduces the benefits of filter-list-based content blocking: filter lists are both overly-conservative (i.e. rules are tailored narrowly to reduce the risk of breaking things) and error-prone (i.e. blocking tools still break large numbers of sites). To scale to the size and scope of the Web, filter list authors need an automated system to detect when a new filter rule breaks websites, before that breakage has a chance to make it to end users. In this work, we design and implement the first automated system for predicting when a filter list rule breaks a website. We build a classifier, trained on a dataset generated by a combination of compatibility data from the EasyList project and novel browser instrumentation, and find it is accurate to practical levels (AUC 0.88). Our open source system requires no human interaction when assessing the compatibility risk of a proposed privacy intervention. We also present the 40 page behaviors that most predict breakage in observed websites.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源