论文标题
在主观性和征收之间:计算机视觉的数据注释中的动力动力学
Between Subjectivity and Imposition: Power Dynamics in Data Annotation for Computer Vision
论文作者
论文摘要
数据的解释对于机器学习至关重要。本文研究了在工业环境中执行的图像数据注释的实践。我们将数据注释定义为一种感知练习,注释者通过使用标签将含义分配给数据。以前以人为中心的调查主要集中在注释者的主观性上,这是偏见标签的主要原因。我们对这个问题提出了更广泛的看法:在建构主义扎根的理论的指导下,我们在两个注释公司进行了数周的野外工作。我们分析了哪些结构,权力关系和归化构成了数据的解释。我们的结果表明,注释者的工作得到了其站点上方其他参与者的利益,价值观和优先事项的深刻了解。任意分类垂直于注释者对数据施加。这种强加在很大程度上是归化的。将含义分配给数据通常是技术问题。本文表明,这实际上是对个人和社会产生多种影响的权力行使。
The interpretation of data is fundamental to machine learning. This paper investigates practices of image data annotation as performed in industrial contexts. We define data annotation as a sense-making practice, where annotators assign meaning to data through the use of labels. Previous human-centered investigations have largely focused on annotators subjectivity as a major cause for biased labels. We propose a wider view on this issue: guided by constructivist grounded theory, we conducted several weeks of fieldwork at two annotation companies. We analyzed which structures, power relations, and naturalized impositions shape the interpretation of data. Our results show that the work of annotators is profoundly informed by the interests, values, and priorities of other actors above their station. Arbitrary classifications are vertically imposed on annotators, and through them, on data. This imposition is largely naturalized. Assigning meaning to data is often presented as a technical matter. This paper shows it is, in fact, an exercise of power with multiple implications for individuals and society.