论文标题

测量数据

Measuring Data

论文作者

Mitchell, Margaret, Luccioni, Alexandra Sasha, Lambert, Nathan, Gerchick, Marissa, McMillan-Major, Angelina, Ozoani, Ezinwanne, Rajani, Nazneen, Thrush, Tristan, Jernite, Yacine, Kiela, Douwe

论文摘要

我们确定测量数据的任务以定量表征机器学习数据和数据集的组成。与对象的高度,宽度和音量相似,数据测量值沿公共维度量化了数据的不同属性,以支持比较。几项研究线提出了我们所说的测量,并具有不同的术语。我们将其中的一些工作融合在一起,尤其是在计算机视觉和语言领域,并从中构建以激励测量数据作为负责人AI开发的关键组成部分。在系统地构建和分析机器学习(ML)数据方面测量数据有助于特定目标,并更好地控制现代ML系统将学习的内容。最后,我们讨论了未来工作的许多途径,数据测量的局限性以及如何在研究和实践中利用这些测量方法。

We identify the task of measuring data to quantitatively characterize the composition of machine learning data and datasets. Similar to an object's height, width, and volume, data measurements quantify different attributes of data along common dimensions that support comparison. Several lines of research have proposed what we refer to as measurements, with differing terminology; we bring some of this work together, particularly in fields of computer vision and language, and build from it to motivate measuring data as a critical component of responsible AI development. Measuring data aids in systematically building and analyzing machine learning (ML) data towards specific goals and gaining better control of what modern ML systems will learn. We conclude with a discussion of the many avenues of future work, the limitations of data measurements, and how to leverage these measurement approaches in research and practice.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源