论文标题
集体知识:将研究项目作为可重复使用的组件和便携式工作流程的数据库
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs
论文作者
论文摘要
本文提供了集体知识框架(CK或CKnowledge)的动机和概述。 CK概念是将研究项目分解成可重复使用的组件,这些组件封装了研究工件,并提供统一的应用程序编程接口(API),命令行界面(CLI),元描述和相关工件的常见自动化操作。 CK框架用于组织和管理研究项目,作为此类组件的数据库。 受USB“插件”方法的启发,CK还有助于组装便携式工作流,这些工作流可以自动插入来自不同用户和供应商的兼容组件(型号,数据集,框架,编译器,编译器,工具)。这样的工作流可以使用带有软件检测插件的通用CK程序管道以统一的方式在不同平台和环境上构建和运行算法,并自动安装缺失的软件包。 本文介绍了许多工业项目,其中成功验证了模块化CK方法,以便在速度,准确性,能源,大小和各种成本方面自动化基准测试,自动调整和共同设计用于机器学习的软件和硬件(ML)和人工智能(AI)。 CK框架还有助于在几个计算机科学会议上自动化工件评估过程,并使其更容易从已发表的论文中复制,比较和重复使用研究技术,将其部署在生产中,并自动适应它们不断更改数据集,模型,模型和系统。 长期的目标是通过将研究人员和从业人员联系起来,以共享和重复其所有知识,最佳实践,工件,工作流程和实验结果,以https://cknowledge.io共享和重复其所有知识,最佳实践,工件,工作流程和实验结果。
This article provides the motivation and overview of the Collective Knowledge framework (CK or cKnowledge). The CK concept is to decompose research projects into reusable components that encapsulate research artifacts and provide unified application programming interfaces (APIs), command-line interfaces (CLIs), meta descriptions and common automation actions for related artifacts. The CK framework is used to organize and manage research projects as a database of such components. Inspired by the USB "plug and play" approach for hardware, CK also helps to assemble portable workflows that can automatically plug in compatible components from different users and vendors (models, datasets, frameworks, compilers, tools). Such workflows can build and run algorithms on different platforms and environments in a unified way using the universal CK program pipeline with software detection plugins and the automatic installation of missing packages. This article presents a number of industrial projects in which the modular CK approach was successfully validated in order to automate benchmarking, auto-tuning and co-design of efficient software and hardware for machine learning (ML) and artificial intelligence (AI) in terms of speed, accuracy, energy, size and various costs. The CK framework also helped to automate the artifact evaluation process at several computer science conferences as well as to make it easier to reproduce, compare and reuse research techniques from published papers, deploy them in production, and automatically adapt them to continuously changing datasets, models and systems. The long-term goal is to accelerate innovation by connecting researchers and practitioners to share and reuse all their knowledge, best practices, artifacts, workflows and experimental results in a common, portable and reproducible format at https://cKnowledge.io .