论文标题
数据文件:通过联合数据管理进行可重现的研究
DataFed: Towards Reproducible Research via Federated Data Management
论文作者
论文摘要
科学研究的日益协作,全球化的性质加上共享数据和数据量爆炸的需求,迫切需要科学数据管理系统(SDMS)。 SDMS介绍了数据的逻辑和整体视图,极大地简化并赋予了数据组织,策展,搜索,共享,传播等。我们呈现DataFeed-datafeed-轻巧的,分布式的SDM,跨越了一个宽松地耦合的科学设施网络中存储系统联合的联合。与现有的SDMS产品不同,DataFed使用高性能和可扩展的用户管理和数据传输技术,这些技术简化了DataFed的部署,维护和扩展。 DataFed提供了基于Web的命令行界面,以管理数据并与复杂的科学工作流程集成。 DataFed代表了通过在所需环境中启用正确数据的可靠阶段来迈出可重复的科学研究的一步。
The increasingly collaborative, globalized nature of scientific research combined with the need to share data and the explosion in data volumes present an urgent need for a scientific data management system (SDMS). An SDMS presents a logical and holistic view of data that greatly simplifies and empowers data organization, curation, searching, sharing, dissemination, etc. We present DataFed -- a lightweight, distributed SDMS that spans a federation of storage systems within a loosely-coupled network of scientific facilities. Unlike existing SDMS offerings, DataFed uses high-performance and scalable user management and data transfer technologies that simplify deployment, maintenance, and expansion of DataFed. DataFed provides web-based and command-line interfaces to manage data and integrate with complex scientific workflows. DataFed represents a step towards reproducible scientific research by enabling reliable staging of the correct data at the desired environment.