论文标题
jupyter笔记本的计算重现性来自生物医学出版物
Computational reproducibility of Jupyter notebooks from biomedical publications
论文作者
论文摘要
Jupyter笔记本电脑允许在一个交互环境中将可执行代码与其文档和输出捆绑在一起,它们代表了一种流行的机制,用于记录和共享计算工作流程,包括研究出版物。在这里,我们分析了来自1117 GitHub存储库的9625 jupyter笔记本的计算可重复性,与生物医学文献存储库PubMed Central相关的1419个出版物相关。其中的8160是用Python编写的,其中包括4169,其依赖项按标准要求文件声明,并且我们试图自动重新运行。对于其中的2684,所有声明的依赖项都可以成功安装,我们重新运行它们以评估可重复性。其中,有396个笔记本电脑没有任何错误,其中包括245个产生的结果与原始报道的结果相同。运行其他笔记本会导致例外。我们缩小常见问题和实践,突出趋势并讨论与生物医学出版物相关的与Jupyter相关的工作流程的潜在改进。
Jupyter notebooks allow to bundle executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows, including for research publications. Here, we analyze the computational reproducibility of 9625 Jupyter notebooks from 1117 GitHub repositories associated with 1419 publications indexed in the biomedical literature repository PubMed Central. 8160 of these were written in Python, including 4169 that had their dependencies declared in standard requirement files and that we attempted to re-run automatically. For 2684 of these, all declared dependencies could be installed successfully, and we re-ran them to assess reproducibility. Of these, 396 notebooks ran through without any errors, including 245 that produced results identical to those reported in the original. Running the other notebooks resulted in exceptions. We zoom in on common problems and practices, highlight trends and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.