论文标题

人口量表药物ePidemiology数据集的互动探索

Interactive exploration of population scale pharmacoepidemiology datasets

论文作者

Skar, Tengel Ekrem, Holsbø, Einar, Svendsen, Kristian, Bongo, Lars Ailo

论文摘要

与不良药物反应(ADR)数据相关的人口规模的药物处方数据支持了足够大的模型的拟合,可以检测到使用较小数据集中传统方法可检测到的药物使用和ADR模式。但是,检测大型数据集中的ADR模式需要用于可扩展数据处理的工具,用于数据分析的机器学习和交互式可视化。据我们所知,没有现有的药物ePidemiology工具支持​​这三个要求。因此,我们创建了一个工具,用于在有数百万个样本的处方数据集中进行交互式探索模式。我们使用SPARK来预处理数据进行机器学习和使用SQL查询进行分析。我们已经在Keras和Scikit-Learn框架中实现了模型。使用jupyter中的实时Python编码可视化和解释模型结果。我们应用工具来探索挪威处方数据库中的3.84亿个处方数据集,并为住院的长者提供6200万处方。我们在两分钟内预处理数据,以几秒钟的速度训练模型,然后以毫秒为单位。我们的结果表明,将计算能力,简短的计算时间和易用性用于分析人口量表药物学数据集的力量。该代码是开源的,可在以下网址提供:https://github.com/uit-hdl/norpd_prescription_analyses

Population-scale drug prescription data linked with adverse drug reaction (ADR) data supports the fitting of models large enough to detect drug use and ADR patterns that are not detectable using traditional methods on smaller datasets. However, detecting ADR patterns in large datasets requires tools for scalable data processing, machine learning for data analysis, and interactive visualization. To our knowledge no existing pharmacoepidemiology tool supports all three requirements. We have therefore created a tool for interactive exploration of patterns in prescription datasets with millions of samples. We use Spark to preprocess the data for machine learning and for analyses using SQL queries. We have implemented models in Keras and the scikit-learn framework. The model results are visualized and interpreted using live Python coding in Jupyter. We apply our tool to explore a 384 million prescription data set from the Norwegian Prescription Database combined with a 62 million prescriptions for elders that were hospitalized. We preprocess the data in two minutes, train models in seconds, and plot the results in milliseconds. Our results show the power of combining computational power, short computation times, and ease of use for analysis of population scale pharmacoepidemiology datasets. The code is open source and available at: https://github.com/uit-hdl/norpd_prescription_analyses

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源