论文标题
迈向医疗保健ML的合规数据管理系统
Towards Compliant Data Management Systems for Healthcare ML
论文作者
论文摘要
机器学习方法的日益普及以及对数据保护和数据隐私的不断提高为建立真正安全和值得信赖的医疗保健系统提供了机会。 GDPR和HIPAA等法规提出了广泛的准则和框架,但实施可以提出技术挑战。合规的数据管理系统需要执行许多技术和管理保障。尽管可以为两个保障措施制定政策,但实时了解合规性的可用性有限。机器学习从业人员越来越多地意识到跟踪敏感数据的重要性。通过对个人身份,健康或商业敏感信息的敏感性,以更具动态的方式评估数据流程将有价值。我们回顾了医疗保健机器学习项目中的数据流如何从源到存储,再在培训算法及其他方面使用。基于此,我们为数据版本设计工程规范和解决方案。我们的目标是设计工具,以在项目的整个生命周期中检测和跟踪跨机器和用户的敏感数据,从而优先考虑效率,一致性和易用性。我们构建了解决方案的原型,该原型证明了该域中的困难。这些共同努力共同为建立医疗机器学习项目的合规数据管理系统。
The increasing popularity of machine learning approaches and the rising awareness of data protection and data privacy presents an opportunity to build truly secure and trustworthy healthcare systems. Regulations such as GDPR and HIPAA present broad guidelines and frameworks, but the implementation can present technical challenges. Compliant data management systems require enforcement of a number of technical and administrative safeguards. While policies can be set for both safeguards there is limited availability to understand compliance in real time. Increasingly, machine learning practitioners are becoming aware of the importance of keeping track of sensitive data. With sensitivity over personally identifiable, health or commercially sensitive information there would be value in understanding assessment of the flow of data in a more dynamic fashion. We review how data flows within machine learning projects in healthcare from source to storage to use in training algorithms and beyond. Based on this, we design engineering specifications and solutions for versioning of data. Our objective is to design tools to detect and track sensitive data across machines and users across the life cycle of a project, prioritizing efficiency, consistency and ease of use. We build a prototype of the solution that demonstrates the difficulties in this domain. Together, these represent first efforts towards building a compliant data management system for healthcare machine learning projects.