论文标题
机器学习研究中的陷阱:重新审查开发周期
Pitfalls in Machine Learning Research: Reexamining the Development Cycle
论文作者
论文摘要
机器学习有可能推动数据科学进一步的进步,但由于临时设计过程,数据卫生不良以及模型评估中缺乏统计严格的严格性,这极大地阻碍了它。最近,这些问题已经开始引起更多关注,因为它们引起了公众和令人尴尬的研究和开发问题。从我们作为机器学习研究人员的经验中,我们遵循从算法设计到数据收集的机器学习过程,以建模评估,引起人们对常见陷阱的关注,并为改进提供了实用的建议。在每个步骤中,都会介绍案例研究,以突出这些陷阱在实践中是如何发生的,以及可以改善这些陷阱的地方。
Machine learning has the potential to fuel further advances in data science, but it is greatly hindered by an ad hoc design process, poor data hygiene, and a lack of statistical rigor in model evaluation. Recently, these issues have begun to attract more attention as they have caused public and embarrassing issues in research and development. Drawing from our experience as machine learning researchers, we follow the machine learning process from algorithm design to data collection to model evaluation, drawing attention to common pitfalls and providing practical recommendations for improvements. At each step, case studies are introduced to highlight how these pitfalls occur in practice, and where things could be improved.