论文标题
可解释数据流学习的动态模型树
Dynamic Model Tree for Interpretable Data Stream Learning
论文作者
论文摘要
数据流在现代商业和社会中无处不在。实际上,数据流可能会随着时间的流逝而发展,并且不能无限期地存储。因此,在数据流上的有效透明的机器学习通常具有挑战性。 Hoeffing Trees已成为在线预测建模的最先进。它们易于训练,并在固定过程中提供有意义的收敛保证。然而,与此同时,霍夫丁树通常需要启发式和昂贵的扩展以适应分配变化,这可能会大大损害其可解释性。在这项工作中,我们在不断发展的数据流中重新访问用于机器学习的模型树。模型树能够维持活动数据概念的更灵活,本地强大的表示,使其自然地适合数据流应用程序。我们的新型框架(称为动态模型树)满足了理想的一致性和最小的属性。在使用合成和现实的表格流数据集的实验中,我们表明所提出的框架可以大大减少现有的增量决策树所需的拆分数量。同时,我们的框架通常优于预测质量的最先进模型,尤其是在涉及概念漂移的情况下。因此,动态模型树是一个强大的在线学习框架,有助于在数据流中更轻巧,可解释的机器学习。
Data streams are ubiquitous in modern business and society. In practice, data streams may evolve over time and cannot be stored indefinitely. Effective and transparent machine learning on data streams is thus often challenging. Hoeffding Trees have emerged as a state-of-the art for online predictive modelling. They are easy to train and provide meaningful convergence guarantees under a stationary process. Yet, at the same time, Hoeffding Trees often require heuristic and costly extensions to adjust to distributional change, which may considerably impair their interpretability. In this work, we revisit Model Trees for machine learning in evolving data streams. Model Trees are able to maintain more flexible and locally robust representations of the active data concept, making them a natural fit for data stream applications. Our novel framework, called Dynamic Model Tree, satisfies desirable consistency and minimality properties. In experiments with synthetic and real-world tabular streaming data sets, we show that the proposed framework can drastically reduce the number of splits required by existing incremental decision trees. At the same time, our framework often outperforms state-of-the-art models in terms of predictive quality -- especially when concept drift is involved. Dynamic Model Trees are thus a powerful online learning framework that contributes to more lightweight and interpretable machine learning in data streams.