灵活功能库的倾斜回归树的收敛速率

论文标题

灵活功能库的倾斜回归树的收敛速率

Convergence Rates of Oblique Regression Trees for Flexible Function Libraries

论文作者

Cattaneo, Matias D., Chandak, Rajita, Klusowski, Jason M.

论文摘要

我们开发了一个理论框架来分析倾斜决策树，其中每个决策节点在协变量的线性组合处发生分裂（与仅涉及仅涉及单个协变量的迫使轴对齐的分裂的常规树结构相反）。自从80年代中期以来，这种方法引起了计算机科学和优化社区的极大关注，但它们比轴心一致的对应物所提供的优势仅在经验上仍然是合理的，而且成功的解释在很大程度上基于启发式方法。在理论和实践之间填补了这一长期存在的差距，我们表明倾斜的回归树（通过递归最大程度地减少平方误差构建）满足了一种甲骨文的不平等，并且可以适应由脊函数的线性组合及其极限点组成的丰富回归模型库。这提供了一个定量基线，将决策树与其他不太容易解释的方法进行比较和对比，例如投影追踪回归和神经网络，这些方法针对相似的模型形式。与普遍的看法相反，一个人并不总是以准确的方式权衡解释性。具体而言，我们表明，在适当的条件下，斜决策树具有与同一回归模型库的神经网络相似的预测准确性。为了解决在每个决策节点上找到最佳分裂超平面的组合复杂性，我们提出的理论框架可以容纳文献中的许多现有计算工具。我们的结果依赖于（可以说是令人惊讶的）连接（递归自适应分区与凸优化问题的顺序贪婪近似算法（例如，正交贪婪算法），这可能具有独立的理论利益。使用我们的理论和方法，我们还研究倾斜的随机森林。

We develop a theoretical framework for the analysis of oblique decision trees, where the splits at each decision node occur at linear combinations of the covariates (as opposed to conventional tree constructions that force axis-aligned splits involving only a single covariate). While this methodology has garnered significant attention from the computer science and optimization communities since the mid-80s, the advantages they offer over their axis-aligned counterparts remain only empirically justified, and explanations for their success are largely based on heuristics. Filling this long-standing gap between theory and practice, we show that oblique regression trees (constructed by recursively minimizing squared error) satisfy a type of oracle inequality and can adapt to a rich library of regression models consisting of linear combinations of ridge functions and their limit points. This provides a quantitative baseline to compare and contrast decision trees with other less interpretable methods, such as projection pursuit regression and neural networks, which target similar model forms. Contrary to popular belief, one need not always trade-off interpretability with accuracy. Specifically, we show that, under suitable conditions, oblique decision trees achieve similar predictive accuracy as neural networks for the same library of regression models. To address the combinatorial complexity of finding the optimal splitting hyperplane at each decision node, our proposed theoretical framework can accommodate many existing computational tools in the literature. Our results rely on (arguably surprising) connections between recursive adaptive partitioning and sequential greedy approximation algorithms for convex optimization problems (e.g., orthogonal greedy algorithms), which may be of independent theoretical interest. Using our theory and methods, we also study oblique random forests.

下载PDF全文

下载文献需遵守相关版权规定

论文标题