论文标题

决策树学习和测试的超多个单位下限

Superpolynomial Lower Bounds for Decision Tree Learning and Testing

论文作者

Koch, Caleb, Strassle, Carmen, Tan, Li-Yang

论文摘要

我们为决策树优化问题建立了新的硬度结果,增加了1976年的Hyafil and Divest的一系列工作。在随机的ETH下,我们证明了两个基本问题的超级级别下限:给定函数$ f $的明确表示,并为分布$ \ nath dectibal $ fors $ f. $ f.并确定在$ \ mathcal {d} $下的$ f $的小决策树近似器。 我们的结果暗示了用于无分配PAC学习和对决策树测试的新的下限,该设置仅限制了对$ f $和$ \ MATHCAL {D} $的限制访问权限。 Specifically, we show: $n$-variable size-$s$ decision trees cannot be properly PAC learned in time $n^{\tilde{O}(\log\log s)}$, and depth-$d$ decision trees cannot be tested in time $\exp(d^{\,O(1)})$.对于学习,以前的最佳下限仅排除了$ \ text {poly}(n)$ - 时间算法(Alekhnovich,Braverman,Feldman,Klivans和Pitassi,2009年)。为了进行测试,最近的工作在$ f $是随机的环境中提供了类似的但无与伦比的界限,而$ \ Mathcal {d} $是非XPLICIT(Blais,Ferreira Pinto Jr.和Harms,2021年)。假设对设定封面的硬度有一个合理的猜想,我们表明,我们的下限可以将学习树提高到$ n^{ω(\ log s)} $,与$ n^{o(\ log s)$的最著名上限相匹配,因为Ehrenfeucht和Haussler(1989)。 我们在一个统一的框架内获得结果,该框架利用了两条工作的最新进展:设定覆盖和XOR引理的不合适性可用于查询复杂性。我们的框架具有多功能性,可为相关概念类别(例如Juntas和DNF公式)产生结果。

We establish new hardness results for decision tree optimization problems, adding to a line of work that dates back to Hyafil and Rivest in 1976. We prove, under randomized ETH, superpolynomial lower bounds for two basic problems: given an explicit representation of a function $f$ and a generator for a distribution $\mathcal{D}$, construct a small decision tree approximator for $f$ under $\mathcal{D}$, and decide if there is a small decision tree approximator for $f$ under $\mathcal{D}$. Our results imply new lower bounds for distribution-free PAC learning and testing of decision trees, settings in which the algorithm only has restricted access to $f$ and $\mathcal{D}$. Specifically, we show: $n$-variable size-$s$ decision trees cannot be properly PAC learned in time $n^{\tilde{O}(\log\log s)}$, and depth-$d$ decision trees cannot be tested in time $\exp(d^{\,O(1)})$. For learning, the previous best lower bound only ruled out $\text{poly}(n)$-time algorithms (Alekhnovich, Braverman, Feldman, Klivans, and Pitassi, 2009). For testing, recent work gives similar though incomparable bounds in the setting where $f$ is random and $\mathcal{D}$ is nonexplicit (Blais, Ferreira Pinto Jr., and Harms, 2021). Assuming a plausible conjecture on the hardness of Set-Cover, we show our lower bound for learning decision trees can be improved to $n^{Ω(\log s)}$, matching the best known upper bound of $n^{O(\log s)}$ due to Ehrenfeucht and Haussler (1989). We obtain our results within a unified framework that leverages recent progress in two lines of work: the inapproximability of Set-Cover and XOR lemmas for query complexity. Our framework is versatile and yields results for related concept classes such as juntas and DNF formulas.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源