决策树学习和测试的超多个单位下限

论文标题

决策树学习和测试的超多个单位下限

Superpolynomial Lower Bounds for Decision Tree Learning and Testing

论文作者

Koch, Caleb, Strassle, Carmen, Tan, Li-Yang

论文摘要

我们为决策树优化问题建立了新的硬度结果，增加了1976年的Hyafil and Divest的一系列工作。在随机的ETH下，我们证明了两个基本问题的超级级别下限：给定函数$ f $的明确表示，并为分布$ \ nath dectibal $ fors $ f. $ f.并确定在$ \ mathcal {d} $下的$ f $的小决策树近似器。我们的结果暗示了用于无分配PAC学习和对决策树测试的新的下限，该设置仅限制了对$ f $和$ \ MATHCAL {D} $的限制访问权限。 Specifically, we show: $n$-variable size-$s$ decision trees cannot be properly PAC learned in time $n^{\tilde{O}(\log\log s)}$, and depth-$d$ decision trees cannot be tested in time $\exp(d^{\,O(1)})$.对于学习，以前的最佳下限仅排除了$ \ text {poly}（n）$ - 时间算法（Alekhnovich，Braverman，Feldman，Klivans和Pitassi，2009年）。为了进行测试，最近的工作在$ f $是随机的环境中提供了类似的但无与伦比的界限，而$ \ Mathcal {d} $是非XPLICIT（Blais，Ferreira Pinto Jr.和Harms，2021年）。假设对设定封面的硬度有一个合理的猜想，我们表明，我们的下限可以将学习树提高到$ n^{ω（\ log s）} $，与$ n^{o（\ log s）$的最著名上限相匹配，因为Ehrenfeucht和Haussler（1989）。我们在一个统一的框架内获得结果，该框架利用了两条工作的最新进展：设定覆盖和XOR引理的不合适性可用于查询复杂性。我们的框架具有多功能性，可为相关概念类别（例如Juntas和DNF公式）产生结果。

We establish new hardness results for decision tree optimization problems, adding to a line of work that dates back to Hyafil and Rivest in 1976. We prove, under randomized ETH, superpolynomial lower bounds for two basic problems: given an explicit representation of a function $f$ and a generator for a distribution $\mathcal{D}$, construct a small decision tree approximator for $f$ under $\mathcal{D}$, and decide if there is a small decision tree approximator for $f$ under $\mathcal{D}$. Our results imply new lower bounds for distribution-free PAC learning and testing of decision trees, settings in which the algorithm only has restricted access to $f$ and $\mathcal{D}$. Specifically, we show: $n$-variable size-$s$ decision trees cannot be properly PAC learned in time $n^{\tilde{O}(\log\log s)}$, and depth-$d$ decision trees cannot be tested in time $\exp(d^{\,O(1)})$. For learning, the previous best lower bound only ruled out $\text{poly}(n)$-time algorithms (Alekhnovich, Braverman, Feldman, Klivans, and Pitassi, 2009). For testing, recent work gives similar though incomparable bounds in the setting where $f$ is random and $\mathcal{D}$ is nonexplicit (Blais, Ferreira Pinto Jr., and Harms, 2021). Assuming a plausible conjecture on the hardness of Set-Cover, we show our lower bound for learning decision trees can be improved to $n^{Ω(\log s)}$, matching the best known upper bound of $n^{O(\log s)}$ due to Ehrenfeucht and Haussler (1989). We obtain our results within a unified framework that leverages recent progress in two lines of work: the inapproximability of Set-Cover and XOR lemmas for query complexity. Our framework is versatile and yields results for related concept classes such as juntas and DNF formulas.

下载PDF全文

下载文献需遵守相关版权规定

论文标题