论文标题
关于非平滑自动分化的复杂性
On the complexity of nonsmooth automatic differentiation
论文作者
论文摘要
使用保守梯度的概念,我们提供了一个简单的模型,以估算算法分化的向后和向前模式的计算成本,以换取广泛的非平滑程序。当使用具有局部Lipschitz半代数或可定义的基本功能的程序时,向后模式的间接复杂性与维度无关。这大大扩展了Baur-Strassen的平稳廉价梯度原则。我们通过通过标准激活和损失功能来建立保守梯度的快速反向传播结果来说明我们的结果。非平滑反向传播的廉价性与并发的前向方法形成鲜明对比,直到今天,这些方法一直依赖于维度最差的开销估计。我们提供进一步的结果,表明保守梯度的向后传播的优势。实际上,我们将计算大量定向衍生物的复杂性与矩阵乘法的复杂性联系在一起,并且我们表明,在函数的Clarke subdifentions中找到两个亚级别是NP-固定的问题。
Using the notion of conservative gradient, we provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. The overhead complexity of the backward mode turns out to be independent of the dimension when using programs with locally Lipschitz semi-algebraic or definable elementary functions. This considerably extends Baur-Strassen's smooth cheap gradient principle. We illustrate our results by establishing fast backpropagation results of conservative gradients through feedforward neural networks with standard activation and loss functions. Nonsmooth backpropagation's cheapness contrasts with concurrent forward approaches, which have, to this day, dimensional-dependent worst-case overhead estimates. We provide further results suggesting the superiority of backward propagation of conservative gradients. Indeed, we relate the complexity of computing a large number of directional derivatives to that of matrix multiplication, and we show that finding two subgradients in the Clarke subdifferential of a function is an NP-hard problem.