论文标题

美元

$Λ$-DARTS: Mitigating Performance Collapse by Harmonizing Operation Selection among Cells

论文作者

Movahedi, Sajad, Adabinejad, Melika, Imani, Ayyoob, Keshavarz, Arezou, Dehghani, Mostafa, Shakery, Azadeh, Araabi, Babak N.

论文摘要

可区分的神经体系结构搜索(DARTS)是神经体系结构搜索(NAS)的流行方法,它可以执行细胞搜索并利用连续的放松来通过基于梯度的优化来提高搜索效率。飞镖的主要缺点是性能崩溃,在搜索过程中,发现的建筑遭受了质量下降的模式。性能崩溃已成为研究的重要主题,许多方法试图通过正规化或对飞镖的基本变化来解决问题。但是,尚未分析用于飞镖细胞搜索的重量共享框架,并且尚未分析体系结构参数的收敛性。在本文中,我们提供了关于飞镖及其收敛点的详尽和新颖的理论和经验分析。我们表明,飞镖由于其重量分担框架而遭受特定的结构缺陷,该框架限制了飞镖与软磁功能的饱和点的收敛。这一点收敛性为选择最佳体系结构的输出而更接近输出,从而导致性能崩溃。然后,我们提出了两个新的正则化术语,旨在通过对齐层的梯度来协调操作选择,以防止性能崩溃。在六个不同的搜索空间和三个不同数据集上进行的实验结果表明,我们的方法($λ$ -DARTS)确实可以防止性能崩溃,从而为我们的理论分析和拟议的补救措施提供了理由。

Differentiable neural architecture search (DARTS) is a popular method for neural architecture search (NAS), which performs cell-search and utilizes continuous relaxation to improve the search efficiency via gradient-based optimization. The main shortcoming of DARTS is performance collapse, where the discovered architecture suffers from a pattern of declining quality during search. Performance collapse has become an important topic of research, with many methods trying to solve the issue through either regularization or fundamental changes to DARTS. However, the weight-sharing framework used for cell-search in DARTS and the convergence of architecture parameters has not been analyzed yet. In this paper, we provide a thorough and novel theoretical and empirical analysis on DARTS and its point of convergence. We show that DARTS suffers from a specific structural flaw due to its weight-sharing framework that limits the convergence of DARTS to saturation points of the softmax function. This point of convergence gives an unfair advantage to layers closer to the output in choosing the optimal architecture, causing performance collapse. We then propose two new regularization terms that aim to prevent performance collapse by harmonizing operation selection via aligning gradients of layers. Experimental results on six different search spaces and three different datasets show that our method ($Λ$-DARTS) does indeed prevent performance collapse, providing justification for our theoretical analysis and the proposed remedy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源