论文标题
多个粒度任务的嵌套学习
Nested Learning For Multi-Granular Tasks
论文作者
论文摘要
标准的深神经网络(DNN)通常以端到端的方式训练,以示对象识别,面部识别或角色识别等特定任务。这种特异性通常会导致过度自信的模型,这些模型概括为不是原始培训分布的样本。此外,这样的标准DNN不允许从异类注释的训练数据中利用信息,例如,可以在其中提供不同级别的粒度。此外,DNN不会同时产生不同级别的细节置信度的结果,它们通常是全部或全无的方法。为了应对这些挑战,我们介绍了嵌套学习的概念:如何获得输入的层次结构表示,以便可以先提取粗标签,并依次完善此表示,如果样本允许持续完善的预测,所有这些预测都具有相应的信心。我们通过创建一系列嵌套信息瓶颈来明确执行此行为。从信息理论的角度来看,查看嵌套学习的问题,我们设计了一个具有两个重要属性的网络拓扑。首先,强制执行一系列低维(嵌套)特征嵌入的序列。然后,我们展示了嵌套输出的显式组合如何提高预测的鲁棒性和准确性。 CIFAR-10,CIFAR-100,MNIST,Fashion-Mnist,DBPedia和PlantVillage的实验结果表明,嵌套学习的表现优于以标准端到端方式训练的相同网络。
Standard deep neural networks (DNNs) are commonly trained in an end-to-end fashion for specific tasks such as object recognition, face identification, or character recognition, among many examples. This specificity often leads to overconfident models that generalize poorly to samples that are not from the original training distribution. Moreover, such standard DNNs do not allow to leverage information from heterogeneously annotated training data, where for example, labels may be provided with different levels of granularity. Furthermore, DNNs do not produce results with simultaneous different levels of confidence for different levels of detail, they are most commonly an all or nothing approach. To address these challenges, we introduce the concept of nested learning: how to obtain a hierarchical representation of the input such that a coarse label can be extracted first, and sequentially refine this representation, if the sample permits, to obtain successively refined predictions, all of them with the corresponding confidence. We explicitly enforce this behavior by creating a sequence of nested information bottlenecks. Looking at the problem of nested learning from an information theory perspective, we design a network topology with two important properties. First, a sequence of low dimensional (nested) feature embeddings are enforced. Then we show how the explicit combination of nested outputs can improve both the robustness and the accuracy of finer predictions. Experimental results on Cifar-10, Cifar-100, MNIST, Fashion-MNIST, Dbpedia, and Plantvillage demonstrate that nested learning outperforms the same network trained in the standard end-to-end fashion.