论文标题

改进的图像和视频识别的残留网络

Improved Residual Networks for Image and Video Recognition

论文作者

Duta, Ionut Cosmin, Liu, Li, Zhu, Fan, Shao, Ling

论文摘要

残留网络(RESNETS)代表了一种强大的卷积神经网络(CNN)体系结构,该类型被广泛采用并用于各种任务。在这项工作中,我们提出了改进的Resnets版本。我们提出的改进介绍了Resnet的所有三个主要组成部分:通过网络层,剩余构建块和投影快捷方式的信息流。我们能够表现出对基线的准确性和学习融合的一致提高。例如,在Imagenet数据集上,使用具有50层的Resnet,为了获得TOP-1的精度,我们可以在一种设置中报告比基线的1.19%改善,而另一种设置则提高了2%。重要的是,这些改进是在不增加模型复杂性的情况下获得的。我们提出的方法使我们能够训练极深的网络,而基线显示出严重的优化问题。我们报告了六个数据集的三个任务:图像分类(ImageNet,CIFAR-10和CIFAR-100),对象检测(COCO)和视频动作识别(Kinetics-400和Sothings-Some-Something-V2)。在深度学习时代,我们为CNN的深度建立了一个新的里程碑。我们成功地在Imagenet数据集上成功训练了404层的Deep CNN,以及CIFAR-10和CIFAR-100上的3002层网络,而基线无法在如此极端的深度下收敛。代码可用:https://github.com/iduta/iresnet

Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture, widely adopted and used in various tasks. In this work we propose an improved version of ResNets. Our proposed improvements address all three main components of a ResNet: the flow of information through the network layers, the residual building block, and the projection shortcut. We are able to show consistent improvements in accuracy and learning convergence over the baseline. For instance, on ImageNet dataset, using the ResNet with 50 layers, for top-1 accuracy we can report a 1.19% improvement over the baseline in one setting and around 2% boost in another. Importantly, these improvements are obtained without increasing the model complexity. Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues. We report results on three tasks over six datasets: image classification (ImageNet, CIFAR-10 and CIFAR-100), object detection (COCO) and video action recognition (Kinetics-400 and Something-Something-v2). In the deep learning era, we establish a new milestone for the depth of a CNN. We successfully train a 404-layer deep CNN on the ImageNet dataset and a 3002-layer network on CIFAR-10 and CIFAR-100, while the baseline is not able to converge at such extreme depths. Code is available at: https://github.com/iduta/iresnet

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源