论文标题
关于端到端视频学习的批发归一化的陷阱:手术工作流程分析的研究
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis
论文作者
论文摘要
批处理标准化(BN)的独特属性取决于批处理中的其他样本,这会在几个任务中引起问题,包括序列建模。然而,尽管BN在CNN(卷积神经网络)中无处不在,但与BN相关的问题几乎没有用于长时间的视频理解。尤其是在手术工作流程分析中,缺乏预处理的功能提取器导致了复杂的多阶段训练管道,对BN问题的认识有限可能已经掩盖了训练CNN的好处,而暂时性模型则最终终止。在本文中,我们在视频学习中分析了BN的陷阱,包括针对在线任务的问题,例如预期的“作弊”效果。我们观察到,国阵的特性为端到端学习造成了主要障碍。但是,使用无BN的骨架,即使是简单的CNN-LSTMS在三个外科手术工作流基准基准上击败了最新的{\ Color {\ color {\ colorReRevtwo}},也可以利用足够的端到端训练策略,从而最大程度地提高时间上下文。我们得出的结论是,对BN陷阱的认识对于手术任务中的有效端到端学习至关重要。通过在天然视频数据集上复制结果,我们希望我们的见解也将使视频学习的其他领域受益。代码可在:\ url {https://gitlab.com/nct_tso_public/pitfalls_bn}
Batch Normalization's (BN) unique property of depending on other samples in a batch is known to cause problems in several tasks, including sequence modeling. Yet, BN-related issues are hardly studied for long video understanding, despite the ubiquitous use of BN in CNNs (Convolutional Neural Networks) for feature extraction. Especially in surgical workflow analysis, where the lack of pretrained feature extractors has led to complex, multi-stage training pipelines, limited awareness of BN issues may have hidden the benefits of training CNNs and temporal models end to end. In this paper, we analyze pitfalls of BN in video learning, including issues specific to online tasks such as a 'cheating' effect in anticipation. We observe that BN's properties create major obstacles for end-to-end learning. However, using BN-free backbones, even simple CNN-LSTMs beat the state of the art {\color{\colorrevtwo}on three surgical workflow benchmarks} by utilizing adequate end-to-end training strategies which maximize temporal context. We conclude that awareness of BN's pitfalls is crucial for effective end-to-end learning in surgical tasks. By reproducing results on natural-video datasets, we hope our insights will benefit other areas of video learning as well. Code is available at: \url{https://gitlab.com/nct_tso_public/pitfalls_bn}