通过标签平滑进行动作预期的知识蒸馏

论文标题

通过标签平滑进行动作预期的知识蒸馏

Knowledge Distillation for Action Anticipation via Label Smoothing

论文作者

Camporese, Guglielmo, Coscia, Pasquale, Furnari, Antonino, Farinella, Giovanni Maria, Ballan, Lamberto

论文摘要

人类从视觉观察和非语言提示中预期不久的将来的能力对于开发需要与人互动的智能系统至关重要。几个研究领域，例如人类机器人互动（HRI），辅助生活或自主驾驶需要预见未来的事件，以避免崩溃或帮助人们。以自我为中心的场景是经典的例子，由于其众多应用，因此在其中应用了行动预期。这种具有挑战性的任务需要捕获和建模域的隐藏结构，以减少预测不确定性。由于将来可能会出现多个动作，因此我们将行动预期视为一个多标签问题，缺少标签扩展了标签平滑的概念。这个想法类似于知识蒸馏过程，因为在培训过程中将有用的信息注入了模型。我们基于长期记忆（LSTM）网络实施一个多模式框架，以总结过去的观察结果并在不同的时间步骤中做出预测。我们分别对包括2500多个动作类别的Epic-Kitchens和Egtea凝视+数据集进行了广泛的实验。实验表明，标签平滑系统可以系统地提高最先进模型以进行动作预期。

Human capability to anticipate near future from visual observations and non-verbal cues is essential for developing intelligent systems that need to interact with people. Several research areas, such as human-robot interaction (HRI), assisted living or autonomous driving need to foresee future events to avoid crashes or help people. Egocentric scenarios are classic examples where action anticipation is applied due to their numerous applications. Such challenging task demands to capture and model domain's hidden structure to reduce prediction uncertainty. Since multiple actions may equally occur in the future, we treat action anticipation as a multi-label problem with missing labels extending the concept of label smoothing. This idea resembles the knowledge distillation process since useful information is injected into the model during training. We implement a multi-modal framework based on long short-term memory (LSTM) networks to summarize past observations and make predictions at different time steps. We perform extensive experiments on EPIC-Kitchens and EGTEA Gaze+ datasets including more than 2500 and 100 action classes, respectively. The experiments show that label smoothing systematically improves performance of state-of-the-art models for action anticipation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题