多模式的新型动作外观，用于合成对日常生活活动的识别

论文标题

多模式的新型动作外观，用于合成对日常生活活动的识别

Multimodal Generation of Novel Action Appearances for Synthetic-to-Real Recognition of Activities of Daily Living

论文作者

Marinov, Zdravko, Schneider, David, Roitberg, Alina, Stiefelhagen, Rainer

论文摘要

在活动识别模型的现实应用应用中，域移动（例如外观变化）是一个关键挑战，范围从辅助机器人和智能家居到智能车辆的驾驶员观察。例如，虽然模拟是一种经济数据收集的绝佳方式，但合成到现实的域转移导致识别日常生活活动（ADL）时的准确性下降> 60％。我们应对这一挑战，并引入了一个活动域生成框架，该框架从视频培训数据中推论的不同现有活动模式（源域）中创建了新颖的ADL外观（新域）。我们的框架计算人体姿势，人体关节的热图和光流图，并将它们与原始的RGB视频一起使用，以了解源域的本质，以生成全新的ADL域。通过最大化现有源外观与生成的新颖外观之间的距离，同时确保活动的语义通过额外的分类损失保留，可以优化该模型。虽然源数据多模态在此设计中是一个重要的概念，但我们的设置不依赖于多传感器设置（即，仅从单个视频中推断出所有源模式。）然后将新创建的活动域集成到ADL分类网络的培训中，从而在模型中较小的模型中导致数据分布的变化少得多。对合成基准的SIMS4Action进行了广泛的实验，证明了域产生范式对跨域ADL识别的潜力，从而创造了新的最新结果。我们的代码可在https://github.com/zrrrrr1997/syn2real_dg上公开获取

Domain shifts, such as appearance changes, are a key challenge in real-world applications of activity recognition models, which range from assistive robotics and smart homes to driver observation in intelligent vehicles. For example, while simulations are an excellent way of economical data collection, a Synthetic-to-Real domain shift leads to a > 60% drop in accuracy when recognizing activities of Daily Living (ADLs). We tackle this challenge and introduce an activity domain generation framework which creates novel ADL appearances (novel domains) from different existing activity modalities (source domains) inferred from video training data. Our framework computes human poses, heatmaps of body joints, and optical flow maps and uses them alongside the original RGB videos to learn the essence of source domains in order to generate completely new ADL domains. The model is optimized by maximizing the distance between the existing source appearances and the generated novel appearances while ensuring that the semantics of an activity is preserved through an additional classification loss. While source data multimodality is an important concept in this design, our setup does not rely on multi-sensor setups, (i.e., all source modalities are inferred from a single video only.) The newly created activity domains are then integrated in the training of the ADL classification networks, resulting in models far less susceptible to changes in data distributions. Extensive experiments on the Synthetic-to-Real benchmark Sims4Action demonstrate the potential of the domain generation paradigm for cross-domain ADL recognition, setting new state-of-the-art results. Our code is publicly available at https://github.com/Zrrr1997/syn2real_DG

下载PDF全文

下载文献需遵守相关版权规定

论文标题