ProContext：探索用于跟踪的渐进上下文变压器

论文标题

ProContext：探索用于跟踪的渐进上下文变压器

ProContEXT: Exploring Progressive Context Transformer for Tracking

论文作者

Lan, Jin-Peng, Cheng, Zhi-Qi, He, Jun-Yan, Li, Chenyang, Luo, Bin, Bao, Xu, Xiang, Wangmeng, Geng, Yifeng, Xie, Xuansong

论文摘要

现有的Visual对象跟踪（FOT）仅将目标区域作为模板作为模板。这会导致跟踪在快速变化和拥挤的场景中不可避免地失败，因为它无法解释帧之间对象外观的变化。为此，我们用渐进式上下文编码变压器跟踪器（Procontext）对跟踪框架进行了修改，后者一致利用空间和时间上下文以预测对象运动轨迹。具体而言，Procontext利用上下文感知的自我发项模块编码空间和时间上下文，完善和更新多尺度的静态和动态模板以逐步准确地跟踪。它探讨了空间和时间上下文之间的互补性，从而为基于变压器的跟踪器提供了一种新的途径。此外，ProContext修改了令牌修剪技术，以降低计算复杂性。在诸如GOT-10K和TrackingNet之类的流行基准数据集上进行的广泛实验表明，所提出的Procontext实现了最先进的性能。

Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template. This causes tracking to inevitably fail in fast-changing and crowded scenes, as it cannot account for changes in object appearance between frames. To this end, we revamped the tracking framework with Progressive Context Encoding Transformer Tracker (ProContEXT), which coherently exploits spatial and temporal contexts to predict object motion trajectories. Specifically, ProContEXT leverages a context-aware self-attention module to encode the spatial and temporal context, refining and updating the multi-scale static and dynamic templates to progressively perform accurately tracking. It explores the complementary between spatial and temporal context, raising a new pathway to multi-context modeling for transformer-based trackers. In addition, ProContEXT revised the token pruning technique to reduce computational complexity. Extensive experiments on popular benchmark datasets such as GOT-10k and TrackingNet demonstrate that the proposed ProContEXT achieves state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题