上下文增强的立体变压器

论文标题

上下文增强的立体变压器

Context-Enhanced Stereo Transformer

论文作者

Guo, Weiyu, Li, Zhaoshuo, Yang, Yongkui, Wang, Zheng, Taylor, Russell H., Unberath, Mathias, Yuille, Alan, Li, Yingwei

论文摘要

立体声深度估计对于计算机视觉研究引起了极大的兴趣。但是，现有方法在危险区域（例如大统一区域）中努力概括和可靠地预测。为了克服这些局限性，我们提出了上下文增强的路径（CEP）。 CEP通过捕获远程全球信息来改善现有解决方案中常见失败案例的概括和鲁棒性。我们通过将CEP插入最新的立体声深度估计方法立体变压器来构建立体声深度估计模型，上下文增强了立体变压器（CSTR）。在不同的公共数据集上检查了CSTR，例如场景流，2014年，Kitti-2015和MPI-Sintel。我们发现CSTR优于先验方法的方法很大。例如，在零击综合到现实设置中，CSTR的表现优于Middlebury-2014数据集中最佳竞争方法的最佳方法。我们的广泛实验表明，远程信息对于立体声匹配任务至关重要，CEP成功捕获了此类信息。

Stereo depth estimation is of great interest for computer vision research. However, existing methods struggles to generalize and predict reliably in hazardous regions, such as large uniform regions. To overcome these limitations, we propose Context Enhanced Path (CEP). CEP improves the generalization and robustness against common failure cases in existing solutions by capturing the long-range global information. We construct our stereo depth estimation model, Context Enhanced Stereo Transformer (CSTR), by plugging CEP into the state-of-the-art stereo depth estimation method Stereo Transformer. CSTR is examined on distinct public datasets, such as Scene Flow, Middlebury-2014, KITTI-2015, and MPI-Sintel. We find CSTR outperforms prior approaches by a large margin. For example, in the zero-shot synthetic-to-real setting, CSTR outperforms the best competing approaches on Middlebury-2014 dataset by 11%. Our extensive experiments demonstrate that the long-range information is critical for stereo matching task and CEP successfully captures such information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题