论文标题

BEV-SAN:通过切片注意网络进行准确的BEV 3D对象检测

BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks

论文作者

Chi, Xiaowei, Liu, Jiaming, Lu, Ming, Zhang, Rongyu, Wang, Zhaoqing, Guo, Yandong, Zhang, Shanghang

论文摘要

鸟眼视图(BEV)3D对象检测是一种至关重要的多视图技术,用于自主驾驶系统。最近,提出了许多作品,遵循类似的范式,包括三个基本组件,即相机功能提取,BEV功能构造和任务头。在这三个组件中,与2D任务相比,BEV功能构建是BEV特定的。现有方法将多视摄像头功能汇总到扁平的网格中,以构建BEV功能。但是,沿高度尺寸平坦的BEV空间无法强调不同高度的信息特征。例如,障碍物位于高度时,位于高度。在本文中,我们提出了一种名为BEV切片注意网络(BEV-SAN)的新方法,用于利用不同高度的内在特征。我们首先沿高度尺寸进行采样,以建立全球和本地BEV切片,而不是使BEV空间变平。然后,BEV切片的特征是从相机功能中汇总的,并通过注意机制合并。最后,我们融合了变压器合并的本地和全局BEV功能,以生成任务头的最终功能映射。本地BEV切片的目的是强调信息丰富的高度。 In order to find them, we further propose a LiDAR-guided sampling strategy to leverage the statistical distribution of LiDAR to determine the heights of local slices.与均匀的采样相比,激光雷达指导的采样可以确定更有用的高度。我们进行详细的实验以证明BEV-SAN的有效性。代码将发布。

Bird's-Eye-View (BEV) 3D Object Detection is a crucial multi-view technique for autonomous driving systems. Recently, plenty of works are proposed, following a similar paradigm consisting of three essential components, i.e., camera feature extraction, BEV feature construction, and task heads. Among the three components, BEV feature construction is BEV-specific compared with 2D tasks. Existing methods aggregate the multi-view camera features to the flattened grid in order to construct the BEV feature. However, flattening the BEV space along the height dimension fails to emphasize the informative features of different heights. For example, the barrier is located at a low height while the truck is located at a high height. In this paper, we propose a novel method named BEV Slice Attention Network (BEV-SAN) for exploiting the intrinsic characteristics of different heights. Instead of flattening the BEV space, we first sample along the height dimension to build the global and local BEV slices. Then, the features of BEV slices are aggregated from the camera features and merged by the attention mechanism. Finally, we fuse the merged local and global BEV features by a transformer to generate the final feature map for task heads. The purpose of local BEV slices is to emphasize informative heights. In order to find them, we further propose a LiDAR-guided sampling strategy to leverage the statistical distribution of LiDAR to determine the heights of local slices. Compared with uniform sampling, LiDAR-guided sampling can determine more informative heights. We conduct detailed experiments to demonstrate the effectiveness of BEV-SAN. Code will be released.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源