Autonomous driving paper index

RM2Occ: Re-Projection Multi-Task Multi-Sensor Fusion for Autonomous Driving 3D Object Detection and Occupancy Perception

2025-11-01 · IEEE transactions on intelligent transportation systems (Print)

BEV Perception 3D Object Detection LiDAR Perception Occupancy Prediction Sensor Fusion

autonomous drivingbev3d object detectionoccupancy predictionoccupancydepth estimationsemantic segmentationobject detectionlidarpoint cloudsensor fusionmulti-sensor fusion

One-line summary

Engineering notes

We propose R<inline-formula> <tex-math notation="LaTeX">$M^{2}$ </tex-math></inline-formula>Occ, the first 3D occupancy perception network that integrates multi-sensor fusion based on different sensor principles and achieves multi-task learning. Extensive experiments and ablation studies on the nuScenes dataset demonstrate that R<inline-formula> <tex-math notation="LaTeX">$M^{2}$ </tex-math></inline-formula>Occ significantly outperforms existing state-of-the-art methods, establishing a new paradigm for accurate and efficient multi-sensor fusion and multi-task perception in autonomous driving scenarios.

Chinese explanation / 中文解读

中文解读待补充：本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

Occupancy prediction plays a crucial role in supporting autonomous driving planning and decision-making. Existing methods typically rely on modular stacking and fusion techniques of object detection, semantic segmentation, and depth estimation to achieve 3D occupancy. However, they fail to deeply explore the transformation relationships between 2D and 3D spaces and to efficiently fuse the different characteristics of multi-source sensors. We propose R<inline-formula> <tex-math notation="LaTeX">$M^{2}$ </tex-math></inline-formula>Occ, the first 3D occupancy perception network that integrates multi-sensor fusion based on different sensor principles and achieves multi-task learning. To leverage the rich 2D semantic information captured by cameras and elevate it to the 3D domain, we begin by querying and populating predefined empty voxels with multi-view image features. Subsequently, we progressively fuse 3D LiDAR point clouds with these populated voxels through an unbalanced fusion strategy that effectively supplements missing information and suppresses noise. Leveraging IMU data and calibration parameters, we then re-project the enriched voxels back onto the 2D image plane according to camera coordinates, performing a secondary query using the semantic segmentation results to recover semantic details potentially lost due to radar fusion limitations and incomplete voxel querying. Finally, supported by a multi-task detection head, R<inline-formula> <tex-math notation="LaTeX">$M^{2}$ </tex-math></inline-formula>Occ simultaneously accomplishes 3D object detection, semantic segmentation, Bird’s Eye View (BEV) detection, and full-scene grid occupancy prediction, enabling comprehensive multi-task output. Extensive experiments and ablation studies on the nuScenes dataset demonstrate that R<inline-formula> <tex-math notation="LaTeX">$M^{2}$ </tex-math></inline-formula>Occ significantly outperforms existing state-of-the-art methods, establishing a new paradigm for accurate and efficient multi-sensor fusion and multi-task perception in autonomous driving scenarios.

5.5Engineering value

8.0Research novelty

5.0Business relevance

Links and sources

Official / arXiv page

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.