TDOcc: Exploit machine learning and big data in multi-view 3D occupancy prediction

Chun Shan, Jian Zeng, Hongming Liu, Chuixing Chen, Xiaojiang Du, Mohsen Guizani

Research output: Contribution to journalArticlepeer-review

Abstract

With the advancement of machine learning and big data technologies, BEV (Bird's Eye View)-based methodologies have recently achieved significant breakthroughs in multi-view 3D occupancy prediction tasks. However, BEV (Bird's Eye View)-centric 3D occupancy prediction continues to grapple with feature representation and annotation costs when applied to complex open environments. In order to surmount these issues and further propel the evolution of 3D occupancy tasks, this study introduces a novel framework termed TDOcc. By leveraging multi-camera imagery, TDOcc executes 3D semantic occupancy prediction by directly learning from unprocessed 3D spaces, thereby maximizing information retention. TDOcc presents two notable advantages: firstly, it utilizes dense occupancy labels, which not only facilitate robust dense occupancy inference but also enable comprehensive object estimation within the scene. Secondly, the framework synthesizes historical feature information by adeptly aligning past and present features through temporal cues, thereby bolstering the efficacy of the feature fusion module. Additionally, with a view to address the ill-posed nature inherent in camera-based 3D occupancy prediction, we incorporate an enhancement module that operates within the 3D feature space. This module has been meticulously crafted for the training phase to amplify the model's learning potential. Extensive experiments conducted on the widely recognized nuScenes dataset underscore the efficacy of our proposed approach. Compared to the most recent TPVFormer and OccFormer, our approach has achieved a significant improvement in mean Intersection over Union (mIoU) by 2.0 and 0.8 respectively, and has reached performance comparable to the state-of-the-art LiDAR-based methods.

Original languageEnglish
Article number107583
JournalFuture Generation Computer Systems
Volume164
DOIs
StatePublished - Mar 2025

Keywords

  • 3D occupancy prediction
  • Big data
  • Machine learning

Fingerprint

Dive into the research topics of 'TDOcc: Exploit machine learning and big data in multi-view 3D occupancy prediction'. Together they form a unique fingerprint.

Cite this