DTTDNet: Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset

Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset

Zixun Huang*, Keling Yao*, Seth Z. Zhao,
Chuanyu Pan, Allen Y. Yang^†

▶ UC Berkeley ▶ CMU ▶ UCLA

CVPR Workshop MAI, 2025

^*Indicates Equal Contribution; ^† Indicates Corresponding Author

Abstract

Robust 6DoF pose estimation with mobile devices is the foundation for applications in robotics, augmented reality, and digital twin localization. In this paper, we extensively investigate the robustness of existing RGBD-based 6DoF pose estimation methods against varying levels of depth sensor noise. We highlight that existing 6DoF pose estimation methods suffer significant performance discrepancies due to depth measurement inaccuracies. In response to the robustness issue, we present a simple and effective transformer-based 6DoF pose estimation approach called DTTDNet, featuring a novel geometric feature filtering module and a Chamfer distance loss for training. Moreover, we advance the field of robust 6DoF pose estimation and introduce a new dataset – Digital Twin Tracking Dataset Mobile (DTTDMobile), tailored for digital twin object tracking with noisy depth data from the mobile RGBD sensor suite of the Apple iPhone 14 Pro. Extensive experiments demonstrate that DTTDNet significantly outperforms state-of-the-art methods at least 4.32, up to 60.74 points in ADD metrics on the DTTD-Mobile. More importantly, our approach exhibits superior robustness to varying levels of measurement noise, setting a new benchmark for the robustness to noise measurements.

BibTeX

@inproceedings{huang2025robust, title={Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset}, author={Huang, Zixun and Yao, Keling and Zhao, Zhihao and Pan, Chuanyu and Yang, Allen}, booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference}, pages={1848--1857}, year={2025} }

Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset

Abstract

DTTD-Mobile Dataset

Left: Setup of our data acquisition pipeline. Right: 3D models of the 18 objects in DTTD-Mobile.

Sample visualizations of our dataset. First row: Annotations for 3D bounding boxes. Second row: Corresponding semantic segmentation labels. Third row: Zoomed-in LiDAR depth visualizations.

Visualization of an iPhone LiDAR depth scene that shows distortion and long-tail non-Gaussian noise (highlighted inside the red box). (a) Front view. (b) Left view. (c) Right view.

Features and statistics of different datasets.

DTTDNet

Experiments Results

Qualitative evaluation of different methods. To further validate our approach, we provide visual evidence of our model's effectiveness in challenging occlusion scenarios and varying lighting conditions, where other models' predictions fail but ours remain reliable.

Poster

BibTeX