Overview

Highlights

01

Mamba-based Pose Network

ConvMamSPN combines a ConvNeXt visual backbone with a Mamba-based hierarchical decoder for accurate spacecraft keypoint heatmap prediction.

02

Directional Context Modeling

CrossScanMamba enhances long-range spatial dependency modeling by scanning feature maps across complementary directions.

03

Geometry-guided Recovery

Predicted 2D keypoints are combined with predefined 3D keypoints, and the 6D pose is recovered through PnP-based geometric solving.

04

Reliable Pseudo Labels

UGGP-ST filters pseudo labels using heatmap uncertainty and PnP reprojection consistency to reduce noisy target-domain supervision.

Paper Introduction

Abstract

Spacecraft 6D pose estimation is a fundamental task for non-cooperative space target perception, autonomous rendezvous, and on-orbit servicing. However, spacecraft images often suffer from weak textures, large scale variation, complex illumination, and significant synthetic-to-real domain gaps. To address these challenges, we propose ConvMamSPN, a Mamba-based keypoint pose estimation network, and UGGP-ST, an uncertainty-guided geometric pseudo-label self-training framework.

ConvMamSPN predicts spacecraft keypoint heatmaps using a ConvNeXt backbone and a Mamba-based hierarchical decoder. The final 6D pose is recovered using geometric PnP. UGGP-ST further improves target-domain robustness by selecting reliable pseudo labels through keypoint uncertainty estimation and PnP reprojection consistency, enabling effective teacher-student self-training on unlabeled real-domain images.

Method

ConvMamSPN and UGGP-ST

The framework integrates network-based keypoint localization, geometry-based pose recovery, and uncertainty-guided target-domain adaptation.

ConvMamSPN

ConvMamSPN is an encoder-decoder network for monocular spacecraft 6D pose estimation. It extracts hierarchical visual features with a ConvNeXt backbone and refines multi-scale decoder features with CrossScanMamba and MambaResBlock modules. The network outputs keypoint heatmaps, from which high-confidence 2D keypoints are selected for geometric PnP pose recovery.

  • ConvNeXt backbone for hierarchical visual representation.
  • Mamba-based decoder for global-local feature refinement.
  • Top-M keypoint selection for robust geometric solving.
  • EPnP / RANSAC PnP for interpretable 6D pose recovery.

UGGP-ST

UGGP-ST addresses the synthetic-to-real domain gap by generating and selecting reliable pseudo labels from unlabeled target-domain images. Instead of relying only on heatmap peak confidence, it jointly considers heatmap uncertainty and PnP reprojection consistency, reducing error accumulation during teacher-student self-training.

  • Peak confidence, entropy, and local variance uncertainty cues.
  • Geometric validation based on PnP consensus and reprojection error.
  • Teacher-student self-training on Lightbox and Sunlamp domains.
  • Improved robustness under illumination and background shifts.
ConvMamSPN architecture
ConvMamSPN architecture. A ConvNeXt backbone and Mamba-based hierarchical decoder predict spacecraft keypoint heatmaps for PnP-based pose recovery.
UGGP-ST pseudo-label self-training pipeline
UGGP-ST pipeline. Uncertainty and geometric consistency are jointly used to select reliable pseudo labels for target-domain self-training.
Qualitative Visualization

Four-domain Result Carousel

Qualitative predictions are shown across SPEED, SPEED+ Synthetic, Lightbox, and Sunlamp domains. Each card independently rotates examples to illustrate robustness under synthetic-to-real appearance shifts.

Experiments

Results

Comparison with state-of-the-art methods on the SPEED+ dataset. Only Ascore is shown for each domain for a compact project-page display.
Model Params(M) ↓ FLOPs(G) ↓ Lightbox
Ascore
Sunlamp
Ascore
Synthetic
Ascore
VPU 190.07 273.36 0.1014 0.0612 0.039
SPNv2 56.92 142.54 0.122 0.198
laval1302 0.1627 0.0545 0.04
KPN 0.810 1.320
SPTN 0.098 0.156
TSTF 0.095 0.197
PVSPE 73.09 104.12 0.101 0.178
PVSAR 30.60 0.076 0.112
SPNv3-S 22.70 0.064 0.088
SPNv3-M 39.60 0.056 0.082
SPNv3-B 86.30 0.047 0.074
ConvMamSPN + UGGP-ST Ours 13.97 23.87 0.0297 0.0465 0.0121

denotes results reported with an ensemble of heatmap predictions from models trained in three independent sessions, following the original paper. Best results are shown in bold, and second-best results are underlined.

Citation

BibTeX

@article{TODO,
  title   = {TODO},
  author  = {TODO},
  journal = {TODO},
  year    = {2026}
}