AMCF-NET: ADAPTIVE MULTI-SCALE CROSS-MODAL FUSION NETWORK FOR UAV-SATELLITE CROSS-VIEW LOCALIZATION

  • Van Quan Ngo Institute of Information Technology and Electronics, Academy of Military Science and Technology
  • Quang Tung Pham Institute of Information Technology and Electronics, Academy of Military Science and Technology
  • Chi Thanh Nguyen Institute of Information Technology and Electronics, Academy of Military Science and Technology
Keywords: UAV localization, satellite images, cross-view matching, multi-scale fusion, adaptive feature learning

Abstract

Cross-view localization between Unmanned Aerial Vehicle (UAV) and satellite imagery
is crucial for autonomous navigation in GPS-denied environments. However, large domain
gaps, including viewpoint discrepancies, scale variations, and appearance differences — pose
significant challenges. In this paper, we propose the Adaptive Multi-scale Cross-modal Fusion
Network (AMCF-Net), a novel approach that effectively addresses these limitations through a
shared backbone architecture and adaptive fusion mechanisms. Unlike previous dual-backbone
approaches that process UAV and satellite images separately, our method employs a unified
FocalNet-Tiny backbone to extract cross-modal features, followed by a Spatially-adaptive Crossmodal
Feature Fusion (AMCF) module that dynamically combines multi-scale similarities
using learned adaptive weights. This shared representation learning enables better cross-modal
alignment and significantly reduces computational overhead. Comprehensive experiments on
the UL14 benchmark demonstrate that AMCF-Net achieves state-of-the-art performance, with a
Relative Distance Score (RDS) of 78.12% and meter-level accuracy of 27.25% at 3 m, 50.16%
at 5 m, 84.37% at 10 m, and finally 88.51% at 20 m. Ablation studies further validate the
effectiveness of the shared backbone and adaptive fusion mechanism, demonstrating significant
improvements over traditional separate processing approaches.

điểm /   đánh giá
Published
2026-01-12
Section
Bài viết