AMCF-NET: ADAPTIVE MULTI-SCALE CROSS-MODAL FUSION NETWORK FOR UAV-SATELLITE CROSS-VIEW LOCALIZATION

Van Quan Ngo; Quang Tung Pham; Chi Thanh Nguyen

Van Quan Ngo Institute of Information Technology and Electronics, Academy of Military Science and Technology
Quang Tung Pham Institute of Information Technology and Electronics, Academy of Military Science and Technology
Chi Thanh Nguyen Institute of Information Technology and Electronics, Academy of Military Science and Technology

Keywords: UAV localization, satellite images, cross-view matching, multi-scale fusion, adaptive feature learning

Abstract

Cross-view localization between Unmanned Aerial Vehicle (UAV) and satellite imagery
is crucial for autonomous navigation in GPS-denied environments. However, large domain
gaps, including viewpoint discrepancies, scale variations, and appearance differences — pose
significant challenges. In this paper, we propose the Adaptive Multi-scale Cross-modal Fusion
Network (AMCF-Net), a novel approach that effectively addresses these limitations through a
shared backbone architecture and adaptive fusion mechanisms. Unlike previous dual-backbone
approaches that process UAV and satellite images separately, our method employs a unified
FocalNet-Tiny backbone to extract cross-modal features, followed by a Spatially-adaptive Crossmodal
Feature Fusion (AMCF) module that dynamically combines multi-scale similarities
using learned adaptive weights. This shared representation learning enables better cross-modal
alignment and significantly reduces computational overhead. Comprehensive experiments on
the UL14 benchmark demonstrate that AMCF-Net achieves state-of-the-art performance, with a
Relative Distance Score (RDS) of 78.12% and meter-level accuracy of 27.25% at 3 m, 50.16%
at 5 m, 84.37% at 10 m, and finally 88.51% at 20 m. Ablation studies further validate the
effectiveness of the shared backbone and adaptive fusion mechanism, demonstrating significant
improvements over traditional separate processing approaches.

AMCF-NET: ADAPTIVE MULTI-SCALE CROSS-MODAL FUSION NETWORK FOR UAV-SATELLITE CROSS-VIEW LOCALIZATION

Abstract

BỘ KHOA HỌC VÀ CÔNG NGHỆ - MINISTRY OF SCIENCE AND TECHNOLOGY OF VIETNAM

CỤC THÔNG TIN, THỐNG KÊ - NATIONAL AGENCY FOR SCIENCE AND TECHNOLOGY INFORMATION AND STATISTICS