DS1 spectrogram: Cross-Modal Navigation with Multi-Agent Reinforcement Learning

Cross-Modal Navigation with Multi-Agent Reinforcement Learning

May 7, 20262605.06595

Authors

Shuo Liu,Xinzichen Li,Christopher Amato

Abstract

Robust embodied navigation relies on complementary sensory cues. However, high-quality and well-aligned multi-modal data is often difficult to obtain in practice.

Training a monolithic model is also challenging as rich multi-modal inputs induce complex representations and substantially enlarge the policy space. Cross-modal collaboration among lightweight modality-specialized agents offers a scalable paradigm.

It enables flexible deployment and parallel execution, while preserving the strength of each modality. In this paper, we propose CRONA, a Multi-Agent Reinforcement Learning (MARL) framework for Cross-Modal Navigation. CRONA improves collaboration by leveraging control-relevant auxiliary beliefs and a centralized multi-modal critic with global state.

Experiments on visual-acoustic navigation tasks show that multi-agent methods significantly improve performance and efficiency over single-agent baselines. We find that homogeneous collaboration with limited modalities is sufficient for short-range navigation under salient cues; heterogeneous collaboration among agents with complementary modalities is generally efficient and effective; and navigation in large, complex environments requires both richer multi-modal perception and increased model capacity.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.