DS1 spectrogram: SwarmFormer: Local-Global Hierarchical Attention via Swarmed Token Representations

SwarmFormer: Local-Global Hierarchical Attention via Swarmed Token Representations

2501.342874

Authors

Jordan Legg,Mikus Sturmanis

Abstract

Standard Transformers rely on O(N^2) attention, which becomes prohibitive for large N. Although local or sparse approximations reduce complexity, they may limit global context.

We propose SwarmFormer, a hierarchical local-global approach that draws inspiration from swarm intelligence. Each layer combines repeated local (swarm-like) token neighbor updates with cluster-based global attention among a smaller set of representatives.

The local aggregator enables decentralized multi-hop propagation, while the cluster-level attention captures global context without full O(N^2) overhead. Experimental results on text classification tasks show that SwarmFormer achieves strong accuracy with up to 90% fewer parameters than baseline Transformers, demonstrating efficient scalability to longer sequences.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.