DS1 spectrogram: S^3 -- Semantic Signal Separation

S^3 -- Semantic Signal Separation

June 13, 20242406.09556

Authors

Márton Kardos,Jan Kostkan,Arnault-Quentin Vermillet,Kristoffer Nielbo,Kenneth Enevoldsen

Abstract

Topic models are useful tools for discovering latent semantic structures in large textual corpora. Recent efforts have been oriented at incorporating contextual representations in topic modeling and have been shown to outperform classical topic models.

These approaches are typically slow, volatile, and require heavy preprocessing for optimal results. We present Semantic Signal Separation ($S^3$), a theory-driven topic modeling approach in neural embedding spaces.

$S^3$ conceptualizes topics as independent axes of semantic space and uncovers these by decomposing contextualized document embeddings using Independent Component Analysis. Our approach provides diverse and highly coherent topics, requires no preprocessing, and is demonstrated to be the fastest contextual topic model, being, on average, 4.5x faster than the runner-up BERTopic.

We offer an implementation of $S^3$, and all contextual baselines, in the Turftopic Python package.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.