DS1 spectrogram: A Unified and Reproducible Experimentation Framework for Speech Understanding

A Unified and Reproducible Experimentation Framework for Speech Understanding

2605.30899

Authors

Jing Peng,Chenghao Wang,Hanqi Li,Xiaoyu Gu,Guanyu Chen

Abstract

Speech foundation models and Speech LLMs have advanced speech understanding, yet deployment-oriented model selection is hindered by non-comparable evaluations caused by mismatched post-processing, and by training results that are hard to reproduce across data scales and pipelines. We present SURE, a unified experimentation framework that standardizes prediction formats, normalization, and scoring.

SURE evaluates strong systems across paradigms, from conventional pipelines to Speech LLMs, on representative tasks under realistic acoustic and linguistic stressors. Beyond evaluation, SURE introduces an agent-assisted training conversion flow that maps paper and code into versioned, runnable training pipelines under a unified protocol on matched open-data subsets.

Overall, SURE improves comparability and reproducibility for deployment-oriented evaluation.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • takara.ai
  • Custom AI and machine learning from the Frontier Research Team.
  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.