DS1 spectrogram: How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection

How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection

February 18, 20262602.16343

Authors

Yixuan Xiao,Florian Lux,Alejandro Pérez-González-de-Martos,Ngoc Thang Vu

Abstract

Since Text-to-Speech systems typically don't produce waveforms directly, recent spoof detection studies use resynthesized waveforms from vocoders and neural audio codecs to simulate an attacker. Unlike vocoders, which are specifically designed for speech synthesis, neural audio codecs were originally developed for compressing audio for storage and transmission.

However, their ability to discretize speech also sparked interest in language-modeling-based speech synthesis. Owing to this dual functionality, codec resynthesized data may be labeled as either bonafide or spoof.

So far, very little research has addressed this issue. In this study, we present a challenging extension of the ASVspoof 5 dataset constructed for this purpose.

We examine how different labeling choices affect detection performance and provide insights into labeling strategies.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.