DS1 spectrogram: ViTNT-FIQA: Training-Free Face Image Quality Assessment with Vision Transformers

ViTNT-FIQA: Training-Free Face Image Quality Assessment with Vision Transformers

January 9, 20262601.05741

Authors

Tahar Chettaoui,Jan Niklas Kolf,Marco Huber,Naser Damer,Fadi Boutros

Abstract

Face Image Quality Assessment (FIQA) is essential for reliable face recognition systems. Current approaches primarily exploit only final-layer representations, while training-free methods require multiple forward passes or backpropagation.

We propose ViTNT-FIQA, a training-free approach that measures the stability of patch embedding evolution across intermediate Vision Transformer (ViT) blocks. We demonstrate that high-quality face images exhibit stable feature refinement trajectories across blocks, while degraded images show erratic transformations.

Our method computes Euclidean distances between L2-normalized patch embeddings from consecutive transformer blocks and aggregates them into image-level quality scores. We empirically validate this correlation on a quality-labeled synthetic dataset with controlled degradation levels.

Unlike existing training-free approaches, ViTNT-FIQA requires only a single forward pass without backpropagation or architectural modifications. Through extensive evaluation on eight benchmarks (LFW, AgeDB-30, CFP-FP, CALFW, Adience, CPLFW, XQLFW, IJB-C), we show that ViTNT-FIQA achieves competitive performance with state-of-the-art methods while maintaining computational efficiency and immediate applicability to any pre-trained ViT-based face recognition model.

Resources

Stay in the loop

Get tldr.takara.ai to Your Email, Everyday.

tldr.takara.aiHome·Daily at 6am UTC·© 2026 takara.ai Ltd

Content is sourced from third-party publications.