DS1 spectrogram: Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

March 6, 20262603.05899

Authors

Schrasing Tong,Antoine Salaun,Vincent Yuan,Annabel Adeyeri,Lalana Kagal

Abstract

Ensuring fairness in image classification prevents models from perpetuating and amplifying bias. Concept bottleneck models (CBMs) map images to high-level, human-interpretable concepts before making predictions via a sparse, one-layer classifier.

This structure enhances interpretability and, in theory, supports fairness by masking sensitive attribute proxies such as facial features. However, CBM concepts have been known to leak information unrelated to concept semantics and early results reveal only marginal reductions in gender bias on datasets like ImSitu.

We propose three bias mitigation techniques to improve fairness in CBMs: 1. Decreasing information leakage using a top-k concept filter, 2.

Removing biased concepts, and 3. Adversarial debiasing.

Our results outperform prior work in terms of fairness-performance tradeoffs, indicating that our debiased CBM provides a significant step towards fair and interpretable image classification.

Resources

Stay in the loop

Get tldr.takara.ai to Your Email, Everyday.

tldr.takara.aiHome·Daily at 6am UTC·© 2026 takara.ai Ltd

Content is sourced from third-party publications.