DS1 spectrogram: The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning

The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning

January 20, 20262601.14127

Authors

Shumin Zhang,Renmiao Chen,Yida Lu,Chengwei Pan,Han Qiu

Abstract

As Multimodal Large Language Models (MLLMs) acquire stronger reasoning capabilities to handle complex, multi-image instructions, this advancement may pose new safety risks. We study this problem by introducing MIR-SafetyBench, the first benchmark focused on multi-image reasoning safety, which consists of 2,676 instances across a taxonomy of 9 multi-image relations.

Our extensive evaluations on 19 MLLMs reveal a troubling trend: models with more advanced multi-image reasoning can be more vulnerable on MIR-SafetyBench. Beyond attack success rates, we find that many responses labeled as safe are superficial, often driven by misunderstanding or evasive, non-committal replies.

We further observe that unsafe generations exhibit lower attention entropy than safe ones on average. This internal signature suggests a possible risk that models may over-focus on task solving while neglecting safety constraints.

Our code and data are available at https://github.com/thu-coai/MIR-SafetyBench.

Resources

Stay in the loop

Get tldr.takara.ai to Your Email, Everyday.

tldr.takara.aiHome·Daily at 6am UTC·© 2026 takara.ai Ltd

Content is sourced from third-party publications.