DS1 spectrogram: Automated Evaluation of Large Vision-Language Models on Self-driving
  Corner Cases

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

2404.10595

Authors

Lanqing Hong,Meng Tian,Xinhai Zhao,Dit-Yan Yeung,Yanze Li

Abstract

Large Vision-Language Models (LVLMs) have received widespread attention for advancing the interpretable self-driving. Existing evaluations of LVLMs primarily focus on multi-faceted capabilities in natural circumstances, lacking automated and quantifiable assessment for self-driving, let alone the severe road corner cases.

In this work, we propose CODA-LM, the very first benchmark for the automatic evaluation of LVLMs for self-driving corner cases. We adopt a hierarchical data structure and prompt powerful LVLMs to analyze complex driving scenes and generate high-quality pre-annotations for the human annotators, while for LVLM evaluation, we show that using the text-only large language models (LLMs) as judges reveals even better alignment with human preferences than the LVLM judges.

Moreover, with our CODA-LM, we build CODA-VLM, a new driving LVLM surpassing all open-sourced counterparts on CODA-LM. Our CODA-VLM performs comparably with GPT-4V, even surpassing GPT-4V by +21.42% on the regional perception task.

We hope CODA-LM can become the catalyst to promote interpretable self-driving empowered by LVLMs.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • takara.ai
  • Custom AI and machine learning from the Frontier Research Team.
  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.