DS1 spectrogram: Many Voices, One Reward: Multi-Role Rubric Generation for LLM Judging and Reward Modeling

Many Voices, One Reward: Multi-Role Rubric Generation for LLM Judging and Reward Modeling

2607.01830

Authors

Dazhi Fu,Jiuding Yang,Yiwen Guo,Jicong Fan

Abstract

Reliable reward and preference signals are critical for evaluating and optimizing large language models on open-ended tasks. Rubric-based judges offer a transparent way to decompose such judgments into explicit evaluation criteria, but existing annotation-free rubric generators typically rely on a single generic evaluator.

As a result, they may overlook important dimensions of human preference, a failure mode we term dimensional blind spots. To address this limitation, we propose Multi-Role Rubric Generation (MRRG), a training-free and reference-free framework that elicits evaluation criteria from multiple complementary roles and consolidates them into an auditable rubric-based scorer.

This scorer can be used both to validate pairwise preferences and to provide rewards for GRPO-style Reinforcement Learning with Verifiable Rewards (RLVR). Experiments on preference validation benchmarks show that MRRG consistently outperforms single-role rubric generation baselines across multiple backbone models.

Further RLVR experiments demonstrate that MRRG yields a stronger reward signal for improving open-ended generation.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • takara.ai
  • Custom AI and machine learning from the Frontier Research Team.
  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.