DS1 spectrogram: Chasing the Tail: Effective Rubric-based Reward Modeling for Large
  Language Model Post-Training

Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training

September 25, 20252509.21500

Authors

Lifeng Jin,Junkai Zhang,Lin Gui,Swarnashree Mysore Sathyendra,Yunzhong He

Abstract

Reinforcement fine-tuning (RFT) often suffers from reward over-optimization, where a policy model hacks the reward signals to achieve high scores while producing low-quality outputs. Our theoretical analysis shows that the key lies in reward misspecification at the high-reward tail: the inability to reliably distinguish Excellent responses from merely Great ones.

This motivate us to focus on the high-reward region. However, such tail examples are scarce under the base LLM.

While off-policy exemplars (e.g. from stronger models or rewrites) are easier to obtain, naively training on them yields a misspecified reward for the policy we aim to align.

To address this, we study rubric-based rewards. By design, rubrics can leverage off-policy examples while remaining insensitive to their artifacts.

To elicit rubrics that capture the high-reward tail, we highlight the importance of distinguishing among great and diverse responses, and introduce a workflow to implement this idea. We empirically demonstrate that rubric-based rewards substantially mitigate reward over-optimization and deliver effective LLM post-training improvements.

Our code can be accessed at https://github.com/Jun-Kai-Zhang/rubrics.git .

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.