DS1 spectrogram: AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models

AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models

2605.26013

Authors

Krishna Kumar Singh,Viet Dac Lai,Branislav Kveton,Anup Rao,Subhojyoti Mukherjee

Abstract

We introduce AdvantageFlow, a forward-process reinforcement learning algorithm for rectified flow models. Unlike Flow-GRPO, which optimizes the reverse process, we optimize an advantage-weighted forward-process prediction loss.

This optimization problem is unstable when advantages are negative and the loss becomes non-convex. We stabilize it by rollout policy regularization, which reduces variance and arises from fitting a local reward-improving target distribution.

We evaluate AdvantageFlow on image generation tasks with Stable Diffusion 3.5 Medium. It outperforms both Flow-GRPO and a state-of-the-art forward-process RL baseline based on negative-aware fine-tuning.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • takara.ai
  • Custom AI and machine learning from the Frontier Research Team.
  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.