AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models

2605.26013

Authors

Krishna Kumar Singh,Viet Dac Lai,Branislav Kveton,Anup Rao,Subhojyoti Mukherjee

Abstract

We introduce AdvantageFlow, a forward-process reinforcement learning algorithm for rectified flow models. Unlike Flow-GRPO, which optimizes the reverse process, we optimize an advantage-weighted forward-process prediction loss.

This optimization problem is unstable when advantages are negative and the loss becomes non-convex. We stabilize it by rollout policy regularization, which reduces variance and arises from fitting a local reward-improving target distribution.

We evaluate AdvantageFlow on image generation tasks with Stable Diffusion 3.5 Medium. It outperforms both Flow-GRPO and a state-of-the-art forward-process RL baseline based on negative-aware fine-tuning.

Resources

View on Hugging Face Read PDF ArXiv

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Tools

Details

takara.ai
Custom AI and machine learning from the Frontier Research Team.
Content is sourced from third-party publications.