DS1 spectrogram: Refining Alignment Framework for Diffusion Models with Intermediate-Step
  Preference Ranking

Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking

2502.01667

Authors

Jie Ren,Yuhang Zhang,Dongrui Liu,Xiaopeng Zhang,Qi Tian

Abstract

Direct preference optimization (DPO) has shown success in aligning diffusion models with human preference. Previous approaches typically assume a consistent preference label between final generations and noisy samples at intermediate steps, and directly apply DPO to these noisy samples for fine-tuning.

However, we theoretically identify inherent issues in this assumption and its impacts on the effectiveness of preference alignment. We first demonstrate the inherent issues from two perspectives: gradient direction and preference order, and then propose a Tailored Preference Optimization (TailorPO) framework for aligning diffusion models with human preference, underpinned by some theoretical insights.

Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues through a simple yet efficient design. Additionally, we incorporate the gradient guidance of diffusion models into preference alignment to further enhance the optimization effectiveness.

Experimental results demonstrate that our method significantly improves the model's ability to generate aesthetically pleasing and human-preferred images.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.