DS1 spectrogram: Edit Where You Mean: Region-Aware Adapter Injection for Mask-Free Local Image Editing

Edit Where You Mean: Region-Aware Adapter Injection for Mask-Free Local Image Editing

2604.23763

Authors

Tianze Zhou,Runqi Wang,Zhen Li,Honghao Cai,Xiangyuan Wang

Abstract

Large diffusion transformers (DiTs) follow global editing instructions well but consistently leak local edits into unrelated regions, because joint-attention architectures offer no explicit channel telling the network where to apply the edit. We introduce AdaptEdit, a co-trained, instruction- and region-aware adapter framework that retro-fits a frozen DiT into a precise local editor without modifying its backbone weights.

A lightweight Block Adapter at every transformer block injects a structured condition stream that factorizes what to edit (instruction semantics) from where to edit (spatial mask); a learned SpatialGate routes the adapter signal selectively into the edit region while keeping the rest of the image near-identical to the source; and a Region-Aware Loss focuses the training objective on the changing pixels. Because these components make the backbone's internal representation mask-aware end-to-end, a thin MaskPredictor head trained jointly with the editor can ground the edit region directly from the instruction and source image -- eliminating any user-mask requirement at deployment.

We evaluate on two complementary benchmarks: MagicBrush (paired ground-truth targets) to measure pixel-level preservation and edit accuracy, and Emu-Edit Test (no ground-truth images, 9 diverse edit categories) to stress-test instruction following and generalization across edit types. On both, AdaptEdit achieves state-of-the-art results, simultaneously outperforming mask-free and oracle-mask baselines.

A seven-variant ablation cleanly isolates the contribution of each component.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.