DS1 spectrogram: Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

April 6, 20262604.04934

Authors

Wonjung Woo,Byungjun Kim,Hanbyul Joo,Hyunsoo Cha

Abstract

We present Vanast, a unified framework that generates garment-transferred human animation videos directly from a single human image, garment images, and a pose guidance video. Conventional two-stage pipelines treat image-based virtual try-on and pose-driven animation as separate processes, which often results in identity drift, garment distortion, and front-back inconsistency.

Our model addresses these issues by performing the entire process in a single unified step to achieve coherent synthesis. To enable this setting, we construct large-scale triplet supervision.

Our data generation pipeline includes generating identity-preserving human images in alternative outfits that differ from garment catalog images, capturing full upper and lower garment triplets to overcome the single-garment-posed video pair limitation, and assembling diverse in-the-wild triplets without requiring garment catalog images. We further introduce a Dual Module architecture for video diffusion transformers to stabilize training, preserve pretrained generative quality, and improve garment accuracy, pose adherence, and identity preservation while supporting zero-shot garment interpolation.

Together, these contributions allow Vanast to produce high-fidelity, identity-consistent animation across a wide range of garment types.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.