DS1 spectrogram: nnY-Net: Swin-NeXt with Cross-Attention for 3D Medical Images Segmentation

nnY-Net: Swin-NeXt with Cross-Attention for 3D Medical Images Segmentation

2501.01406

Authors

Haixu Liu,Zerui Tao,Wenzhen Dong,Qiuzhuang Sun

Abstract

This paper provides a novel 3D medical image segmentation model structure called nnY-Net. This name comes from the fact that our model adds a cross-attention module at the bottom of the U-net structure to form a Y structure.

We integrate the advantages of the two latest SOTA models, MedNeXt and SwinUNETR, and use Swin Transformer as the encoder and ConvNeXt as the decoder to innovatively design the Swin-NeXt structure. Our model uses the lowest-level feature map of the encoder as Key and Value and uses patient features such as pathology and treatment information as Query to calculate the attention weights in a Cross Attention module.

Moreover, we simplify some pre- and post-processing as well as data enhancement methods in 3D image segmentation based on the dynUnet and nnU-net frameworks. We integrate our proposed Swin-NeXt with Cross-Attention framework into this framework.

Last, we construct a DiceFocalCELoss to improve the training efficiency for the uneven data convergence of voxel classification.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • takara.ai
  • Custom AI and machine learning from the Frontier Research Team.
  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.