DS1 spectrogram: Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code
  Workflows

Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows

2505.24189

Authors

Orlando Marquez Ayala,Patrice Bechard,Emily Chen,Maggie Baird,Jingfei Chen

Abstract

Large Language Models (LLMs) such as GPT-4o can handle a wide range of complex tasks with the right prompt. As per token costs are reduced, the advantages of fine-tuning Small Language Models (SLMs) for real-world applications -- faster inference, lower costs -- may no longer be clear.

In this work, we present evidence that, for domain-specific tasks that require structured outputs, SLMs still have a quality advantage. We compare fine-tuning an SLM against prompting LLMs on the task of generating low-code workflows in JSON form.

We observe that while a good prompt can yield reasonable results, fine-tuning improves quality by 10% on average. We also perform systematic error analysis to reveal model limitations.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.