DS1 spectrogram: To Write or to Automate Linguistic Prompts, That Is the Question

To Write or to Automate Linguistic Prompts, That Is the Question

March 26, 20262603.25169

Authors

Marina Sánchez-Torrón,Daria Akselrod,Jason Rauchwerk

Abstract

LLM performance is highly sensitive to prompt design, yet whether automatic prompt optimization can replace expert prompt engineering in linguistic tasks remains unexplored. We present the first systematic comparison of hand-crafted zero-shot expert prompts, base DSPy signatures, and GEPA-optimized DSPy signatures across translation, terminology insertion, and language quality assessment, evaluating five model configurations.

Results are task-dependent. In terminology insertion, optimized and manual prompts produce mostly statistically indistinguishable quality.

In translation, each approach wins on different models. In LQA, expert prompts achieve stronger error detection while optimization improves characterization.

Across all tasks, GEPA elevates minimal DSPy signatures, and the majority of expert-optimized comparisons show no statistically significant difference. We note that the comparison is asymmetric: GEPA optimization searches programmatically over gold-standard splits, whereas expert prompts require in principle no labeled data, relying instead on domain expertise and iterative refinement.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.