DS1 spectrogram: SciDef: Automating Definition Extraction from Academic Literature with Large Language Models

SciDef: Automating Definition Extraction from Academic Literature with Large Language Models

February 5, 20262602.05413

Authors

Filip Kučera,Christoph Mandl,Isao Echizen,Radu Timofte,Timo Spinde

Abstract

Definitions are the foundation for any scientific work, but with a significant increase in publication numbers, gathering definitions relevant to any keyword has become challenging. We therefore introduce SciDef, an LLM-based pipeline for automated definition extraction.

We test SciDef on DefExtra & DefSim, novel datasets of human-extracted definitions and definition-pairs' similarity, respectively. Evaluating 16 language models across prompting strategies, we demonstrate that multi-step and DSPy-optimized prompting improve extraction performance.

To evaluate extraction, we test various metrics and show that an NLI-based method yields the most reliable results. We show that LLMs are largely able to extract definitions from scientific literature (86.4% of definitions from our test-set); yet future work should focus not just on finding definitions, but on identifying relevant ones, as models tend to over-generate them.

Code & datasets are available at https://github.com/Media-Bias-Group/SciDef.

Resources

Stay in the loop

Get tldr.takara.ai to Your Email, Everyday.

tldr.takara.aiHome·Daily at 6am UTC·© 2026 takara.ai Ltd

Content is sourced from third-party publications.