DS1 spectrogram: An Annotation Scheme and Classifier for Personal Facts in Dialogue

An Annotation Scheme and Classifier for Personal Facts in Dialogue

May 11, 20262605.10339

Authors

Konstantin Zaitsev

Abstract

The advancement of Large Language Models (LLMs) has enabled their application in personalized dialogue systems. We present an extended annotation scheme for personal fact classification that addresses limitations in existing approaches, particularly PeaCoK.

Our scheme introduces new categories (Demographics, Possessions) and attributes (Duration, Validity, Followup) that enable structured storage, quality filtering, and identification of facts suitable for dialogue continuation. We manually annotated 2,779 facts from Multi-Session Chat and trained a multi-head classifier based on transformer encoders.

Combined with the Gemma-300M encoder, the classifier achieves $81.6 \pm 2.6$% macro F1, outperforming all few-shot LLM baselines (best: GPT-5.4-mini, 72.92%) by nearly 9 percentage points while requiring substantially fewer computational resources. Error analysis reveals persistent challenges in semantic boundary disambiguation, temporal aspect interpretation, and pragmatic reasoning for followup assessment.

The dataset\footnotemark[1] and classifier\footnotemark[2] are publicly available.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.