An Annotation Scheme and Classifier for Personal Facts in Dialogue

Abstract

The advancement of Large Language Models (LLMs) has enabled their application in personalized dialogue systems. We present an extended annotation scheme for personal fact classification that addresses limitations in existing approaches, particularly PeaCoK.

Our scheme introduces new categories (Demographics, Possessions) and attributes (Duration, Validity, Followup) that enable structured storage, quality filtering, and identification of facts suitable for dialogue continuation. We manually annotated 2,779 facts from Multi-Session Chat and trained a multi-head classifier based on transformer encoders.

Combined with the Gemma-300M encoder, the classifier achieves $81.6 \pm 2.6$% macro F1, outperforming all few-shot LLM baselines (best: GPT-5.4-mini, 72.92%) by nearly 9 percentage points while requiring substantially fewer computational resources. Error analysis reveals persistent challenges in semantic boundary disambiguation, temporal aspect interpretation, and pragmatic reasoning for followup assessment.

The dataset\footnotemark[1] and classifier\footnotemark[2] are publicly available.

An Annotation Scheme and Classifier for Personal Facts in Dialogue

Authors

Abstract

Resources

Stay in the loop

Pages

Tools

Details