DS1 spectrogram: KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

January 8, 20262601.04745

Authors

Shuo Zhang,Sen Hu,Silin Wu,Qizhen Lan,Ronghao Chen

Abstract

Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles.

\BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval.

Our data is in https://github.com/QuantaAlpha/KnowMeBench.

Resources

Stay in the loop

Get tldr.takara.ai to Your Email, Everyday.

tldr.takara.aiHome·Daily at 6am UTC·© 2026 takara.ai Ltd

Content is sourced from third-party publications.