DS1 spectrogram: Entity Resolution via Batched Oracle Queries

Entity Resolution via Batched Oracle Queries

2606.24407

Authors

Lorenzo Balzotti,Donatella Firmani,Luca Gagliardelli,Giovanni Simonini

Abstract

We consider an oracle that processes a limited batch of records at a time and clusters those that refer to the same real-world entity. We study how to interrogate such an oracle to resolve entities in a dataset whose size is far larger than a single batch, and where no batch is guaranteed to contain all records of any given entity.

We aim at a pay-as-you-go approach, to have full control over the costs (the number of oracle consults), while achieving the highest possible recall at every step. We formally cast this problem as batched entity resolution, prove that selecting optimal batches is NP-hard, and provide an optimal solution under a natural condition on entity sizes.

Finally, we evaluate our approach on six datasets and show its superiority over state-of-the-art baselines.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • takara.ai
  • Custom AI and machine learning from the Frontier Research Team.
  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.