DS1 spectrogram: A Dataset for the Recognition of Historical and Handwritten Music Scores in Western Notation

A Dataset for the Recognition of Historical and Handwritten Music Scores in Western Notation

2605.18436

Authors

Samuel Šomorjai,Carles Badal,Markéta Herzanová Vlková,Jan Hajič,Alicia Fornés

Abstract

A large amount of musical heritage has been digitised by memory institutions: libraries, museums, and archives. Nevertheless, the field of Optical Music Recognition (OMR) has struggled with making this music machine-readable, despite advances in deep learning, mostly because no datasets for training systems in realistic conditions were available.

The MusiCorpus dataset aims to remedy this situation by providing 1,309 pages of historical sheet music, primarily handwritten, with MusicXML transcriptions and symbol annotations. It is the largest dataset of handwritten music to date and the first dataset containing a realistic and representative sample of musical document collections from memory institutions, suitable for training and evaluating both end-to-end and object detection-based OMR systems and comparing their performance.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.