Description
A visual-audio dataset consisting sheet music images and their corresponding audio played on a range of pianos. There are 88 ground truth sheet music images which are of dimensions 162x300 (height x width), 1,948 training audio files, and 516 testing audio files. The audio files have 764 frames when in Mel spectrogram representation. The dataset also contains annotations files to align the audio files to their correct ground truths and determine what audio files are in what set (this must be reflected in the directory). This dataset was created to be used for a dissertation project involving deep learning (more specifically convolutional models) and as such is likely most useful for similar tasks. The task for the project in question was to generate sheet music images from corresponding audio samples.
Date made available | 21 Jun 2023 |
---|---|
Publisher | University of Birmingham |
Date of data production | Nov 2022 - Feb 2023 |