MusicMNIST: A Simple Visual-Audio Dataset for Musical Notes

Dataset

Description

A visual-audio dataset consisting sheet music images and their corresponding audio played on a range of pianos. There are 88 ground truth sheet music images which are of dimensions 162x300 (height x width), 1,948 training audio files, and 516 testing audio files. The audio files have 764 frames when in Mel spectrogram representation. The dataset also contains annotations files to align the audio files to their correct ground truths and determine what audio files are in what set (this must be reflected in the directory). This dataset was created to be used for a dissertation project involving deep learning (more specifically convolutional models) and as such is likely most useful for similar tasks. The task for the project in question was to generate sheet music images from corresponding audio samples.
Date made available21 Jun 2023
PublisherUniversity of Birmingham
Date of data productionNov 2022 - Feb 2023

Cite this