Deep Audio Transcription

Victoria Ebert and Dr. Patrick Donnelly, Department of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331

Audio transcription is a music information retrieval task of transcribing an audio file of a musical performance into symbolic notation. In this project, two separate autoencoders are built to investigate the use of deep music representation. Autoencoders produce outputs similar to their inputs and can help inform the model structures that best represent the problem. The first autoencoder learns waveforms from audio files and the second autoencoder learns musical scores represented by symbolic MIDI files. The audio and MIDI sequences are used to train the two respective separate autoencoders, built using the Tensorflow Keras library. The autoencoders are trained on the MAESTRO dataset, a dataset that aligns audio and MIDI files within 10 milliseconds of accuracy, that was collected from solo piano performances on a Yamaha Disklavier piano at the International Piano E-Competition. Prior to training the neural networks, the dataset was preprocessed, downsampling the audio to 16000 samples per second and converting the MIDI into a time-series representation with a step size of 64 milliseconds, or 1024 samples. This project is part of a larger research effort to explore music transcription using deep learning. These autoencoders will be combined using transfer learning, merging trained layers of the encoder of the audio model and trained layers of the decoder of the score model. Starting with this combination of trained autoencoders, a neural network will be trained that attempts to learn MIDI from an audio file. By combining halves of trained autoencoders together, we expect faster convergence when training deep neural networks to convert audio signals to symbolic notation. 

Additional Abstract Information

Presenter: Victoria Ebert

Institution: Oregon State University

Type: Oral

Subject: Computer Science

Status: Approved

Time and Location

Session: Oral 9
Date/Time: Wed 12:00pm-1:00pm
Session Number: 913
List other presenters in this same room and session