Data Description for Music Project

Acknowledgment


This acquisition and labeling of the music dataset are supported by the National Research Foundation of Korea’s Brain Korea 21 FOUR program, "Data Science Innovative Talent Fostering Education Research Center" at the Graduate School of Data Science, Seoul National University.

Data access

For full data, you need to contact us with the requested information listed below.

Please inform that we only allow usage of data for the ones who got approval, only for non-commercial academic purposes.

With the full requirement below, you can send an email to Jisoo Park of GSDS.

Information you should include

Email Subject: [Request for Music Dataset] - Your Name / Institution


  • Your full name

  • Your institution with description

  • Purpose and range for using the dataset


You also should add this text at last of your mail:

I agree to use the music dataset only for non-commercial academic purposes, and not to share the dataset with unauthorized others.


After you send mail, we will check the content and reply to you in 1-2 workdays.

If you used the dataset for work, you must state that you have used this dataset:

Data Description

Label Description

Sample Data


We provide sample data for you to test.

Download link(csv)

sample.xlsx

Our usage (example preprocessing)

For our model, first we set ground truth for each segment. We used apex of probability density function(of the responses of the segment) as the ground truth, but mean also works well.

With the ground truth, to put in our model(LSTM), we aligned with score data. For this we used Nakamura's Performance Error Detection program, and used text data as table.

However, if you put your tabled data into MusicBert directly, you don't need alignment process.

A bit more detail, we dropped 3 columns of conditional questions, which had less responses than other labels. For considering metadata of workers, we tried to separate but because of the nature of crowdsourcing, it was more effective to remain anonymous, and use whole data to integrate.


If you want more discussion, feel free to contact us.