I am doing a final project at campus: pitch estimation from a song using CNN.
Input to CNN is spectrogram of a song, generated by plt.specgram()
, with size 334 x 217. The song dataset is taken from MIR-QBSH, with this specification: 8 sec duration, mono, 8KHz sampling, 8-bit quantization, frame size = 256, overlap = 0, and the first frame starts from the first sample of the audio file.
This is one example of the spectrogram:
As far as I understand now, I need data label (in my case: pitch labels) combined with the spectrogram for CNN to be able to process the computation. My data label contains 250 pitch labels for 1 song. These pitch labels are in the unit of semitone (MIDI number).
This is the example of pitch labels for spectrogram above. I have done math.floor()
method to these pitch labels from the original file to simplify the computation.
Pitch values: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 50, 50, 52, 52, 53, 54, 54, 53, 53, 53, 53, 54, 54, 54, 54, 54, 53, 0, 0, 54, 54, 54, 54, 54, 54, 53, 0, 0, 0, 0, 46, 46, 46, 47, 48, 48, 48, 48, 48, 49, 49, 49, 50, 50, 50, 50, 0, 0, 0, 50, 50, 50, 50, 50, 50, 50, 49, 0, 0, 51, 47, 47, 47, 47, 47, 47, 47, 47, 48, 49, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 58, 57, 58, 58, 58, 58, 58, 57, 57, 57, 57, 57, 57, 57, 58, 58, 57, 57, 57, 57, 56, 55, 55, 56, 56, 56, 56, 56, 55, 56, 56, 56, 56, 55, 55, 55, 56, 56, 54, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 52, 52, 52, 53, 53, 54, 54, 0, 0, 54, 54, 54, 54, 54, 54, 54, 54, 0, 0, 0, 54, 54, 54, 54, 54, 53, 52, 51, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 49, 49, 49, 49, 50, 50, 50, 50, 49, 0, 0, 0, 50, 49, 49, 49, 49, 49, 50, 50, 49, 0, 0, 47, 47, 47, 48, 48, 48, 48, 0, 0, 0, 0, 0, 0, 0]
My question is, what should I do to combine the spectrogram and its pitch label before it is processed by CNN in Python?
question from:
https://stackoverflow.com/questions/65913473/how-to-combine-spectrogram-image-with-its-human-labelled-data-to-be-processed-wi