here are results from the same network as the previous post on a recording of two notes being played at the same time.

neural nets + sound
here are results from the same network as the previous post on a recording of two notes being played at the same time.

i recorded two more samples of each note on the glockenspiel-like instrument, then changed the network so it randomly selects one of the three for each note when generating training patterns. after training on 20000 patterns here are the results for the same scale as before:

results are much better than before. there don’t seem to be any major “gray areas”. the only problem is that every single note seems to pick up something when just one note is played.
i started testing how the network does with recordings that i did of real instruments today. i’m using goldwave to record with a apple pc microphone hooked up to an “iMic” usb sound interface. i did recordings for a glockenspiel-like instrument with four notes. it’s lower-pitched than the old samples i was using, so i brought the inputs back down from 72 to 64. there is some noise in the recording’s background, and i’m using goldwave’s noise reduction function to get rid of most of it. i trained the network for 20000 training patterns, and here is the output of a scale i constructed out of the training pattern notes:

here is the same scale, but re-recorded when played on the instrument:

it seems to hit the notes, but it also hits a bunch of other stuff that wasn’t being played. (i think the long duration of the second note might actually be correct, since that note was played quite a bit harder than the others.) a way to solve this problem might be recording multiple training patterns for each note, then the network might be able to generalize better.
here are the results of training with linear outputs and the same targets values as previously. note that to produce the output i had to adjust the plotrange from {0,-1} to {-1,2}.

when i used the multiplier of 100 (twice the usual 50) i still couldn’t see the missing note. with the multiplier of 1000 it is very clear. the notes do seem to “cut off” faster than when i was using rms, though.

did some overnight training using sigmoid outputs with the four-note glockenspiel samples again, and it still is missing that first note… i think the problem might come from training using intensity rather than rms value for the sounds. when i switched seems to be when that first note went missing. i’m currently briefly training the network again with a larger multiplier, so maybe that first note will get noticed this time.
also, i made the training work so that after every 1000 training patterns are applied the network checks its “error” and adds the value to a list. this way i can make a listplot to see how the error evolves over time and at what point further training doesn’t have a significant effect. here are the results from last night’s overnight training:

did an overnight test of the network with linear output activation functions and limits on the targets (they were created with the exact same multiplier and limits as the previous working tests with sigmoid outputs) and it still failed to detect that first note, and it seemed to do even worse than some of the other linear output tests i’ve done. here is the output for the scale pattern:

here are the intensities over time for the four-note glockenspiel samples i’ve been using. the first one is much lower than the others, which could explain why it was not showing up on the linear output node tests.

tested slopes of 0.5, 0.05, 0.005, and 0.0005 for the linear output function. 0.05 seems to have done the best this time, reaching the lowest error and not having any problems other than the missing first note. all three other slopes had big problems on the scale (at least one note being “grayed out” in addition to the missing first note) and higher errors.
did some tests with the linear outputs to see how the slope affected the training of the network. all networks were trained with 20000 random training patterns and had an output target multiplier of 10. networks with output slopes of 5 and 10 failed, with errors jumping very high very quickly, and output nodes always outputting either positive or negative 490000. networks with output slopes of 1 and 0.1 worked, with 0.1 working just a tiny bit better than slope of 1, achieving a slightly lower minimum error. both networks still failed to recognize the first note. i am not going to try the same tests with even lower slopes.