Archive for January, 2008

linear output nodes

23 January 2008

yesterday i modified my network so that different activation functions could be used in each layers, and set it so that the output layer had linear activation functions rather than the typical sigmoid. the network was trained overnight on about 563000 randomly generated training patterns. results don’t seem as good as the sigmoidal outputs… here is the network’s output for the four-note scale:

glocklinear.png

the last three notes were hit fairly well, but it looks like it completely missed the first one. i think it might be best to go back to the sigmoid outputs for now.

today i need to figure out a better way for checking the network’s error. i am thinking maybe generating 100 training patterns and finding the error for each of those, then averaging. i also need to look more into how to write to files. the way i’m using now (writing as “Package”) output all the data without rounding anything off, but i don’t know if it can easily be imported back into mathematica.

training glockenspiel over the long weekend

22 January 2008

trained the the four-note glockenspiel network setup over the long weekend… unfortunately i was unable to tell how many training iterations it went through, but it seems to fit randomly generated training patterns pretty well. here is the output for the sound that the training patterns were based on:

glockscale.png

the network’s weights are saved in sndnn03.nb.

training sound

18 January 2008

finished updating the old-style network today. current file is sndnn03.nb. started training on a simple set of four notes played on glockenspiel (network has four outputs). one of the note’s major peak was outside the 64 point range i usually use for fourier transform, so i extended it to 72 points for this problem. noticed that training was very slow and what was causing the slowness was the “detune” function i had written yesterday, so i decided that instead of using it on every single new training pattern i would just run detune 21 times on every note of every instrument (from multiplier of 0.99 to 1.01) and store those sounds, then randomly select one of the tunings for each training set. this sped training up considerably, probably by at least a factor of 100.

sound

17 January 2008

worked on looking back into my sound stuff today, transferring functions over to my new network program, making improvements and completely rewriting some things so that they can be used easier. while looking into my old program i saw that i actually hadn’t been pitchshifting/detuning the sound files (i only did that with generated sounds), so i wrote a function that can pitchshift the files within mathematica. i also greatly improved the random training pattern generation function… now it will always output at least one instrument playing one note (but now that i think about it there should be at least a small chance that there will be no instrument playing) and instruments can be specified to be polyphonic or not. another idea i might try to implement in the future is adding random noise to the training patterns. the file is sndnn01.nb and it is backed up on euclid.

also, the three other computers in here have been set up to not reset, so in the future i’ll be able to train overnight on four machines.

quickprop modification

16 January 2008

this paper describes a modification to the quickprop method. i tried implementing it and it didn’t work, but i’m not really sure if i’m doing it right.

a quick test shows the initial value of the eta term should be around 1… but again not sure if i’m doing all the calculations correctly.

quickprop and the “infinite training set”

16 January 2008

overnight i tried training the “circle” problem with a new “infinite” training set method with quickprop. this method is similar to the old one, in that every pattern in a training set is trained simultaneously, but now the training set is only used once before a new one is generated. it seemed to work about as good as the standard quickprop with a single training set method at matching the “intended” region, by generally being one color near the center and another around the center, but was unable to meet the stopping criteria (which was relatively low, an error of 0.01) because of the constantly shifting training patterns. i think one reason for this might be that the network i was using (which was actually fairly large, {2, 12, 16, 20, 1}) might be technically unable to create a near-perfect circle region…

epoch training time

15 January 2008

using standard backprop a single epoch took about 0.08572 seconds to train. using quickprop a single epoch took about 0.08933 seconds to train. but even though quickprop took longer for each epoch, less of them needed to be completed for a solution to be reached.

more time comparisons

15 January 2008

did 40 more tests with identical training sets and initial weights on the “circle” problem with a {2, 4, 4, 1} network. quickprop was generally faster than standard backprop, but there were some cases in which standard backprop reach a solution faster. quickprop’s average time was 14.453 seconds, and backprop’s average time was 30.7022 seconds.

here are the times for each individual test with quickprop:

{10.124, 5.508, 4.577, 7.341, 3.665, 15.883, 68.068, 10.104, 8.242, 3.745, 6.57, 10.184, 10.085, 22.632, 7.561, 9.384, 9.433, 104.11, 25.166, 6.46, 5.508, 9.193, 8.262, 11.086, 7.34, 16.534, 8.282, 10.154, 14.872, 18.416, 3.666, 21.12, 16.624, 10.124, 11.938, 10.164, 20.199, 14.771, 4.587, 6.439}

and here are the times for backprop:

{15.072, 18.706, 29.393, 8.893, 13.329, 14.591, 315.794, 23.213, 20.66, 9.854, 8.873, 16.874, 16.083, 20.99, 22.753, 26.088, 37.003, 21.881, 40.519, 21.4, 11.587, 24.095, 15.362, 12.498, 23.163, 65.935, 27.64, 16.023, 66.535, 51.975, 12.868, 26.709, 26.728, 25.597, 24.976, 12.488, 14.24, 30.334, 16.934, 20.43}

time comparison

15 January 2008

some of the problems i was having yesterday seem to come from the term added to the sigmoid function’s derivative, which if too large caused the system to not converge. also, the rho term seems to work best around the range of 5 (i actually found this out two summers ago, but i did a few more tests to see if the term added to the sigmoid derivative made any difference). the following results were made by comparing the quickprop and standard backprop methods with identical starting weights and training set. the value of alpha was 1.0 and rho was 5.0. quickprop used a mu value of 1.75. goal was to reach a network error of 0.05.

here are the results for standard backprop, which were reached in 31.235 seconds:

backprop.png

here are the results for quickprop, reached in 10.755 seconds:

quickprop.png

14 January 2008

the method described in the previous post doesn’t seem to work so well on other problems… i have been trying it on the “circle” problem and the “two circle” problem and it does not work very well on those. it seems the way that is working best in general now is the old standard backprop/gradient descent method that looks at every training pattern individually.

also, the new method didn’t work very well on the cascade spiral problem, either.