Archive for the 'quickprop' Category

quickprop modification

16 January 2008

this paper describes a modification to the quickprop method. i tried implementing it and it didn’t work, but i’m not really sure if i’m doing it right.

a quick test shows the initial value of the eta term should be around 1… but again not sure if i’m doing all the calculations correctly.

quickprop and the “infinite training set”

16 January 2008

overnight i tried training the “circle” problem with a new “infinite” training set method with quickprop. this method is similar to the old one, in that every pattern in a training set is trained simultaneously, but now the training set is only used once before a new one is generated. it seemed to work about as good as the standard quickprop with a single training set method at matching the “intended” region, by generally being one color near the center and another around the center, but was unable to meet the stopping criteria (which was relatively low, an error of 0.01) because of the constantly shifting training patterns. i think one reason for this might be that the network i was using (which was actually fairly large, {2, 12, 16, 20, 1}) might be technically unable to create a near-perfect circle region…

epoch training time

15 January 2008

using standard backprop a single epoch took about 0.08572 seconds to train. using quickprop a single epoch took about 0.08933 seconds to train. but even though quickprop took longer for each epoch, less of them needed to be completed for a solution to be reached.

more time comparisons

15 January 2008

did 40 more tests with identical training sets and initial weights on the “circle” problem with a {2, 4, 4, 1} network. quickprop was generally faster than standard backprop, but there were some cases in which standard backprop reach a solution faster. quickprop’s average time was 14.453 seconds, and backprop’s average time was 30.7022 seconds.

here are the times for each individual test with quickprop:

{10.124, 5.508, 4.577, 7.341, 3.665, 15.883, 68.068, 10.104, 8.242, 3.745, 6.57, 10.184, 10.085, 22.632, 7.561, 9.384, 9.433, 104.11, 25.166, 6.46, 5.508, 9.193, 8.262, 11.086, 7.34, 16.534, 8.282, 10.154, 14.872, 18.416, 3.666, 21.12, 16.624, 10.124, 11.938, 10.164, 20.199, 14.771, 4.587, 6.439}

and here are the times for backprop:

{15.072, 18.706, 29.393, 8.893, 13.329, 14.591, 315.794, 23.213, 20.66, 9.854, 8.873, 16.874, 16.083, 20.99, 22.753, 26.088, 37.003, 21.881, 40.519, 21.4, 11.587, 24.095, 15.362, 12.498, 23.163, 65.935, 27.64, 16.023, 66.535, 51.975, 12.868, 26.709, 26.728, 25.597, 24.976, 12.488, 14.24, 30.334, 16.934, 20.43}

time comparison

15 January 2008

some of the problems i was having yesterday seem to come from the term added to the sigmoid function’s derivative, which if too large caused the system to not converge. also, the rho term seems to work best around the range of 5 (i actually found this out two summers ago, but i did a few more tests to see if the term added to the sigmoid derivative made any difference). the following results were made by comparing the quickprop and standard backprop methods with identical starting weights and training set. the value of alpha was 1.0 and rho was 5.0. quickprop used a mu value of 1.75. goal was to reach a network error of 0.05.

here are the results for standard backprop, which were reached in 31.235 seconds:

backprop.png

here are the results for quickprop, reached in 10.755 seconds:

quickprop.png

14 January 2008

the method described in the previous post doesn’t seem to work so well on other problems… i have been trying it on the “circle” problem and the “two circle” problem and it does not work very well on those. it seems the way that is working best in general now is the old standard backprop/gradient descent method that looks at every training pattern individually.

also, the new method didn’t work very well on the cascade spiral problem, either.

quickprop on ff nets

14 January 2008

i’ve completed putting the quickprop algorithm into my feedforward network and i’m now running some tests on it. it seems that the method i was using in the cascade network of calculating the error on all of the training set and minimizing that does not seem to work as well as simply going through the training set and calculating the error on a single point, then adjusting the weights after each point. this is more similar to the old method i used on feedforward nets in which the weights were adjusted after each point in the training set. a {2, 4, 4, 1} net was able to match the “x” problem after 1000 epochs and get close to the solution in less than 100 using the new method, while the old method was unable to even get close within 1500 epochs. the current file is ffqp3.nb.

on another computer i’ve decided to try this new method on the old cascade network on the spiral problem and see what happens.

the last cascade spiral for now

14 January 2008

here are the results of the last spiral i’ll be doing with the cascade correlation algorithm. 183 hidden nodes were added and the network error was 0.290283. it trained over the entire weekend and i used much lower stopping criteria than previously, which is why it was able to add so many hidden nodes.

spiral183.png

\[Rho] = 1.0;
\[Alpha] = 0.35;
\[Beta] = 0.35;
\[Mu] = 1.5;
\[Zeta] = 0.1;
r = 1.0;
MinError = 0.01;
MinSChange = 0.1;
MinEChange = 0.0005;
loopsize = 10;
maxweightval = 100.0;
stopafter = 1000;
candidatenodes = 8;

moving on

11 January 2008

i will be focusing less on the cascade-correlation algorithm from now on because it just doesn’t seem to be working right no matter what i try. the next thing i will do is try to implement quickprop on the old feedforward network.

overnight spiral again…

11 January 2008

trained the spiral overnight again, but since no changes in the actual training algorithm were made (just changes that increased the speed) it still did poorly. however, 27 hidden nodes were added in about the same amount of time as the previous trial, in which only 16 hidden nodes were added. the network error was 0.492361.

spiral27.png

\[Rho] = 1.0;
\[Alpha] = 0.35;
\[Beta] = 0.35;
\[Mu] = 1.5;
\[Zeta] = 0.1;
r = 1.0;
MinError = 0.01;
MinSChange = 0.1;
MinEChange = 0.0005;
loopsize = 100;
maxweightval = 20.0;
stopafter = 10000;
candidatenodes = 8;