diff options
author | Paul Mineiro <paul-github@mineiro.com> | 2014-08-20 08:52:53 +0400 |
---|---|---|
committer | Paul Mineiro <paul-github@mineiro.com> | 2014-08-20 08:52:53 +0400 |
commit | ff045938d57c435c366613f012c01ccf93162fa1 (patch) | |
tree | d1b27dac9ba36c2f4671da8bb9f18197b0389972 /demo | |
parent | b19ec85b7c95758ef7174717f212c38b9bb0a263 (diff) |
nn hogwild training fix + demo
Diffstat (limited to 'demo')
-rw-r--r-- | demo/dna/README | 12 | ||||
-rwxr-xr-x | demo/dna/do-dnahogwild-multicore-train | 47 | ||||
-rwxr-xr-x | demo/dna/do-dnahogwildnn-multicore-train | 47 | ||||
-rw-r--r-- | demo/mnist/README | 3 | ||||
-rwxr-xr-x | demo/movielens/README.md | 2 |
5 files changed, 109 insertions, 2 deletions
diff --git a/demo/dna/README b/demo/dna/README index 8a10a775..b5767565 100644 --- a/demo/dna/README +++ b/demo/dna/README @@ -20,7 +20,7 @@ Scale Learning Challenge (http://largescale.ml.tu-berlin.de/summary/). results in APR of 0.512 * make dnann.perf - as above but with additionally 1 neural network hidden node + same as dna.perf, but with additionally 1 neural network hidden node slower (by circa 60 seconds) but better results in APR of 0.532 @@ -34,3 +34,13 @@ Scale Learning Challenge (http://largescale.ml.tu-berlin.de/summary/). subsequently, 6 minute per pass if you have SSD or enough RAM cache 10 passes = 60 minutes (x 6 cores) results in APR of 0.545 + + * make dnahogwild.perf + same as dna.perf, but trained via lock-free multicore sgd ("hogwild") + rather than parallel sgd + averaging + nondeterministic, but a typical result is APR of 0.516 + + * make dnahogwildnn.perf + same as dnann.perf, but trained via lock-free multicore sgd ("hogwild") + rather than parallel sgd + averaging + nondeterministic, but a typical result is APR of 0.536 diff --git a/demo/dna/do-dnahogwild-multicore-train b/demo/dna/do-dnahogwild-multicore-train new file mode 100755 index 00000000..37270bf2 --- /dev/null +++ b/demo/dna/do-dnahogwild-multicore-train @@ -0,0 +1,47 @@ +#! /bin/zsh + +rm -f dnahogwild.model + +set -e + +nukeem() { \ + trap - INT QUIT TERM + pkill -9 -f 'vw.*--port 26543' +} + +learner() { + ./quaddna2vw | \ + netcat localhost 26543 > /dev/null +} + +{ + ../../vowpalwabbit/vw -f dnahogwild.model \ + --loss_function logistic \ + -b 18 -l 0.0625 --adaptive --invariant \ + --daemon --num_children 4 --port 26543 2>&1 | \ + perl -lane 'print $_ unless $c{$F[2]}++;' +} & + +trap 'nukeem; exit 1' INT QUIT TERM + +while ! netcat -z localhost 26543 + do + sleep 1 + done + +paste -d' ' \ + <(bzcat dna_train.lab.bz2) \ + <(bzcat dna_train.dat.bz2) | \ +tail -n +1000000 | \ +./map \ + >(learner) \ + >(learner) \ + >(learner) \ + >(learner) + +pkill -f 'vw.*--port 26543' + +while test ! -s dnahogwild.model + do + sleep 1 + done diff --git a/demo/dna/do-dnahogwildnn-multicore-train b/demo/dna/do-dnahogwildnn-multicore-train new file mode 100755 index 00000000..95ef5e93 --- /dev/null +++ b/demo/dna/do-dnahogwildnn-multicore-train @@ -0,0 +1,47 @@ +#! /bin/zsh + +rm -f dnahogwildnn.model + +set -e + +nukeem() { \ + trap - INT QUIT TERM + pkill -9 -f 'vw.*--port 26544' +} + +learner() { + ./quaddna2vw | \ + netcat localhost 26544 > /dev/null +} + +{ + ../../vowpalwabbit/vw -f dnahogwildnn.model \ + --loss_function logistic --nn 1 --inpass \ + -b 18 -l 0.015 --adaptive --invariant \ + --daemon --num_children 4 --port 26544 2>&1 | \ + perl -lane 'print $_ unless $c{$F[2]}++;' +} & + +trap 'nukeem; exit 1' INT QUIT TERM + +while ! netcat -z localhost 26544 + do + sleep 1 + done + +paste -d' ' \ + <(bzcat dna_train.lab.bz2) \ + <(bzcat dna_train.dat.bz2) | \ +tail -n +1000000 | \ +./map \ + >(learner) \ + >(learner) \ + >(learner) \ + >(learner) + +pkill -f 'vw.*--port 26544' + +while test ! -s dnahogwildnn.model + do + sleep 1 + done diff --git a/demo/mnist/README b/demo/mnist/README index 738dbd33..0dfe714d 100644 --- a/demo/mnist/README +++ b/demo/mnist/README @@ -3,6 +3,9 @@ set for testing neural network implementations. mnist8m (http://leon.bottou.org/papers/loosli-canu-bottou-2006) is a variant of the original mnist training set augmented with deformations. +see the dna demo directory for an example of distributed neural network +training. + === INSTRUCTIONS === --- starting from raw pixels --- diff --git a/demo/movielens/README.md b/demo/movielens/README.md index 46d7e575..2bf1c544 100755 --- a/demo/movielens/README.md +++ b/demo/movielens/README.md @@ -36,7 +36,7 @@ You might find a bit of `--l2` regularization improves generalization. - linear: a model without any interactions. basically this creates a user bias and item bias fit. this is a surprisingly strong baseline in terms of MAE, but is useless for recommendation as it induces the same item ranking for all users. It achieves test MAE of 0.731. - lrq: the linear model augmented with rank-7 interactions between users and movies, aka, "seven latent factors". It achieves test MAE of 0.709. I determined that 7 was the best number to use through experimentation. The key additional `vw` command-line flags vs. the linear model are `--l2 1.25e-7 --lrq um7`. Performance is sensitive to the choice of `--l2` regularization strength. - lrqdropout: the linear model augmented with rank-14 interactions between users and movies, and trained with dropout. It achieves test MAE of 0.689. The key additional `vw` command-line flags vs. the linear model are `--lrq um14 --lrqdropout`. - - lrqdropouthogwild: same as lrqdropout, but trained in parallel on multiple cores without locking, a la [Niu et. al.](http://www.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf). Test MAE is nondeterministic but generally equivalent to lrqdropout. The main purpose of this demo is to instruct on how to achieve lock-free parallel learning. (Note using the cache and a single training core can be faster than using multiple cores and parsing continuously. However in some cases data is generated dynamically in such volume that the cache is not practical, thus this technique is helpful.) + - lrqdropouthogwild: same as lrqdropout, but trained in parallel on multiple cores without locking, a la [Niu et. al.](http://www.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf). Test MAE is nondeterministic but typically equivalent to lrqdropout. The main purpose of this demo is to instruct on how to achieve lock-free parallel learning. (Note using the cache and a single training core can be faster than using multiple cores and parsing continuously. However in some cases data is generated dynamically in such volume that the cache is not practical, thus this technique is helpful.) - the first time you invoke `make shootout` there is a lot of other output. invoking it a second time will allow you to just see the cached results. - `make movie_dendrogram.pdf` will produce a couple of PDFs with hierarchical clustering of the movies based on the latent factors found by `--lrq`. It serves as an example on how to extract the latent factors from an `--invert_hash` file. You will need to zoom in in the large dendrogram to find the movie names. |