nn hogwild training fix + demo

author: Paul Mineiro <paul-github@mineiro.com> 2014-08-20 08:52:53 +0400
committer: Paul Mineiro <paul-github@mineiro.com> 2014-08-20 08:52:53 +0400
commit: ff045938d57c435c366613f012c01ccf93162fa1 (patch)
tree: d1b27dac9ba36c2f4671da8bb9f18197b0389972 /demo
parent: b19ec85b7c95758ef7174717f212c38b9bb0a263 (diff)
5 files changed, 109 insertions, 2 deletions
diff --git a/demo/dna/README b/demo/dna/README
index 8a10a775..b5767565 100644
--- a/demo/dna/README
+++ b/demo/dna/README
@@ -20,7 +20,7 @@ Scale Learning Challenge (http://largescale.ml.tu-berlin.de/summary/).
       results in APR of 0.512
 
   * make dnann.perf
-      as above but with additionally 1 neural network hidden node 
+      same as dna.perf, but with additionally 1 neural network hidden node 
       slower (by circa 60 seconds) but better
       results in APR of 0.532
 
@@ -34,3 +34,13 @@ Scale Learning Challenge (http://largescale.ml.tu-berlin.de/summary/).
           subsequently, 6 minute per pass if you have SSD or enough RAM cache
             10 passes = 60 minutes (x 6 cores)
       results in APR of 0.545
+
+  * make dnahogwild.perf
+      same as dna.perf, but trained via lock-free multicore sgd ("hogwild") 
+        rather than parallel sgd + averaging
+      nondeterministic, but a typical result is APR of 0.516
+
+  * make dnahogwildnn.perf
+      same as dnann.perf, but trained via lock-free multicore sgd ("hogwild")
+        rather than parallel sgd + averaging
+      nondeterministic, but a typical result is APR of 0.536
diff --git a/demo/dna/do-dnahogwild-multicore-train b/demo/dna/do-dnahogwild-multicore-train
new file mode 100755
index 00000000..37270bf2
--- /dev/null
+++ b/demo/dna/do-dnahogwild-multicore-train
@@ -0,0 +1,47 @@
+#! /bin/zsh
+
+rm -f dnahogwild.model
+
+set -e
+
+nukeem() { \
+  trap - INT QUIT TERM 
+  pkill -9 -f 'vw.*--port 26543'
+}
+
+learner() {
+  ./quaddna2vw |                                                        \
+  netcat localhost 26543 > /dev/null
+}
+
+{
+  ../../vowpalwabbit/vw -f dnahogwild.model                             \
+     --loss_function logistic                                           \
+     -b 18 -l 0.0625 --adaptive --invariant                             \
+     --daemon --num_children 4 --port 26543 2>&1 |                      \
+     perl -lane 'print $_ unless $c{$F[2]}++;'
+} &
+
+trap 'nukeem; exit 1' INT QUIT TERM 
+
+while ! netcat -z localhost 26543
+  do
+    sleep 1
+  done
+
+paste -d' '					\
+  <(bzcat dna_train.lab.bz2) 			\
+  <(bzcat dna_train.dat.bz2) |	                \
+tail -n +1000000 |				\
+./map 						\
+  >(learner)                                    \
+  >(learner)                                    \
+  >(learner)                                    \
+  >(learner)
+
+pkill -f 'vw.*--port 26543'
+
+while test ! -s dnahogwild.model
+  do
+    sleep 1
+  done
diff --git a/demo/dna/do-dnahogwildnn-multicore-train b/demo/dna/do-dnahogwildnn-multicore-train
new file mode 100755
index 00000000..95ef5e93
--- /dev/null
+++ b/demo/dna/do-dnahogwildnn-multicore-train
@@ -0,0 +1,47 @@
+#! /bin/zsh
+
+rm -f dnahogwildnn.model
+
+set -e
+
+nukeem() { \
+  trap - INT QUIT TERM 
+  pkill -9 -f 'vw.*--port 26544'
+}
+
+learner() {
+  ./quaddna2vw |                                                        \
+  netcat localhost 26544 > /dev/null
+}
+
+{
+  ../../vowpalwabbit/vw -f dnahogwildnn.model                           \
+     --loss_function logistic --nn 1 --inpass                           \
+     -b 18 -l 0.015  --adaptive --invariant                             \
+     --daemon --num_children 4 --port 26544 2>&1 |                      \
+     perl -lane 'print $_ unless $c{$F[2]}++;'
+} &
+
+trap 'nukeem; exit 1' INT QUIT TERM 
+
+while ! netcat -z localhost 26544
+  do
+    sleep 1
+  done
+
+paste -d' '					\
+  <(bzcat dna_train.lab.bz2) 			\
+  <(bzcat dna_train.dat.bz2) |	                \
+tail -n +1000000 |				\
+./map 						\
+  >(learner)                                    \
+  >(learner)                                    \
+  >(learner)                                    \
+  >(learner)
+
+pkill -f 'vw.*--port 26544'
+
+while test ! -s dnahogwildnn.model
+  do
+    sleep 1
+  done
diff --git a/demo/mnist/README b/demo/mnist/README
index 738dbd33..0dfe714d 100644
--- a/demo/mnist/README
+++ b/demo/mnist/README
@@ -3,6 +3,9 @@ set for testing neural network implementations.  mnist8m
 (http://leon.bottou.org/papers/loosli-canu-bottou-2006) is a variant of
 the original mnist training set augmented with deformations.
 
+see the dna demo directory for an example of distributed neural network
+training.
+
 === INSTRUCTIONS ===
 
 --- starting from raw pixels ---
diff --git a/demo/movielens/README.md b/demo/movielens/README.md
index 46d7e575..2bf1c544 100755
--- a/demo/movielens/README.md
+++ b/demo/movielens/README.md
@@ -36,7 +36,7 @@ You might find a bit of `--l2` regularization improves generalization.
  - linear: a model without any interactions.  basically this creates a user bias and item bias fit.  this is a surprisingly strong baseline in terms of MAE, but is useless for recommendation as it induces the same item ranking for all users.  It achieves test MAE of 0.731.
  - lrq: the linear model augmented with rank-7 interactions between users and movies, aka, "seven latent factors".  It achieves test MAE of 0.709.  I determined that 7 was the best number to use through experimentation.  The key additional `vw` command-line flags vs. the linear model are `--l2 1.25e-7 --lrq um7`.  Performance is sensitive to the choice of `--l2` regularization strength.
  - lrqdropout: the linear model augmented with rank-14 interactions between users and movies, and trained with dropout.  It achieves test MAE of 0.689.  The key additional `vw` command-line flags vs. the linear model are `--lrq um14 --lrqdropout`.
- - lrqdropouthogwild: same as lrqdropout, but trained in parallel on multiple cores without locking, a la [Niu et. al.](http://www.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf). Test MAE is nondeterministic but generally equivalent to lrqdropout.  The main purpose of this demo is to instruct on how to achieve lock-free parallel learning.  (Note using the cache and a single training core can be faster than using multiple cores and parsing continuously.  However in some cases data is generated dynamically in such volume that the cache is not practical, thus this technique is helpful.)
+ - lrqdropouthogwild: same as lrqdropout, but trained in parallel on multiple cores without locking, a la [Niu et. al.](http://www.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf). Test MAE is nondeterministic but typically equivalent to lrqdropout.  The main purpose of this demo is to instruct on how to achieve lock-free parallel learning.  (Note using the cache and a single training core can be faster than using multiple cores and parsing continuously.  However in some cases data is generated dynamically in such volume that the cache is not practical, thus this technique is helpful.)
 - the first time you invoke `make shootout` there is a lot of other output.  invoking it a second time will allow you to just see the cached results.
 - `make movie_dendrogram.pdf` will produce a couple of PDFs with hierarchical clustering of the movies based on the latent factors found by `--lrq`. It serves as an example on how to extract the latent factors from an `--invert_hash` file. You will need to zoom in in the large dendrogram to find the movie names.
author	Paul Mineiro <paul-github@mineiro.com>	2014-08-20 08:52:53 +0400
committer	Paul Mineiro <paul-github@mineiro.com>	2014-08-20 08:52:53 +0400
commit	ff045938d57c435c366613f012c01ccf93162fa1 (patch)
tree	d1b27dac9ba36c2f4671da8bb9f18197b0389972 /demo
parent	b19ec85b7c95758ef7174717f212c38b9bb0a263 (diff)