The implementation of Non-linear (Preconditioned) Conjugate Gradient, LBFGS, online learning, and adaptive online elarning works on clusters (both Hadoop and otherwise) now. To build the code, run make. At a high level, the code operates by repeatedly executing something equivalent to the MPI AllReduce function---adding up floats from all nodes then broadcasting them back to each individual node. In order to do this, a spanning tree over the nodes must be created. This is done using the helper daemon 'allreduce_master'. *********************************************************************** To run the code on Hadoop clusters: Connect to the master node for the Hadoop cluster: ./allreduce_master where is the number of mappers you are going to launch and is the number of spanning trees it should setup. Start the map-reduce job using Hadoop streaming: hadoop jar $HADOOP_HOME/hadoop-streaming.jar -Dmapred.job.map.memory.mb=2500 -input -output -file vw -file runvw.sh -mapper 'runvw.sh ' -reducer NONE where is the directory on HDFS where you want the trained model to be saved and is the hostname of the gateway where allreduc_master runs. The trained model is saved to the file /model on HDFS and can be retreived by hadoop -get. To modify the arguments to VW, edit the script runvw.sh. Arguments to hadoop can be directly added in the hadoop streaming command. See the 'mapscript.sh' which uses 'runvw.sh' for an advanced example of running VW in a Hadoop enviornmnent. *********************************************************************** To run the code on non-Hadoop clusters: Start the master job on one of the cluster nodes: ./allreduce_master where is the number of workers you are going to launch. Launch vw on each of the worker nodes: ./vw --cache_file temp.cache -b 24 --conjugate_gradient --passes 4 -q up -q ua -q us -q pa -q ps -q as --loss_function=logistic --regularization=1 --master_location -d -f model where is the hostname of the master node and is a different input file to each of the slaves. The trained model is saved as the file model on the worker nodes. ************************************************************************ The files you need to know about: runvw.sh: This is the mapper code. It takes as arguments: The output directory. The trained model from the first mapper is stored as the file "model" in the output directory. The hostname of the cluster gateway, so that the mappers can connect to the gateway All the other standard VW options are currently hardcoded in the script, feel free to mess around with them. ######################################################################### allreduce_master.cc: This is the master code which runs on the gateway. You start it before the call to hadoop. The master takes as input the number of mappers it should expect to hear from, so you need to know this ahead of time based on the number of input files you have. The master backgrounds itself after starting and listens for incoming connections. The role of the master is to set up the topology on the mappers and then let them communicate amongst themselves. ######################################################################### allreduce.h: This is the header file for the workers. It provides a struct socks, that contains a socket to the parent and sockets for up to two children. Also provides the following function: void all_reduce(char* buffer, int n, string master_location): Performs AllReduce on the entries of buffer that is n bytes, with the parent and children specified in socks object. It is implicitly assumed that the buffer contains floats so that addition is defined on chunks of 4 bytes. The header also defines a constant const int buf_size which is the buffer size (in bytes) used for communication. ######################################################################### allreduce.cc: This is the code for the workers. It implements the routines described above. all_reduce is implemented as a combination of reduce and broadcast routines. reduce reads data from children, adds it with local data and passes it up to the parent with a call to pass_up. broadcast receives data from parent, and passes it down to children with a call to pass_down. ######################################################################### cg.cc, gd.cc, lbfgs.cc: learning algorithms which use all_reduce whenever communication is needed. Uses routines accumulate and accumulate_scalar to reduce vectors and scalars resp.