diff options
author | John Langford <jl@hunch.net> | 2014-05-17 18:01:49 +0400 |
---|---|---|
committer | John Langford <jl@hunch.net> | 2014-05-17 18:01:49 +0400 |
commit | 4c59f811ef18e0bd6e5f8fc68f06868dd700e9ad (patch) | |
tree | fca3d0d91c75e986abc32e0699f456502c618f15 /cluster | |
parent | ca6c8006f41cb92001d1c1a1102927f68ab30f34 (diff) |
add some direction concerning hadoop use
Diffstat (limited to 'cluster')
-rw-r--r-- | cluster/README_cluster | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/cluster/README_cluster b/cluster/README_cluster index 43a4a918..71914f06 100644 --- a/cluster/README_cluster +++ b/cluster/README_cluster @@ -33,6 +33,11 @@ where: To run the code on Hadoop clusters: +Decide if you are going to control the number of tasks by: +(a) using gzip compressed files which cannot be broken up by Hadoop +(b) controlling the number of reducers. +We'll assume (a) below. + Connect to the span server node for the Hadoop cluster: ./spanning_tree |