Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/moses-smt/mosesdecoder.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'sigtest-filter/README.txt')
-rw-r--r--sigtest-filter/README.txt40
1 files changed, 40 insertions, 0 deletions
diff --git a/sigtest-filter/README.txt b/sigtest-filter/README.txt
new file mode 100644
index 000000000..6ea1f002e
--- /dev/null
+++ b/sigtest-filter/README.txt
@@ -0,0 +1,40 @@
+Re-implementation of Johnson et al. (2007)'s phrasetable filtering strategy.
+
+This implementation relies on Joy Zhang's SALM Suffix Array toolkit. It is
+available here:
+
+ http://projectile.is.cs.cmu.edu/research/public/tools/salm/salm.htm
+
+--Chris Dyer <redpony@umd.edu>
+
+BUILD INSTRUCTIONS
+---------------------------------
+
+1. Download and build SALM.
+
+2. make SALMDIR=/path/to/SALM
+
+
+USAGE INSTRUCTIONS
+---------------------------------
+
+1. Using the SALM/Bin/Linux/Index/IndexSA.O32, create a suffix array index
+ of the source and target sides of your training bitext.
+
+2. cat phrase-table.txt | ./filter-pt -e TARG.suffix -f SOURCE.suffix \
+ -l <FILTER-VALUE>
+
+ FILTER-VALUE is the -log prob threshold described in Johnson et al.
+ (2007)'s paper. It may be either 'a+e', 'a-e', or a positive real
+ value.
+
+3. Run with no options to see more use-cases.
+
+
+REFERENCES
+---------------------------------
+
+H. Johnson, J. Martin, G. Foster and R. Kuhn. (2007) Improving Translation
+ Quality by Discarding Most of the Phrasetable. In Proceedings of the 2007
+ Joint Conference on Empirical Methods in Natural Language Processing and
+ Computational Natural Language Learning (EMNLP-CoNLL), pp. 967-975.