Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/moses-smt/mosesdecoder.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authoralvations <alvations@gmail.com>2015-03-20 21:00:36 +0300
committeralvations <alvations@gmail.com>2015-03-20 21:00:36 +0300
commit8f2d687d27f560b8e09ecd5a19542dde0507d84e (patch)
tree80de673aa02d851524a93e27f01b9a8318e55108
parent44cd32d058e0e9086561942e86e16d43e70c100f (diff)
added more description of usage in docstring
-rw-r--r--scripts/other/gacha_filter.py17
1 files changed, 17 insertions, 0 deletions
diff --git a/scripts/other/gacha_filter.py b/scripts/other/gacha_filter.py
index 6fabedade..4ebc501ac 100644
--- a/scripts/other/gacha_filter.py
+++ b/scripts/other/gacha_filter.py
@@ -22,6 +22,23 @@ where:
- s2 = global variance, i.e. d ((l1 - l2)^2) / d (l1)
(For details on Gale-Church, see http://www.aclweb.org/anthology/J93-1004.pdf)
+
+USAGE:
+
+ $ python gacha_filter.py train.en train.de
+
+Outputs to STDOUT a separated lines of the source and target sentence pairs.
+You can simply cut the file after that.
+
+ $ python gacha_filter.py train.en train.de > train.en-de
+ $ cut -f1 train.en-de > train.clean.en
+ $ cut -f2 train.en-de > train.clean.de
+
+You can also allow lower threshold to yield more lines:
+
+ $ python gacha_filter.py train.en train.de 0.05
+
+Default threshold is set to 0.2.
"""
import io, subprocess