diff options
author | alvations <alvations@gmail.com> | 2015-03-20 21:00:36 +0300 |
---|---|---|
committer | alvations <alvations@gmail.com> | 2015-03-20 21:00:36 +0300 |
commit | 8f2d687d27f560b8e09ecd5a19542dde0507d84e (patch) | |
tree | 80de673aa02d851524a93e27f01b9a8318e55108 | |
parent | 44cd32d058e0e9086561942e86e16d43e70c100f (diff) |
added more description of usage in docstring
-rw-r--r-- | scripts/other/gacha_filter.py | 17 |
1 files changed, 17 insertions, 0 deletions
diff --git a/scripts/other/gacha_filter.py b/scripts/other/gacha_filter.py index 6fabedade..4ebc501ac 100644 --- a/scripts/other/gacha_filter.py +++ b/scripts/other/gacha_filter.py @@ -22,6 +22,23 @@ where: - s2 = global variance, i.e. d ((l1 - l2)^2) / d (l1) (For details on Gale-Church, see http://www.aclweb.org/anthology/J93-1004.pdf) + +USAGE: + + $ python gacha_filter.py train.en train.de + +Outputs to STDOUT a separated lines of the source and target sentence pairs. +You can simply cut the file after that. + + $ python gacha_filter.py train.en train.de > train.en-de + $ cut -f1 train.en-de > train.clean.en + $ cut -f2 train.en-de > train.clean.de + +You can also allow lower threshold to yield more lines: + + $ python gacha_filter.py train.en train.de 0.05 + +Default threshold is set to 0.2. """ import io, subprocess |