diff options
author | Jeroen Vermeulen <jtv@precisiontranslationtools.com> | 2015-05-29 14:30:26 +0300 |
---|---|---|
committer | Jeroen Vermeulen <jtv@precisiontranslationtools.com> | 2015-05-29 14:30:26 +0300 |
commit | ef028446f3640e007215b4576a4dc52a9c9de6db (patch) | |
tree | 2ff9de308868631414617bf09b641b60be8bb151 /scripts/training/filter-rule-table.py | |
parent | 26170a41790bc1dfbc01c90dbcbf2699a0fe3cd0 (diff) |
Add license notices to scripts.
This is not pleasant to read (and much, much less pleasant to write!) but
sort of necessary in an open project. Right now it's quite hard to figure
out what is licensed how, which doesn't matter much to most people but can
suddenly become very important when people want to know what they're being
allowed to do.
I kept the notices as short as I could. As far as I could see, everything
without a clear license notice is LGPL v2.1 or later.
Diffstat (limited to 'scripts/training/filter-rule-table.py')
-rwxr-xr-x | scripts/training/filter-rule-table.py | 40 |
1 files changed, 22 insertions, 18 deletions
diff --git a/scripts/training/filter-rule-table.py b/scripts/training/filter-rule-table.py index 14736fe1f..d28fa0c89 100755 --- a/scripts/training/filter-rule-table.py +++ b/scripts/training/filter-rule-table.py @@ -1,25 +1,29 @@ #!/usr/bin/env python # Author: Phil Williams - -# Usage: filter-rule-table.py [--min-non-initial-rule-count=N] INPUT -# -# Given a rule table (on stdin) and an input text, filter out rules that -# couldn't be used in parsing the input and write the resulting rule table -# to stdout. The input text is assumed to contain the same factors as -# the rule table and is assumed to be small (not more than a few thousand -# sentences): the current algorithm won't scale well to large input sets. # -# The filtering algorithm considers a source RHS to be a sequence of -# words and gaps, which must match a sequence of words in one of the -# input sentences, with at least one input word per gap. The NT labels -# are ignored, so for example a rule with the source RHS "the JJ dog" -# would be allowed if the sequence "the slobbering dog" occurs in one of -# the input sentences, even if there's no rule to derive a JJ from -# "slobbering." (If "slobbering" were an unknown word, the 'unknown-lhs' -# decoder option would allow it to take a number of NT labels, likely -# including JJ, with varying probabilities, so removing the rule would -# be a bad idea.) +# This file is part of moses. Its use is licensed under the GNU Lesser General +# Public License version 2.1 or, at your option, any later version. + +"""Usage: filter-rule-table.py [--min-non-initial-rule-count=N] INPUT + +Given a rule table (on stdin) and an input text, filter out rules that +couldn't be used in parsing the input and write the resulting rule table +to stdout. The input text is assumed to contain the same factors as +the rule table and is assumed to be small (not more than a few thousand +sentences): the current algorithm won't scale well to large input sets. + +The filtering algorithm considers a source RHS to be a sequence of +words and gaps, which must match a sequence of words in one of the +input sentences, with at least one input word per gap. The NT labels +are ignored, so for example a rule with the source RHS "the JJ dog" +would be allowed if the sequence "the slobbering dog" occurs in one of +the input sentences, even if there's no rule to derive a JJ from +"slobbering." (If "slobbering" were an unknown word, the 'unknown-lhs' +decoder option would allow it to take a number of NT labels, likely +including JJ, with varying probabilities, so removing the rule would +be a bad idea.) +""" import optparse import sys |