Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/moses-smt/mosesdecoder.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rwxr-xr-xcruise-control/test_all_new_commits.sh6
-rw-r--r--cruise-control/web/html_templates.php2
-rw-r--r--cruise-control/web/log_wrapper.php12
-rw-r--r--mingw/MosesGUI/Ui_addMTModel.py25
-rw-r--r--mingw/MosesGUI/Ui_chooseMTModel.py26
-rw-r--r--mingw/MosesGUI/Ui_credits.py25
-rw-r--r--mingw/MosesGUI/Ui_mainWindow.py33
-rw-r--r--mingw/MosesGUI/addMTModel.py21
-rw-r--r--mingw/MosesGUI/chooseMTModel.py14
-rw-r--r--mingw/MosesGUI/credits.py4
-rw-r--r--mingw/MosesGUI/datamodel.py6
-rw-r--r--mingw/MosesGUI/main.py3
-rw-r--r--phrase-extract/consolidate-main.cpp13
-rw-r--r--phrase-extract/extract-ghkm/ExtractGHKM.cpp15
-rw-r--r--phrase-extract/extract-main.cpp9
-rwxr-xr-xregression-testing/run-test-detokenizer.perl4
-rwxr-xr-xregression-testing/run-test-extract.perl12
-rwxr-xr-xregression-testing/run-test-misc.perl12
-rwxr-xr-xregression-testing/run-test-scorer.perl10
-rwxr-xr-xscripts/OSM/OSM-Train.perl30
-rwxr-xr-xscripts/OSM/extract-singletons.perl4
-rwxr-xr-xscripts/OSM/flipAlignment.perl14
-rwxr-xr-xscripts/Transliteration/clean.pl10
-rwxr-xr-xscripts/Transliteration/corpusCreator.pl14
-rwxr-xr-xscripts/Transliteration/in-decoding-transliteration.pl40
-rwxr-xr-xscripts/Transliteration/post-decoding-transliteration.pl64
-rwxr-xr-xscripts/Transliteration/prepare-transliteration-phrase-table.pl26
-rwxr-xr-xscripts/Transliteration/threshold.pl2
-rwxr-xr-xscripts/Transliteration/train-transliteration-module.pl52
-rwxr-xr-xscripts/analysis/bootstrap-hypothesis-difference-significance.pl178
-rwxr-xr-xscripts/analysis/nontranslated_words.pl2
-rwxr-xr-xscripts/analysis/oov.pl2
-rw-r--r--scripts/analysis/perllib/Error.pm8
-rwxr-xr-xscripts/analysis/sentence-by-sentence.pl24
-rwxr-xr-xscripts/analysis/sg2dot.perl20
-rwxr-xr-xscripts/analysis/show-phrases-used.pl6
-rw-r--r--scripts/analysis/smtgui/Corpus.pm74
-rwxr-xr-xscripts/analysis/smtgui/filter-phrase-table.pl2
-rwxr-xr-xscripts/analysis/smtgui/newsmtgui.cgi40
-rwxr-xr-xscripts/analysis/suspicious_tokenization.pl2
-rwxr-xr-xscripts/analysis/weight-scan.pl10
-rwxr-xr-xscripts/ems/experiment.perl310
-rwxr-xr-xscripts/ems/fix-info.perl6
-rwxr-xr-xscripts/ems/support/analysis.perl108
-rwxr-xr-xscripts/ems/support/berkeley-process.sh8
-rwxr-xr-xscripts/ems/support/berkeley-train.sh4
-rwxr-xr-xscripts/ems/support/build-domain-file-from-subcorpora.perl4
-rwxr-xr-xscripts/ems/support/build-sparse-features.perl6
-rwxr-xr-xscripts/ems/support/consolidate-training-data.perl2
-rwxr-xr-xscripts/ems/support/fast-align-in-parts.perl2
-rwxr-xr-xscripts/ems/support/generic-multicore-parallelizer.perl6
-rwxr-xr-xscripts/ems/support/generic-parallelizer.perl6
-rwxr-xr-xscripts/ems/support/input-from-sgm.perl6
-rwxr-xr-xscripts/ems/support/interpolate-lm.perl10
-rwxr-xr-xscripts/ems/support/lmplz-wrapper.perl2
-rwxr-xr-xscripts/ems/support/mml-filter.perl2
-rwxr-xr-xscripts/ems/support/mml-score.perl2
-rwxr-xr-xscripts/ems/support/mml-train.perl2
-rwxr-xr-xscripts/ems/support/prepare-fast-align.perl2
-rwxr-xr-xscripts/ems/support/reference-from-sgm.perl4
-rwxr-xr-xscripts/ems/support/remove-segmentation-markup.perl6
-rwxr-xr-xscripts/ems/support/report-experiment-scores.perl2
-rwxr-xr-xscripts/ems/support/run-command-on-multiple-refsets.perl4
-rwxr-xr-xscripts/ems/support/run-wade.perl2
-rwxr-xr-xscripts/ems/support/split-sentences.perl28
-rwxr-xr-xscripts/ems/support/submit-grid.perl2
-rwxr-xr-xscripts/ems/support/substitute-filtered-tables-and-weights.perl2
-rwxr-xr-xscripts/ems/support/substitute-filtered-tables.perl4
-rwxr-xr-xscripts/ems/support/substitute-weights.perl6
-rwxr-xr-xscripts/ems/support/symmetrize-fast-align.perl2
-rwxr-xr-xscripts/ems/support/thot-lm-wrapper.perl2
-rwxr-xr-xscripts/ems/support/tree-converter-wrapper.perl2
-rwxr-xr-xscripts/ems/support/wrap-xml.perl2
-rw-r--r--scripts/ems/web/analysis.php80
-rw-r--r--scripts/ems/web/analysis_diff.php62
-rw-r--r--scripts/ems/web/bilingual-concordance.css56
-rw-r--r--scripts/ems/web/comment.php2
-rw-r--r--scripts/ems/web/diff.php2
-rw-r--r--scripts/ems/web/hierarchical-segmentation.css8
-rw-r--r--scripts/ems/web/hierarchical-segmentation.js4
-rw-r--r--scripts/ems/web/index.php4
-rw-r--r--scripts/ems/web/javascripts/scriptaculous-js-1.8.3/src/unittest.js132
-rw-r--r--scripts/ems/web/javascripts/sound.js2
-rw-r--r--scripts/ems/web/javascripts/unittest.js132
-rw-r--r--scripts/ems/web/lib.php20
-rw-r--r--scripts/ems/web/overview.php24
-rwxr-xr-xscripts/ems/web/progress.perl2
-rw-r--r--scripts/ems/web/sgviz.js178
-rw-r--r--scripts/ems/web/sgviz.php4
-rwxr-xr-xscripts/fuzzy-match/create_xml.perl2
-rwxr-xr-xscripts/generic/compound-splitter.perl28
-rwxr-xr-xscripts/generic/extract-factors.pl2
-rwxr-xr-xscripts/generic/extract-parallel.perl14
-rwxr-xr-xscripts/generic/fsa2fsal.pl2
-rwxr-xr-xscripts/generic/fsa2plf.pl8
-rwxr-xr-xscripts/generic/fsal2fsa.pl2
-rwxr-xr-xscripts/generic/generic-parallel.perl10
-rwxr-xr-xscripts/generic/giza-parallel.perl6
-rwxr-xr-xscripts/generic/lopar2pos.pl2
-rwxr-xr-xscripts/generic/moses-parallel.pl54
-rwxr-xr-xscripts/generic/mteval-v12.pl246
-rwxr-xr-xscripts/generic/mteval-v13a.pl16
-rwxr-xr-xscripts/generic/multi-bleu.perl4
-rwxr-xr-xscripts/generic/ph_numbers.perl12
-rwxr-xr-xscripts/generic/qsub-wrapper.pl14
-rwxr-xr-xscripts/generic/reverse-alignment.perl4
-rwxr-xr-xscripts/generic/score-parallel.perl32
-rwxr-xr-xscripts/generic/strip-xml.perl2
-rwxr-xr-xscripts/generic/trainlm-irst2.perl8
-rwxr-xr-xscripts/other/blame-stat.sh2
-rwxr-xr-xscripts/other/convert-pt.perl2
-rwxr-xr-xscripts/other/delete-scores.perl10
-rwxr-xr-xscripts/other/get_many_translations_from_google.perl10
-rwxr-xr-xscripts/other/retain-lines.perl2
-rwxr-xr-xscripts/other/translate_by_microsoft_bing.perl2
-rwxr-xr-xscripts/recaser/detruecase.perl2
-rwxr-xr-xscripts/recaser/recase.perl2
-rwxr-xr-xscripts/recaser/train-recaser.perl8
-rwxr-xr-xscripts/recaser/train-truecaser.perl2
-rwxr-xr-xscripts/recaser/truecase.perl2
-rw-r--r--scripts/regression-testing/MosesScriptsRegressionTesting.pm4
-rwxr-xr-xscripts/regression-testing/compare-results.pl2
-rwxr-xr-xscripts/regression-testing/create_localized_moses_ini.pl2
-rwxr-xr-xscripts/regression-testing/modify-pars.pl2
-rwxr-xr-xscripts/regression-testing/moses-virtual.pl8
-rwxr-xr-xscripts/regression-testing/run-single-test.pl2
-rwxr-xr-xscripts/regression-testing/run-test-suite.pl2
-rwxr-xr-xscripts/tokenizer/deescape-special-chars-PTB.perl2
-rwxr-xr-xscripts/tokenizer/deescape-special-chars.perl2
-rwxr-xr-xscripts/tokenizer/detokenizer.perl4
-rwxr-xr-xscripts/tokenizer/escape-special-chars.perl4
-rwxr-xr-xscripts/tokenizer/lowercase.perl2
-rwxr-xr-xscripts/tokenizer/normalize-punctuation.perl2
-rwxr-xr-xscripts/tokenizer/pre-tokenizer.perl2
-rwxr-xr-xscripts/tokenizer/remove-non-printing-char.perl8
-rwxr-xr-xscripts/tokenizer/replace-unicode-punctuation.perl2
-rwxr-xr-xscripts/tokenizer/tokenizer.perl85
-rwxr-xr-xscripts/tokenizer/tokenizer_PTB.perl92
-rw-r--r--scripts/training/LexicalTranslationModel.pm8
-rwxr-xr-xscripts/training/absolutize_moses_model.pl6
-rwxr-xr-xscripts/training/analyse_moses_model.pl2
-rwxr-xr-xscripts/training/binarize-model.perl4
-rwxr-xr-xscripts/training/build-generation-table.perl10
-rwxr-xr-xscripts/training/build-mmsapt.perl2
-rwxr-xr-xscripts/training/clean-corpus-n.perl12
-rwxr-xr-xscripts/training/clone_moses_model.pl4
-rwxr-xr-xscripts/training/combine_factors.pl4
-rwxr-xr-xscripts/training/convert-moses-ini-to-v2.perl10
-rwxr-xr-xscripts/training/convert-moses-ini-v2-to-v1.perl6
-rwxr-xr-xscripts/training/corpus-sizes.perl4
-rwxr-xr-xscripts/training/exodus.perl12
-rwxr-xr-xscripts/training/filter-model-given-input.pl32
-rwxr-xr-xscripts/training/get-lexical.perl4
-rwxr-xr-xscripts/training/giza2bal.pl10
-rwxr-xr-xscripts/training/mert-moses.pl40
-rwxr-xr-xscripts/training/postprocess-lopar.perl6
-rwxr-xr-xscripts/training/reduce-factors.perl4
-rwxr-xr-xscripts/training/reduce-topt-count.pl4
-rwxr-xr-xscripts/training/reduce_combine.pl4
-rwxr-xr-xscripts/training/remove-orphan-phrase-pairs-from-reordering-table.perl2
-rwxr-xr-xscripts/training/threshold-filter.perl2
-rwxr-xr-xscripts/training/train-global-lexicon-model.perl2
-rwxr-xr-xscripts/training/train-model.perl350
-rwxr-xr-xscripts/training/wrappers/berkeleyparsed2mosesxml.perl8
-rwxr-xr-xscripts/training/wrappers/berkeleyparsed2mosesxml_PTB.perl6
-rwxr-xr-xscripts/training/wrappers/filter-excluded-lines.perl8
-rwxr-xr-xscripts/training/wrappers/find-unparseable.perl2
-rwxr-xr-xscripts/training/wrappers/mada-wrapper.perl4
-rwxr-xr-xscripts/training/wrappers/madamira-tok.perl6
-rwxr-xr-xscripts/training/wrappers/madamira-wrapper.perl12
-rwxr-xr-xscripts/training/wrappers/make-factor-brown-cluster-mkcls.perl2
-rwxr-xr-xscripts/training/wrappers/make-factor-de-pos.perl2
-rwxr-xr-xscripts/training/wrappers/make-factor-en-pos.mxpost.perl4
-rwxr-xr-xscripts/training/wrappers/make-factor-pos.tree-tagger.perl10
-rwxr-xr-xscripts/training/wrappers/make-factor-stem.perl2
-rwxr-xr-xscripts/training/wrappers/make-factor-suffix.perl4
-rwxr-xr-xscripts/training/wrappers/morfessor-wrapper.perl2
-rwxr-xr-xscripts/training/wrappers/mosesxml2berkeleyparsed.perl2
-rwxr-xr-xscripts/training/wrappers/parse-de-berkeley.perl6
-rwxr-xr-xscripts/training/wrappers/parse-de-bitpar.perl10
-rwxr-xr-xscripts/training/wrappers/parse-en-collins.perl2
-rwxr-xr-xscripts/training/wrappers/parse-en-egret.perl2
-rwxr-xr-xscripts/training/wrappers/syntax-hyphen-splitting.perl2
-rwxr-xr-xscripts/training/wrappers/tagger-german-chunk.perl12
-rw-r--r--vw/README.md20
185 files changed, 1789 insertions, 1766 deletions
diff --git a/cruise-control/test_all_new_commits.sh b/cruise-control/test_all_new_commits.sh
index 93ef30cf1..1e0a9c47f 100755
--- a/cruise-control/test_all_new_commits.sh
+++ b/cruise-control/test_all_new_commits.sh
@@ -115,7 +115,7 @@ function run_single_test () {
if [ -z "$err" ]; then
./bjam $MCC_CONFIGURE_ARGS >> $longlog 2>&1 || err="bjam"
fi
-
+
echo "## regression tests" >> $longlog
if [ -z "$err" ]; then
./bjam $MCC_CONFIGURE_ARGS --with-regtest=$regtest_dir >> $longlog 2>&1 || err="regression tests"
@@ -158,7 +158,7 @@ function run_single_test () {
status="FAIL:$err"
fi
echo "## Status: $status" >> $longlog
-
+
nicedate=$(date +"%Y%m%d-%H%M%S")
echo "$commit $status $configname $ccversion $nicedate" \
>> "$LOGDIR/brief.log"
@@ -180,7 +180,7 @@ done
# create info files for new commits
for i in $(git rev-list $MCC_SCAN_BRANCHES); do
first_char=$(echo $i | grep -o '^.')
- mkdir -p "$LOGDIR/logs/$configname/$first_char"
+ mkdir -p "$LOGDIR/logs/$configname/$first_char"
[ -f "$LOGDIR/logs/$configname/$first_char/$i.info" ] && break;
git show $i | $MYDIR/shorten_info.pl > "$LOGDIR/logs/$configname/$first_char/$i.info"
done
diff --git a/cruise-control/web/html_templates.php b/cruise-control/web/html_templates.php
index 83f6cc879..b7346914c 100644
--- a/cruise-control/web/html_templates.php
+++ b/cruise-control/web/html_templates.php
@@ -8,7 +8,7 @@ function show_header($title)
<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html;charset=utf-8\">
<title>$title</title>
</head><body>";
-}
+}
function show_heading($text, $size = 1)
{
diff --git a/cruise-control/web/log_wrapper.php b/cruise-control/web/log_wrapper.php
index af03db016..9fcf4db00 100644
--- a/cruise-control/web/log_wrapper.php
+++ b/cruise-control/web/log_wrapper.php
@@ -18,7 +18,7 @@ function get_all_branch_names()
}
class Branch
-{
+{
public function __construct($name)
{
$this->name = $name;
@@ -72,7 +72,7 @@ class Commit
while (($line = fgets($log_hdl)) !== false) {
if (preg_match('/tests passed/', $line)) {
$this->passed_percent = substr($line, 0, strpos('%', $line));
- }
+ }
else if (preg_match('/INVESTIGATE THESE FAILED TESTS/', $line)) {
$this->failed_tests = substr($line, 39);
}
@@ -113,7 +113,7 @@ class Commit
return file_exists(StaticData::logs_path . "/" . substr($this->name, 0, 1) . "/" . $this->name . ".OK");
}
}
-
+
public function get_status()
{
return $this->was_tested()
@@ -157,12 +157,12 @@ class Commit
public function get_log_file()
{
- return "show_commit.php?commit_id=$this->name&type=log";
+ return "show_commit.php?commit_id=$this->name&type=log";
}
public function get_info_file()
{
- return "show_commit.php?commit_id=$this->name&type=info";
+ return "show_commit.php?commit_id=$this->name&type=info";
}
private function open_log()
@@ -182,7 +182,7 @@ class Commit
private $message;
private $author;
private $timestamp;
-
+
}
?>
diff --git a/mingw/MosesGUI/Ui_addMTModel.py b/mingw/MosesGUI/Ui_addMTModel.py
index a312c6f23..a911be6d5 100644
--- a/mingw/MosesGUI/Ui_addMTModel.py
+++ b/mingw/MosesGUI/Ui_addMTModel.py
@@ -9,19 +9,17 @@
from PyQt4 import QtCore, QtGui
-try:
- _fromUtf8 = QtCore.QString.fromUtf8
-except AttributeError:
- def _fromUtf8(s):
- return s
-try:
- _encoding = QtGui.QApplication.UnicodeUTF8
- def _translate(context, text, disambig):
- return QtGui.QApplication.translate(context, text, disambig, _encoding)
-except AttributeError:
- def _translate(context, text, disambig):
- return QtGui.QApplication.translate(context, text, disambig)
+_fromUtf8 = getattr(QtCore.QString, 'fromUtf8', lambda s: s)
+
+
+def _translate(context, text, disambig):
+ return QtGui.QApplication.translate(
+ context, text, disambig,
+ getattr(
+ QtGui.QApplication, 'UnicodeUTF8',
+ QtCore.QCoreApplication.Encoding))
+
class Ui_Dialog(object):
def setupUi(self, Dialog):
@@ -89,7 +87,7 @@ class Ui_Dialog(object):
self.horizontalLayout_2.setStretch(1, 1)
self.verticalLayout.addWidget(self.groupBox_2)
self.buttonBox = QtGui.QDialogButtonBox(Dialog)
- self.buttonBox.setStandardButtons(QtGui.QDialogButtonBox.Cancel|QtGui.QDialogButtonBox.Ok)
+ self.buttonBox.setStandardButtons(QtGui.QDialogButtonBox.Cancel | QtGui.QDialogButtonBox.Ok)
self.buttonBox.setObjectName(_fromUtf8("buttonBox"))
self.verticalLayout.addWidget(self.buttonBox)
self.verticalLayout.setStretch(1, 2)
@@ -130,4 +128,3 @@ if __name__ == "__main__":
ui.setupUi(Dialog)
Dialog.show()
sys.exit(app.exec_())
-
diff --git a/mingw/MosesGUI/Ui_chooseMTModel.py b/mingw/MosesGUI/Ui_chooseMTModel.py
index 993cd2598..246e88308 100644
--- a/mingw/MosesGUI/Ui_chooseMTModel.py
+++ b/mingw/MosesGUI/Ui_chooseMTModel.py
@@ -9,19 +9,17 @@
from PyQt4 import QtCore, QtGui
-try:
- _fromUtf8 = QtCore.QString.fromUtf8
-except AttributeError:
- def _fromUtf8(s):
- return s
-try:
- _encoding = QtGui.QApplication.UnicodeUTF8
- def _translate(context, text, disambig):
- return QtGui.QApplication.translate(context, text, disambig, _encoding)
-except AttributeError:
- def _translate(context, text, disambig):
- return QtGui.QApplication.translate(context, text, disambig)
+_fromUtf8 = getattr(QtCore.QString, 'fromUtf8', lambda s: s)
+
+
+def _translate(context, text, disambig):
+ return QtGui.QApplication.translate(
+ context, text, disambig,
+ getattr(
+ QtGui.QApplication, 'UnicodeUTF8',
+ QtCore.QCoreApplication.Encoding))
+
class Ui_Dialog(object):
def setupUi(self, Dialog):
@@ -48,7 +46,8 @@ class Ui_Dialog(object):
self.verticalLayout.addWidget(self.groupBox)
self.buttonBox = QtGui.QDialogButtonBox(Dialog)
self.buttonBox.setOrientation(QtCore.Qt.Horizontal)
- self.buttonBox.setStandardButtons(QtGui.QDialogButtonBox.Cancel|QtGui.QDialogButtonBox.Ok)
+ self.buttonBox.setStandardButtons(
+ QtGui.QDialogButtonBox.Cancel | QtGui.QDialogButtonBox.Ok)
self.buttonBox.setObjectName(_fromUtf8("buttonBox"))
self.verticalLayout.addWidget(self.buttonBox)
@@ -70,4 +69,3 @@ if __name__ == "__main__":
ui.setupUi(Dialog)
Dialog.show()
sys.exit(app.exec_())
-
diff --git a/mingw/MosesGUI/Ui_credits.py b/mingw/MosesGUI/Ui_credits.py
index c2e9c5d81..37759221c 100644
--- a/mingw/MosesGUI/Ui_credits.py
+++ b/mingw/MosesGUI/Ui_credits.py
@@ -9,19 +9,17 @@
from PyQt4 import QtCore, QtGui
-try:
- _fromUtf8 = QtCore.QString.fromUtf8
-except AttributeError:
- def _fromUtf8(s):
- return s
-try:
- _encoding = QtGui.QApplication.UnicodeUTF8
- def _translate(context, text, disambig):
- return QtGui.QApplication.translate(context, text, disambig, _encoding)
-except AttributeError:
- def _translate(context, text, disambig):
- return QtGui.QApplication.translate(context, text, disambig)
+_fromUtf8 = getattr(QtCore.QString, 'fromUtf8', lambda s: s)
+
+
+def _translate(context, text, disambig):
+ return QtGui.QApplication.translate(
+ context, text, disambig,
+ getattr(
+ QtGui.QApplication, 'UnicodeUTF8',
+ QtCore.QCoreApplication.Encoding))
+
class Ui_Dialog(object):
def setupUi(self, Dialog):
@@ -29,7 +27,7 @@ class Ui_Dialog(object):
Dialog.resize(359, 271)
self.label = QtGui.QLabel(Dialog)
self.label.setGeometry(QtCore.QRect(10, 10, 341, 211))
- self.label.setAlignment(QtCore.Qt.AlignJustify|QtCore.Qt.AlignVCenter)
+ self.label.setAlignment(QtCore.Qt.AlignJustify | QtCore.Qt.AlignVCenter)
self.label.setWordWrap(True)
self.label.setObjectName(_fromUtf8("label"))
self.pushButton = QtGui.QPushButton(Dialog)
@@ -62,4 +60,3 @@ if __name__ == "__main__":
ui.setupUi(Dialog)
Dialog.show()
sys.exit(app.exec_())
-
diff --git a/mingw/MosesGUI/Ui_mainWindow.py b/mingw/MosesGUI/Ui_mainWindow.py
index b5d3fe006..3a7687fde 100644
--- a/mingw/MosesGUI/Ui_mainWindow.py
+++ b/mingw/MosesGUI/Ui_mainWindow.py
@@ -9,19 +9,17 @@
from PyQt4 import QtCore, QtGui
-try:
- _fromUtf8 = QtCore.QString.fromUtf8
-except AttributeError:
- def _fromUtf8(s):
- return s
-try:
- _encoding = QtGui.QApplication.UnicodeUTF8
- def _translate(context, text, disambig):
- return QtGui.QApplication.translate(context, text, disambig, _encoding)
-except AttributeError:
- def _translate(context, text, disambig):
- return QtGui.QApplication.translate(context, text, disambig)
+_fromUtf8 = getattr(QtCore.QString, 'fromUtf8', lambda s: s)
+
+
+def _translate(context, text, disambig):
+ return QtGui.QApplication.translate(
+ context, text, disambig,
+ getattr(
+ QtGui.QApplication, 'UnicodeUTF8',
+ QtCore.QCoreApplication.Encoding))
+
class Ui_MainWindow(object):
def setupUi(self, MainWindow):
@@ -83,7 +81,10 @@ class Ui_MainWindow(object):
self.editModelName.setObjectName(_fromUtf8("editModelName"))
self.gridLayout.addWidget(self.editModelName, 1, 2, 1, 1)
self.label_2 = QtGui.QLabel(self.groupBox)
- self.label_2.setAlignment(QtCore.Qt.AlignRight|QtCore.Qt.AlignTrailing|QtCore.Qt.AlignVCenter)
+ self.label_2.setAlignment(
+ QtCore.Qt.AlignRight |
+ QtCore.Qt.AlignTrailing |
+ QtCore.Qt.AlignVCenter)
self.label_2.setObjectName(_fromUtf8("label_2"))
self.gridLayout.addWidget(self.label_2, 1, 1, 1, 1)
self.verticalLayout_2.addWidget(self.groupBox)
@@ -151,7 +152,10 @@ class Ui_MainWindow(object):
self.verticalLayout_3.addWidget(self.tabWidget)
self.labelInfo = QtGui.QLabel(self.centralWidget)
self.labelInfo.setTextFormat(QtCore.Qt.AutoText)
- self.labelInfo.setAlignment(QtCore.Qt.AlignRight|QtCore.Qt.AlignTrailing|QtCore.Qt.AlignVCenter)
+ self.labelInfo.setAlignment(
+ QtCore.Qt.AlignRight |
+ QtCore.Qt.AlignTrailing |
+ QtCore.Qt.AlignVCenter)
self.labelInfo.setObjectName(_fromUtf8("labelInfo"))
self.verticalLayout_3.addWidget(self.labelInfo)
MainWindow.setCentralWidget(self.centralWidget)
@@ -185,4 +189,3 @@ if __name__ == "__main__":
ui.setupUi(MainWindow)
MainWindow.show()
sys.exit(app.exec_())
-
diff --git a/mingw/MosesGUI/addMTModel.py b/mingw/MosesGUI/addMTModel.py
index 3fd4b14de..8d55400d5 100644
--- a/mingw/MosesGUI/addMTModel.py
+++ b/mingw/MosesGUI/addMTModel.py
@@ -7,16 +7,18 @@ Module implementing Dialog.
from PyQt4.QtGui import *
from PyQt4.QtCore import *
-import os, datetime
+import datetime
+import os
from Ui_addMTModel import Ui_Dialog
from util import *
+
class AddMTModelDialog(QDialog, Ui_Dialog):
"""
Class documentation goes here.
"""
- def __init__(self, parent = None, workdir=None):
+ def __init__(self, parent=None, workdir=None):
"""
Constructor
"""
@@ -27,7 +29,7 @@ class AddMTModelDialog(QDialog, Ui_Dialog):
todir, timestr = self.findEmptyDirWithTime(self.workdir)
self.editPath.setText(todir)
self.editName.setText("SampleModel_" + timestr)
-
+
def findEmptyDirWithTime(self, workdir):
if not self.timestr:
self.timestr = datetime.datetime.now().strftime('%Y-%m-%d_%H%M%S')
@@ -37,7 +39,7 @@ class AddMTModelDialog(QDialog, Ui_Dialog):
break
self.timestr = datetime.datetime.now().strftime('%Y-%m-%d_%H%M%S')
return todir, self.timestr
-
+
@pyqtSignature("")
def on_btnLocal_clicked(self):
"""
@@ -49,7 +51,7 @@ class AddMTModelDialog(QDialog, Ui_Dialog):
dialog.setViewMode(QFileDialog.Detail)
if dialog.exec_():
self.editLocal.setText(dialog.selectedFiles()[0])
-
+
@pyqtSignature("")
def on_btnPath_clicked(self):
"""
@@ -63,21 +65,21 @@ class AddMTModelDialog(QDialog, Ui_Dialog):
root = str(dialog.selectedFiles()[0])
todir, _ = self.findEmptyDirWithTime(root)
self.editPath.setText(todir)
-
+
@pyqtSignature("bool")
def on_grpBoxInternet_toggled(self, p0):
"""
Slot documentation goes here.
"""
self.grpBoxLocal.setChecked(not p0)
-
+
@pyqtSignature("bool")
def on_grpBoxLocal_toggled(self, p0):
"""
Slot documentation goes here.
"""
self.grpBoxInternet.setChecked(not p0)
-
+
@pyqtSignature("")
def on_buttonBox_accepted(self):
"""
@@ -85,7 +87,7 @@ class AddMTModelDialog(QDialog, Ui_Dialog):
"""
def checkEmpty(mystr):
return len(str(mystr).strip()) <= 0
-
+
#check everything
self.modelName = self.editName.text()
if checkEmpty(self.modelName):
@@ -111,4 +113,3 @@ class AddMTModelDialog(QDialog, Ui_Dialog):
doAlert("Please provide non-empty Install Destination Folder")
return
self.accept()
-
diff --git a/mingw/MosesGUI/chooseMTModel.py b/mingw/MosesGUI/chooseMTModel.py
index 52a6c8eed..95c566f1e 100644
--- a/mingw/MosesGUI/chooseMTModel.py
+++ b/mingw/MosesGUI/chooseMTModel.py
@@ -10,11 +10,12 @@ from PyQt4.QtSql import *
from Ui_chooseMTModel import Ui_Dialog
+
class ChooseMTModelDialog(QDialog, Ui_Dialog):
"""
Class documentation goes here.
"""
- def __init__(self, parent = None, datamodel = None):
+ def __init__(self, parent=None, datamodel=None):
"""
Constructor
"""
@@ -29,16 +30,16 @@ class ChooseMTModelDialog(QDialog, Ui_Dialog):
self.selTableView.hideColumn(6)
#change status and keep the column
QObject.connect(datamodel, SIGNAL("modelInstalled()"), self.on_datamodel_modelInstalled)
-
+
def updateModel(self):
self.model.setQuery('SELECT ID, name, srclang, trglang, status, path, mosesini FROM models WHERE status = "READY" AND deleted != "True"', self.database)
-
+
def on_datamodel_recordUpdated(self, bRecord):
#deal with the selection changed problem
try:
if bRecord:
current = self.selTableView.currentIndex()
- if current and current.row() <> -1:
+ if current and current.row() != -1:
self.curSelection = current.row()
else:
self.curSelection = None
@@ -47,10 +48,10 @@ class ChooseMTModelDialog(QDialog, Ui_Dialog):
self.selTableView.selectRow(self.curSelection)
except Exception, e:
print >> sys.stderr, str(e)
-
+
def on_datamodel_modelInstalled(self):
self.updateModel()
-
+
@pyqtSignature("")
def on_buttonBox_accepted(self):
"""
@@ -68,4 +69,3 @@ class ChooseMTModelDialog(QDialog, Ui_Dialog):
self.path = record.value("path").toString()
self.mosesini = record.value("mosesini").toString()
self.accept()
-
diff --git a/mingw/MosesGUI/credits.py b/mingw/MosesGUI/credits.py
index 6b1b97a05..43aa4f381 100644
--- a/mingw/MosesGUI/credits.py
+++ b/mingw/MosesGUI/credits.py
@@ -9,18 +9,18 @@ from PyQt4.QtCore import pyqtSignature, QUrl
from Ui_credits import Ui_Dialog
+
class DlgCredits(QDialog, Ui_Dialog):
"""
Class documentation goes here.
"""
- def __init__(self, parent = None):
+ def __init__(self, parent=None):
"""
Constructor
"""
QDialog.__init__(self, parent)
self.setupUi(self)
-
@pyqtSignature("QString")
def on_label_linkActivated(self, link):
"""
diff --git a/mingw/MosesGUI/datamodel.py b/mingw/MosesGUI/datamodel.py
index ad802e439..16076043f 100644
--- a/mingw/MosesGUI/datamodel.py
+++ b/mingw/MosesGUI/datamodel.py
@@ -248,7 +248,7 @@ Click "Cancel" to do nothing.'''
msg = 'COPY %.0f%%' % (
download_size * 100.0 / total_size)
else:
- msg = 'COPY %d MB' % (download_size/1048576)
+ msg = 'COPY %d MB' % (download_size / 1048576)
if msg != lastMsg:
updateRecord({'status': msg})
lastMsg = msg
@@ -289,7 +289,7 @@ Click "Cancel" to do nothing.'''
msg = 'DOWNLOAD %.0f%%' % (
download_size * 100.0 / total_size)
else:
- msg = 'DOWNLOAD %d MB' % (download_size/1048576)
+ msg = 'DOWNLOAD %d MB' % (download_size / 1048576)
if msg != lastMsg:
updateRecord({'status': msg})
lastMsg = msg
@@ -359,7 +359,7 @@ Click "Cancel" to do nothing.'''
download_size * 100.0 / total_size)
else:
msg = 'UNZIP %d MB' % (
- download_size/1048576)
+ download_size / 1048576)
if msg != lastMsg:
updateRecord({'status': msg})
lastMsg = msg
diff --git a/mingw/MosesGUI/main.py b/mingw/MosesGUI/main.py
index e2a08d03a..805a7bc0c 100644
--- a/mingw/MosesGUI/main.py
+++ b/mingw/MosesGUI/main.py
@@ -3,7 +3,8 @@
from PyQt4.QtCore import *
from PyQt4.QtGui import *
-import sys, os
+import os
+import sys
from mainWindow import MainWindow
from datamodel import DataModel
diff --git a/phrase-extract/consolidate-main.cpp b/phrase-extract/consolidate-main.cpp
index 4ff0b5373..d52e8797b 100644
--- a/phrase-extract/consolidate-main.cpp
+++ b/phrase-extract/consolidate-main.cpp
@@ -72,7 +72,18 @@ int main(int argc, char* argv[])
<< "consolidating direct and indirect rule tables" << std::endl;
if (argc < 4) {
- std::cerr << "syntax: consolidate phrase-table.direct phrase-table.indirect phrase-table.consolidated [--Hierarchical] [--OnlyDirect] [--PhraseCount] [--GoodTuring counts-of-counts-file] [--KneserNey counts-of-counts-file] [--LowCountFeature] [--SourceLabels source-labels-file] [--PartsOfSpeech parts-of-speech-file] [--MinScore id:threshold[,id:threshold]*]" << std::endl;
+ std::cerr <<
+ "syntax: "
+ "consolidate phrase-table.direct "
+ "phrase-table.indirect "
+ "phrase-table.consolidated "
+ "[--Hierarchical] [--OnlyDirect] [--PhraseCount] "
+ "[--GoodTuring counts-of-counts-file] "
+ "[--KneserNey counts-of-counts-file] [--LowCountFeature] "
+ "[--SourceLabels source-labels-file] "
+ "[--PartsOfSpeech parts-of-speech-file] "
+ "[--MinScore id:threshold[,id:threshold]*]"
+ << std::endl;
exit(1);
}
const std::string fileNameDirect = argv[1];
diff --git a/phrase-extract/extract-ghkm/ExtractGHKM.cpp b/phrase-extract/extract-ghkm/ExtractGHKM.cpp
index 6468b7473..bc687ec6b 100644
--- a/phrase-extract/extract-ghkm/ExtractGHKM.cpp
+++ b/phrase-extract/extract-ghkm/ExtractGHKM.cpp
@@ -429,11 +429,22 @@ void ExtractGHKM::ProcessOptions(int argc, char *argv[],
usageBottom << "\nImplementation Notes:\n"
<< "\nThe parse tree is assumed to contain part-of-speech preterminal nodes.\n"
<< "\n"
- << "For the composed rule constraints: rule depth is the maximum distance from the\nrule's root node to a sink node, not counting preterminal expansions or word\nalignments. Rule size is the measure defined in DeNeefe et al (2007): the\nnumber of non-part-of-speech, non-leaf constituent labels in the target tree.\nNode count is the number of target tree nodes (excluding target words).\n"
+ << "For the composed rule constraints: rule depth is the "
+ "maximum distance from the\nrule's root node to a sink "
+ "node, not counting preterminal expansions or word\n"
+ "alignments. Rule size is the measure defined in DeNeefe "
+ "et al (2007): the\nnumber of non-part-of-speech, non-leaf "
+ "constituent labels in the target tree.\nNode count is the "
+ "number of target tree nodes (excluding target words).\n"
<< "\n"
<< "Scope pruning (Hopkins and Langmead, 2010) is applied to both minimal and\ncomposed rules.\n"
<< "\n"
- << "Unaligned source words are attached to the tree using the following heuristic:\nif there are aligned source words to both the left and the right of an unaligned\nsource word then it is attached to the lowest common ancestor of its nearest\nsuch left and right neighbours. Otherwise, it is attached to the root of the\nparse tree.\n"
+ << "Unaligned source words are attached to the tree using the "
+ "following heuristic:\nif there are aligned source words to "
+ "both the left and the right of an unaligned\nsource word "
+ "then it is attached to the lowest common ancestor of its "
+ "nearest\nsuch left and right neighbours. Otherwise, it is "
+ "attached to the root of the\nparse tree.\n"
<< "\n"
<< "Unless the --AllowUnary option is given, unary rules containing no lexical\nsource items are eliminated using the method described in Chung et al. (2011).\nThe parsing algorithm used in Moses is unable to handle such rules.\n"
<< "\n"
diff --git a/phrase-extract/extract-main.cpp b/phrase-extract/extract-main.cpp
index e386d1721..eb44b83d1 100644
--- a/phrase-extract/extract-main.cpp
+++ b/phrase-extract/extract-main.cpp
@@ -86,7 +86,14 @@ namespace MosesTraining
class ExtractTask
{
public:
- ExtractTask(size_t id, SentenceAlignment &sentence,PhraseExtractionOptions &initoptions, Moses::OutputFileStream &extractFile, Moses::OutputFileStream &extractFileInv,Moses::OutputFileStream &extractFileOrientation, Moses::OutputFileStream &extractFileContext, Moses::OutputFileStream &extractFileContextInv):
+ ExtractTask(
+ size_t id, SentenceAlignment &sentence,
+ PhraseExtractionOptions &initoptions,
+ Moses::OutputFileStream &extractFile,
+ Moses::OutputFileStream &extractFileInv,
+ Moses::OutputFileStream &extractFileOrientation,
+ Moses::OutputFileStream &extractFileContext,
+ Moses::OutputFileStream &extractFileContextInv):
m_sentence(sentence),
m_options(initoptions),
m_extractFile(extractFile),
diff --git a/regression-testing/run-test-detokenizer.perl b/regression-testing/run-test-detokenizer.perl
index e297b90be..9e5888ed0 100755
--- a/regression-testing/run-test-detokenizer.perl
+++ b/regression-testing/run-test-detokenizer.perl
@@ -166,12 +166,12 @@ sub runDetokenizerTest {
unless (mkdir($testOutputDir)) {
return fail($testCase->getName().": Failed to create output directory ".$testOutputDir." [".$!."]");
}
-
+
open TOK, ">".$tokenizedFile;
binmode TOK, ":utf8";
print TOK $testCase->getTokenizedText();
close TOK;
-
+
open TRUTH, ">".$expectedFile;
binmode TRUTH, ":utf8";
print TRUTH $testCase->getRightAnswer();
diff --git a/regression-testing/run-test-extract.perl b/regression-testing/run-test-extract.perl
index bc0dc0cf9..e03502434 100755
--- a/regression-testing/run-test-extract.perl
+++ b/regression-testing/run-test-extract.perl
@@ -1,10 +1,10 @@
-#!/usr/bin/perl -w
+#!/usr/bin/perl -w
use strict;
BEGIN {
-use Cwd qw/ abs_path /;
-use File::Basename;
+use Cwd qw/ abs_path /;
+use File::Basename;
my $script_dir = dirname(abs_path($0));
print STDERR "script_dir=$script_dir\n";
push @INC, $script_dir;
@@ -30,10 +30,10 @@ GetOptions("extractor=s" => \$extractorExe,
) or exit 1;
# output dir
-unless (defined $results_dir)
-{
+unless (defined $results_dir)
+{
my $ts = get_timestamp($extractorExe);
- $results_dir = "$data_dir/results/$test_name/$ts";
+ $results_dir = "$data_dir/results/$test_name/$ts";
}
`mkdir -p $results_dir`;
diff --git a/regression-testing/run-test-misc.perl b/regression-testing/run-test-misc.perl
index da79c94e8..9b65115c1 100755
--- a/regression-testing/run-test-misc.perl
+++ b/regression-testing/run-test-misc.perl
@@ -1,10 +1,10 @@
-#!/usr/bin/perl -w
+#!/usr/bin/perl -w
use strict;
BEGIN {
-use Cwd qw/ abs_path cwd /;
-use File::Basename;
+use Cwd qw/ abs_path cwd /;
+use File::Basename;
my $script_dir = dirname(abs_path($0));
print STDERR "script_dir=$script_dir\n";
push @INC, $script_dir;
@@ -27,10 +27,10 @@ GetOptions("moses-root=s" => \$mosesRoot,
) or exit 1;
# output dir
-unless (defined $results_dir)
-{
+unless (defined $results_dir)
+{
my $ts = get_timestamp($mosesRoot);
- $results_dir = "$data_dir/results/$test_name/$ts";
+ $results_dir = "$data_dir/results/$test_name/$ts";
}
`mkdir -p $results_dir`;
diff --git a/regression-testing/run-test-scorer.perl b/regression-testing/run-test-scorer.perl
index 6bd95ad55..9f4f15d9e 100755
--- a/regression-testing/run-test-scorer.perl
+++ b/regression-testing/run-test-scorer.perl
@@ -3,8 +3,8 @@
use strict;
BEGIN {
-use Cwd qw/ abs_path /;
-use File::Basename;
+use Cwd qw/ abs_path /;
+use File::Basename;
my $script_dir = dirname(abs_path($0));
print STDERR "script_dir=$script_dir\n";
push @INC, $script_dir;
@@ -30,10 +30,10 @@ GetOptions("scorer=s" => \$scoreExe,
) or exit 1;
# output dir
-unless (defined $results_dir)
-{
+unless (defined $results_dir)
+{
my $ts = get_timestamp($scoreExe);
- $results_dir = "$data_dir/results/$test_name/$ts";
+ $results_dir = "$data_dir/results/$test_name/$ts";
}
`mkdir -p $results_dir`;
diff --git a/scripts/OSM/OSM-Train.perl b/scripts/OSM/OSM-Train.perl
index e7d9b9057..895a821db 100755
--- a/scripts/OSM/OSM-Train.perl
+++ b/scripts/OSM/OSM-Train.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -35,19 +35,19 @@ if (!defined($LMPLZ)) {
}
# check if the files are in place
-die("ERROR: you need to define --corpus-e, --corpus-f, --alignment, and --moses-src-dir")
- unless (defined($MOSES_SRC_DIR) &&
- defined($CORPUS_F) &&
- defined($CORPUS_E) &&
- defined($ALIGNMENT)&&
+die("ERROR: you need to define --corpus-e, --corpus-f, --alignment, and --moses-src-dir")
+ unless (defined($MOSES_SRC_DIR) &&
+ defined($CORPUS_F) &&
+ defined($CORPUS_E) &&
+ defined($ALIGNMENT)&&
(defined($SRILM_DIR) || defined($LMPLZ)));
-die("ERROR: could not find input corpus file '$CORPUS_F'")
+die("ERROR: could not find input corpus file '$CORPUS_F'")
unless -e $CORPUS_F;
-die("ERROR: could not find output corpus file '$CORPUS_E'")
+die("ERROR: could not find output corpus file '$CORPUS_E'")
unless -e $CORPUS_E;
-die("ERROR: could not find algnment file '$ALIGNMENT'")
+die("ERROR: could not find algnment file '$ALIGNMENT'")
unless -e $ALIGNMENT;
-die("ERROR: could not find OSM scripts in '$MOSES_SRC_DIR/scripts/OSM")
+die("ERROR: could not find OSM scripts in '$MOSES_SRC_DIR/scripts/OSM")
unless -e "$MOSES_SRC_DIR/scripts/OSM/flipAlignment.perl";
# create factors
@@ -55,13 +55,13 @@ die("ERROR: could not find OSM scripts in '$MOSES_SRC_DIR/scripts/OSM")
`$MOSES_SRC_DIR/scripts/OSM/flipAlignment.perl $ALIGNMENT > $OUT_DIR/align`;
if (defined($FACTOR)) {
-
+
my @factor_values = split(/\+/, $FACTOR);
-
+
foreach my $factor_val (@factor_values) {
`mkdir $OUT_DIR/$factor_val`;
my ($factor_f,$factor_e) = split(/\-/,$factor_val);
-
+
$CORPUS_F =~ /^(.+)\.([^\.]+)/;
my ($corpus_stem_f,$ext_f) = ($1,$2);
$CORPUS_E =~ /^(.+)\.([^\.]+)/;
@@ -77,7 +77,7 @@ if (defined($FACTOR)) {
else {
`ln -s $CORPUS_F $OUT_DIR/f`;
`ln -s $CORPUS_E $OUT_DIR/e`;
- create_model("");
+ create_model("");
}
# create model
@@ -184,7 +184,7 @@ sub reduce_factors {
die "ERROR: Couldn't find factor $outfactor in token \"$_\" in $full LINE $nr" if !defined $out;
print OUT $out;
}
- }
+ }
print OUT "\n";
}
print STDERR "\n";
diff --git a/scripts/OSM/extract-singletons.perl b/scripts/OSM/extract-singletons.perl
index e5e127d1a..5a1665a8c 100755
--- a/scripts/OSM/extract-singletons.perl
+++ b/scripts/OSM/extract-singletons.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
#use strict;
use warnings;
@@ -32,7 +32,7 @@ while (<TARGET>) {
}
for( $i=0; $i<=$#A; $i+=2 ) {
- if ($target_links[$A[$i]] == 1 && $source_links[$A[$i+1]] == 1 &&
+ if ($target_links[$A[$i]] == 1 && $source_links[$A[$i+1]] == 1 &&
$T[$A[$i]] eq $S[$A[$i+1]])
{
$count{$S[$A[$i+1]]}++; # Print this if it only occurs here
diff --git a/scripts/OSM/flipAlignment.perl b/scripts/OSM/flipAlignment.perl
index 3559bf79b..b896c0a23 100755
--- a/scripts/OSM/flipAlignment.perl
+++ b/scripts/OSM/flipAlignment.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -13,16 +13,16 @@ use strict;
while (<MYFILE>) {
chomp;
#print "$_\n";
-
+
$sentence = "$_";
@words = split(/ /, $sentence);
-
- foreach (@words)
+
+ foreach (@words)
{
my ($factor_f,$factor_e) = split(/\-/,"$_");
print $factor_e . " " . $factor_f . " ";
- }
-
+ }
+
print "\n";
}
- close (MYFILE);
+ close (MYFILE);
diff --git a/scripts/Transliteration/clean.pl b/scripts/Transliteration/clean.pl
index c59bf0798..ccc364fc9 100755
--- a/scripts/Transliteration/clean.pl
+++ b/scripts/Transliteration/clean.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
#input hindi word urdu word, delete all those entries that have number on any side
use warnings;
@@ -57,7 +57,7 @@ else
$retur = deleteSymbol($_);
if($retur == 1)
{
- #print "$_\n";
+ #print "$_\n";
$retur = deleteEnglish($lang1, $lang2, $_);
if ($retur == 1)
{
@@ -92,7 +92,7 @@ sub deleteEnglish{
else {$backEng = 1; return $backEng;}
}
elsif($list[0] == 0 && $list[1] == 1)
- {
+ {
# print "Target is Non-Latin\n";
@F=split("\t");
if ($F[1] =~ m/[A-Za-z]/) {}
@@ -130,7 +130,7 @@ sub deleteSymbol{
elsif(/\,/) {}
elsif(/\</){}
elsif(/\>/){}
- else
+ else
{
@wrds = split(/\t/);
if($wrds[0] eq $wrds[1])
@@ -260,7 +260,7 @@ sub charFreqFilter{
$remove = 0;
########################## search if word contain any of the bad characters ####################################
-
+
foreach (@srcWrdArr)
{
# print "$srcWrd\n";
diff --git a/scripts/Transliteration/corpusCreator.pl b/scripts/Transliteration/corpusCreator.pl
index d2df8323c..4c62449df 100755
--- a/scripts/Transliteration/corpusCreator.pl
+++ b/scripts/Transliteration/corpusCreator.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -29,15 +29,15 @@ open FH, "<:encoding(UTF-8)", "$tPath/$tFile" or die "Can't open $tPath/$tFile:
open MYSFILE, ">:encoding(UTF-8)", "$tPath/training/corpus.$inp_ext" or die "Can't open $tPath/training/corpus.$inp_ext: $!\n";
open MYTFILE, ">:encoding(UTF-8)", "$tPath/training/corpus.$op_ext" or die "Can't open $tPath/training/corpus.$op_ext: $!\n";
-while (<FH>)
+while (<FH>)
{
- chomp;
+ chomp;
my ($src,$tgt) = split(/\t/);
-
- $s = join(' ', split('',$src));
- $t = join(' ', split('',$tgt));
+
+ $s = join(' ', split('',$src));
+ $t = join(' ', split('',$tgt));
print MYSFILE "$s\n";
- print MYTFILE "$t\n";
+ print MYTFILE "$t\n";
push(@source, $s);
push(@target, $t);
}
diff --git a/scripts/Transliteration/in-decoding-transliteration.pl b/scripts/Transliteration/in-decoding-transliteration.pl
index 216d99a3e..c3cc31f26 100755
--- a/scripts/Transliteration/in-decoding-transliteration.pl
+++ b/scripts/Transliteration/in-decoding-transliteration.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -31,8 +31,8 @@ die("ERROR: you need to define --moses-src-dir --external-bin-dir, --translitera
unless (defined($MOSES_SRC_DIR) &&
defined($TRANSLIT_MODEL) &&
defined($OOV_FILE) &&
- defined($INPUT_EXTENSION)&&
- defined($OUTPUT_EXTENSION)&&
+ defined($INPUT_EXTENSION)&&
+ defined($OUTPUT_EXTENSION)&&
defined($EXTERNAL_BIN_DIR));
die("ERROR: could not find Transliteration Model '$TRANSLIT_MODEL'")
@@ -68,7 +68,7 @@ sub prepare_for_transliteration
open MYFILE, "<:encoding(UTF-8)", $testFile or die "Can't open $testFile: $!\n";
- while (<MYFILE>)
+ while (<MYFILE>)
{
chomp;
#print "$_\n";
@@ -76,12 +76,12 @@ sub prepare_for_transliteration
foreach (@words)
{
-
+
@tW = split /\Q$___FACTOR_DELIMITER/;
if (defined $tW[0])
{
-
+
if (! ($tW[0] =~ /[0-9.,]/))
{
$UNK{$tW[0]} = 1;
@@ -90,7 +90,7 @@ sub prepare_for_transliteration
{
print "Not transliterating $tW[0] \n";
}
- }
+ }
}
}
close (MYFILE);
@@ -100,7 +100,7 @@ sub prepare_for_transliteration
foreach my $key ( keys %UNK )
{
$src=join(' ', split('',$key));
- print MYFILE "$src\n";
+ print MYFILE "$src\n";
}
close (MYFILE);
}
@@ -116,7 +116,7 @@ sub run_transliteration
my $eval_file = $list[3];
`touch $TRANSLIT_MODEL/evaluation/$eval_file.moses.table.ini`;
-
+
print "Filter Table\n";
`$MOSES_SRC/scripts/training/train-model.perl -mgiza -mgiza-cpus 10 -dont-zip -first-step 9 -external-bin-dir $EXTERNAL_BIN_DIR -f $INPUT_EXTENSION -e $OUTPUT_EXTENSION -alignment grow-diag-final-and -parts 5 -score-options '--KneserNey' -phrase-translation-table $TRANSLIT_MODEL/model/phrase-table -config $TRANSLIT_MODEL/evaluation/$eval_file.moses.table.ini -lm 0:3:$TRANSLIT_MODEL/evaluation/$eval_file.moses.table.ini:8`;
@@ -152,16 +152,16 @@ sub form_corpus
my $antLog = exp(0.2);
my $phraseTable = $list[2];
-
+
open MYFILE, "<:encoding(UTF-8)", $inp_file or die "Can't open $inp_file: $!\n";
open PT, ">:encoding(UTF-8)", $phraseTable or die "Can't open $phraseTable: $!\n";
- while (<MYFILE>)
+ while (<MYFILE>)
{
chomp;
#print "$_\n";
@words = split(/ /, "$_");
-
+
$thisStr = "";
foreach (@words)
{
@@ -169,14 +169,14 @@ sub form_corpus
}
push(@UNK, $thisStr);
- $vocab{$thisStr} = 1;
+ $vocab{$thisStr} = 1;
}
close (MYFILE);
open MYFILE, "<:encoding(UTF-8)", $testFile or die "Can't open $testFile: $!\n";
my $inpCount = 0;
- while (<MYFILE>)
+ while (<MYFILE>)
{
chomp;
#print "$_\n";
@@ -186,8 +186,8 @@ sub form_corpus
if ($prev != $sNum){
$inpCount++;
- }
-
+ }
+
my $i = 2;
$thisStr = "";
$features = "";
@@ -199,7 +199,7 @@ sub form_corpus
}
$i++;
-
+
while ($words[$i] ne "|||")
{
if ($words[$i] =~ /Penalty0/ || $words[$i] eq "Distortion0=" || $words[$i] eq "LM0=" ){
@@ -214,7 +214,7 @@ sub form_corpus
$i++;
#$features = $features . " " . $words[$i];
-
+
if ($thisStr ne ""){
print PT "$UNK[$inpCount] ||| $thisStr ||| $features ||| 0-0 ||| 0 0 0\n";
}
@@ -223,9 +223,9 @@ sub form_corpus
close (MYFILE);
close (PT);
-
+
`gzip $phraseTable`;
-
+
}
diff --git a/scripts/Transliteration/post-decoding-transliteration.pl b/scripts/Transliteration/post-decoding-transliteration.pl
index 201f40d97..60c3200f6 100755
--- a/scripts/Transliteration/post-decoding-transliteration.pl
+++ b/scripts/Transliteration/post-decoding-transliteration.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -34,10 +34,10 @@ die("ERROR: you need to define --moses-src-dir --external-bin-dir, --translitera
unless (defined($MOSES_SRC_DIR) &&
defined($TRANSLIT_MODEL) &&
defined($OOV_FILE) &&
- defined($INPUT_EXTENSION)&&
- defined($OUTPUT_EXTENSION)&&
+ defined($INPUT_EXTENSION)&&
+ defined($OUTPUT_EXTENSION)&&
defined($INPUT_FILE)&&
- defined($EXTERNAL_BIN_DIR)&&
+ defined($EXTERNAL_BIN_DIR)&&
defined($LM_FILE));
if (! -e $LM_FILE) {
my $LM_FILE_WORD = `ls $LM_FILE*word*`;
@@ -83,7 +83,7 @@ sub prepare_for_transliteration
open MYFILE, "<:encoding(UTF-8)", $testFile or die "Can't open $testFile: $!\n";
- while (<MYFILE>)
+ while (<MYFILE>)
{
chomp;
#print "$_\n";
@@ -91,12 +91,12 @@ sub prepare_for_transliteration
foreach (@words)
{
-
+
@tW = split /\Q$___FACTOR_DELIMITER/;
if (defined $tW[0])
{
-
+
if (! ($tW[0] =~ /[0-9.,]/))
{
$UNK{$tW[0]} = 1;
@@ -105,7 +105,7 @@ sub prepare_for_transliteration
{
print "Not transliterating $tW[0] \n";
}
- }
+ }
}
}
close (MYFILE);
@@ -115,7 +115,7 @@ sub prepare_for_transliteration
foreach my $key ( keys %UNK )
{
$src=join(' ', split('',$key));
- print MYFILE "$src\n";
+ print MYFILE "$src\n";
}
close (MYFILE);
}
@@ -131,7 +131,7 @@ sub run_transliteration
my $eval_file = $list[3];
`touch $TRANSLIT_MODEL/evaluation/$eval_file.moses.table.ini`;
-
+
print "Filter Table\n";
`$MOSES_SRC/scripts/training/train-model.perl -mgiza -mgiza-cpus 10 -dont-zip -first-step 9 -external-bin-dir $EXTERNAL_BIN_DIR -f $INPUT_EXTENSION -e $OUTPUT_EXTENSION -alignment grow-diag-final-and -parts 5 -score-options '--KneserNey' -phrase-translation-table $TRANSLIT_MODEL/model/phrase-table -config $TRANSLIT_MODEL/evaluation/$eval_file.moses.table.ini -lm 0:3:$TRANSLIT_MODEL/evaluation/$eval_file.moses.table.ini:8`;
@@ -171,16 +171,16 @@ sub form_corpus
my $antLog = exp(0.2);
my $phraseTable = $EVAL_DIR . "/Transliteration-Module/$OUTPUT_FILE_NAME/model/phrase-table";
-
+
open MYFILE, "<:encoding(UTF-8)", $inp_file or die "Can't open $inp_file: $!\n";
open PT, ">:encoding(UTF-8)", $phraseTable or die "Can't open $phraseTable: $!\n";
- while (<MYFILE>)
+ while (<MYFILE>)
{
chomp;
#print "$_\n";
@words = split(/ /, "$_");
-
+
$thisStr = "";
foreach (@words)
{
@@ -188,14 +188,14 @@ sub form_corpus
}
push(@UNK, $thisStr);
- $vocab{$thisStr} = 1;
+ $vocab{$thisStr} = 1;
}
close (MYFILE);
open MYFILE, "<:encoding(UTF-8)", $testFile or die "Can't open $testFile: $!\n";
my $inpCount = 0;
- while (<MYFILE>)
+ while (<MYFILE>)
{
chomp;
#print "$_\n";
@@ -205,8 +205,8 @@ sub form_corpus
if ($prev != $sNum){
$inpCount++;
- }
-
+ }
+
my $i = 2;
$thisStr = "";
$features = "";
@@ -218,7 +218,7 @@ sub form_corpus
}
$i++;
-
+
while ($words[$i] ne "|||")
{
if ($words[$i] =~ /Penalty0/ || $words[$i] eq "Distortion0=" || $words[$i] eq "LM0=" ){
@@ -233,39 +233,39 @@ sub form_corpus
$i++;
#$features = $features . " " . $words[$i];
-
+
if ($thisStr ne ""){
print PT "$UNK[$inpCount] ||| $thisStr ||| $features ||| 0-0 ||| 0 0 0\n";
}
$prev = $sNum;
}
close (MYFILE);
-
+
open MYFILE, "<:encoding(UTF-8)", $INPUT_FILE or die "Can't open $INPUT_FILE: $!\n";
-
+
my %dd;
- while (<MYFILE>)
+ while (<MYFILE>)
{
chomp;
@words = split(/ /, "$_");
-
+
foreach (@words)
{
if (! exists $vocab{$_} && ! exists $dd{$_}){
-
+
print PT "$_ ||| $_ ||| 1.0 1.0 1.0 1.0 ||| 0-0 ||| 0 0 0\n";
- $dd{$_} = 1;
+ $dd{$_} = 1;
}
}
- }
-
+ }
+
close (PT);
close (MYFILE);
-
+
`gzip $phraseTable`;
-
+
}
@@ -288,7 +288,7 @@ sub run_decoder
$find = ".output.";
}
$final_file =~ s/$find/$replace/g;
-
+
`mkdir $corpus_dir/evaluation`;
`$MOSES_SRC/scripts/training/train-model.perl -mgiza -mgiza-cpus 10 -dont-zip -first-step 9 -external-bin-dir $EXTERNAL_BIN_DIR -f $INPUT_EXTENSION -e $OUTPUT_EXTENSION -alignment grow-diag-final-and -parts 5 -lmodel-oov-feature "yes" -post-decoding-translit "yes" -phrase-translation-table $corpus_dir/model/phrase-table -config $corpus_dir/model/moses.ini -lm 0:5:$LM_FILE:8`;
@@ -302,10 +302,10 @@ sub run_decoder
`rm $corpus_dir/evaluation/$OUTPUT_FILE_NAME.moses.table.ini`;
`$MOSES_SRC/scripts/ems/support/substitute-filtered-tables.perl $corpus_dir/evaluation/filtered/moses.ini < $corpus_dir/model/moses.ini > $corpus_dir/evaluation/moses.filtered.ini`;
-
+
my $drop_stderr = $VERBOSE ? "" : " 2>/dev/null";
`$DECODER -search-algorithm 1 -cube-pruning-pop-limit 5000 -s 5000 -threads 16 -feature-overwrite 'TranslationModel0 table-limit=100' -max-trans-opt-per-coverage 100 -f $corpus_dir/evaluation/moses.filtered.ini -distortion-limit 0 < $INPUT_FILE > $OUTPUT_FILE $drop_stderr`;
-
+
print "$DECODER -search-algorithm 1 -cube-pruning-pop-limit 5000 -s 5000 -threads 16 -feature-overwrite 'TranslationModel0 table-limit=100' -max-trans-opt-per-coverage 100 -f $corpus_dir/evaluation/moses.filtered.ini -distortion-limit 0 < $INPUT_FILE > $OUTPUT_FILE $drop_stderr\n";
}
diff --git a/scripts/Transliteration/prepare-transliteration-phrase-table.pl b/scripts/Transliteration/prepare-transliteration-phrase-table.pl
index 4fc03b526..df3b1ceca 100755
--- a/scripts/Transliteration/prepare-transliteration-phrase-table.pl
+++ b/scripts/Transliteration/prepare-transliteration-phrase-table.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -29,7 +29,7 @@ die("ERROR: you need to define --moses-src-dir --external-bin-dir, --translitera
unless (defined($MOSES_SRC_DIR) &&
defined($TRANSLIT_MODEL) &&
defined($OOV_FILE) &&
- defined($INPUT_EXTENSION)&&
+ defined($INPUT_EXTENSION)&&
defined($OUTPUT_EXTENSION));
die("ERROR: could not find Transliteration Model '$TRANSLIT_MODEL'")
@@ -63,7 +63,7 @@ sub prepare_for_transliteration
my $src;
open MYFILE, "<:encoding(UTF-8)", $testFile or die "Can't open $testFile: $!\n";
- while (<MYFILE>)
+ while (<MYFILE>)
{
chomp;
#print "$_\n";
@@ -81,7 +81,7 @@ sub prepare_for_transliteration
foreach my $key ( keys %UNK )
{
$src=join(' ', split('',$key));
- print MYFILE "$src\n";
+ print MYFILE "$src\n";
}
close (MYFILE);
}
@@ -97,11 +97,11 @@ sub run_transliteration
my $eval_file = $list[3];
`touch $eval_file.moses.table.ini`;
-
+
print STDERR "Filter Table\n";
`$MOSES_SRC/scripts/training/train-model.perl -mgiza -mgiza-cpus 10 -dont-zip -first-step 9 -external-bin-dir $EXTERNAL_BIN_DIR -f $INPUT_EXTENSION -e $OUTPUT_EXTENSION -alignment grow-diag-final-and -parts 5 -reordering msd-bidirectional-fe -score-options '--KneserNey' -phrase-translation-table $TRANSLIT_MODEL/model/phrase-table -reordering-table $TRANSLIT_MODEL/model/reordering-table -config $eval_file.moses.table.ini -lm 0:3:$eval_file.moses.table.ini:8`;
-
+
`$MOSES_SRC/scripts/training/filter-model-given-input.pl $eval_file.filtered $eval_file.moses.table.ini $eval_file -Binarizer "$MOSES_SRC/bin/CreateOnDiskPt 1 1 4 100 2"`;
`rm $eval_file.moses.table.ini`;
@@ -131,18 +131,18 @@ sub form_corpus
my $UNK_FILE_NAME = basename($OOV_FILE);
my $target = $EVAL_DIR . "/$UNK_FILE_NAME/training/corpus.$OUTPUT_EXTENSION";
my $outFile = "$EVAL_DIR/out.txt";
-
+
open MYFILE, "<:encoding(UTF-8)", $testFile or die "Can't open $testFile: $!\n";
open OUTFILE, ">:encoding(UTF-8)", $outFile or die "Can't open $outFile: $!\n";
- while (<MYFILE>)
+ while (<MYFILE>)
{
chomp;
#print "$_\n";
@words = split(/ /, "$_");
-
-
+
+
my $i = 2;
my $prob;
@@ -158,12 +158,12 @@ sub form_corpus
while ($words[$i] ne "|||")
{
- $i++;
+ $i++;
}
-
+
$i++;
$prob = $words[$i];
-
+
print OUTFILE "$thisStr\t$prob\n";
}
close (MYFILE);
diff --git a/scripts/Transliteration/threshold.pl b/scripts/Transliteration/threshold.pl
index 8e3704fd6..bf6657742 100755
--- a/scripts/Transliteration/threshold.pl
+++ b/scripts/Transliteration/threshold.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use utf8;
diff --git a/scripts/Transliteration/train-transliteration-module.pl b/scripts/Transliteration/train-transliteration-module.pl
index 05804afb6..35e4ee396 100755
--- a/scripts/Transliteration/train-transliteration-module.pl
+++ b/scripts/Transliteration/train-transliteration-module.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use utf8;
@@ -42,9 +42,9 @@ die("ERROR: you need to define --corpus-e, --corpus-f, --alignment, --srilm-dir,
defined($CORPUS_F) &&
defined($CORPUS_E) &&
defined($ALIGNMENT)&&
- defined($INPUT_EXTENSION)&&
- defined($OUTPUT_EXTENSION)&&
- defined($EXTERNAL_BIN_DIR)&&
+ defined($INPUT_EXTENSION)&&
+ defined($OUTPUT_EXTENSION)&&
+ defined($EXTERNAL_BIN_DIR)&&
defined($SRILM_DIR));
die("ERROR: could not find input corpus file '$CORPUS_F'")
unless -e $CORPUS_F;
@@ -70,13 +70,13 @@ if (defined($TARGET_SYNTAX)) {
# create factors
if (defined($FACTOR)) {
-
+
my @factor_values = split(',', $FACTOR);
-
+
foreach my $factor_val (@factor_values) {
my ($factor_f,$factor_e) = split(/\-/,$factor_val);
-
+
$stripped_corpus_f =~ /^(.+)\.([^\.]+)/;
my ($corpus_stem_f,$ext_f) = ($1,$2);
$stripped_corpus_e =~ /^(.+)\.([^\.]+)/;
@@ -86,19 +86,19 @@ if (defined($FACTOR)) {
`ln -s $corpus_stem_f.$factor_val.$ext_f $OUT_DIR/f`;
`ln -s $corpus_stem_e.$factor_val.$ext_e $OUT_DIR/e`;
- `ln -s $ALIGNMENT $OUT_DIR/a`;
+ `ln -s $ALIGNMENT $OUT_DIR/a`;
+
-
}
}
else {
`ln -s $stripped_corpus_f $OUT_DIR/f`;
`ln -s $stripped_corpus_e $OUT_DIR/e`;
- `ln -s $ALIGNMENT $OUT_DIR/a`;
+ `ln -s $ALIGNMENT $OUT_DIR/a`;
}
- mine_transliterations($INPUT_EXTENSION, $OUTPUT_EXTENSION);
+ mine_transliterations($INPUT_EXTENSION, $OUTPUT_EXTENSION);
train_transliteration_module();
retrain_transliteration_module();
@@ -114,7 +114,7 @@ sub learn_transliteration_model{
`cp $OUT_DIR/training/corpus$t.$OUTPUT_EXTENSION $OUT_DIR/lm/target`;
print "Align Corpus\n";
-
+
`$MOSES_SRC_DIR/scripts/training/train-model.perl -mgiza -mgiza-cpus 10 -dont-zip -last-step 1 -external-bin-dir $EXTERNAL_BIN_DIR -f $INPUT_EXTENSION -e $OUTPUT_EXTENSION -alignment grow-diag-final-and -parts 5 -score-options '--KneserNey' -corpus $OUT_DIR/training/corpus$t -corpus-dir $OUT_DIR/training/prepared`;
`$MOSES_SRC_DIR/scripts/training/train-model.perl -mgiza -mgiza-cpus 10 -dont-zip -first-step 2 -last-step 2 -external-bin-dir $EXTERNAL_BIN_DIR -f $INPUT_EXTENSION -e $OUTPUT_EXTENSION -alignment grow-diag-final-and -parts 5 -score-options '--KneserNey' -corpus-dir $OUT_DIR/training/prepared -giza-e2f $OUT_DIR/training/giza -direction 2`;
@@ -124,23 +124,23 @@ sub learn_transliteration_model{
`$MOSES_SRC_DIR/scripts/training/train-model.perl -mgiza -mgiza-cpus 10 -dont-zip -first-step 3 -last-step 3 -external-bin-dir $EXTERNAL_BIN_DIR -f $INPUT_EXTENSION -e $OUTPUT_EXTENSION -alignment grow-diag-final-and -parts 5 -score-options '--KneserNey' -giza-e2f $OUT_DIR/training/giza -giza-f2e $OUT_DIR/training/giza-inverse -alignment-file $OUT_DIR/model/aligned -alignment-stem $OUT_DIR/model/aligned -alignment grow-diag-final-and`;
print "Train Translation Models\n";
-
+
`$MOSES_SRC_DIR/scripts/training/train-model.perl -mgiza -mgiza-cpus 10 -dont-zip -first-step 4 -last-step 4 -external-bin-dir $EXTERNAL_BIN_DIR -f $INPUT_EXTENSION -e $OUTPUT_EXTENSION -alignment grow-diag-final-and -parts 5 -score-options '--KneserNey' -lexical-file $OUT_DIR/model/lex -alignment-file $OUT_DIR/model/aligned -alignment-stem $OUT_DIR/model/aligned -corpus $OUT_DIR/training/corpus$t`;
`$MOSES_SRC_DIR/scripts/training/train-model.perl -mgiza -mgiza-cpus 10 -dont-zip -first-step 5 -last-step 5 -external-bin-dir $EXTERNAL_BIN_DIR -f $INPUT_EXTENSION -e $OUTPUT_EXTENSION -alignment grow-diag-final-and -parts 5 -score-options '--KneserNey' -alignment-file $OUT_DIR/model/aligned -alignment-stem $OUT_DIR/model/aligned -extract-file $OUT_DIR/model/extract -corpus $OUT_DIR/training/corpus$t`;
`$MOSES_SRC_DIR/scripts/training/train-model.perl -mgiza -mgiza-cpus 10 -dont-zip -first-step 6 -last-step 6 -external-bin-dir $EXTERNAL_BIN_DIR -f $INPUT_EXTENSION -e $OUTPUT_EXTENSION -alignment grow-diag-final-and -parts 5 -score-options '--KneserNey' -extract-file $OUT_DIR/model/extract -lexical-file $OUT_DIR/model/lex -phrase-translation-table $OUT_DIR/model/phrase-table`;
-
+
print "Train Language Models\n";
`$SRILM_DIR/ngram-count -order 5 -interpolate -kndiscount -addsmooth1 0.0 -unk -text $OUT_DIR/lm/target -lm $OUT_DIR/lm/targetLM`;
`$MOSES_SRC_DIR/bin/build_binary $OUT_DIR/lm/targetLM $OUT_DIR/lm/targetLM.bin`;
- print "Create Config File\n";
+ print "Create Config File\n";
`$MOSES_SRC_DIR/scripts/training/train-model.perl -mgiza -mgiza-cpus 10 -dont-zip -first-step 9 -external-bin-dir $EXTERNAL_BIN_DIR -f $INPUT_EXTENSION -e $OUTPUT_EXTENSION -alignment grow-diag-final-and -parts 5 -score-options '--KneserNey' -phrase-translation-table $OUT_DIR/model/phrase-table -config $OUT_DIR/model/moses.ini -lm 0:5:$OUT_DIR/lm/targetLM.bin:8`;
-
+
}
sub retrain_transliteration_module{
@@ -149,12 +149,12 @@ sub retrain_transliteration_module{
{
`rm -r $OUT_DIR/model`;
`rm -r $OUT_DIR/lm`;
- `rm -r $OUT_DIR/training/giza`;
- `rm -r $OUT_DIR/training/giza-inverse`;
- `rm -r $OUT_DIR/training/prepared`;
+ `rm -r $OUT_DIR/training/giza`;
+ `rm -r $OUT_DIR/training/giza-inverse`;
+ `rm -r $OUT_DIR/training/prepared`;
`mkdir $OUT_DIR/model`;
`mkdir $OUT_DIR/lm`;
-
+
learn_transliteration_model("");
}
}
@@ -163,19 +163,19 @@ sub train_transliteration_module{
`mkdir $OUT_DIR/model`;
`mkdir $OUT_DIR/lm`;
- print "Preparing Corpus\n";
+ print "Preparing Corpus\n";
`$MOSES_SRC_DIR/scripts/Transliteration/corpusCreator.pl $OUT_DIR 1-1.$INPUT_EXTENSION-$OUTPUT_EXTENSION.mined-pairs $INPUT_EXTENSION $OUTPUT_EXTENSION`;
if (-e "$OUT_DIR/training/corpusA.$OUTPUT_EXTENSION")
- {
+ {
learn_transliteration_model("A");
}
else
{
learn_transliteration_model("");
}
-
- print "Running Tuning for Transliteration Module\n";
+
+ print "Running Tuning for Transliteration Module\n";
`touch $OUT_DIR/tuning/moses.table.ini`;
@@ -215,7 +215,7 @@ print "Cleaning the list for Miner\n";
`$MOSES_SRC_DIR/scripts/Transliteration/clean.pl $OUT_DIR/1-1.$inp_ext-$op_ext > $OUT_DIR/1-1.$inp_ext-$op_ext.cleaned`;
- if (-e "$OUT_DIR/1-1.$inp_ext-$op_ext.pair-probs")
+ if (-e "$OUT_DIR/1-1.$inp_ext-$op_ext.pair-probs")
{
print STDERR "1-1.$inp_ext-$op_ext.pair-probs in place, reusing\n";
}
@@ -296,7 +296,7 @@ sub reduce_factors {
die "ERROR: Couldn't find factor $outfactor in token \"$_\" in $full LINE $nr" if !defined $out;
print OUT $out;
}
- }
+ }
print OUT "\n";
}
print STDERR "\n";
diff --git a/scripts/analysis/bootstrap-hypothesis-difference-significance.pl b/scripts/analysis/bootstrap-hypothesis-difference-significance.pl
index 149676b6f..8e6a6255a 100755
--- a/scripts/analysis/bootstrap-hypothesis-difference-significance.pl
+++ b/scripts/analysis/bootstrap-hypothesis-difference-significance.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use utf8;
###############################################
@@ -8,7 +8,7 @@ use utf8;
# Usage: ./compare-hypotheses-with-significance.pl hypothesis_1 hypothesis_2 reference_1 [ reference_2 ... ]
#
# Author: Mark Fishel, fishel@ut.ee
-#
+#
# 22.10.2008: altered algorithm according to (Riezler and Maxwell 2005 @ MTSE'05), now computes p-value
#
# 23.01.2010: added NIST p-value and interval computation
@@ -30,7 +30,7 @@ if (@ARGV < 3) {
unless ($ARGV[0] =~ /^(--help|-help|-h|-\?|\/\?|--usage|-usage)$/) {
die("\nERROR: not enough arguments");
}
-
+
exit 1;
}
@@ -60,7 +60,7 @@ bootstrap_report("NIST", \&getNist);
sub bootstrap_report {
my $title = shift;
my $proc = shift;
-
+
my ($subSampleScoreDiffArr, $subSampleScore1Arr, $subSampleScore2Arr) = bootstrap_pass($proc);
my $realScore1 = &$proc($data->{refs}, $data->{hyp1});
@@ -86,7 +86,7 @@ sub bootstrap_report {
#####
sub bootstrap_pass {
my $scoreFunc = shift;
-
+
my @subSampleDiffArr;
my @subSample1Arr;
my @subSample2Arr;
@@ -94,14 +94,14 @@ sub bootstrap_pass {
#applying sampling
for my $idx (1..$TIMES_TO_REPEAT_SUBSAMPLING) {
my $subSampleIndices = drawWithReplacement($data->{size}, ($SUBSAMPLE_SIZE? $SUBSAMPLE_SIZE: $data->{size}));
-
+
my $score1 = &$scoreFunc($data->{refs}, $data->{hyp1}, $subSampleIndices);
my $score2 = &$scoreFunc($data->{refs}, $data->{hyp2}, $subSampleIndices);
-
+
push @subSampleDiffArr, abs($score2 - $score1);
push @subSample1Arr, $score1;
push @subSample2Arr, $score2;
-
+
if ($idx % 10 == 0) {
print STDERR ".";
}
@@ -109,11 +109,11 @@ sub bootstrap_pass {
print STDERR "$idx\n";
}
}
-
+
if ($TIMES_TO_REPEAT_SUBSAMPLING % 100 != 0) {
print STDERR ".$TIMES_TO_REPEAT_SUBSAMPLING\n";
}
-
+
return (\@subSampleDiffArr, \@subSample1Arr, \@subSample2Arr);
}
@@ -124,9 +124,9 @@ sub bootstrap_pvalue {
my $subSampleDiffArr = shift;
my $realScore1 = shift;
my $realScore2 = shift;
-
+
my $realDiff = abs($realScore2 - $realScore1);
-
+
#get subsample difference mean
my $averageSubSampleDiff = 0;
@@ -155,16 +155,16 @@ sub bootstrap_pvalue {
#####
sub bootstrap_interval {
my $subSampleArr = shift;
-
+
my @sorted = sort @$subSampleArr;
-
+
my $lowerIdx = int($TIMES_TO_REPEAT_SUBSAMPLING / 40);
my $higherIdx = $TIMES_TO_REPEAT_SUBSAMPLING - $lowerIdx - 1;
-
+
my $lower = $sorted[$lowerIdx];
my $higher = $sorted[$higherIdx];
my $diff = $higher - $lower;
-
+
return ($lower + 0.5 * $diff, 0.5 * $diff);
}
@@ -173,7 +173,7 @@ sub bootstrap_interval {
#####
sub readAllData {
my ($hypFile1, $hypFile2, @refFiles) = @_;
-
+
my %result;
#reading hypotheses and checking for matching sizes
@@ -193,16 +193,16 @@ sub readAllData {
for my $refFile (@refFiles) {
$i++;
my $refDataX = readData($refFile);
-
+
unless (scalar @$refDataX == $result{size}) {
die ("ERROR: ref set $i size doesn't match the size of hyp sets");
}
-
+
updateCounts($result{ngramCounts}, $refDataX);
-
+
push @{$result{refs}}, $refDataX;
}
-
+
return \%result;
}
@@ -211,17 +211,17 @@ sub readAllData {
#####
sub updateCounts {
my ($countHash, $refData) = @_;
-
+
for my $snt(@$refData) {
my $size = scalar @{$snt->{words}};
$countHash->{""} += $size;
-
+
for my $order(1..$MAX_NGRAMS) {
my $ngram;
-
+
for my $i (0..($size-$order)) {
$ngram = join(" ", @{$snt->{words}}[$i..($i + $order - 1)]);
-
+
$countHash->{$ngram}++;
}
}
@@ -233,11 +233,11 @@ sub updateCounts {
#####
sub ngramInfo {
my ($data, $ngram) = @_;
-
+
my @nwords = split(/ /, $ngram);
pop @nwords;
my $smallGram = join(" ", @nwords);
-
+
return log($data->{ngramCounts}->{$smallGram} / $data->{ngramCounts}->{$ngram}) / log(2.0);
}
@@ -247,16 +247,16 @@ sub ngramInfo {
sub readData {
my $file = shift;
my @result;
-
+
open (FILE, $file) or die ("Failed to open `$file' for reading");
binmode (FILE, ":$IO_ENCODING");
-
+
while (<FILE>) {
push @result, { words => [split(/\s+/, $_)] };
}
-
+
close (FILE);
-
+
return \@result;
}
@@ -266,7 +266,7 @@ sub readData {
sub preEvalHypo {
my $data = shift;
my $hypId = shift;
-
+
for my $lineIdx (0..($data->{size} - 1)) {
preEvalHypoSnt($data, $hypId, $lineIdx);
}
@@ -277,50 +277,50 @@ sub preEvalHypo {
#####
sub preEvalHypoSnt {
my ($data, $hypId, $lineIdx) = @_;
-
+
my ($correctNgramCounts, $totalNgramCounts);
my ($refNgramCounts, $hypNgramCounts);
my ($coocNgramInfoSum, $totalNgramAmt);
-
+
my $hypSnt = $data->{$hypId}->[$lineIdx];
-
+
#update total hyp len
$hypSnt->{hyplen} = scalar @{$hypSnt->{words}};
-
+
#update total ref len with closest current ref len
$hypSnt->{reflen} = getClosestLength($data->{refs}, $lineIdx, $hypSnt->{hyplen});
$hypSnt->{avgreflen} = getAvgLength($data->{refs}, $lineIdx);
-
+
$hypSnt->{correctNgrams} = [];
$hypSnt->{totalNgrams} = [];
-
+
#update ngram precision for each n-gram order
for my $order (1..$MAX_NGRAMS) {
#hyp ngrams
$hypNgramCounts = groupNgrams($hypSnt, $order);
-
+
#ref ngrams
$refNgramCounts = groupNgramsMultiSrc($data->{refs}, $lineIdx, $order);
-
+
$correctNgramCounts = 0;
$totalNgramCounts = 0;
$coocNgramInfoSum = 0;
$totalNgramAmt = 0;
my $coocUpd;
-
+
#correct, total
for my $ngram (keys %$hypNgramCounts) {
$coocUpd = min($hypNgramCounts->{$ngram}, $refNgramCounts->{$ngram});
$correctNgramCounts += $coocUpd;
$totalNgramCounts += $hypNgramCounts->{$ngram};
-
+
if ($coocUpd > 0) {
$coocNgramInfoSum += ngramInfo($data, $ngram);
}
-
+
$totalNgramAmt++;
}
-
+
$hypSnt->{correctNgrams}->[$order] = $correctNgramCounts;
$hypSnt->{totalNgrams}->[$order] = $totalNgramCounts;
$hypSnt->{ngramNistInfoSum}->[$order] = $coocNgramInfoSum;
@@ -333,13 +333,13 @@ sub preEvalHypoSnt {
#####
sub drawWithReplacement {
my ($setSize, $subSize) = @_;
-
+
my @result;
-
+
for (1..$subSize) {
push @result, int(rand($setSize));
}
-
+
return \@result;
}
@@ -348,48 +348,48 @@ sub drawWithReplacement {
#####
sub getNist {
my ($refs, $hyp, $idxs) = @_;
-
+
#default value for $idxs
unless (defined($idxs)) {
$idxs = [0..((scalar @$hyp) - 1)];
}
-
+
#vars
my ($hypothesisLength, $referenceLength) = (0, 0);
my (@infosum, @totalamt);
-
+
#gather info from each line
for my $lineIdx (@$idxs) {
-
+
my $hypSnt = $hyp->[$lineIdx];
-
+
#update total hyp len
$hypothesisLength += $hypSnt->{hyplen};
-
+
#update total ref len with closest current ref len
$referenceLength += $hypSnt->{avgreflen};
-
+
#update ngram precision for each n-gram order
for my $order (1..$MAX_NGRAMS) {
$infosum[$order] += $hypSnt->{ngramNistInfoSum}->[$order];
$totalamt[$order] += $hypSnt->{ngramNistCount}->[$order];
}
}
-
+
my $toplog = log($hypothesisLength / $referenceLength);
my $btmlog = log(2.0/3.0);
-
+
#compose nist score
my $brevityPenalty = ($hypothesisLength > $referenceLength)? 1.0: exp(log(0.5) * $toplog * $toplog / ($btmlog * $btmlog));
-
+
my $sum = 0;
-
+
for my $order (1..$MAX_NGRAMS) {
$sum += $infosum[$order]/$totalamt[$order];
}
-
+
my $result = $sum * $brevityPenalty;
-
+
return $result;
}
@@ -400,43 +400,43 @@ sub getNist {
#####
sub getBleu {
my ($refs, $hyp, $idxs) = @_;
-
+
#default value for $idxs
unless (defined($idxs)) {
$idxs = [0..((scalar @$hyp) - 1)];
}
-
+
#vars
my ($hypothesisLength, $referenceLength) = (0, 0);
my (@correctNgramCounts, @totalNgramCounts);
my ($refNgramCounts, $hypNgramCounts);
-
+
#gather info from each line
for my $lineIdx (@$idxs) {
my $hypSnt = $hyp->[$lineIdx];
-
+
#update total hyp len
$hypothesisLength += $hypSnt->{hyplen};
-
+
#update total ref len with closest current ref len
$referenceLength += $hypSnt->{reflen};
-
+
#update ngram precision for each n-gram order
for my $order (1..$MAX_NGRAMS) {
$correctNgramCounts[$order] += $hypSnt->{correctNgrams}->[$order];
$totalNgramCounts[$order] += $hypSnt->{totalNgrams}->[$order];
}
}
-
+
#compose bleu score
my $brevityPenalty = ($hypothesisLength < $referenceLength)? exp(1 - $referenceLength/$hypothesisLength): 1;
-
+
my $logsum = 0;
-
+
for my $order (1..$MAX_NGRAMS) {
$logsum += safeLog($correctNgramCounts[$order] / $totalNgramCounts[$order]);
}
-
+
return $brevityPenalty * exp($logsum / $MAX_NGRAMS);
}
@@ -445,15 +445,15 @@ sub getBleu {
#####
sub getAvgLength {
my ($refs, $lineIdx) = @_;
-
+
my $result = 0;
my $count = 0;
-
+
for my $ref (@$refs) {
$result += scalar @{$ref->[$lineIdx]->{words}};
$count++;
}
-
+
return $result / $count;
}
@@ -462,22 +462,22 @@ sub getAvgLength {
#####
sub getClosestLength {
my ($refs, $lineIdx, $hypothesisLength) = @_;
-
+
my $bestDiff = infty();
my $bestLen = infty();
-
+
my ($currLen, $currDiff);
-
+
for my $ref (@$refs) {
$currLen = scalar @{$ref->[$lineIdx]->{words}};
$currDiff = abs($currLen - $hypothesisLength);
-
+
if ($currDiff < $bestDiff or ($currDiff == $bestDiff and $currLen < $bestLen)) {
$bestDiff = $currDiff;
$bestLen = $currLen;
}
}
-
+
return $bestLen;
}
@@ -487,16 +487,16 @@ sub getClosestLength {
sub groupNgrams {
my ($snt, $order) = @_;
my %result;
-
+
my $size = scalar @{$snt->{words}};
my $ngram;
-
+
for my $i (0..($size-$order)) {
$ngram = join(" ", @{$snt->{words}}[$i..($i + $order - 1)]);
-
+
$result{$ngram}++;
}
-
+
return \%result;
}
@@ -506,15 +506,15 @@ sub groupNgrams {
sub groupNgramsMultiSrc {
my ($refs, $lineIdx, $order) = @_;
my %result;
-
+
for my $ref (@$refs) {
my $currNgramCounts = groupNgrams($ref->[$lineIdx], $order);
-
+
for my $currNgram (keys %$currNgramCounts) {
$result{$currNgram} = max($result{$currNgram}, $currNgramCounts->{$currNgram});
}
}
-
+
return \%result;
}
@@ -523,7 +523,7 @@ sub groupNgramsMultiSrc {
#####
sub safeLog {
my $x = shift;
-
+
return ($x > 0)? log($x): -infty();
}
@@ -539,7 +539,7 @@ sub infty {
#####
sub min {
my ($a, $b) = @_;
-
+
return ($a < $b)? $a: $b;
}
@@ -548,12 +548,12 @@ sub min {
#####
sub max {
my ($a, $b) = @_;
-
+
return ($a > $b)? $a: $b;
}
sub poww {
my ($a, $b) = @_;
-
+
return exp($b * log($a));
}
diff --git a/scripts/analysis/nontranslated_words.pl b/scripts/analysis/nontranslated_words.pl
index b5639429b..51a4f9d20 100755
--- a/scripts/analysis/nontranslated_words.pl
+++ b/scripts/analysis/nontranslated_words.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
# Reads a source and hypothesis file and counts equal tokens. Some of these
diff --git a/scripts/analysis/oov.pl b/scripts/analysis/oov.pl
index c5d6f92e3..052c9994d 100755
--- a/scripts/analysis/oov.pl
+++ b/scripts/analysis/oov.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# Display OOV rate of a test set against a training corpus or a phrase table.
# Ondrej Bojar
diff --git a/scripts/analysis/perllib/Error.pm b/scripts/analysis/perllib/Error.pm
index cc9edbb69..e62044eff 100644
--- a/scripts/analysis/perllib/Error.pm
+++ b/scripts/analysis/perllib/Error.pm
@@ -15,7 +15,7 @@ use strict;
use vars qw($VERSION);
use 5.004;
-$VERSION = "0.15";
+$VERSION = "0.15";
use overload (
'""' => 'stringify',
@@ -146,7 +146,7 @@ sub throw {
# if we are not rethrow-ing then create the object to throw
$self = $self->new(@_) unless ref($self);
-
+
die $Error::THROWN = $self;
}
@@ -429,7 +429,7 @@ sub except (&;$) {
my $code = shift;
my $clauses = shift || {};
my $catch = $clauses->{'catch'} ||= [];
-
+
my $sub = sub {
my $ref;
my(@array) = $code->($_[0]);
@@ -481,7 +481,7 @@ Error - Error/exception handling in an OO-ish way
record Error::Simple("A simple error")
and return;
}
-
+
unlink($file) or throw Error::Simple("$file: $!",$!);
try {
diff --git a/scripts/analysis/sentence-by-sentence.pl b/scripts/analysis/sentence-by-sentence.pl
index 4f6560a56..72b70dc72 100755
--- a/scripts/analysis/sentence-by-sentence.pl
+++ b/scripts/analysis/sentence-by-sentence.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
#sentence-by-sentence: take in a system output, with any number of factors, and a reference translation, also maybe with factors, and show each sentence and its errors
@@ -69,7 +69,7 @@ while(readLines(\@SYSOUTS, \@sLines) && readLines(\@TRUTHS, \@eLines))
my $sourceLine = <SOURCE>;
escapeMetachars($sourceLine); #remove inconsistencies in encoding
$sourceFactors = extractFactorArrays($sourceLine);
- push @html, "<tr><td class=\"sent_title\">Source</td><td class=\"source_sentence\" id=\"source$numSentences\">"
+ push @html, "<tr><td class=\"sent_title\">Source</td><td class=\"source_sentence\" id=\"source$numSentences\">"
. getFactoredSentenceHTML($sourceFactors) . "</td></tr>\n";
}
#process truth
@@ -77,7 +77,7 @@ while(readLines(\@SYSOUTS, \@sLines) && readLines(\@TRUTHS, \@eLines))
{
escapeMetachars($eLines[$j]); #remove inconsistencies in encoding
push @eFactors, extractFactorArrays($eLines[$j]);
- push @html, "<tr><td class=\"sent_title\">Ref $j</td><td class=\"truth_sentence\" id=\"truth${numSentences}_$j\">"
+ push @html, "<tr><td class=\"sent_title\">Ref $j</td><td class=\"truth_sentence\" id=\"truth${numSentences}_$j\">"
. getFactoredSentenceHTML($eFactors[$j]) . "</td></tr>\n";
}
#process sysouts
@@ -89,12 +89,12 @@ while(readLines(\@SYSOUTS, \@sLines) && readLines(\@TRUTHS, \@eLines))
push @bleuData, getBLEUSentenceDetails($sFactors[$j], \@eFactors, 0);
push @{$bleuScores[$j]}, [$numSentences, $bleuData[$j]->[0], 0]; #the last number will be the rank
my $pwerData = getPWERSentenceDetails($sFactors[$j], \@eFactors, 0);
- push @html, "<tr><td class=\"sent_title\">Output $j</td><td class=\"sysout_sentence\" id=\"sysout$numSentences\">"
+ push @html, "<tr><td class=\"sent_title\">Output $j</td><td class=\"sysout_sentence\" id=\"sysout$numSentences\">"
. getFactoredSentenceHTML($sFactors[$j], $pwerData) . "</td></tr>\n";
- push @html, "<tr><td class=\"sent_title\">N-grams</td><td class=\"sysout_ngrams\" id=\"ngrams$numSentences\">"
+ push @html, "<tr><td class=\"sent_title\">N-grams</td><td class=\"sysout_ngrams\" id=\"ngrams$numSentences\">"
. getAllNgramsHTML($sFactors[$j], $bleuData[$j]->[1], scalar(@truthfiles)) . "</td></tr>\n";
}
- splice(@html, 1, 0, "<div class=\"bleu_report\"><b>Sentence $numSentences)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;BLEU:</b> "
+ splice(@html, 1, 0, "<div class=\"bleu_report\"><b>Sentence $numSentences)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;BLEU:</b> "
. join("; ", map {sprintf("%.4lg", $_->[0]->[0]) . " (" . join('/', map {sprintf("%.4lg", $_)} @{$_->[0]}[1 .. 4]) . ") "} @bleuData) . "</div><table>\n");
push @html, "</table></div>\n";
push @htmlSentences, join('', @html);
@@ -142,7 +142,7 @@ function selectSysout(index)
var spans = cell.getElementsByTagName('span');
cell.childNodes[0].nodeValue = spans[index].firstChild.nodeValue; //something like '0.1 - 0.3'
}
-
+
//update the background colors of the sentence divs
var allSentences = document.getElementById('all_sentences');
var sentences = allSentences.childNodes;
@@ -207,7 +207,7 @@ for(my $i = 0; $i < scalar(@sysoutfiles); $i++)
print "<table border=0><tr><td><div id=\"legendBLEU\" class=\"legend\"><span class=\"legend_title\">Sentence Background Colors => BLEU Ranges</span><table border=0>";
for(my $k = 0; $k < scalar(@htmlColors); $k++)
{
- print "<tr><td style=\"width: 15px; height: 15px; background: " . $htmlColors[$k] . "\"></td><td align=left style=\"padding-left: 12px\">"
+ print "<tr><td style=\"width: 15px; height: 15px; background: " . $htmlColors[$k] . "\"></td><td align=left style=\"padding-left: 12px\">"
. sprintf("%.4lg", $minBLEU[0]->[$k]) . " - " . sprintf("%.4lg", $maxBLEU[0]->[$k]);
for(my $j = 0; $j < scalar(@sysoutfiles); $j++)
{
@@ -223,7 +223,7 @@ for(my $k = 1; $k <= scalar(@truthfiles); $k++)
}
print "</table></div></td></tr></table><div style=\"font-weight: bold; margin-bottom: 15px\">
PWER errors are marked in red on output sentence displays.</div>
-<div style=\"margin-bottom: 8px\">Color by system # "
+<div style=\"margin-bottom: 8px\">Color by system # "
. join(' | ', map {"<a href=\"javascript:selectSysout($_);\">$_</a>" . (($_ == '0') ? " (default)" : "")} (0 .. scalar(@sysoutfiles) - 1)) . "</div>
<div style=\"margin-bottom: 8px\">Sort by <a href=\"javascript:sortByBLEU();\">BLEU score</a> | <a href=\"javascript:sortByCorpusOrder();\">corpus order</a> (default)</div>\n";
@@ -472,7 +472,7 @@ sub getSentenceBGColorHTML
#display all matching n-grams in the given sentence, with all 1-grams on one line, then arranged by picking, for each, the first line on which it fits,
# where a given word position can only be filled by one n-gram per line, so that all n-grams can be shown
-#arguments: sentence (arrayref of arrayrefs of factor strings), arrayref of arrayrefs of matching n-gram [start, length, arrayref of matching reference indices],
+#arguments: sentence (arrayref of arrayrefs of factor strings), arrayref of arrayrefs of matching n-gram [start, length, arrayref of matching reference indices],
# number of reference translations
#return: HTML string
sub getAllNgramsHTML
@@ -507,9 +507,9 @@ sub getAllNgramsHTML
}
$n++;
}
-
+
my $html = "<table class=\"ngram_table\"><tr><td align=center>" . join("</td><td align=center>", map {$_->[$factorIndex]} @$sentence) . "</td></tr>";
-
+
my $numWords = scalar(@$sentence);
my ($curRow, $curCol) = (0, 0); #address in table
$html .= "<tr>";
diff --git a/scripts/analysis/sg2dot.perl b/scripts/analysis/sg2dot.perl
index b17dfd9fb..e9c1639ed 100755
--- a/scripts/analysis/sg2dot.perl
+++ b/scripts/analysis/sg2dot.perl
@@ -1,5 +1,5 @@
-#!/usr/bin/env perl
-#
+#!/usr/bin/env perl
+#
# Author : Loic BARRAULT
# Script to convert MOSES searchgraph to DOT format
#
@@ -24,11 +24,11 @@ print STDOUT "digraph searchgraph\n{\nrankdir=LR\n";
my($line, $cpt, $from, $to, $label, $recombined, $transition, $o, $stack, $state);
$cpt = 0;
-
+
$line=<>; #skip first line (the empty hypothesis, no arc in fact)
my $nr = 0;
-while(($line=<>) )
+while(($line=<>) )
{
$nr++;
$from = "";
@@ -37,23 +37,23 @@ while(($line=<>) )
$recombined = "";
chomp($line);
#print STDERR "$line\n";
-
+
#Three kinds of lines in searchgraph
#0 hyp=0 stack=0 forward=1 fscore=-205.192
#0 hyp=5 stack=1 back=0 score=-0.53862 transition=-0.53862 forward=181 fscore=-205.36 covered=0-0 out=I am , pC=-0.401291, c=-0.98555
#256 hyp=6566 stack=2 back=23 score=-2.15644 transition=-0.921959 recombined=6302 forward=15519 fscore=-112.807 covered=2-2 out=countries , , pC=-0.640574, c=-1.07215
if($line =~ /hyp=(\d+).+stack=(\d+).+back=(\d+).+transition=([^ ]*).+recombined=(\d+).+out=(.*)(, pC|$)/)
- {
+ {
#print STDERR "hyp=$1, stack=$2, from=$3, transition=$4, recombined=$5, out=$6\n";
$to = $1;
$stack = $2;
$from = $3;
$transition=$4;
$recombined = $5;
- $o = $6;
+ $o = $6;
$label = "[color=blue label=";
-
+
$to = $recombined;
$stacks{$stack}{$recombined} = $recombined if $organize_to_stacks;
#$stack++;
@@ -80,9 +80,9 @@ while(($line=<>) )
#print STDERR "out = $o after regexp\n";
$label .= "\"$o p=$transition\"]\n";
#$label .= " p=$transition\"]\n";
-
+
print STDOUT "$from -> $to $label";
-
+
$cpt++;
}
diff --git a/scripts/analysis/show-phrases-used.pl b/scripts/analysis/show-phrases-used.pl
index 0a719d207..522e6d3ff 100755
--- a/scripts/analysis/show-phrases-used.pl
+++ b/scripts/analysis/show-phrases-used.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
#show-phrases-used: display all source and target phrases for each sentence in a corpus, and give average phrase length used
@@ -192,13 +192,13 @@ for(my $i = 0; $i < scalar(@{$sentenceData[0]}); $i++)
{
my $k = $srcPhraseIndices[$j];
my $srcBottomY = $srcY + ($font->height + $phraseEdgeVSpace) * $srcNumFactors; #bottom of color
- $img->filledRectangle($srcStartX[$k], $srcY - $phraseEdgeVSpace, $srcStartX[$k] + $font->width * $sentenceData[0]->[$i]->[$k]->{'srcNumChars'} + 2 * $phraseEdgeHSpace,
+ $img->filledRectangle($srcStartX[$k], $srcY - $phraseEdgeVSpace, $srcStartX[$k] + $font->width * $sentenceData[0]->[$i]->[$k]->{'srcNumChars'} + 2 * $phraseEdgeHSpace,
$srcBottomY, $bgCols[$srcBGCols[$k]]);
if(length $sentenceData[0]->[$i]->[$k]->{'tgtText'} > 0) #non-empty target phrase
{
$img->filledRectangle($tgtStartX[$j], $tgtY - $phraseEdgeVSpace, $tgtStartX[$j] + $font->width * $sentenceData[0]->[$i]->[$k]->{'tgtNumChars'} + 2 * $phraseEdgeHSpace,
$tgtY + ($font->height + $phraseEdgeVSpace) * $tgtNumFactors, $bgCols[$srcBGCols[$k]]);
- my ($srcMidX, $tgtMidX) = ($srcStartX[$k] + $font->width * $sentenceData[0]->[$i]->[$k]->{'srcNumChars'} / 2 + $phraseEdgeHSpace,
+ my ($srcMidX, $tgtMidX) = ($srcStartX[$k] + $font->width * $sentenceData[0]->[$i]->[$k]->{'srcNumChars'} / 2 + $phraseEdgeHSpace,
$tgtStartX[$j] + $font->width * $sentenceData[0]->[$i]->[$k]->{'tgtNumChars'} / 2 + $phraseEdgeHSpace);
$img->line($srcMidX, $srcBottomY, $tgtMidX, $tgtY, $bgCols[$srcBGCols[$k]]);
writeFactoredStringGD($img, $srcStartX[$k] + $phraseEdgeHSpace, \@srcFactorYs, $sentenceData[0]->[$i]->[$k]->{'srcText'}, $font, $black);
diff --git a/scripts/analysis/smtgui/Corpus.pm b/scripts/analysis/smtgui/Corpus.pm
index 2a7493b39..f050a9f6d 100644
--- a/scripts/analysis/smtgui/Corpus.pm
+++ b/scripts/analysis/smtgui/Corpus.pm
@@ -106,7 +106,7 @@ sub calcUnknownTokens
return ($self->{'unknownCount'}->{$factorName}, $self->{'tokenCount'}->{'input'});
}
warn "calcing unknown tokens\n";
-
+
$self->ensureFilenameDefined('input');
$self->ensurePhraseTableDefined($factorName);
$self->ensureFactorPosDefined($factorName);
@@ -129,7 +129,7 @@ sub calcUnknownTokens
}
$self->{'unknownCount'}->{$factorName} = $unknownTokens;
$self->{'tokenCount'}->{'input'} = $totalTokens;
-
+
return ($unknownTokens, $totalTokens);
}
@@ -145,7 +145,7 @@ sub calcNounAdjWER_PWERDiff
return @{$self->{'nnAdjWERPWER'}->{$sysname}};
}
warn "calcing NN/JJ PWER/WER\n";
-
+
$self->ensureFilenameDefined('truth');
$self->ensureFilenameDefined($sysname);
$self->ensureFactorPosDefined('surf');
@@ -164,7 +164,7 @@ sub calcNounAdjWER_PWERDiff
($sentWer, $tmp) = $self->sentencePWER(\@nnAdjSWords, \@nnAdjEWords, $self->{'factorIndices'}->{'surf'});
$pwerScore += $sentWer;
}
-
+
#unhog memory
$self->releaseSentences('truth');
$self->releaseSentences($sysname);
@@ -186,16 +186,16 @@ sub calcOverallWER
return $self->{'sysoutWER'}->{$sysname}->{$factorName}->[0];
}
warn "calcing WER\n";
-
+
$self->ensureFilenameDefined('truth');
$self->ensureFilenameDefined($sysname);
$self->ensureFactorPosDefined($factorName);
$self->loadSentences('truth', $self->{'truthFilename'});
$self->loadSentences($sysname, $self->{'sysoutFilenames'}->{$sysname});
-
+
my ($wer, $swers, $indices) = $self->corpusWER($self->{$sysname}, $self->{'truth'}, $self->{'factorIndices'}->{$factorName});
$self->{'sysoutWER'}->{$sysname}->{$factorName} = [$wer, $swers, $indices]; #total; arrayref of scores for individual sentences; arrayref of arrayrefs of offending words in each sentence
-
+
#unhog memory
$self->releaseSentences('truth');
$self->releaseSentences($sysname);
@@ -216,16 +216,16 @@ sub calcOverallPWER
return $self->{'sysoutPWER'}->{$sysname}->{$factorName}->[0];
}
warn "calcing PWER\n";
-
+
$self->ensureFilenameDefined('truth');
$self->ensureFilenameDefined($sysname);
$self->ensureFactorPosDefined($factorName);
$self->loadSentences('truth', $self->{'truthFilename'});
$self->loadSentences($sysname, $self->{'sysoutFilenames'}->{$sysname});
-
+
my ($pwer, $spwers, $indices) = $self->corpusPWER($self->{$sysname}, $self->{'truth'}, $self->{'factorIndices'}->{$factorName});
$self->{'sysoutPWER'}->{$sysname}->{$factorName} = [$pwer, $spwers, $indices]; #total; arrayref of scores for individual sentences; arrayref of arrayrefs of offending words in each sentence
-
+
#unhog memory
$self->releaseSentences('truth');
$self->releaseSentences($sysname);
@@ -244,23 +244,23 @@ sub calcBLEU
return $self->{'bleuScores'}->{$sysname}->{$factorName};
}
warn "calcing BLEU\n";
-
+
$self->ensureFilenameDefined('truth');
$self->ensureFilenameDefined($sysname);
$self->ensureFactorPosDefined($factorName);
$self->loadSentences('truth', $self->{'truthFilename'});
$self->loadSentences($sysname, $self->{'sysoutFilenames'}->{$sysname});
-
+
#score structure: various total scores, arrayref of by-sentence score arrays
if(!exists $self->{'bleuScores'}->{$sysname}) {$self->{'bleuScores'}->{$sysname} = {};}
if(!exists $self->{'bleuScores'}->{$sysname}->{$factorName}) {$self->{'bleuScores'}->{$sysname}->{$factorName} = [[], []];}
-
+
my ($good1, $tot1, $good2, $tot2, $good3, $tot3, $good4, $tot4, $totCLength, $totRLength) = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
my $factorIndex = $self->{'factorIndices'}->{$factorName};
for(my $i = 0; $i < scalar(@{$self->{'truth'}}); $i++)
{
my ($truthSentence, $sysoutSentence) = ($self->{'truth'}->[$i], $self->{$sysname}->[$i]);
- my ($unigood, $unicount, $bigood, $bicount, $trigood, $tricount, $quadrugood, $quadrucount, $cLength, $rLength) =
+ my ($unigood, $unicount, $bigood, $bicount, $trigood, $tricount, $quadrugood, $quadrucount, $cLength, $rLength) =
$self->sentenceBLEU($truthSentence, $sysoutSentence, $factorIndex, 0); #last argument is whether to debug-print
push @{$self->{'bleuScores'}->{$sysname}->{$factorName}->[1]}, [$unigood, $unicount, $bigood, $bicount, $trigood, $tricount, $quadrugood, $quadrucount, $cLength, $rLength];
$good1 += $unigood; $tot1 += $unicount;
@@ -271,7 +271,7 @@ sub calcBLEU
$totRLength += $rLength;
}
my $brevity = ($totCLength > $totRLength || $totCLength == 0) ? 1 : exp(1 - $totRLength / $totCLength);
- my ($pct1, $pct2, $pct3, $pct4) = ($tot1 == 0 ? -1 : $good1 / $tot1, $tot2 == 0 ? -1 : $good2 / $tot2,
+ my ($pct1, $pct2, $pct3, $pct4) = ($tot1 == 0 ? -1 : $good1 / $tot1, $tot2 == 0 ? -1 : $good2 / $tot2,
$tot3 == 0 ? -1 : $good3 / $tot3, $tot4 == 0 ? -1 : $good4 / $tot4);
my ($logsum, $logcount) = (0, 0);
if($tot1 > 0) {$logsum += my_log($pct1); $logcount++;}
@@ -280,7 +280,7 @@ sub calcBLEU
if($tot4 > 0) {$logsum += my_log($pct4); $logcount++;}
my $bleu = $brevity * exp($logsum / $logcount);
$self->{'bleuScores'}->{$sysname}->{$factorName}->[0] = [$bleu, 100 * $pct1, 100 * $pct2, 100 * $pct3, 100 * $pct4, $brevity];
-
+
#unhog memory
$self->releaseSentences('truth');
$self->releaseSentences($sysname);
@@ -302,13 +302,13 @@ sub statisticallyTestBLEUResults
return $self->{'bleuConfidence'}->{$sysname}->{$factorName};
}
warn "performing consistency tests\n";
-
+
my $k = 30; #HARDCODED NUMBER OF SUBSETS (WE DO k-FOLD CROSS-VALIDATION); IF YOU CHANGE THIS YOU MUST ALSO CHANGE getApproxPValue() and $criticalTStat
my $criticalTStat = 2.045; #hardcoded value given alpha (.025 here) and degrees of freedom (= $k - 1) ########################################
$self->ensureFilenameDefined('truth');
$self->ensureFilenameDefined($sysname);
$self->ensureFactorPosDefined($factorName);
-
+
#ensure we have full-corpus BLEU results
if(!exists $self->{'bleuScores'}->{$sysname}->{$factorName})
{
@@ -316,7 +316,7 @@ sub statisticallyTestBLEUResults
}
if(!exists $self->{'subsetBLEUstats'}->{$sysname}) {$self->{'subsetBLEUstats'}->{$sysname} = {};}
if(!exists $self->{'subsetBLEUstats'}->{$sysname}->{$factorName}) {$self->{'subsetBLEUstats'}->{$sysname}->{$factorName} = [];}
-
+
#calculate n-gram precisions for each small subset
my @sentenceStats = @{$self->{'bleuScores'}->{$sysname}->{$factorName}->[1]};
for(my $i = 0; $i < $k; $i++)
@@ -355,10 +355,10 @@ sub statisticallyTestBLEUResults
$devs[$i] = sqrt($devs[$i] / ($k - 1));
$t->[$i] = ($fullCorpusBLEU->[$i + 1] / 100 - $means[$i]) / $devs[$i];
push @{$self->{'bleuConfidence'}->{$sysname}->{$factorName}->[0]}, getLowerBoundPValue($t->[$i]); #p-value for overall score vs. subset average
- push @{$self->{'bleuConfidence'}->{$sysname}->{$factorName}->[1]},
+ push @{$self->{'bleuConfidence'}->{$sysname}->{$factorName}->[1]},
[$means[$i] - $criticalTStat * $devs[$i] / sqrt($k), $means[$i] + $criticalTStat * $devs[$i] / sqrt($k)]; #the confidence interval
}
-
+
return $self->{'bleuConfidence'}->{$sysname}->{$factorName};
}
@@ -374,7 +374,7 @@ sub calcPerplexity
return $self->{'perplexity'}->{$sysname}->{$factorName};
}
warn "calcing perplexity\n";
-
+
$self->ensureFilenameDefined($sysname);
my $sysoutFilename;
if($sysname eq 'truth' || $sysname eq 'input') {$sysoutFilename = $self->{"${sysname}Filename"};}
@@ -395,26 +395,26 @@ sub calcPerplexity
#run a paired t test and a sign test on BLEU statistics for subsets of both systems' outputs
#arguments: system name 1, system name 2, factor name
#return: arrayref of [arrayref of confidence levels for t test at which results differ, arrayref of index (0/1) of better system by t test,
-# arrayref of confidence levels for sign test at which results differ, arrayref of index (0/1) of better system by sign test],
+# arrayref of confidence levels for sign test at which results differ, arrayref of index (0/1) of better system by sign test],
# where each inner arrayref has one element per n-gram order considered
sub statisticallyCompareSystemResults
{
my ($self, $sysname1, $sysname2, $factorName) = @_;
#check in-memory cache first
- if(exists $self->{'comparisonStats'}->{$sysname1} && exists $self->{'comparisonStats'}->{$sysname1}->{$sysname2}
+ if(exists $self->{'comparisonStats'}->{$sysname1} && exists $self->{'comparisonStats'}->{$sysname1}->{$sysname2}
&& exists $self->{'comparisonStats'}->{$sysname1}->{$sysname2}->{$factorName})
{
return $self->{'comparisonStats'}->{$sysname1}->{$sysname2}->{$factorName};
}
warn "comparing sysoutputs\n";
-
+
$self->ensureFilenameDefined($sysname1);
$self->ensureFilenameDefined($sysname2);
$self->ensureFactorPosDefined($factorName);
#make sure we have tallied results for both systems
if(!exists $self->{'subsetBLEUstats'}->{$sysname1}->{$factorName}) {$self->statisticallyTestBLEUResults($sysname1, $factorName);}
if(!exists $self->{'subsetBLEUstats'}->{$sysname2}->{$factorName}) {$self->statisticallyTestBLEUResults($sysname2, $factorName);}
-
+
if(!exists $self->{'comparisonStats'}->{$sysname1}) {$self->{'comparisonStats'}->{$sysname1} = {};}
if(!exists $self->{'comparisonStats'}->{$sysname1}->{$sysname2}) {$self->{'comparisonStats'}->{$sysname1}->{$sysname2} = {};}
if(!exists $self->{'comparisonStats'}->{$sysname1}->{$sysname2}->{$factorName}) {$self->{'comparisonStats'}->{$sysname1}->{$sysname2}->{$factorName} = [];}
@@ -570,7 +570,7 @@ sub writeCacheFile
}
#store WER, PWER to disk
print CACHEFILE "\nWER scores\n";
- my $printWERFunc =
+ my $printWERFunc =
sub
{
my $werType = shift;
@@ -791,7 +791,7 @@ sub getLowerBoundPValue
0.683 => .5,
0.854 => .4,
1.055 => .3,
- 1.311 => .2,
+ 1.311 => .2,
1.699 => .1
);
foreach my $tCmp (sort keys %t2p) {return $t2p{$tCmp} if $t <= $tCmp;}
@@ -803,7 +803,7 @@ sub getUpperBoundPValue
{
my $t = abs(shift);
#encode various known p-values for ###### DOF = 29 ######
- my %t2p =
+ my %t2p =
(
4.506 => .0001,
4.254 => .0002,
@@ -913,7 +913,7 @@ sub loadSentences
my ($self, $sysname, $filename) = @_;
#if the sentences are already loaded, leave them be
if(exists $self->{$sysname} && scalar(@{$self->{$sysname}}) > 0) {return;}
-
+
$self->{$sysname} = [];
$self->{'tokenCount'}->{$sysname} = 0;
open(INFILE, "<$filename") or die "Corpus::load(): couldn't open '$filename' for read\n";
@@ -929,7 +929,7 @@ sub loadSentences
}
push @{$self->{$sysname}}, $refFactors;
}
- close(INFILE);
+ close(INFILE);
}
#free the memory used for the given corpus (but NOT any associated calculations, eg WER)
@@ -948,7 +948,7 @@ sub loadPhraseTable
{
my ($self, $factorName) = @_;
$self->ensurePhraseTableDefined($factorName);
-
+
my $filename = $self->{'phraseTableFilenames'}->{$factorName};
open(PTABLE, "<$filename") or die "couldn't open '$filename' for read\n";
$self->{'phraseTables'}->{$factorName} = {}; #create ref to phrase table (hash of strings, for source phrases, to anything whatsoever)
@@ -1022,7 +1022,7 @@ sub sentenceWER
my ($totWER, $indices) = (0, []);
my ($sLength, $eLength) = (scalar(@$refSysOutput), scalar(@$refTruth));
if($sLength == 0 || $eLength == 0) {return ($totWER, $indices);} #special case
-
+
my @refWordsMatchIndices = (-1) x $eLength; #at what sysout-word index this truth word is first matched
my @sysoutWordsMatchIndices = (-1) x $sLength; #at what truth-word index this sysout word is first matched
my $table = []; #index by sysout word index, then truth word index; a cell holds max count of matching words and direction we came to get it
@@ -1041,7 +1041,7 @@ sub sentenceWER
push @{$table->[$i]}, [($match ? $maxPrev + 1 : $maxPrev), $prevDir];
}
}
-
+
#look back along the path and get indices of non-matching words
my @unusedSysout = (0) x $sLength; #whether each sysout word was matched--used for outputting html table
my ($i, $j) = ($sLength - 1, $eLength - 1);
@@ -1066,7 +1066,7 @@ sub sentenceWER
#we're at the first sysout word; finish up checking for matches
while($j > 0 && $refWordsMatchIndices[$j] != 0) {push @{$table->[0]->[$j]}, 0; $j--;}
if($j == 0 && $refWordsMatchIndices[0] != 0) {unshift @$indices, 0; $unusedSysout[0] = 1;} #no truth word was matched to the first sysout word
-
+
#print some HTML to debug the WER algorithm
# print "<table border=1><tr><td></td><td>" . join("</td><td>", map {() . $_->[$index]} @$refTruth) . "</td></tr>";
# for(my $i = 0; $i < $sLength; $i++)
@@ -1086,7 +1086,7 @@ sub sentenceWER
# print "</tr>";
# }
# print "</table>";
-
+
my $matchCount = 0;
if($sLength > 0) {$matchCount = $table->[$sLength - 1]->[$eLength - 1]->[0];}
return ($sLength - $matchCount, $indices);
@@ -1192,7 +1192,7 @@ sub sentenceBLEU
$total2 = max(1, $total - 1);
$total3 = max(1, $total - 2);
$total4 = max(1, $total - 3);
-
+
return ($correct1, $total1, $correct2, $total2, $correct3, $total3, $correct4, $total4, $length_translation, $length_reference);
}
diff --git a/scripts/analysis/smtgui/filter-phrase-table.pl b/scripts/analysis/smtgui/filter-phrase-table.pl
index 9f411f3fa..55f2619c0 100755
--- a/scripts/analysis/smtgui/filter-phrase-table.pl
+++ b/scripts/analysis/smtgui/filter-phrase-table.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
#by Philipp Koehn, de-augmented by Evan Herbst
diff --git a/scripts/analysis/smtgui/newsmtgui.cgi b/scripts/analysis/smtgui/newsmtgui.cgi
index 552f89924..32ad3a948 100755
--- a/scripts/analysis/smtgui/newsmtgui.cgi
+++ b/scripts/analysis/smtgui/newsmtgui.cgi
@@ -23,7 +23,7 @@ my $FOREIGN = 'f';
#FILEDESC: textual descriptions associated with specific filenames; to be displayed on the single-corpus view
my %FILEDESC = (); &load_descriptions();
-my %factorData = loadFactorData('file-factors');
+my %factorData = loadFactorData('file-factors');
my %MEMORY; &load_memory();
my (@mBLEU,@NIST);
@mBLEU=`cat mbleu-memory.dat` if -e "mbleu-memory.dat"; chop(@mBLEU);
@@ -60,13 +60,13 @@ print "</BODY></HTML>\n";
sub show_corpora {
my %CORPUS = ();
-
+
# find corpora in evaluation directory: see the factor-index file, which was already read in
foreach my $corpusName (keys %factorData)
{
$CORPUS{$corpusName} = 1;
}
-
+
# list corpora
&htmlhead("All Corpora");
print "<UL>\n";
@@ -81,11 +81,11 @@ sub show_corpora {
sub view_corpus {
my @TABLE;
&htmlhead("View Corpus $in{CORPUS}");
-
+
# find corpora in evaluation directory
my $corpus = new Corpus('-name' => "$in{CORPUS}", '-descriptions' => \%FILEDESC, '-info_line' => $factorData{$in{CORPUS}});
# $corpus->printDetails(); #debugging info
-
+
my ($sentence_count, $lineInfo);
if(-e "$in{CORPUS}.f")
{
@@ -99,7 +99,7 @@ sub view_corpus {
$lineInfo =~ /^\s*(\d+)\s+/;
$sentence_count = 0 + $1;
}
-
+
print "Corpus '$in{CORPUS}' consists of $sentence_count sentences\n";
print "(<A HREF=?ACTION=VIEW_CORPUS&CORPUS=" . CGI::escape($in{CORPUS})."&mBLEU=1>with mBLEU</A>)" if ((!defined($in{mBLEU})) && (scalar keys %MEMORY) && -e "$in{CORPUS}.e" && -e "$in{CORPUS}.f");
print "<P>\n";
@@ -162,16 +162,16 @@ sub view_corpus {
# filename
$row .= "$file</A>";
# description (hard-coded)
- my @TRANSLATION_SENTENCE = `cat $in{CORPUS}.$file`;
+ my @TRANSLATION_SENTENCE = `cat $in{CORPUS}.$file`;
chop(@TRANSLATION_SENTENCE);
-
+
#count sentences that contain null words
my $null_count = 0;
foreach (@TRANSLATION_SENTENCE)
{
$null_count++ if /^NULL$/ || /^NONE$/;
}
- if ($null_count > 0) {
+ if ($null_count > 0) {
$row .= "$null_count NULL ";
}
@@ -200,8 +200,8 @@ sub view_corpus {
$no_bleu=1;
}
# NIST score
- if (-e "$in{CORPUS}.ref.sgm" && -e "$in{CORPUS}.src.sgm"
- && !$DONTSCORE{$file}) {
+ if (-e "$in{CORPUS}.ref.sgm" && -e "$in{CORPUS}.src.sgm"
+ && !$DONTSCORE{$file}) {
$row .= "<TD>";
print "$DONTSCORE{$file}+";
my ($nist,$nist_bleu);
@@ -230,7 +230,7 @@ sub view_corpus {
}
$row .= "</TD>\n";
}
-
+
my $isSystemOutput = ($file ne 'e' && $file ne 'f' && $file !~ /^pt/);
# misc stats (note the unknown words should come first so the total word count is available for WER)
$row .= "<TD align=\"center\">";
@@ -284,7 +284,7 @@ sub view_corpus {
else
{
my ($lemmaBLEU, $p1, $p2, $p3, $p4, $brevity) = $corpus->calcBLEU($file, 'lemma');
- $row .= sprintf("surface = %.3lf<br>lemma = %.3lf<br><b>lemma BLEU = %.04f</b> %.01f/%.01f/%.01f/%.01f *%.03f",
+ $row .= sprintf("surface = %.3lf<br>lemma = %.3lf<br><b>lemma BLEU = %.04f</b> %.01f/%.01f/%.01f/%.01f *%.03f",
$surfPWER, $lemmaPWER, $lemmaBLEU, $p1, $p2, $p3, $p4, $brevity);
}
}
@@ -315,7 +315,7 @@ sub view_corpus {
$row .= "/<FONT COLOR=ORANGE>$just_syn</FONT>";
$row .= "/<FONT COLOR=ORANGE>$just_sem</FONT>";
$row .= "/<FONT COLOR=RED>$wrong</FONT> ($unknown)</TD>\n";
- if ($in{SORT} eq 'SCORE') {
+ if ($in{SORT} eq 'SCORE') {
$sort = sprintf("%03d %04d",$correct,$just_syn+$just_sem);
}
}
@@ -324,7 +324,7 @@ sub view_corpus {
$row .= "</TD>\n";
}
- $row .= "</TR>\n";
+ $row .= "</TR>\n";
push @TABLE, "<!-- $sort -->\n$row";
}
close(DIR);
@@ -408,7 +408,7 @@ sub score_file {
for(my $i=0;$i<=$#SENTENCES;$i++) {
my $evaluation = &get_from_memory($REFERENCE{$FOREIGN}[$i],$SENTENCES[$i]);
next if ($in{ACTION} eq 'SCORE_FILE' &&
- ! $in{VIEW} &&
+ ! $in{VIEW} &&
$evaluation ne '' && $evaluation ne 'wrong');
print "<P>Sentence ".($i+1).":<BR>\n";
# color coding
@@ -419,7 +419,7 @@ sub score_file {
}
}
- # all sentences
+ # all sentences
print "$SENTENCES[$i] (System output)<BR>\n";
foreach my $ref (@SHOW) {
if (-e "$in{CORPUS}.$ref") {
@@ -576,7 +576,7 @@ sub get_nist_score {
my $current_timestamp = $STAT[9];
foreach (@NIST) {
my ($file,$time,$nist,$bleu) = split;
- return ($nist,$bleu)
+ return ($nist,$bleu)
if ($file eq $translation_file && $current_timestamp == $time);
}
@@ -668,7 +668,7 @@ sub get_multi_bleu_score {
$REF_NGRAM_N{$ngram}++;
}
foreach my $ngram (keys %REF_NGRAM_N) {
- if (!defined($REF_NGRAM{$ngram}) ||
+ if (!defined($REF_NGRAM{$ngram}) ||
$REF_NGRAM{$ngram} < $REF_NGRAM_N{$ngram}) {
$REF_NGRAM{$ngram} = $REF_NGRAM_N{$ngram};
}
@@ -716,7 +716,7 @@ sub get_multi_bleu_score {
@STAT = stat($translation_file);
printf BLEU "$translation_file $STAT[9] %f %f %f %f %f %f\n",$bleu,$CORRECT[1]/$TOTAL[1],$CORRECT[2]/$TOTAL[2],$CORRECT[3]/$TOTAL[3],$CORRECT[4]/$TOTAL[4],$brevity_penalty;
close(BLEU);
-
+
return ($bleu,
100*$CORRECT[1]/$TOTAL[1],
100*$CORRECT[2]/$TOTAL[2],
diff --git a/scripts/analysis/suspicious_tokenization.pl b/scripts/analysis/suspicious_tokenization.pl
index d1e5c1f67..3ea15154e 100755
--- a/scripts/analysis/suspicious_tokenization.pl
+++ b/scripts/analysis/suspicious_tokenization.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# Collects and prints all n-grams that appear in the given corpus both
# tokenized as well as untokenized.
# Ondrej Bojar
diff --git a/scripts/analysis/weight-scan.pl b/scripts/analysis/weight-scan.pl
index 7283483e9..b33360694 100755
--- a/scripts/analysis/weight-scan.pl
+++ b/scripts/analysis/weight-scan.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# runs Moses many times changing the values of one weight, all others fixed
# nbest lists are always produced to allow for comparison of real and
# 'projected' BLEU (BLEU estimated from n-best lists collected at a neighouring
@@ -20,7 +20,7 @@ my $jobs = 0;
my $workdir = "weight-scan";
my $range = "0.0,0.1,1.0";
my $input_type = 0;
-my $normalize = 0; # normalize
+my $normalize = 0; # normalize
my $nbestsize = 100;
my $decoderflags = "";
my $moses_parallel_cmd = "$SCRIPTS_ROOTDIR/generic/moses-parallel.pl";
@@ -110,7 +110,7 @@ die "Failed to find weights of the name '$weightname' in moses config."
#store current directory and create the working directory (if needed)
-my $cwd = `pawd 2>/dev/null`;
+my $cwd = `pawd 2>/dev/null`;
if(!$cwd){$cwd = `pwd`;}
chomp($cwd);
@@ -136,7 +136,7 @@ sub run_decoder {
my $filebase = sprintf("%${prec}f", $weightvalue);
my $nbestfilename = "best$nbestsize.$filebase";
my $filename = "out.$filebase";
-
+
# user-supplied parameters
print STDERR "params = $decoderflags\n";
@@ -240,7 +240,7 @@ sub ensure_full_path {
my $PATH = shift;
$PATH =~ s/\/nfsmnt//;
return $PATH if $PATH =~ /^\//;
- my $dir = `pawd 2>/dev/null`;
+ my $dir = `pawd 2>/dev/null`;
if(!$dir){$dir = `pwd`;}
chomp($dir);
$PATH = $dir."/".$PATH;
diff --git a/scripts/ems/experiment.perl b/scripts/ems/experiment.perl
index 5d68e409c..62b039124 100755
--- a/scripts/ems/experiment.perl
+++ b/scripts/ems/experiment.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# Experiment Management System
# Documentation at http://www.statmt.org/moses/?n=FactoredTraining.EMS
@@ -30,7 +30,7 @@ my ($CONFIG_FILE,
$DELETE_CRASHED,
$DELETE_VERSION
);
-
+
my $SLEEP = 2;
my $META = "$RealBin/experiment.meta";
@@ -63,7 +63,7 @@ die("experiment.perl -config config-file [-exec] [-no-graph]")
'no-graph' => \$NO_GRAPH);
if (! -e "steps") { `mkdir -p steps`; }
-die("error: could not find config file")
+die("error: could not find config file")
unless ($CONFIG_FILE && -e $CONFIG_FILE) ||
($CONTINUE && -e &steps_file("config.$CONTINUE",$CONTINUE)) ||
($DELETE_CRASHED && -e &steps_file("config.$DELETE_CRASHED",$DELETE_CRASHED)) ||
@@ -159,7 +159,7 @@ exit();
# graph that depicts steps of the experiment, with depedencies
sub init_agenda_graph() {
- my $dir = &check_and_get("GENERAL:working-dir");
+ my $dir = &check_and_get("GENERAL:working-dir");
my $graph_file = &steps_file("graph.$VERSION",$VERSION);
open(PS,">".$graph_file.".ps") or die "Cannot open: $!";
@@ -208,7 +208,7 @@ sub detect_if_cluster {
$CLUSTER = 1;
print "running on a cluster\n" if $CLUSTER;
}
- }
+ }
}
sub detect_if_multicore {
@@ -230,7 +230,7 @@ sub read_meta {
my ($module,$step);
while(<META>) {
s/\#.*$//; # strip comments
- next if /^\s*$/;
+ next if /^\s*$/;
while (/\\\s*$/) {
$_ .= <META>;
s/\s*\\\s*[\n\r]*\s+/ /;
@@ -369,7 +369,7 @@ sub read_config {
else {
print STDERR "BUGGY CONFIG LINE ($line_count): $_";
$error++;
- }
+ }
}
}
die("$error ERROR".(($error>1)?"s":"")." IN CONFIG FILE") if $error;
@@ -406,9 +406,9 @@ sub read_config {
s/\$\{$pattern\}/$o/ if $escaped;
s/\$$pattern/$o/ unless $escaped;
print "$_\n" if $VERBOSE;
- if (/\$/) {
+ if (/\$/) {
print "more resolving needed\n" if $VERBOSE;
- $resolve = 1;
+ $resolve = 1;
}
}
}
@@ -419,12 +419,12 @@ sub read_config {
# check if specified files exist
$error = 0;
foreach my $parameter (keys %CONFIG) {
- foreach (@{$CONFIG{$parameter}}) {
+ foreach (@{$CONFIG{$parameter}}) {
next if $parameter =~ /temp-dir/;
next if (!/^\// || -e); # ok if not file, or exists
- my $file = $_;
+ my $file = $_;
$file =~ s/ .+$//; # remove switches
- my $gz = $file; $gz =~ s/\.gz$//;
+ my $gz = $file; $gz =~ s/\.gz$//;
next if -e $gz; # ok if non gzipped exists
next if `find $file* -maxdepth 0 -follow`; # ok if stem
print STDERR "$parameter: file $_ does not exist!\n";
@@ -515,15 +515,15 @@ sub find_steps_for_module {
print "\t\tproduces $out\n" if $VERBOSE;
next unless defined($NEEDED{$out}) || (defined($FINAL_STEP) && $FINAL_STEP eq $step);
print "\t\tneeded\n" if $VERBOSE;
-
- # if output of a step is specified, you do not have
+
+ # if output of a step is specified, you do not have
# to execute that step
if(defined($CONFIG{$out})) {
$GIVEN{$out} = $step;
next;
}
print "\t\toutput not specified in config\n" if $VERBOSE;
-
+
# not needed, if optional and not specified
if (defined($STEP_IGNORE{$defined_step})) {
my $next = 0;
@@ -560,16 +560,16 @@ sub find_steps_for_module {
# OK, add step to the list
- push @DO_STEP,$step;
+ push @DO_STEP,$step;
$STEP_LOOKUP{$step} = $#DO_STEP;
print "\tdo-step: $step\n" if $VERBOSE;
-
- # mark as pass step (where no action is taken), if step is
+
+ # mark as pass step (where no action is taken), if step is
# optional and nothing needs to be do done
if (defined($STEP_PASS{$defined_step})) {
my $flag = 1;
foreach my $pass (@{$STEP_PASS{$defined_step}}) {
- $flag = 0
+ $flag = 0
if &backoff_and_get(&extend_local_name($module,$set,$pass));
}
$PASS{$#DO_STEP}++ if $flag;
@@ -578,12 +578,12 @@ sub find_steps_for_module {
if (defined($STEP_PASS_IF{$defined_step})) {
my $flag = 0;
foreach my $pass (@{$STEP_PASS_IF{$defined_step}}) {
- $flag = 1
+ $flag = 1
if &backoff_and_get(&extend_local_name($module,$set,$pass));
}
$PASS{$#DO_STEP}++ if $flag;
}
-
+
# special case for passing: steps that only affect factor 0
if (defined($ONLY_FACTOR_0{$defined_step})) {
my $FACTOR = &backoff_and_get_array("LM:$set:factors");
@@ -625,7 +625,7 @@ sub find_steps_for_module {
print "\n\t\tcross-directed to $in\n" if $VERBOSE;
}
elsif(defined($CONFIG{$in})) {
- print "\n\t\t... but that is specified\n" if $VERBOSE;
+ print "\n\t\t... but that is specified\n" if $VERBOSE;
}
else {
push @{$NEEDED{$in}}, $#DO_STEP;
@@ -639,11 +639,11 @@ sub find_steps_for_module {
sub check_producability {
my ($module,$set,$output) = @_;
-
+
# find $output requested as input by step in $module/$set
my @OUT = &construct_input($module,$set,$output);
-
- # if multiple outputs (due to multiple sets merged into one),
+
+ # if multiple outputs (due to multiple sets merged into one),
# only one needs to exist
foreach my $out (@OUT) {
print "producable? $out\n" if $VERBOSE;
@@ -687,12 +687,12 @@ sub construct_input {
# potentially multiple input files
my @IN;
-
+
# input from same module
if ($in !~ /([^:]+):(\S+)/) {
push @IN, &construct_name($module,$set,$in);
}
-
+
# input from previous model, multiple
elsif ($MODULE_TYPE{$1} eq "multiple") {
my @SETS = &get_sets($1);
@@ -711,7 +711,7 @@ sub construct_input {
else {
push @IN,$in;
}
-
+
return @IN;
}
@@ -750,7 +750,7 @@ sub delete_version {
# check which versions are already deleted
my %ALREADY_DELETED;
- my $dir = &check_and_get("GENERAL:working-dir");
+ my $dir = &check_and_get("GENERAL:working-dir");
open(VERSION,"ls $dir/steps/*/deleted.* 2>/dev/null|");
while(<VERSION>) {
/deleted\.(\d+)/;
@@ -787,7 +787,7 @@ sub delete_version {
chomp($step_file);
my $step = &get_step_from_step_file($step_file);
next if $USED_BY_OTHERS{$step};
- &delete_step($step,$DELETE_VERSION);
+ &delete_step($step,$DELETE_VERSION);
}
# orphan killing: delete steps in deleted versions, if they were only preserved because this version needed them
@@ -808,12 +808,12 @@ sub get_step_from_step_file {
$step =~ s/_/:/g;
return $step;
}
-
+
sub delete_step {
my ($step_name,$version) = @_;
my ($module,$set,$step) = &deconstruct_name($step_name);
- my $step_file = &versionize(&step_file2($module,$set,$step),$version);
+ my $step_file = &versionize(&step_file2($module,$set,$step),$version);
print "delete step $step_file\n";
`rm $step_file $step_file.*` if $EXECUTE;
@@ -839,7 +839,7 @@ sub delete_output {
if (-e $file) {
print "\tdelete file $file\n";
`rm $file` if $EXECUTE;
- }
+ }
# delete files that have additional extension
$file =~ /^(.+)\/([^\/]+)$/;
my ($dir,$f) = ($1,$2);
@@ -861,7 +861,7 @@ sub delete_output {
# RE-USE
# look for completed step jobs from previous experiments
sub find_re_use {
- my $dir = &check_and_get("GENERAL:working-dir");
+ my $dir = &check_and_get("GENERAL:working-dir");
return unless -e "$dir/steps";
for(my $i=0;$i<=$#DO_STEP;$i++) {
@@ -953,13 +953,13 @@ sub find_re_use {
$change = 1;
print "\tpassed step $DO_STEP[$_] used in re-use run $run -> fail\n" if $VERBOSE;
}
- }
+ }
}
# re-use step has to exist for this run
if (! defined($RE_USE[$parent]{$reuse_run})) {
print "\tno previous step -> fail\n" if $VERBOSE;
delete($RE_USE[$i]{$run});
- $change = 1;
+ $change = 1;
}
}
}
@@ -967,7 +967,7 @@ sub find_re_use {
}
}
- # summarize and convert hashes into integers for to be re-used
+ # summarize and convert hashes into integers for to be re-used
print "\nSTEP SUMMARY:\n";
open(RE_USE,">".&steps_file("re-use.$VERSION",$VERSION)) or die "Cannot open: $!";
for(my $i=$#DO_STEP;$i>=0;$i--) {
@@ -998,7 +998,7 @@ sub find_dependencies {
}
for(my $i=0;$i<=$#DO_STEP;$i++) {
my $step = $DO_STEP[$i];
- $step =~ /^(.+:)[^:]+$/;
+ $step =~ /^(.+:)[^:]+$/;
my $module_set = $1;
foreach my $needed_by (@{$NEEDED{$module_set.$STEP_OUT{&defined_step($step)}}}) {
print "$needed_by needed by $i\n" if $VERBOSE;
@@ -1020,9 +1020,9 @@ sub draw_agenda_graph {
print DOT " ranksep=0;\n";
for(my $i=0;$i<=$#DO_STEP;$i++) {
my $step = $DO_STEP[$i];
- $step =~ /^(.+):[^:]+$/;
+ $step =~ /^(.+):[^:]+$/;
my $module_set = $1;
- push @{$M{$module_set}},$i;
+ push @{$M{$module_set}},$i;
}
my $i = 0;
my (@G,%GIVEN_NUMBER);
@@ -1049,7 +1049,7 @@ sub draw_agenda_graph {
}
else {
my $step = $DO_STEP[$i];
- $step =~ s/^.+:([^:]+)$/$1/;
+ $step =~ s/^.+:([^:]+)$/$1/;
$step .= " (".$RE_USE[$i].")" if $RE_USE[$i];
my $color = "green";
@@ -1058,7 +1058,7 @@ sub draw_agenda_graph {
$color = "lightblue" if $RE_USE[$i] && $RE_USE[$i] != $VERSION;
$color = "red" if defined($CRASHED{$i});
$color = "lightyellow" if defined($PASS{$i});
-
+
print DOT " $i [label=\"$step\",shape=box,fontsize=10,height=0,style=filled,fillcolor=\"$color\"];\n";
}
}
@@ -1087,7 +1087,7 @@ sub draw_agenda_graph {
sub define_step {
my ($step) = @_;
- my $dir = &check_and_get("GENERAL:working-dir");
+ my $dir = &check_and_get("GENERAL:working-dir");
`mkdir -p $dir` if ! -e $dir;
my @STEP;
if ($step eq "all") {
@@ -1104,14 +1104,14 @@ sub define_step {
next if &define_template($i);
if ($DO_STEP[$i] =~ /^CORPUS:(.+):factorize$/) {
&define_corpus_factorize($i);
- }
+ }
elsif ($DO_STEP[$i] eq 'SPLITTER:train') {
&define_splitter_train($i);
- }
+ }
elsif ($DO_STEP[$i] =~ /^LM:(.+):factorize$/) {
&define_lm_factorize($i,$1);
}
- elsif ($DO_STEP[$i] =~ /^LM:(.+):randomize$/ ||
+ elsif ($DO_STEP[$i] =~ /^LM:(.+):randomize$/ ||
$DO_STEP[$i] eq 'INTERPOLATED-LM:randomize') {
&define_lm_randomize($i,$1);
}
@@ -1178,7 +1178,7 @@ sub define_step {
}
elsif ($DO_STEP[$i] eq 'TUNING:factorize-input') {
&define_tuningevaluation_factorize($i);
- }
+ }
elsif ($DO_STEP[$i] eq 'TUNING:factorize-input-devtest') {
&define_tuningevaluation_factorize($i);
}
@@ -1193,7 +1193,7 @@ sub define_step {
}
elsif ($DO_STEP[$i] =~ /^EVALUATION:(.+):factorize-input$/) {
&define_tuningevaluation_factorize($i);
- }
+ }
elsif ($DO_STEP[$i] =~ /^EVALUATION:(.+):filter$/) {
&define_tuningevaluation_filter($1,$i);
}
@@ -1225,7 +1225,7 @@ sub define_step {
}
}
-# LOOP that executes the steps
+# LOOP that executes the steps
# including checks, if needed to be executed, waiting for completion, and error detection
sub execute_steps {
@@ -1266,7 +1266,7 @@ sub execute_steps {
print "number of steps doable or running: ".(scalar keys %DO)." at ".`date`;
foreach my $step (keys %DO) { print "\t".($DO{$step}==2?"running: ":"doable: ").$DO_STEP[$step]."\n"; }
return unless scalar keys %DO;
-
+
# execute new step
my $done = 0;
foreach my $i (keys %DO) {
@@ -1284,7 +1284,7 @@ sub execute_steps {
# cluster job submission
if ($CLUSTER && (!&is_qsub_script($i) || (&backoff_and_get($DO_STEP[$i].":jobs") && (&backoff_and_get($DO_STEP[$i].":jobs")==1)))) {
$DO{$i}++;
- my $qsub_args = &get_qsub_args($DO_STEP[$i]);
+ my $qsub_args = &get_qsub_args($DO_STEP[$i]);
print "\texecuting $step via qsub $qsub_args ($active active)\n";
my $qsub_command="qsub $qsub_args -S /bin/bash -e $step.STDERR -o $step.STDOUT $step";
print "\t$qsub_command\n" if $VERBOSE;
@@ -1306,8 +1306,8 @@ sub execute_steps {
}
# update state
- &draw_agenda_graph() unless $done;
-
+ &draw_agenda_graph() unless $done;
+
# sleep until one more step is done
while(! $done) {
sleep($SLEEP);
@@ -1328,7 +1328,7 @@ sub execute_steps {
}
}
`touch $running_file`;
- }
+ }
}
}
@@ -1356,17 +1356,17 @@ sub get_qsub_args {
# instead of submited as jobs. here we check for that.
sub is_qsub_script {
my ($i) = @_;
- return (defined($QSUB_STEP{$i}) ||
+ return (defined($QSUB_STEP{$i}) ||
defined($QSUB_SCRIPT{&defined_step($DO_STEP[$i])}));
}
-# write the info file that is consulted to check if
+# write the info file that is consulted to check if
# a steps has to be redone, even if it was run before
sub write_info {
my ($i) = @_;
my $step = $DO_STEP[$i];
my $module_set = $step; $module_set =~ s/:[^:]+$//;
-
+
open(INFO,">".&versionize(&step_file($i)).".INFO") or die "Cannot open: $!";
my %VALUE = &get_parameters_relevant_for_re_use($i);
foreach my $parameter (keys %VALUE) {
@@ -1426,19 +1426,19 @@ sub check_info {
if (defined($ONLY_EXISTENCE_MATTERS{"$module:$step"}{$parameter})) {
print "existence ok\n" if $VERBOSE;
}
- elsif (&match_info_strings($VALUE{$parameter},$INFO{$parameter})) {
- print "ok\n" if $VERBOSE;
+ elsif (&match_info_strings($VALUE{$parameter},$INFO{$parameter})) {
+ print "ok\n" if $VERBOSE;
}
- else {
+ else {
print "mismatch\n" if $VERBOSE;
- return 0;
+ return 0;
}
}
print "\tall parameters match\n" if $VERBOSE;
return 1;
}
-sub match_info_strings {
+sub match_info_strings {
my ($current,$old) = @_;
$current =~ s/ $//;
$old =~ s/ $//;
@@ -1479,7 +1479,7 @@ sub get_parameters_relevant_for_re_use {
my ($out,@INPUT) = &get_output_and_input($i);
my $actually_used = "USED";
foreach my $in_file (@INPUT) {
- $actually_used .= " ".$in_file;
+ $actually_used .= " ".$in_file;
}
$VALUE{"INPUT"} = $actually_used;
@@ -1591,7 +1591,7 @@ sub step_file2 {
my ($module,$set,$step) = @_;
my $dir = &check_and_get("GENERAL:working-dir");
`mkdir -p $dir/steps` if ! -e "$dir/steps";
- my $file = "$dir/steps/$module" . ($set ? ("_".$set) : "") . "_$step";
+ my $file = "$dir/steps/$module" . ($set ? ("_".$set) : "") . "_$step";
return $file;
}
@@ -1609,7 +1609,7 @@ sub defined_step_id {
sub defined_step {
my ($step) = @_;
- my $defined_step = $step;
+ my $defined_step = $step;
$defined_step =~ s/:.+:/:/;
return $defined_step;
}
@@ -1659,7 +1659,7 @@ sub define_corpus_factorize {
my ($output,$input) = &get_output_and_input($step_id);
my $input_extension = &check_backoff_and_get("TRAINING:input-extension");
my $output_extension = &check_backoff_and_get("TRAINING:output-extension");
-
+
my $dir = &check_and_get("GENERAL:working-dir");
my $temp_dir = &check_and_get("INPUT-FACTOR:temp-dir") . ".$VERSION";
my $cmd = "mkdir -p $temp_dir\n"
@@ -1673,7 +1673,7 @@ sub define_corpus_factorize {
"$output.$output_extension",
&check_backoff_and_get_array("TRAINING:output-factors"),
$step_id);
-
+
&create_step($step_id,$cmd);
}
@@ -1689,7 +1689,7 @@ sub define_tuningevaluation_factorize {
. &factorize_one_language("INPUT-FACTOR",$input,$output,
&check_backoff_and_get_array("TRAINING:input-factors"),
$step_id);
-
+
&create_step($step_id,$cmd);
}
@@ -1700,12 +1700,12 @@ sub define_lm_factorize {
my ($output,$input) = &get_output_and_input($step_id);
print "LM:$set:factors\n" if $VERBOSE;
my $factor = &check_backoff_and_get_array("LM:$set:factors");
-
+
my $dir = &check_and_get("GENERAL:working-dir");
my $temp_dir = &check_and_get("INPUT-FACTOR:temp-dir") . ".$VERSION";
my $cmd = "mkdir -p $temp_dir\n"
. &factorize_one_language("OUTPUT-FACTOR",$input,$output,$factor,$step_id);
-
+
&create_step($step_id,$cmd);
}
@@ -1715,12 +1715,12 @@ sub define_interpolated_lm_factorize_tuning {
my ($output,$input) = &get_output_and_input($step_id);
my $factor = &check_backoff_and_get_array("TRAINING:output-factors");
-
+
my $dir = &check_and_get("GENERAL:working-dir");
my $temp_dir = &check_and_get("INPUT-FACTOR:temp-dir") . ".$VERSION";
my $cmd = "mkdir -p $temp_dir\n"
. &factorize_one_language("OUTPUT-FACTOR",$input,$output,$factor,$step_id);
-
+
&create_step($step_id,$cmd);
}
@@ -1732,7 +1732,7 @@ sub define_splitter_train {
my $output_splitter = &get("GENERAL:output-splitter");
my $input_extension = &check_backoff_and_get("SPLITTER:input-extension");
my $output_extension = &check_backoff_and_get("SPLITTER:output-extension");
-
+
my $cmd = "";
if ($input_splitter) {
$cmd .= "$input_splitter -train -model $output.$input_extension -corpus $input.$input_extension\n";
@@ -1747,7 +1747,7 @@ sub define_splitter_train {
sub define_lm_train_randomized {
my ($step_id,$set) = @_;
my $training = &check_backoff_and_get("LM:$set:rlm-training");
- my $order = &check_backoff_and_get("LM:$set:order");
+ my $order = &check_backoff_and_get("LM:$set:order");
my ($output,$input) = &get_output_and_input($step_id);
$output =~ /^(.+)\/([^\/]+)$/;
@@ -1765,7 +1765,7 @@ sub define_lm_randomize {
my ($module,$set,$stepname) = &deconstruct_name($DO_STEP[$step_id]);
my $randomizer = &check_backoff_and_get("$module:$set:lm-randomizer");
- my $order = &check_backoff_and_get("$module:$set:order");
+ my $order = &check_backoff_and_get("$module:$set:order");
my ($output,$input) = &get_output_and_input($step_id);
$output =~ /^(.+)\/([^\/]+)$/;
@@ -1782,7 +1782,7 @@ sub factorize_one_language {
my $temp_dir = &check_and_get("INPUT-FACTOR:temp-dir") . ".$VERSION";
my $parallelizer = &get("GENERAL:generic-parallelizer");
my ($module,$set,$stepname) = &deconstruct_name($DO_STEP[$step_id]);
-
+
my ($cmd,$list) = ("");
foreach my $factor (@{$FACTOR}) {
if ($factor eq "word") {
@@ -1791,7 +1791,7 @@ sub factorize_one_language {
else {
my $script = &check_and_get("$type:$factor:factor-script");
my $out = "$outfile.$factor";
- if ($parallelizer && defined($PARALLELIZE{&defined_step($DO_STEP[$step_id])})
+ if ($parallelizer && defined($PARALLELIZE{&defined_step($DO_STEP[$step_id])})
&& ( (&get("$module:jobs") && $CLUSTER)
|| (&get("$module:cores") && $MULTICORE))) {
my $subdir = $module;
@@ -1803,7 +1803,7 @@ sub factorize_one_language {
$qflags="--queue-flags \"$qsub_args\"" if ($CLUSTER && $qsub_args);
$cmd .= "$parallelizer $qflags -in $infile -out $out -cmd '$script %s %s $temp_dir/$subdir' -jobs ".&get("$module:jobs")." -tmpdir $temp_dir/$subdir\n";
$QSUB_STEP{$step_id}++;
- }
+ }
elsif ($MULTICORE) {
$cmd .= "$parallelizer -in $infile -out $out -cmd '$script %s %s $temp_dir/$subdir' -cores ".&get("$module:cores")." -tmpdir $temp_dir/$subdir\n";
}
@@ -1825,9 +1825,9 @@ sub define_tuning_tune {
my $use_mira = &backoff_and_get("TUNING:use-mira", 0);
my $word_alignment = &backoff_and_get("TRAINING:include-word-alignment-in-rules");
my $tmp_dir = &get_tmp_file("TUNING","","tune");
-
- # the last 3 variables are only used for mira tuning
- my ($tuned_config,$config,$input,$reference,$config_devtest,$input_devtest,$reference_devtest, $filtered_config) = &get_output_and_input($step_id);
+
+ # the last 3 variables are only used for mira tuning
+ my ($tuned_config,$config,$input,$reference,$config_devtest,$input_devtest,$reference_devtest, $filtered_config) = &get_output_and_input($step_id);
$config = $filtered_config if $filtered_config;
@@ -1853,13 +1853,13 @@ sub define_tuning_tune {
my $mira_config = "$tmp_dir/mira-config.$VERSION.";
my $mira_config_log = $mira_config."log";
$mira_config .= "cfg";
-
+
write_mira_config($mira_config,$tmp_dir,$config,$input,$reference,$config_devtest,$input_devtest,$reference_devtest);
#$cmd = "$tuning_script -config $mira_config -exec >& $mira_config_log";
# we want error messages in top-level log file
$cmd = "$tuning_script -config $mira_config -exec ";
- # write script to select the best set of weights after training for the specified number of epochs -->
+ # write script to select the best set of weights after training for the specified number of epochs -->
# cp to tuning/tmp.?/moses.ini
my $script_filename = "$tmp_dir/selectBestWeights.";
my $script_filename_log = $script_filename."log";
@@ -1881,7 +1881,7 @@ sub define_tuning_tune {
my $decoder_settings = &backoff_and_get("TUNING:decoder-settings");
$decoder_settings = "" unless $decoder_settings;
$decoder_settings .= " -v 0 " unless $CLUSTER && $jobs && $jobs>1;
-
+
my $tuning_settings = &backoff_and_get("TUNING:tuning-settings");
$tuning_settings = "" unless $tuning_settings;
@@ -1890,7 +1890,7 @@ sub define_tuning_tune {
$cmd .= " --continue" if $tune_continue;
$cmd .= " --skip-decoder" if $skip_decoder;
$cmd .= " --inputtype $tune_inputtype" if defined($tune_inputtype);
-
+
my $qsub_args = &get_qsub_args($DO_STEP[$step_id]);
$cmd .= " --queue-flags=\"$qsub_args\"" if ($CLUSTER && $qsub_args);
$cmd .= " --jobs $jobs" if $CLUSTER && $jobs && $jobs>1;
@@ -1898,14 +1898,14 @@ sub define_tuning_tune {
$tuning_dir =~ s/\/[^\/]+$//;
$cmd .= "\nmkdir -p $tuning_dir";
}
-
+
$cmd .= "\ncp $tmp_dir/moses.ini $tuned_config";
&create_step($step_id,$cmd);
}
sub write_mira_config {
- my ($config_filename,$expt_dir,$tune_filtered_ini,$input,$reference,$devtest_filtered_ini,$input_devtest,$reference_devtest) = @_;
+ my ($config_filename,$expt_dir,$tune_filtered_ini,$input,$reference,$devtest_filtered_ini,$input_devtest,$reference_devtest) = @_;
my $moses_src_dir = &check_and_get("GENERAL:moses-src-dir");
my $mira_src_dir = &backoff_and_get("GENERAL:mira-src-dir");
my $tuning_decoder_settings = &check_and_get("TUNING:decoder-settings");
@@ -1916,7 +1916,7 @@ sub write_mira_config {
my $use_jackknife = &backoff_and_get("TUNING:use-jackknife");
# are we tuning a meta feature?
- my $tune_meta_feature = &backoff_and_get("TUNING:tune-meta-feature");
+ my $tune_meta_feature = &backoff_and_get("TUNING:tune-meta-feature");
my $tune_filtered_ini_start;
if (!$use_jackknife) {
@@ -1927,13 +1927,13 @@ sub write_mira_config {
# apply start weights to filtered ini file, and pass the new ini to mira
print "DEBUG: $RealBin/support/substitute-weights.perl $start_weights $tune_filtered_ini $tune_filtered_ini_start \n";
system("$RealBin/support/substitute-weights.perl $start_weights $tune_filtered_ini $tune_filtered_ini_start");
- }
+ }
}
# do we want to continue an interrupted experiment?
my $continue_expt = &backoff_and_get("TUNING:continue-expt");
my $continue_epoch = &backoff_and_get("TUNING:continue-epoch");
- my $continue_weights = &backoff_and_get("TUNING:continue-weights");
+ my $continue_weights = &backoff_and_get("TUNING:continue-weights");
# mira config file
open(CFG, ">$config_filename");
@@ -1956,7 +1956,7 @@ sub write_mira_config {
print CFG "tune-meta-feature=1 \n" if ($tune_meta_feature);
print CFG "jackknife=1 \n" if ($use_jackknife);
print CFG "wait-for-bleu=1 \n\n";
- #print CFG "decoder-settings=".$tuning_decoder_settings."\n\n";
+ #print CFG "decoder-settings=".$tuning_decoder_settings."\n\n";
print CFG "[train] \n";
print CFG "trainer=\${moses-home}/bin/mira \n";
if ($use_jackknife) {
@@ -1972,7 +1972,7 @@ sub write_mira_config {
}
else {
print CFG $input.".only$i, " if $i<9;
- print CFG $input.".only$i" if $i==9;
+ print CFG $input.".only$i" if $i==9;
}
}
print CFG "\n";
@@ -1999,14 +1999,14 @@ sub write_mira_config {
print CFG "moses-ini-file=".$tune_filtered_ini."\n";
}
}
- print CFG "decoder-settings=".$tuning_decoder_settings." -text-type \"dev\"\n";
- print CFG "hours=48 \n";
+ print CFG "decoder-settings=".$tuning_decoder_settings." -text-type \"dev\"\n";
+ print CFG "hours=48 \n";
if ($parallel_settings) {
foreach my $setting (split(" ", $parallel_settings)) {
print CFG $setting."\n";
}
}
- print CFG "extra-args=".$tuning_settings."\n\n";
+ print CFG "extra-args=".$tuning_settings."\n\n";
print CFG "[devtest] \n";
if (&get("TRAINING:hierarchical-rule-set")) {
print CFG "moses=\${moses-home}/bin/moses_chart \n";
@@ -2019,7 +2019,7 @@ sub write_mira_config {
print CFG "input-file=".$input_devtest."\n";
print CFG "reference-file=".$reference_devtest."\n";
print CFG "moses-ini-file=".$devtest_filtered_ini."\n";
- print CFG "decoder-settings=".$tuning_decoder_settings." -text-type \"devtest\"\n";
+ print CFG "decoder-settings=".$tuning_decoder_settings." -text-type \"devtest\"\n";
print CFG "hours=12 \nextra-args= \nskip-dev=1 \nskip-devtest=0 \nskip-submit=0 \n";
close(CFG);
}
@@ -2052,7 +2052,7 @@ sub write_selectBestMiraWeights {
print SCR "} \n\n";
print SCR "print STDERR \"Best weights according to BLEU on devtest set: \$best_weights \\n\"; \n";
print SCR "system(\"cp \$best_weights $weight_out_file\"); \n\n";
-
+
close(SCR);
system("chmod u+x $script_filename");
}
@@ -2118,7 +2118,7 @@ sub define_training_symmetrize_giza {
my $method = &check_and_get("TRAINING:alignment-symmetrization-method");
my $cmd = &get_training_setting(3);
my $alignment_stem = &versionize(&long_file_name("aligned","model",""));
-
+
$cmd .= "-giza-e2f $giza -giza-f2e $giza_inv ";
$cmd .= "-alignment-file $aligned ";
$cmd .= "-alignment-stem $alignment_stem ";
@@ -2129,17 +2129,17 @@ sub define_training_symmetrize_giza {
sub define_training_build_suffix_array {
my ($step_id) = @_;
-
+
my $scripts = &check_and_get("GENERAL:moses-script-dir");
-
+
my ($model, $aligned,$corpus) = &get_output_and_input($step_id);
my $sa_exec_dir = &check_and_get("TRAINING:suffix-array");
my $input_extension = &check_backoff_and_get("TRAINING:input-extension");
my $output_extension = &check_backoff_and_get("TRAINING:output-extension");
my $method = &check_and_get("TRAINING:alignment-symmetrization-method");
-
+
my $glue_grammar_file = &versionize(&long_file_name("glue-grammar","model",""));
-
+
my $cmd = "$scripts/training/wrappers/adam-suffix-array/suffix-array-create.sh $sa_exec_dir $corpus.$input_extension $corpus.$output_extension $aligned.$method $model $glue_grammar_file";
&create_step($step_id,$cmd);
@@ -2222,7 +2222,7 @@ sub define_training_extract_phrases {
$cmd .= "-alignment-stem $alignment_stem ";
$cmd .= "-extract-file $extract ";
$cmd .= "-corpus $corpus ";
-
+
if (&get("TRAINING:hierarchical-rule-set")) {
my $no_glue_grammar = &get("TRAINING:no-glue-grammar");
if (!defined($no_glue_grammar) || $no_glue_grammar eq "false") {
@@ -2327,7 +2327,7 @@ sub define_training_build_ttable {
$cmd .= "-ghkm-parts-of-speech-file $parts_of_speech_labels_file ";
}
}
-
+
&create_step($step_id,$cmd);
}
@@ -2346,7 +2346,7 @@ sub define_domain_feature_score_option {
else {
return "-score-options '--Domain$method $domains' ";
}
-}
+}
sub define_training_build_reordering {
my ($step_id) = @_;
@@ -2406,8 +2406,8 @@ sub define_training_sigtest_filter {
chop($filtered_table);
}
$raw_table =~ s/\s*\-\S+\s*//; # remove switch
- $filtered_table =~ s/\s*\-\S+\s*//;
-
+ $filtered_table =~ s/\s*\-\S+\s*//;
+
my $cmd = "zcat $raw_table.gz | $moses_src_dir/contrib/sigtest-filter/filter-pt -e $suffix_array.$output_extension -f $suffix_array.$input_extension $sigtest_filter $hierarchical_flag | gzip - > $filtered_table.gz\n";
&create_step($step_id,$cmd);
}
@@ -2461,7 +2461,7 @@ sub get_config_tables {
# additional settings for hierarchical models
my $extract_version = $VERSION;
if (&get("TRAINING:hierarchical-rule-set")) {
- $extract_version = $RE_USE[$STEP_LOOKUP{"TRAINING:extract-phrases"}]
+ $extract_version = $RE_USE[$STEP_LOOKUP{"TRAINING:extract-phrases"}]
if defined($STEP_LOOKUP{"TRAINING:extract-phrases"});
my $no_glue_grammar = &get("TRAINING:no-glue-grammar");
if (!defined($no_glue_grammar) || $no_glue_grammar eq "false") {
@@ -2506,10 +2506,10 @@ sub define_training_create_config {
if($transliteration_pt){
$cmd .= "-transliteration-phrase-table $transliteration_pt ";
- }
+ }
if ($osm) {
- my $osm_settings = &get("TRAINING:operation-sequence-model-settings");
+ my $osm_settings = &get("TRAINING:operation-sequence-model-settings");
if ($osm_settings =~ /-factor *(\S+)/){
$cmd .= "-osm-model $osm/ -osm-setting $1 ";
}
@@ -2547,7 +2547,7 @@ sub define_training_create_config {
$type = 5 if (&get("INTERPOLATED-LM:rlm") ||
&backoff_and_get("INTERPOLATED-LM:lm-randomizer"));
- # manually set type
+ # manually set type
$type = &get("INTERPOLATED-LM:type") if &get("INTERPOLATED-LM:type");
# go through each interpolated language model
@@ -2588,7 +2588,7 @@ sub define_training_create_config {
&backoff_and_get("LM:$set:rlm-training") ||
&backoff_and_get("LM:$set:lm-randomizer"));
- # manually set type
+ # manually set type
$type = &backoff_and_get("LM:$set:type") if (&backoff_and_get("LM:$set:type"));
# binarized by INTERPOLATED-LM
@@ -2596,7 +2596,7 @@ sub define_training_create_config {
$lm_file =~ s/\.lm/\.binlm/;
$type = 1;
$type = &get("INTERPOLATED-LM:type") if &get("INTERPOLATED-LM:type");
- }
+ }
# which factor is the model trained on?
my $factor = 0;
@@ -2631,7 +2631,7 @@ sub define_interpolated_lm_interpolate {
}
}
- # go through language models by factor and order
+ # go through language models by factor and order
my ($icount,$ILM_SETS) = &get_interpolated_lm_sets();
foreach my $factor (keys %{$ILM_SETS}) {
foreach my $order (keys %{$$ILM_SETS{$factor}}) {
@@ -2643,8 +2643,8 @@ sub define_interpolated_lm_interpolate {
foreach my $id_set (@{$$ILM_SETS{$factor}{$order}}) {
my ($id,$set) = split(/ /,$id_set,2);
$lm_list .= $LM[$id].",";
- if (defined($weights)) {
- die("ERROR: no interpolation weight set for $factor:$order:$set (factor:order:set)")
+ if (defined($weights)) {
+ die("ERROR: no interpolation weight set for $factor:$order:$set (factor:order:set)")
unless defined($WEIGHT{"$factor:$order:$set"});
$weight_list .= $WEIGHT{"$factor:$order:$set"}.",";
}
@@ -2708,7 +2708,7 @@ sub define_interpolated_lm_process {
my $tool = &check_backoff_and_get("INTERPOLATED-LM:lm-${stepname}r");
my $FACTOR = &backoff_and_get_array("TRAINING:output-factors");
- # go through language models by factor and order
+ # go through language models by factor and order
my ($icount,$ILM_SETS) = &get_interpolated_lm_sets();
my $cmd = "";
foreach my $factor (keys %{$ILM_SETS}) {
@@ -2729,7 +2729,7 @@ sub define_interpolated_lm_process {
$name = "$interpolated_lm$suffix";
$name_processed = "$processed_lm$suffix";
}
- $cmd .= "$tool $name $name_processed\n";
+ $cmd .= "$tool $name $name_processed\n";
}
}
@@ -2768,7 +2768,7 @@ sub get_interpolated_lm_sets {
my $icount=0;
foreach my $set (@LM_SETS) {
my $order = &check_backoff_and_get("LM:$set:order");
-
+
my $factor = 0;
if (&backoff_and_get("TRAINING:output-factors") &&
&backoff_and_get("LM:$set:factors")) {
@@ -2873,7 +2873,7 @@ sub get_table_name_settings {
push @NAME,"$default.$f";
# push @NAME,"$dir/model/$table.$VERSION.$f";
}
-
+
# get specified names, if any
if (&get("TRAINING:$table")) {
my @SPECIFIED_NAME = @{$CONFIG{"TRAINING:$table"}};
@@ -2890,7 +2890,7 @@ sub get_table_name_settings {
$cmd .= "-$table $name ";
}
return $cmd;
-}
+}
sub get_factor_id {
my ($type) = @_;
@@ -2908,7 +2908,7 @@ sub encode_factor_definition {
my $encoded;
foreach my $mapping (split(/,\s*/,$definition)) {
my ($in,$out) = split(/\s*->\s*/,$mapping);
- $encoded .=
+ $encoded .=
&encode_factor_list($IN,$in)."-".
&encode_factor_list($OUT,$out)."+";
}
@@ -2941,22 +2941,22 @@ sub define_tuningevaluation_filter {
$binarizer = &backoff_and_get("EVALUATION:$set:ttable-binarizer") unless $tuning_flag;
$binarizer = &backoff_and_get("TUNING:ttable-binarizer") if $tuning_flag;
my $report_precision_by_coverage = !$tuning_flag && &backoff_and_get("EVALUATION:$set:report-precision-by-coverage");
-
- # occasionally, lattices and conf nets need to be able
- # to filter phrase tables, we can provide sentences/ngrams
+
+ # occasionally, lattices and conf nets need to be able
+ # to filter phrase tables, we can provide sentences/ngrams
# in a separate file
my $input_filter;
$input_filter = &get("EVALUATION:$set:input-filter") unless $tuning_flag;
$input_filter = &get("TUNING:input-filter") if $tuning_flag;
#print "filter: $input_filter \n";
$input_filter = $input unless $input_filter;
-
+
my $settings = &backoff_and_get("EVALUATION:$set:filter-settings") unless $tuning_flag;
$settings = &backoff_and_get("TUNING:filter-settings") if $tuning_flag;
$settings = "" unless $settings;
$binarizer .= " -no-alignment-info" if defined ($binarizer) && !$hierarchical && defined $word_alignment && $word_alignment eq "no";
-
+
$settings .= " -Binarizer \"$binarizer\"" if $binarizer;
$settings .= " --Hierarchical" if $hierarchical;
@@ -2992,13 +2992,13 @@ sub define_tuningevaluation_filter {
$config = $tuning_flag ? "$dir/tuning/moses.table.ini.$VERSION" : "$dir/evaluation/$set.moses.table.ini.$VERSION";
$cmd = "touch $config\n";
$delete_config = 1;
-
+
$cmd .= &get_config_tables($config,$reordering_table,$phrase_translation_table,undef,$domains);
if (&get("TRAINING:in-decoding-transliteration")) {
$cmd .= "-transliteration-phrase-table $dir/model/transliteration-phrase-table.$VERSION ";
- }
+ }
$cmd .= "-lm 0:3:$config:8\n"; # dummy kenlm 3-gram model on factor 0
@@ -3009,12 +3009,12 @@ sub define_tuningevaluation_filter {
if ($sa_exec_dir) {
# suffix array
$cmd .= "$scripts/training/wrappers/adam-suffix-array/suffix-array-extract.sh $sa_exec_dir $phrase_translation_table $input_filter $filter_dir $sa_extractors \n";
-
+
my $escaped_filter_dir = $filter_dir;
$escaped_filter_dir =~ s/\//\\\\\//g;
$cmd .= "cat $config | sed s/10\\ 0\\ 0\\ 7.*/10\\ 0\\ 0\\ 7\\ $escaped_filter_dir/g > $filter_dir/moses.ini \n";
- # kind of a hack -- the correct thing would be to make the generation of the config file ($filter_dir/moses.ini)
- # set the PhraseDictionaryALSuffixArray's path to the filtered directory rather than to the suffix array itself
+ # kind of a hack -- the correct thing would be to make the generation of the config file ($filter_dir/moses.ini)
+ # set the PhraseDictionaryALSuffixArray's path to the filtered directory rather than to the suffix array itself
$cmd .= "sed -i 's%path=$phrase_translation_table%path=$filter_dir%' $filter_dir/moses.ini\n";
}
else {
@@ -3022,7 +3022,7 @@ sub define_tuningevaluation_filter {
$cmd .= "$scripts/training/filter-model-given-input.pl";
$cmd .= " $filter_dir $config $input_filter $settings\n";
}
-
+
# clean-up
$cmd .= "rm $config" if $delete_config;
@@ -3033,7 +3033,7 @@ sub define_evaluation_decode {
my ($set,$step_id) = @_;
my $scripts = &check_and_get("GENERAL:moses-script-dir");
my $dir = &check_and_get("GENERAL:working-dir");
-
+
my ($system_output,
$config,$input,$filtered_config) = &get_output_and_input($step_id);
$config = $filtered_config if $filtered_config;
@@ -3051,12 +3051,12 @@ sub define_evaluation_decode {
my $hierarchical = &get("TRAINING:hierarchical-rule-set");
my $word_alignment = &backoff_and_get("TRAINING:include-word-alignment-in-rules");
my $post_decoding_transliteration = &get("TRAINING:post-decoding-transliteration");
-
- # If Transliteration Module is to be used as post-decoding step ...
+
+ # If Transliteration Module is to be used as post-decoding step ...
if (defined($post_decoding_transliteration) && $post_decoding_transliteration eq "yes"){
$settings .= " -output-unknowns $system_output.oov";
}
-
+
# specify additional output for analysis
if (defined($report_precision_by_coverage) && $report_precision_by_coverage eq "yes") {
@@ -3095,7 +3095,7 @@ sub define_evaluation_decode {
$cmd .= "mkdir -p $dir/evaluation/tmp.$set.$VERSION\n";
$cmd .= "cd $dir/evaluation/tmp.$set.$VERSION\n";
if (defined $moses_parallel) {
- $cmd .= $moses_parallel;
+ $cmd .= $moses_parallel;
} else {
$cmd .= "$scripts/generic/moses-parallel.pl";
}
@@ -3105,7 +3105,7 @@ sub define_evaluation_decode {
$cmd .= " -config $config";
$cmd .= " -input-file $input";
$cmd .= " --jobs $jobs";
- $cmd .= " -decoder-parameters \"$settings\" > $system_output";
+ $cmd .= " -decoder-parameters \"$settings\" > $system_output";
$cmd .= " -n-best-file $system_output.best$nbest_size -n-best-size $nbest" if $nbest;
}
else {
@@ -3252,12 +3252,12 @@ sub define_reporting_report {
my $scripts = &check_and_get("GENERAL:moses-script-dir");
my $cmd = "$scripts/ems/support/report-experiment-scores.perl";
-
+
# get scores that were produced
foreach my $parent (@{$DEPENDENCY[$step_id]}) {
- my ($parent_module,$parent_set,$parent_step)
+ my ($parent_module,$parent_set,$parent_step)
= &deconstruct_name($DO_STEP[$parent]);
-
+
my $file = &get_default_file($parent_module,$parent_set,$parent_step);
$cmd .= " set=$parent_set,type=$parent_step,file=$file";
}
@@ -3282,7 +3282,7 @@ sub get_output_and_input {
my $output = &get_default_file(&deconstruct_name($step));
my @INPUT;
- if (defined($USES_INPUT{$step_id})) {
+ if (defined($USES_INPUT{$step_id})) {
for(my $i=0; $i<scalar @{$USES_INPUT{$step_id}}; $i++) {
# get name of input file needed
my $in_file = $USES_INPUT{$step_id}[$i];
@@ -3293,9 +3293,9 @@ sub get_output_and_input {
my $prev_step = "";
# print "\tlooking up in_file $in_file\n";
foreach my $parent (@{$DEPENDENCY[$step_id]}) {
- my ($parent_module,$parent_set,$parent_step)
+ my ($parent_module,$parent_set,$parent_step)
= &deconstruct_name($DO_STEP[$parent]);
- my $parent_file
+ my $parent_file
= &construct_name($parent_module,$parent_set,
$STEP_OUT{&defined_step($DO_STEP[$parent])});
if ($in_file eq $parent_file) {
@@ -3367,12 +3367,12 @@ sub define_template {
if ($single_cmd =~ /^ln /) {
$new_cmd .= $single_cmd."\n";
}
- elsif ($single_cmd =~ /^.+$/) {
+ elsif ($single_cmd =~ /^.+$/) {
# find IN and OUT files
$single_cmd =~ /(EMS_IN_EMS\S*)/
|| die("ERROR: could not find EMS_IN_EMS in $single_cmd");
my $in = $1;
- $single_cmd =~ /(EMS_OUT_EMS\S*)/
+ $single_cmd =~ /(EMS_OUT_EMS\S*)/
|| die("ERROR: could not find OUT in $single_cmd");
my $out = $1;
# replace IN and OUT with %s
@@ -3388,13 +3388,13 @@ sub define_template {
my $qsub_args = &get_qsub_args($DO_STEP[$step_id]);
$qflags="--queue-flags \"$qsub_args\"" if ($CLUSTER && $qsub_args);
$new_cmd .= "$parallelizer $qflags -in $in -out $out -cmd '$single_cmd' -jobs ".&get("$module:jobs")." -tmpdir $dir/$tmp_dir\n";
- }
+ }
if ($MULTICORE) {
$new_cmd .= "$parallelizer -in $in -out $out -cmd '$single_cmd' -cores ".&get("$module:cores")." -tmpdir $dir/$tmp_dir\n";
}
}
}
-
+
$cmd = $new_cmd;
$QSUB_STEP{$step_id}++;
}
@@ -3468,12 +3468,12 @@ sub create_step {
my $file = &versionize(&step_file2($module,$set,$step));
my $dir = &check_and_get("GENERAL:working-dir");
my $subdir = $module;
- $subdir =~ tr/A-Z/a-z/;
+ $subdir =~ tr/A-Z/a-z/;
$subdir = "evaluation" if $subdir eq "reporting";
$subdir = "lm" if $subdir eq "interpolated-lm";
open(STEP,">$file") or die "Cannot open: $!";
print STEP "#!/bin/bash\n\n";
- print STEP "PATH=\"".$ENV{"PATH"}."\"\n";
+ print STEP "PATH=\"".$ENV{"PATH"}."\"\n";
print STEP "cd $dir\n";
print STEP "echo 'starting at '`date`' on '`hostname`\n";
print STEP "mkdir -p $dir/$subdir\n\n";
@@ -3481,7 +3481,7 @@ sub create_step {
print STEP "echo 'finished at '`date`\n";
print STEP "touch $file.DONE\n";
close(STEP);
-}
+}
sub get {
return &check_and_get($_[0],"allow_undef");
@@ -3541,7 +3541,7 @@ sub check_backoff_and_get_array {
sub get_specified_or_default_file {
my ($specified_module,$specified_set,$specified_parameter,
$default_module, $default_set, $default_step) = @_;
- my $specified =
+ my $specified =
&construct_name($specified_module,$specified_set,$specified_parameter);
if (defined($CONFIG{$specified})) {
print "\t\texpanding $CONFIG{$specified}[0]\n" if $VERBOSE;
@@ -3624,7 +3624,7 @@ sub long_file_name {
$file = "$dir/$file";
}
- my $module_working_dir_parameter =
+ my $module_working_dir_parameter =
$module . ($set ne "" ? ":$set" : "") . ":working-dir";
if (defined($CONFIG{$module_working_dir_parameter})) {
@@ -3634,7 +3634,7 @@ sub long_file_name {
}
sub compute_version_number {
- my $dir = &check_and_get("GENERAL:working-dir");
+ my $dir = &check_and_get("GENERAL:working-dir");
$VERSION = 1;
return unless -e $dir;
open(LS,"find $dir/steps -maxdepth 1 -follow |");
diff --git a/scripts/ems/fix-info.perl b/scripts/ems/fix-info.perl
index 8f83d4ccf..abe58fe83 100755
--- a/scripts/ems/fix-info.perl
+++ b/scripts/ems/fix-info.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -8,7 +8,7 @@ $step = "*" unless defined($step);
die("fix-info.perl file [step]") unless defined($file);
die("file not found") unless -e $file;
-die("full path!") unless $file =~ /^\//;
+die("full path!") unless $file =~ /^\//;
my @filestat = stat($file);
my $newtime = $filestat[9];
@@ -21,7 +21,7 @@ while(my $info = <LS>) {
if (/$file .*\[/) {
$changed++;
s/($file) (.*\[)\d+/$1 $2$newtime/g;
- }
+ }
}
if ($changed) {
print "updating $info\n";
diff --git a/scripts/ems/support/analysis.perl b/scripts/ems/support/analysis.perl
index cea2657c9..f4d5a55b4 100755
--- a/scripts/ems/support/analysis.perl
+++ b/scripts/ems/support/analysis.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -29,7 +29,7 @@ if (!&GetOptions('system=s' => \$system, # raw output from decoder
'search-graph=s' => \$search_graph, # visualization of search graph
'hierarchical' => \$hierarchical) || # hierarchical model?
!defined($dir)) {
- die("ERROR: syntax: analysis.perl -system FILE -reference FILE -dir DIR [-input FILE] [-input-corpus FILE] [-ttable FILE] [-score-options SETTINGS] [-segmentation FILE] [-output-corpus FILE] [-alignment-file FILE] [-biconcor BIN]");
+ die("ERROR: syntax: analysis.perl -system FILE -reference FILE -dir DIR [-input FILE] [-input-corpus FILE] [-ttable FILE] [-score-options SETTINGS] [-segmentation FILE] [-output-corpus FILE] [-alignment-file FILE] [-biconcor BIN]");
}
`mkdir -p $dir`;
@@ -95,7 +95,7 @@ if (defined($segmentation)) {
# coverage analysis
my (%INPUT_PHRASE,%CORPUS_COVERED,%TTABLE_COVERED,%TTABLE_ENTROPY);
-if (!defined($coverage_dir) && (defined($ttable) || defined($corpus))) {
+if (!defined($coverage_dir) && (defined($ttable) || defined($corpus))) {
if (!defined($input)) {
die("ERROR: when specifying either ttable or input-corpus, please also specify input\n");
}
@@ -170,7 +170,7 @@ sub input_phrases {
$line = &get_factor_phrase($factor,$line);
&extract_n_grams($line,\%INPUT_PHRASE);
}
- close(INPUT);
+ close(INPUT);
}
# reduce a factorized phrase into the factors of interest
@@ -279,11 +279,11 @@ sub bleu_annotation {
}
if (ref($REFERENCE[$i]) eq 'ARRAY') {
foreach my $ref (@{$REFERENCE[$i]}) {
- print OUT "\t".$ref;
+ print OUT "\t".$ref;
}
}
else {
- print OUT "\t".$REFERENCE[$i]
+ print OUT "\t".$REFERENCE[$i]
}
print OUT "\n";
}
@@ -301,7 +301,7 @@ sub add_match {
my $ref_count = 0;
$ref_count = $REF_NGRAM{$length}{$ngram} if defined($REF_NGRAM{$length}{$ngram});
my $match_count = ($sys_count > $ref_count) ? $ref_count : $sys_count;
-
+
$$CORRECT{$length}{$ngram} += $match_count;
$$TOTAL{$length}{$ngram} += $sys_count;
#print "$length:$ngram $sys_count $ref_count\n";
@@ -345,7 +345,7 @@ sub ttable_coverage {
# handling hierarchical
$in =~ s/ \[[^ \]]+\]$//; # remove lhs nt
next if $in =~ /\[[^ \]]+\]\[[^ \]]+\]/; # only consider flat rules
- $in = &get_factor_phrase($factor,$in) if defined($factor) && $factor eq "0";
+ $in = &get_factor_phrase($factor,$in) if defined($factor) && $factor eq "0";
$scores = $COLUMN[4] if defined($hierarchical); #scalar @COLUMN == 5;
my @IN = split(/ /,$in);
$size = scalar @IN;
@@ -473,7 +473,7 @@ sub input_annotation {
#$ttable_entropy = 0 unless defined($ttable_entropy);
$ttable_covered = 0 unless defined($ttable_covered);
$corpus_covered = 0 unless defined($corpus_covered);
-
+
if (defined($TTABLE_COVERED{$length}{$phrase})) {
printf OUT "%d-%d:%d:%d:%.5f ",$start,$start+$length-1,$corpus_covered,$ttable_covered,$ttable_entropy;
}
@@ -481,7 +481,7 @@ sub input_annotation {
}
print OUT "\n";
}
- close(INPUT);
+ close(INPUT);
close(OUT);
}
@@ -532,7 +532,7 @@ sub extract_n_grams {
$sentence =~ s/\s+/ /g;
$sentence =~ s/^ //;
$sentence =~ s/ $//;
-
+
my @WORD = split(/ /,$sentence);
for(my $length=1;$length<=$MAX_LENGTH;$length++) {
for(my $i=0;$i<=scalar(@WORD)-$length;$i++) {
@@ -604,8 +604,8 @@ sub precision_by_coverage {
defined($REF_NGRAM{1}{$ngram})) {
my $ref_count = $REF_NGRAM{1}{$ngram};
my $sys_count = $SYS_NGRAM{1}{$ngram};
- $PREC_NGRAM{1}{$ngram} =
- ($ref_count >= $sys_count) ? 1 : $ref_count/$sys_count;
+ $PREC_NGRAM{1}{$ngram} =
+ ($ref_count >= $sys_count) ? 1 : $ref_count/$sys_count;
}
}
close(REPORT);
@@ -615,10 +615,10 @@ sub precision_by_coverage {
while($line =~ /([^|]+) \|(\d+)\-(\d+)\|\s*(.*)$/) {
my ($output,$from,$to) = ($1,$2,$3);
$line = $4;
-
+
# bug fix: 1-1 unknown word mappings get alignment point
if ($from == $to && # one
- scalar(split(/ /,$output)) == 1 && # to one
+ scalar(split(/ /,$output)) == 1 && # to one
!defined($ALIGNED{$from})) { # but not aligned
push @{$ALIGNED{$from}},$output_pos;
}
@@ -631,11 +631,11 @@ sub precision_by_coverage {
my ($precision,$deleted,$length) = (0,0,0);
- # unaligned? note as deleted
+ # unaligned? note as deleted
if (!defined($ALIGNED{$i})) {
$deleted = 1;
}
- # aligned
+ # aligned
else {
foreach my $o (@{$ALIGNED{$i}}) {
$precision += $PREC_NGRAM{1}{$OUTPUT[$o]};
@@ -649,12 +649,12 @@ sub precision_by_coverage {
$DELETED_BY_WORD{$word} += $deleted;
$PREC_BY_WORD{$word} += $precision;
$LENGTH_BY_WORD{$word} += $length;
- $TOTAL_BY_WORD{$word}++;
+ $TOTAL_BY_WORD{$word}++;
$DELETED_BY_COVERAGE{$coverage} += $deleted;
$PREC_BY_COVERAGE{$coverage} += $precision;
$LENGTH_BY_COVERAGE{$coverage} += $length;
- $TOTAL_BY_COVERAGE{$coverage}++;
+ $TOTAL_BY_COVERAGE{$coverage}++;
if ($precision_by_coverage_factor) {
$DELETED_BY_FACTOR{$FACTOR[$i]} += $deleted;
@@ -662,9 +662,9 @@ sub precision_by_coverage {
$PREC_BY_FACTOR{$FACTOR[$i]} += $precision;
$PREC_BY_FACTOR_COVERAGE{$FACTOR[$i]}{$coverage} += $precision;
$LENGTH_BY_FACTOR{$FACTOR[$i]} += $length;
- $LENGTH_BY_FACTOR_COVERAGE{$FACTOR[$i]}{$coverage} += $length;
- $TOTAL_BY_FACTOR{$FACTOR[$i]}++;
- $TOTAL_BY_FACTOR_COVERAGE{$FACTOR[$i]}{$coverage}++;
+ $LENGTH_BY_FACTOR_COVERAGE{$FACTOR[$i]}{$coverage} += $length;
+ $TOTAL_BY_FACTOR{$FACTOR[$i]}++;
+ $TOTAL_BY_FACTOR_COVERAGE{$FACTOR[$i]}{$coverage}++;
}
}
}
@@ -853,10 +853,10 @@ sub hs_scan_line {
# process a single sentence for hierarchical segmentation
sub hs_process {
my ($sentence,$DERIVATION,$STATS) = @_;
-
+
my $DROP_RULE = shift @{$DERIVATION}; # get rid of S -> S </s>
my $max = $$DERIVATION[0]{'end'};
-
+
# consolidate glue rules into one rule
my %GLUE_RULE;
$GLUE_RULE{'start'} = 1;
@@ -867,10 +867,10 @@ sub hs_process {
while(1) {
my $RULE = shift @{$DERIVATION};
if (scalar(@{$$RULE{'rule_rhs'}}) == 2 &&
- ($$RULE{'rule_lhs'} eq "S" &&
+ ($$RULE{'rule_lhs'} eq "S" &&
$$RULE{'rule_rhs'}[0] eq "S" &&
$$RULE{'rule_rhs'}[1] eq "X") ||
- ($$RULE{'rule_lhs'} eq "Q" &&
+ ($$RULE{'rule_lhs'} eq "Q" &&
$$RULE{'rule_rhs'}[0] eq "Q")) {
unshift @{$GLUE_RULE{'spans'}},$$RULE{'spans'}[1];
push @{$GLUE_RULE{'rule_rhs'}}, $$RULE{'rule_rhs'}[1];
@@ -883,17 +883,17 @@ sub hs_process {
last;
}
}
- unshift @{$DERIVATION}, \%GLUE_RULE;
+ unshift @{$DERIVATION}, \%GLUE_RULE;
$$STATS{'glue-rule'} += $x;
-
+
# create chart
my %CHART;
foreach my $RULE (@{$DERIVATION}) {
$CHART{$$RULE{'start'}}{$$RULE{'end'}} = $RULE;
}
-
+
# compute depth
- &hs_compute_depth(1,$max,0,\%CHART);
+ &hs_compute_depth(1,$max,0,\%CHART);
my $max_depth = 0;
foreach my $RULE (@{$DERIVATION}) {
next unless defined($$RULE{'depth'}); # better: delete offending rule S -> S <s>
@@ -901,17 +901,17 @@ sub hs_process {
}
&hs_recompute_depth(1,$max,\%CHART,$max_depth);
$$STATS{'depth'} += $max_depth;
-
+
# build matrix of divs
-
+
my @MATRIX;
&hs_create_out_span(1,$max,\%CHART,\@MATRIX);
print OUTPUT_TREE &hs_output_matrix($sentence,\@MATRIX,$max_depth);
-
+
my @MATRIX_IN;
&hs_create_in_span(1,$max,\%CHART,\@MATRIX_IN);
print INPUT_TREE &hs_output_matrix($sentence,\@MATRIX_IN,$max_depth);
-
+
# number rules and get their children
my $id = 0;
foreach my $RULE (@{$DERIVATION}) {
@@ -920,10 +920,10 @@ sub hs_process {
$$RULE{'id'} = $id++;
}
&hs_get_children(1,$max,\%CHART);
-
+
foreach my $RULE (@{$DERIVATION}) {
next unless defined($$RULE{'start_div'}); # better: delete offending rule S -> S <s>
-
+
print NODE $sentence." ";
print NODE $$RULE{'depth'}." ";
print NODE $$RULE{'start_div'}." ".$$RULE{'end_div'}." ";
@@ -963,11 +963,11 @@ sub hs_output_matrix {
$class = "]";
}
elsif ($OPEN[$d]) {
- $class = "-";
+ $class = "-";
}
$out .= $class;
}
- $out .= "\t";
+ $out .= "\t";
$out .= $$SPAN{'lhs'} if defined($$SPAN{'lhs'});
$out .= "\t";
$out .= $$SPAN{'rhs'} if defined($$SPAN{'rhs'});
@@ -984,9 +984,9 @@ sub hs_output_matrix {
sub hs_rule_type {
my ($RULE) = @_;
-
+
my $type = "";
-
+
# output side
my %NT;
my $total_word_count = 0;
@@ -998,7 +998,7 @@ sub hs_rule_type {
$word_count = 0;
my $nt = chr(97+$nt_count++);
$NT{$$RULE{'alignment'}{$i}} = $nt;
- $type .= $nt;
+ $type .= $nt;
}
else {
$word_count++;
@@ -1006,9 +1006,9 @@ sub hs_rule_type {
}
}
$type .= $word_count if $word_count > 0;
-
+
$type .= ":".$total_word_count.":".$nt_count.":";
-
+
# input side
$word_count = 0;
$total_word_count = 0;
@@ -1039,7 +1039,7 @@ sub hs_compute_depth {
my $RULE = $$CHART{$start}{$end};
$$RULE{'depth'} = $depth;
-
+
for(my $i=0;$i<scalar @{$$RULE{'rule_rhs'}};$i++) {
# non-terminals
if (defined($$RULE{'alignment'}{$i})) {
@@ -1057,7 +1057,7 @@ sub hs_recompute_depth {
return 0;
}
my $RULE = $$CHART{$start}{$end};
-
+
my $min_sub_depth = $max_depth+1;
for(my $i=0;$i<scalar @{$$RULE{'rule_rhs'}};$i++) {
# non-terminals
@@ -1079,10 +1079,10 @@ sub hs_get_children {
return -1;
}
my $RULE = $$CHART{$start}{$end};
-
+
my @CHILDREN = ();
$$RULE{'children'} = \@CHILDREN;
-
+
for(my $i=0;$i<scalar @{$$RULE{'rule_rhs'}};$i++) {
# non-terminals
if (defined($$RULE{'alignment'}{$i})) {
@@ -1091,7 +1091,7 @@ sub hs_get_children {
push @CHILDREN, $child unless $child == -1;
}
}
- return $$RULE{'id'};
+ return $$RULE{'id'};
}
# create the span annotation for an output sentence
@@ -1102,7 +1102,7 @@ sub hs_create_out_span {
return;
}
my $RULE = $$CHART{$start}{$end};
-
+
my %SPAN;
$SPAN{'start'} = $start;
$SPAN{'end'} = $end;
@@ -1130,7 +1130,7 @@ sub hs_create_out_span {
$SPAN{'end'} = $end;
$SPAN{'depth'} = $$RULE{'depth'};
push @{$MATRIX},\%SPAN;
- $THIS_SPAN = \%SPAN;
+ $THIS_SPAN = \%SPAN;
}
$$THIS_SPAN{'rhs'} .= " " if defined($$THIS_SPAN{'rhs'});
$$THIS_SPAN{'rhs'} .= $$RULE{"rule_rhs"}[$i];
@@ -1150,7 +1150,7 @@ sub hs_create_in_span {
return;
}
my $RULE = $$CHART{$start}{$end};
-
+
my %SPAN;
$SPAN{'start'} = $start;
$SPAN{'end'} = $end;
@@ -1160,7 +1160,7 @@ sub hs_create_in_span {
push @{$MATRIX},\%SPAN;
$$RULE{'start_div_in'} = $#{$MATRIX};
my $THIS_SPAN = \%SPAN;
-
+
my $terminal = 1;
# in input order ...
for(my $i=0;$i<scalar(@{$$RULE{'spans'}});$i++) {
@@ -1177,7 +1177,7 @@ sub hs_create_in_span {
$SPAN{'end'} = $end;
$SPAN{'depth'} = $$RULE{'depth'};
push @{$MATRIX},\%SPAN;
- $THIS_SPAN = \%SPAN;
+ $THIS_SPAN = \%SPAN;
}
$$THIS_SPAN{'rhs'} .= " " if defined($$THIS_SPAN{'rhs'});
$$THIS_SPAN{'rhs'} .= $$SUBSPAN{'word'};
@@ -1204,7 +1204,7 @@ sub process_search_graph {
$heuristic_rule_score = $rule_score; # hmmmm....
}
else {
- die("ERROR: buggy search graph line: $_");
+ die("ERROR: buggy search graph line: $_");
}
chop($alignment) if $alignment;
chop($children) if $children;
diff --git a/scripts/ems/support/berkeley-process.sh b/scripts/ems/support/berkeley-process.sh
index 42b8ba9c3..e68056c96 100755
--- a/scripts/ems/support/berkeley-process.sh
+++ b/scripts/ems/support/berkeley-process.sh
@@ -1,6 +1,6 @@
#!/bin/sh
-if [ $# -lt 8 ]
+if [ $# -lt 8 ]
then
echo "Usage: $0 <\"java options\"> <berkeleyaligner jar file> <input file stem> <previous berkeley param dir> <output directory> <source lang> <target lang> <alignment name (i.e. 'berk' or 'low-posterior')> <posterior threshold> [aligner options...]"
exit 1
@@ -23,7 +23,7 @@ shift
shift
shift
shift
-shift
+shift
JAVA_CMD="/usr/local/share/java/bin/java $JAVA_OPTS -jar $JAR -Data.trainSources $INFILE.list -Main.loadParamsDir $PARAMDIR -exec.execDir $OUTNAME -Main.loadLexicalModelOnly false -Data.englishSuffix $SLANG -Data.foreignSuffix $TLANG -exec.create true -Main.saveParams false -Main.alignTraining true -Main.forwardModels HMM -Main.reverseModels HMM -Main.mode JOINT -Main.iters 0 -Data.testSources -EMWordAligner.posteriorDecodingThreshold $POSTERIOR $@"
echo "Running $JAVA_CMD"
@@ -37,8 +37,8 @@ gzip $OUTNAME/training.$TLANG-$SLANG.A3
#now shift the output
perl -e "
-use strict;
-while (<STDIN>) {
+use strict;
+while (<STDIN>) {
chomp();
my @pairs = split(\" \");
for (my \$i=0;\$i<scalar(@pairs);\$i++) {
diff --git a/scripts/ems/support/berkeley-train.sh b/scripts/ems/support/berkeley-train.sh
index 57d2963fc..96f6b648c 100755
--- a/scripts/ems/support/berkeley-train.sh
+++ b/scripts/ems/support/berkeley-train.sh
@@ -1,6 +1,6 @@
#!/bin/sh
-if [ $# -lt 6 ]
+if [ $# -lt 6 ]
then
echo "Usage: $0 <\"java options\"> <berkeleyaligner jar file> <input file stem> <output directory> <source lang> <target lang> [aligner options...]"
exit 1
@@ -20,4 +20,4 @@ shift
shift
echo $INFILE > $INFILE.list
-/usr/local/share/java/bin/java $JAVA_OPTS -jar $JAR -Data.trainSources $INFILE.list -exec.execDir $OUTDIR -Data.englishSuffix $SLANG -Data.foreignSuffix $TLANG -exec.create true -Main.SaveParams true -Main.alignTraining false -Data.testSources $@
+/usr/local/share/java/bin/java $JAVA_OPTS -jar $JAR -Data.trainSources $INFILE.list -exec.execDir $OUTDIR -Data.englishSuffix $SLANG -Data.foreignSuffix $TLANG -exec.create true -Main.SaveParams true -Main.alignTraining false -Data.testSources $@
diff --git a/scripts/ems/support/build-domain-file-from-subcorpora.perl b/scripts/ems/support/build-domain-file-from-subcorpora.perl
index f166c8927..085fd2629 100755
--- a/scripts/ems/support/build-domain-file-from-subcorpora.perl
+++ b/scripts/ems/support/build-domain-file-from-subcorpora.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -7,7 +7,7 @@ use strict;
# (helper for domain adatpation)
# Creates a file with domain names and end line numbers for different domains
-# within the cleaned training corpus. This file is used by various domain
+# within the cleaned training corpus. This file is used by various domain
# adaptation methods.
my ($extension,@SUBCORPORA) = @ARGV;
diff --git a/scripts/ems/support/build-sparse-features.perl b/scripts/ems/support/build-sparse-features.perl
index 3f4b505d5..79fc1e394 100755
--- a/scripts/ems/support/build-sparse-features.perl
+++ b/scripts/ems/support/build-sparse-features.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -18,7 +18,7 @@ foreach my $feature_spec (split(/,\s*/,$specification)) {
my @SPEC = split(/\s+/,$feature_spec);
my $factor = ($SPEC[0] eq 'word-translation') ? "0-0" : "0";
- $factor = $1 if $feature_spec =~ / factor ([\d\-]+)/;
+ $factor = $1 if $feature_spec =~ / factor ([\d\-]+)/;
$feature_spec =~ s/ factor ([\d\-]+)//;
if ($SPEC[0] eq 'target-word-insertion') {
@@ -107,7 +107,7 @@ sub create_top_words {
open(TOP,">$file");
for(my $i=0; $i<$count && $i<scalar(@SORTED); $i++) {
$SORTED[$i] =~ /^\d+ (.+)$/;
- print TOP "$1\n";
+ print TOP "$1\n";
}
close(TOP);
diff --git a/scripts/ems/support/consolidate-training-data.perl b/scripts/ems/support/consolidate-training-data.perl
index 170ba999c..4ab7f82cf 100755
--- a/scripts/ems/support/consolidate-training-data.perl
+++ b/scripts/ems/support/consolidate-training-data.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id: consolidate-training-data.perl 928 2009-09-02 02:58:01Z philipp $
diff --git a/scripts/ems/support/fast-align-in-parts.perl b/scripts/ems/support/fast-align-in-parts.perl
index fa501b454..f777d7e52 100755
--- a/scripts/ems/support/fast-align-in-parts.perl
+++ b/scripts/ems/support/fast-align-in-parts.perl
@@ -24,7 +24,7 @@ die("ERROR - usage: fast-align-in-parts.perl -bin FAST_ALIGN_BIN -i PARALLEL_COR
&& $MAX_LINES > 0;
die("ERROR - input file does not exist: $IN") unless -e $IN;
die("ERROR - fast_align binary does not exist: $BIN") unless -e $BIN;
-
+
chomp(my $line_count = `cat $IN | wc -l`);
# not more than maximal number of lines -> just run it regulary
diff --git a/scripts/ems/support/generic-multicore-parallelizer.perl b/scripts/ems/support/generic-multicore-parallelizer.perl
index e5a12adce..0f7910603 100755
--- a/scripts/ems/support/generic-multicore-parallelizer.perl
+++ b/scripts/ems/support/generic-multicore-parallelizer.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -6,7 +6,7 @@ use strict;
my $cores = 8;
my $serial = 1;
my ($infile,$outfile,$cmd,$tmpdir);
-my $parent = $$;
+my $parent = $$;
use Getopt::Long qw(:config pass_through no_ignore_case);
GetOptions('cores=i' => \$cores,
@@ -27,7 +27,7 @@ die("ERROR: you need to specify a tempdir with -tmpdir") unless $tmpdir;
# create split input files
my $sentenceN = `cat $infile | wc -l`;
-my $splitN = int(($sentenceN+($cores*$serial)-0.5) / ($cores*$serial));
+my $splitN = int(($sentenceN+($cores*$serial)-0.5) / ($cores*$serial));
print STDERR "split -a 3 -l $splitN $infile $tmpdir/in-$parent-\n";
`split -a 4 -l $splitN $infile $tmpdir/in-$parent-`;
diff --git a/scripts/ems/support/generic-parallelizer.perl b/scripts/ems/support/generic-parallelizer.perl
index fd7fb2552..811a99bde 100755
--- a/scripts/ems/support/generic-parallelizer.perl
+++ b/scripts/ems/support/generic-parallelizer.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -25,7 +25,7 @@ $qflags = "" unless $qflags;
# create split input files
my $sentenceN = `cat $infile | wc -l`;
-my $splitN = int(($sentenceN+$jobs-0.5) / $jobs);
+my $splitN = int(($sentenceN+$jobs-0.5) / $jobs);
`split -a 2 -l $splitN $infile $tmpdir/in-$$-`;
# find out the names of the jobs
@@ -56,7 +56,7 @@ foreach my $job (@JOB){
# get qsub ID
my @QSUB_ID;
-foreach my $job (@JOB){
+foreach my $job (@JOB){
`cat $tmpdir/job-$$-$job.log` =~ /Your job (\d+) /
or die "ERROR: Can't read log of job $tmpdir/job-$$-$job.log";
push @QSUB_ID,$1;
diff --git a/scripts/ems/support/input-from-sgm.perl b/scripts/ems/support/input-from-sgm.perl
index 223996676..18000581a 100755
--- a/scripts/ems/support/input-from-sgm.perl
+++ b/scripts/ems/support/input-from-sgm.perl
@@ -1,9 +1,9 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
-die("ERROR syntax: input-from-sgm.perl < in.sgm > in.txt")
+die("ERROR syntax: input-from-sgm.perl < in.sgm > in.txt")
unless scalar @ARGV == 0;
while(my $line = <STDIN>) {
@@ -17,7 +17,7 @@ while(my $line = <STDIN>) {
$line !~ /<seg[^>]+>\s*(.*)\s*<\/seg>/i) {
my $next_line = <STDIN>;
$line .= $next_line;
- chop($line);
+ chop($line);
}
if ($line =~ /<seg[^>]+>\s*(.*)\s*<\/seg>/i) {
my $input = $1;
diff --git a/scripts/ems/support/interpolate-lm.perl b/scripts/ems/support/interpolate-lm.perl
index a2fe62b22..7d52fd877 100755
--- a/scripts/ems/support/interpolate-lm.perl
+++ b/scripts/ems/support/interpolate-lm.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -28,8 +28,8 @@ die("interpolate-lm.perl --tuning set --name out-lm --lm lm0,lm1,lm2,lm3 [--sril
# check and set default to unset parameters
die("ERROR: please specify output language model name --name") unless defined($NAME);
-die("ERROR: please specify tuning set with --tuning") unless defined($TUNING);
-die("ERROR: please specify language models with --lm") unless defined($LM);
+die("ERROR: please specify tuning set with --tuning") unless defined($TUNING);
+die("ERROR: please specify language models with --lm") unless defined($LM);
die("ERROR: can't read $TUNING") unless -e $TUNING;
die("ERROR: did not find srilm dir") unless -e $SRILM;
die("ERROR: cannot run ngram") unless -x $SRILM."/ngram";
@@ -152,7 +152,7 @@ sub interpolate {
$mix =~ /best lambda \(([\d\. e-]+)\)/ || die("ERROR: computing lambdas failed: $mix");
@LAMBDA = split(/ /,$1);
}
-
+
# create new language model
print STDERR "creating new language model...\n";
my $i = 0;
@@ -196,7 +196,7 @@ sub saferun3 {
print STDERR "Executing: @_\n";
my $wtr = gensym();
my $rdr = gensym();
- my $err = gensym();
+ my $err = gensym();
my $pid = open3($wtr, $rdr, $err, @_);
close($wtr);
my $gotout = "";
diff --git a/scripts/ems/support/lmplz-wrapper.perl b/scripts/ems/support/lmplz-wrapper.perl
index f36d2d9e0..df503754f 100755
--- a/scripts/ems/support/lmplz-wrapper.perl
+++ b/scripts/ems/support/lmplz-wrapper.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/ems/support/mml-filter.perl b/scripts/ems/support/mml-filter.perl
index c50725aae..51bc4cda5 100755
--- a/scripts/ems/support/mml-filter.perl
+++ b/scripts/ems/support/mml-filter.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/ems/support/mml-score.perl b/scripts/ems/support/mml-score.perl
index 449d6a05c..6f7b724ea 100755
--- a/scripts/ems/support/mml-score.perl
+++ b/scripts/ems/support/mml-score.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/ems/support/mml-train.perl b/scripts/ems/support/mml-train.perl
index 1f0548082..dcc998711 100755
--- a/scripts/ems/support/mml-train.perl
+++ b/scripts/ems/support/mml-train.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/ems/support/prepare-fast-align.perl b/scripts/ems/support/prepare-fast-align.perl
index 54c124af0..80fec36b2 100755
--- a/scripts/ems/support/prepare-fast-align.perl
+++ b/scripts/ems/support/prepare-fast-align.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/ems/support/reference-from-sgm.perl b/scripts/ems/support/reference-from-sgm.perl
index 595226bf1..ebb9ae4ae 100755
--- a/scripts/ems/support/reference-from-sgm.perl
+++ b/scripts/ems/support/reference-from-sgm.perl
@@ -1,9 +1,9 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
-die("ERROR syntax: reference-from-sgm.perl ref src out")
+die("ERROR syntax: reference-from-sgm.perl ref src out")
unless scalar @ARGV == 3;
my ($ref,$src,$txt) = @ARGV;
diff --git a/scripts/ems/support/remove-segmentation-markup.perl b/scripts/ems/support/remove-segmentation-markup.perl
index d6333f813..a0bd61fff 100755
--- a/scripts/ems/support/remove-segmentation-markup.perl
+++ b/scripts/ems/support/remove-segmentation-markup.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -6,7 +6,7 @@ use strict;
$|++;
while(<STDIN>) {
- s/ \|\d+\-\d+\| / /g;
- s/ \|\d+\-\d+\|$//;
+ s/ \|\d+\-\d+\| / /g;
+ s/ \|\d+\-\d+\|$//;
print $_;
}
diff --git a/scripts/ems/support/report-experiment-scores.perl b/scripts/ems/support/report-experiment-scores.perl
index ef64d4c2d..b649951ce 100755
--- a/scripts/ems/support/report-experiment-scores.perl
+++ b/scripts/ems/support/report-experiment-scores.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id: report-experiment-scores.perl 407 2008-11-10 14:43:31Z philipp $
diff --git a/scripts/ems/support/run-command-on-multiple-refsets.perl b/scripts/ems/support/run-command-on-multiple-refsets.perl
index c3db3c4dc..1e914b44b 100755
--- a/scripts/ems/support/run-command-on-multiple-refsets.perl
+++ b/scripts/ems/support/run-command-on-multiple-refsets.perl
@@ -1,9 +1,9 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
-die("ERROR: syntax: run-command-on-multiple-refsets.perl cmd in out")
+die("ERROR: syntax: run-command-on-multiple-refsets.perl cmd in out")
unless scalar @ARGV == 3;
my ($cmd,$in,$out) = @ARGV;
diff --git a/scripts/ems/support/run-wade.perl b/scripts/ems/support/run-wade.perl
index 25cda3bb3..175948b98 100755
--- a/scripts/ems/support/run-wade.perl
+++ b/scripts/ems/support/run-wade.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/ems/support/split-sentences.perl b/scripts/ems/support/split-sentences.perl
index f1af451b3..02a1e2315 100755
--- a/scripts/ems/support/split-sentences.perl
+++ b/scripts/ems/support/split-sentences.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# Based on Preprocessor written by Philipp Koehn
@@ -97,21 +97,21 @@ sub preprocess {
$text =~ s/ \n/\n/g;
$text =~ s/^ //g;
$text =~ s/ $//g;
-
+
#####add sentence breaks as needed#####
-
+
#non-period end of sentence markers (?!) followed by sentence starters.
$text =~ s/([?!]) +([\'\"\(\[\¿\¡\p{IsPi}]*[\p{IsUpper}])/$1\n$2/g;
-
+
#multi-dots followed by sentence starters
$text =~ s/(\.[\.]+) +([\'\"\(\[\¿\¡\p{IsPi}]*[\p{IsUpper}])/$1\n$2/g;
-
+
# add breaks for sentences that end with some sort of punctuation inside a quote or parenthetical and are followed by a possible sentence starter punctuation and upper case
$text =~ s/([?!\.][\ ]*[\'\"\)\]\p{IsPf}]+) +([\'\"\(\[\¿\¡\p{IsPi}]*[\ ]*[\p{IsUpper}])/$1\n$2/g;
-
+
# add breaks for sentences that end with some sort of punctuation are followed by a sentence starter punctuation and upper case
$text =~ s/([?!\.]) +([\'\"\(\[\¿\¡\p{IsPi}]+[\ ]*[\p{IsUpper}])/$1\n$2/g;
-
+
# special punctuation cases are covered. Check all remaining periods.
my $word;
my $i;
@@ -125,32 +125,32 @@ sub preprocess {
if($prefix && $NONBREAKING_PREFIX{$prefix} && $NONBREAKING_PREFIX{$prefix} == 1 && !$starting_punct) {
#not breaking;
} elsif ($words[$i] =~ /(\.)[\p{IsUpper}\-]+(\.+)$/) {
- #not breaking - upper case acronym
+ #not breaking - upper case acronym
} elsif($words[$i+1] =~ /^([ ]*[\'\"\(\[\¿\¡\p{IsPi}]*[ ]*[\p{IsUpper}0-9])/) {
#the next word has a bunch of initial quotes, maybe a space, then either upper case or a number
$words[$i] = $words[$i]."\n" unless ($prefix && $NONBREAKING_PREFIX{$prefix} && $NONBREAKING_PREFIX{$prefix} == 2 && !$starting_punct && ($words[$i+1] =~ /^[0-9]+/));
#we always add a return for these unless we have a numeric non-breaker and a number start
}
-
+
}
$text = $text.$words[$i]." ";
}
-
+
#we stopped one token from the end to allow for easy look-ahead. Append it now.
$text = $text.$words[$i];
-
+
# clean up spaces at head and tail of each line as well as any double-spacing
$text =~ s/ +/ /g;
$text =~ s/\n /\n/g;
$text =~ s/ \n/\n/g;
$text =~ s/^ //g;
$text =~ s/ $//g;
-
+
#add trailing break
$text .= "\n" unless $text =~ /\n$/;
-
+
return $text;
-
+
}
diff --git a/scripts/ems/support/submit-grid.perl b/scripts/ems/support/submit-grid.perl
index 9997241e7..a0967f9a5 100755
--- a/scripts/ems/support/submit-grid.perl
+++ b/scripts/ems/support/submit-grid.perl
@@ -9,7 +9,7 @@ use File::Basename;
my $continue = 0;
-my $args = "";
+my $args = "";
my $config;
GetOptions("continue=i" => \$continue,
diff --git a/scripts/ems/support/substitute-filtered-tables-and-weights.perl b/scripts/ems/support/substitute-filtered-tables-and-weights.perl
index 681d251c7..13be52c6b 100755
--- a/scripts/ems/support/substitute-filtered-tables-and-weights.perl
+++ b/scripts/ems/support/substitute-filtered-tables-and-weights.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/ems/support/substitute-filtered-tables.perl b/scripts/ems/support/substitute-filtered-tables.perl
index e7d9f55f8..c5ebabded 100755
--- a/scripts/ems/support/substitute-filtered-tables.perl
+++ b/scripts/ems/support/substitute-filtered-tables.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
@@ -45,7 +45,7 @@ while(my $line = <STDIN>) {
elsif ($feature_section && $line =~ /LexicalReordering/) {
print $arr[$ind]."\n";
++$ind;
- }
+ }
else {
print "$line\n";
}
diff --git a/scripts/ems/support/substitute-weights.perl b/scripts/ems/support/substitute-weights.perl
index 42357ed1e..b692f3f85 100755
--- a/scripts/ems/support/substitute-weights.perl
+++ b/scripts/ems/support/substitute-weights.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
@@ -18,7 +18,7 @@ while(my $line = <BASEINI>) {
}
elsif ($line =~ /\[[a-zA-Z0-9\-]*\]/) {
$inWeightSection = 0;
- }
+ }
if (!$inWeightSection) {
print OUT "$line\n" unless $line =~ /dense weights for feature functions/;
@@ -48,7 +48,7 @@ while(my $line = <WEIGHTINI>) {
elsif ($line =~ /\[[a-zA-Z0-9\-]*\]/) {
print OUT "\n" if $inWeightSection;
$inWeightSection = 0;
- }
+ }
if ($inWeightSection && $line !~ /^\s*$/) {
print OUT "$line\n";
diff --git a/scripts/ems/support/symmetrize-fast-align.perl b/scripts/ems/support/symmetrize-fast-align.perl
index 90621dea9..9f7fec248 100755
--- a/scripts/ems/support/symmetrize-fast-align.perl
+++ b/scripts/ems/support/symmetrize-fast-align.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/ems/support/thot-lm-wrapper.perl b/scripts/ems/support/thot-lm-wrapper.perl
index 222623c5b..59d483e65 100755
--- a/scripts/ems/support/thot-lm-wrapper.perl
+++ b/scripts/ems/support/thot-lm-wrapper.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/ems/support/tree-converter-wrapper.perl b/scripts/ems/support/tree-converter-wrapper.perl
index a37654cf1..aae55991a 100755
--- a/scripts/ems/support/tree-converter-wrapper.perl
+++ b/scripts/ems/support/tree-converter-wrapper.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/ems/support/wrap-xml.perl b/scripts/ems/support/wrap-xml.perl
index 28708a62a..52190309a 100755
--- a/scripts/ems/support/wrap-xml.perl
+++ b/scripts/ems/support/wrap-xml.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/ems/web/analysis.php b/scripts/ems/web/analysis.php
index 00bb9e15f..57776dd22 100644
--- a/scripts/ems/web/analysis.php
+++ b/scripts/ems/web/analysis.php
@@ -1,4 +1,4 @@
-<?php
+<?php
# main page frame, triggers the loading of parts
function show_analysis() {
@@ -122,7 +122,7 @@ function precision_by_coverage() {
$log_info[$log_count]["precision"] += $item[1];
$log_info[$log_count]["delete"] += $item[2];
$log_info[$log_count]["length"] += $item[3];
- $log_info[$log_count]["total"] += $item[4];
+ $log_info[$log_count]["total"] += $item[4];
}
print "<h4>By log<sub>2</sub>-count in the training corpus</h4>";
precision_by_coverage_graph("byCoverage",$log_info,$total,$img_width,SORT_NUMERIC);
@@ -159,7 +159,7 @@ function precision_by_coverage_factored($img_width,$total,$file,$factor_id) {
$log_info_factored[$factor][$log_count]["precision"] += $item[2];
$log_info_factored[$factor][$log_count]["delete"] += $item[3];
$log_info_factored[$factor][$log_count]["length"] += $item[4];
- $log_info_factored[$factor][$log_count]["total"] += $item[5];
+ $log_info_factored[$factor][$log_count]["total"] += $item[5];
}
print "<h4>By factor ".factor_name("input",$factor_id)."</h4>";
precision_by_coverage_graph("byFactor",$info_factored_sum,$total,$img_width,SORT_STRING);
@@ -203,7 +203,7 @@ function precision_by_word($type) {
if ($byCoverage != -2 && $byCoverage != $log_count) {
continue;
}
-
+
//# filter for factor
$word = $item[5];
if ($byFactor != "false" && $byFactor != $item[6]) {
@@ -231,7 +231,7 @@ function precision_by_word($type) {
function precision_by_coverage_latex($name,$log_info,$total,$img_width,$sort_type) {
$keys = array_keys($log_info);
sort($keys,$sort_type);
-
+
$img_width /= 100;
print "<div id=\"LatexToggle$name\" onClick=\"document.getElementById('Latex$name').style.display = 'block'; this.style.display = 'none';\" style=\"display:none;\"><font size=-2>(show LaTeX)</font></div>\n";
print "<div id=\"Latex$name\" style=\"display:none;\">\n";
@@ -242,7 +242,7 @@ function precision_by_coverage_latex($name,$log_info,$total,$img_width,$sort_typ
$height = 1.8-$line/10*1.8;
print "\\draw[thin,lightgray] (0.2,-$height) ";
print "node[anchor=east,black] {".$line."0\\%} -- ";
- print "($img_width,-$height) ;<br>\n";
+ print "($img_width,-$height) ;<br>\n";
}
print "% co-ordinates for deletion<br>\n";
for($line=0;$line<=3;$line++) {
@@ -251,7 +251,7 @@ function precision_by_coverage_latex($name,$log_info,$total,$img_width,$sort_typ
if ($line != 0) {
print "node[anchor=east,black] {".$line."0\\%} ";
}
- print "-- ($img_width,-$height) ;<br>\n";
+ print "-- ($img_width,-$height) ;<br>\n";
}
print "% boxes<br>\n";
@@ -265,13 +265,13 @@ function precision_by_coverage_latex($name,$log_info,$total,$img_width,$sort_typ
$width += $x;
$height += $y;
-
+
print "\\filldraw[very thin,gray] ($x,-$y) rectangle($width,-$height) ;<br>";
print "\\draw[very thin,black] ($x,-$y) rectangle($width,-$height);<br>";
if ($width-$x>.1) {
print "\\draw (".(($x+$width)/2).",-1.8) node[anchor=north,black] {".$i."};<br>";
}
-
+
$del_ratio = $log_info[$i]["delete"]/$log_info[$i]["total"];
$height = $del_ratio*1.80;
@@ -281,10 +281,10 @@ function precision_by_coverage_latex($name,$log_info,$total,$img_width,$sort_typ
print "\\filldraw[very thin,lightgray] ($x,-2) rectangle($width,-$height);<br>\n";
print "\\draw[very thin,black] ($x,-2) rectangle($width,-$height);<br>\n";
- $total_so_far += $log_info[$i]["total"];
+ $total_so_far += $log_info[$i]["total"];
}
print "\\end{tikzpicture}</code>";
- print "</div>";
+ print "</div>";
}
function precision_by_coverage_graph($name,$log_info,$total,$img_width,$sort_type) {
@@ -351,7 +351,7 @@ ctx.font = '9px serif';
print "ctx.fillRect ($x, 200, $width, $height);";
$total_so_far += $log_info[$i]["total"];
-
+
if ($width>3) {
print "ctx.fillStyle = \"rgb(0,0,0)\";";
// print "ctx.rotate(-1.5707);";
@@ -410,7 +410,7 @@ function ngram_summary() {
$info["$type-1-correct"],
$info["$type-2-correct"],
$info["$type-3-correct"],
- $info["$type-4-correct"]);
+ $info["$type-4-correct"]);
printf("<tr><td>&nbsp;</td><td>%.1f%s</td><td>%.1f%s</td><td>%.1f%s</td><td>%.1f%s</td></tr>\n",
$info["$type-1-correct"]/$info["$type-1-total"]*100,'%',
$info["$type-2-correct"]/$info["$type-2-total"]*100,'%',
@@ -483,7 +483,7 @@ function ngram_show($type) {
$data = file(get_current_analysis_filename("basic","n-gram-$type.$order"));
for($i=0;$i<count($data);$i++) {
$item = split("\t",$data[$i]);
- $line["total"] = $item[0];
+ $line["total"] = $item[0];
$line["correct"] = $item[1];
$line["ngram"] = $item[2];
$ngram[] = $line;
@@ -496,7 +496,7 @@ function ngram_show($type) {
$sort = 'ratio_worst';
$smooth = 1;
}
-
+
// sort index
for($i=0;$i<count($ngram);$i++) {
if ($sort == "abs_worst") {
@@ -806,7 +806,7 @@ function segmentation_summary() {
if (array_key_exists($in,$count) &&
array_key_exists($out,$count[$in])) {
$c = $count[$in][$out];
- }
+ }
else { $c = 0; }
printf("<td align=right nowrap>%d (%.1f%s)</td>",$c,100*$c/$total,"%");
}
@@ -885,8 +885,8 @@ function bleu_show() {
$count = $_GET['count'];
if ($count == 0) { $count = 5; }
- $filter = "";
- if (array_key_exists("filter",$_GET)) {
+ $filter = "";
+ if (array_key_exists("filter",$_GET)) {
$filter = base64_decode($_GET['filter']);
}
@@ -924,7 +924,7 @@ function bleu_show() {
if ($filter != "") {
print "; filter: '$filter'";
}
-
+
sentence_annotation($count,$filter);
print "<p align=center><A HREF=\"javascript:show('bleu','" . $_GET['sort'] . "',5+$count,'".base64_encode($filter)."')\">5 more</A> | ";
print "<A HREF=\"javascript:show('bleu','" . $_GET['sort'] . "',10+$count,'".base64_encode($filter)."')\">10 more</A> | ";
@@ -950,28 +950,28 @@ function sentence_annotation($count,$filter) {
$word = explode(" ",$item[0]);
$keep = 0;
for($j=0;$j<count($word);$j++) {
- if ($word[$j] == $filter) {
- $keep = 1;
+ if ($word[$j] == $filter) {
+ $keep = 1;
}
}
- if (!$keep) { $filtered[$i] = 1; }
+ if (!$keep) { $filtered[$i] = 1; }
}
- }
+ }
}
-
- # load bleu scores
+
+ # load bleu scores
$data = file(get_current_analysis_filename("basic","bleu-annotation"));
for($i=0;$i<count($data);$i++) {
$item = split("\t",$data[$i]);
if (! array_key_exists($item[1],$filtered)) {
- $line["bleu"] = $item[0];
- $line["id"] = $item[1];
+ $line["bleu"] = $item[0];
+ $line["id"] = $item[1];
$line["system"] = $item[2];
- $line["reference"] = "";
+ $line["reference"] = "";
for($j=3;$j<count($item);$j++) {
if ($j>3) { $line["reference"] .= "<br>"; };
$line["reference"] .= $item[$j];
- }
+ }
$bleu[] = $line;
}
}
@@ -987,7 +987,7 @@ function sentence_annotation($count,$filter) {
else if ($sort == "worst" || $sort == "75") {
$a_idx = $a["bleu"];
$b_idx = $b["bleu"];
- if ($a_idx == $b_idx) {
+ if ($a_idx == $b_idx) {
$a_idx = $b["id"];
$b_idx = $a["id"];
}
@@ -995,7 +995,7 @@ function sentence_annotation($count,$filter) {
else if ($sort == "best" || $sort == "avg" || $sort == "25") {
$a_idx = -$a["bleu"];
$b_idx = -$b["bleu"];
- if ($a_idx == $b_idx) {
+ if ($a_idx == $b_idx) {
$a_idx = $a["id"];
$b_idx = $b["id"];
}
@@ -1021,7 +1021,7 @@ function sentence_annotation($count,$filter) {
$retained = array();
for($i=$offset;$i<$count+$offset && $i<count($bleu);$i++) {
- $line = $bleu[$i];
+ $line = $bleu[$i];
$retained[$line["id"]] = 1;
}
@@ -1056,7 +1056,7 @@ function sentence_annotation($count,$filter) {
list($sentence,$brackets,$nt,$words) = split("\t",$data[$i]);
if ($sentence != $last_sentence) { $span = 0; }
$last_sentence = $sentence;
- if (array_key_exists($sentence,$retained)) {
+ if (array_key_exists($sentence,$retained)) {
$segmentation[$sentence][$span]["brackets"] = $brackets;
# $segmentation[$sentence][$span]["nt"] = $nt;
$segmentation[$sentence][$span]["words"] = rtrim($words);
@@ -1083,7 +1083,7 @@ function sentence_annotation($count,$filter) {
list($sentence,$brackets,$nt,$words) = split("\t",$data[$i]);
if ($sentence != $last_sentence) { $span = 0; }
$last_sentence = $sentence;
- if (array_key_exists($sentence,$retained)) {
+ if (array_key_exists($sentence,$retained)) {
$segmentation_out[$sentence][$span]["brackets"] = $brackets;
$segmentation_out[$sentence][$span]["nt"] = $nt;
$segmentation_out[$sentence][$span]["words"] = rtrim($words);
@@ -1109,7 +1109,7 @@ function sentence_annotation($count,$filter) {
list($sentence,$depth,$start_div,$end_div,$start_div_in,$end_div_in,$children) = split(" ",$data[$i]);
if ($sentence != $last_sentence) { $n = 0; }
$last_sentence = $sentence;
- if (array_key_exists($sentence,$retained)) {
+ if (array_key_exists($sentence,$retained)) {
$node[$sentence][$n]['depth'] = $depth;
$node[$sentence][$n]['start_div'] = $start_div;
$node[$sentence][$n]['end_div'] = $end_div;
@@ -1119,10 +1119,10 @@ function sentence_annotation($count,$filter) {
$n++;
}
}
- }
+ }
# display
- if ($filter != "") {
+ if ($filter != "") {
print " (".(count($input)-count($filtered))." retaining)";
}
print "</font><BR>\n";
@@ -1130,7 +1130,7 @@ function sentence_annotation($count,$filter) {
$biconcor = get_biconcor_version($dir,$set,$id);
//print "<div id=\"debug\">$sort / $offset</div>";
for($i=$offset;$i<$count+$offset && $i<count($bleu);$i++) {
- $line = $bleu[$i];
+ $line = $bleu[$i];
$search_graph_dir = get_current_analysis_filename("basic","search-graph");
if (file_exists($search_graph_dir) && file_exists($search_graph_dir."/graph.".$line["id"])) {
$state = return_state_for_link();
@@ -1279,7 +1279,7 @@ function input_annotation($sentence,$input,$segmentation,$filter) {
print "<tr><td colspan=".($sep_end-$sep_start)."><div style=\"position:relative; z-index:1;\">";
for($j=$sep_start;$j<$sep_end;$j++) {
if ($segmentation && array_key_exists($j,$segmentation["input_start"])) {
- $id = $segmentation["input_start"][$j];
+ $id = $segmentation["input_start"][$j];
print "<span id=\"input-$sentence-$id\" style=\"border-color:#000000; border-style:solid; border-width:1px;\" onmouseover=\"highlight_phrase($sentence,$id);\" onmouseout=\"lowlight_phrase($sentence,$id);\">";
}
if (array_key_exists($j,$coverage)) {
@@ -1413,7 +1413,7 @@ function annotation_hierarchical($sentence,$segmentation,$segmentation_out,$node
print "<span style=\"opacity:0\">|</span>";
}
- $span_word = array();
+ $span_word = array();
if ($words != "") { $span_word = split(" ",$words); }
for($w=0;$w<count($span_word);$w++) {
if ($w > 0) { print " "; }
diff --git a/scripts/ems/web/analysis_diff.php b/scripts/ems/web/analysis_diff.php
index 2f0947e13..214ae1592 100644
--- a/scripts/ems/web/analysis_diff.php
+++ b/scripts/ems/web/analysis_diff.php
@@ -1,4 +1,4 @@
-<?php
+<?php
function diff_analysis() {
global $task,$user,$setup,$id,$id2,$set;
@@ -15,7 +15,7 @@ function diff_analysis() {
print "Run $id2 ($c2) vs $id ($c)";
}
print "</h4>";
-
+
?><script language="javascript" src="javascripts/prototype.js"></script>
<script language="javascript" src="javascripts/scriptaculous.js"></script>
<script>
@@ -96,9 +96,9 @@ function precision_by_coverage_diff() {
$log_info[$log_count]["precision"] += $item[1];
$log_info[$log_count]["delete"] += $item[2];
$log_info[$log_count]["length"] += $item[3];
- $log_info[$log_count]["total"] += $item[4];
+ $log_info[$log_count]["total"] += $item[4];
}
- $log_info_new = $log_info;
+ $log_info_new = $log_info;
// load base data
$data = file(get_current_analysis_filename("precision","precision-by-corpus-coverage"));
@@ -154,7 +154,7 @@ function precision_by_coverage_diff_factored($img_width,$total,$file,$factor_id)
$log_info_factored[$factor][$log_count]["precision"] += $item[2];
$log_info_factored[$factor][$log_count]["delete"] += $item[3];
$log_info_factored[$factor][$log_count]["length"] += $item[4];
- $log_info_factored[$factor][$log_count]["total"] += $item[5];
+ $log_info_factored[$factor][$log_count]["total"] += $item[5];
}
$info_factored_new = $info_factored;
$info_factored_sum_new = $info_factored_sum;
@@ -225,7 +225,7 @@ function precision_by_word_diff($type) {
if ($byCoverage != -2 && $byCoverage != $log_count) {
continue;
}
-
+
//# filter for factor
$word = $item[5];
if ($byFactor != "false" && $byFactor != $item[6]) {
@@ -258,7 +258,7 @@ function precision_by_word_diff($type) {
if ($byCoverage != -2 && $byCoverage != $log_count) {
continue;
}
-
+
//# filter for factor
$word = $item[5];
if ($byFactor != "false" && $byFactor != $item[6]) {
@@ -319,7 +319,7 @@ ctx.fillRect (0, 0, $size, $size);
$surface = $item[5];
$word[$surface] = array();
$word[$surface]["precision"] = $item[0]; # number of precise translations
- $word[$surface]["delete"] = $item[1]; # number of deleted
+ $word[$surface]["delete"] = $item[1]; # number of deleted
$word[$surface]["total"] = $item[2]; # number of all translations
$word[$surface]["coverage"] = $item[4]; # count in training corpus
if ($item[4] == 0) { $log_count = -1; }
@@ -369,7 +369,7 @@ ctx.fillRect (0, 0, $size, $size);
$matrix[$base][$alt]["coverage2"] = 0;
}
# ignore mismatches in source words due to tokenization / casing
- if (array_key_exists($surface,$word)) {
+ if (array_key_exists($surface,$word)) {
$matrix[$base][$alt]["precision1"] += $word[$surface]["precision"];
$matrix[$base][$alt]["delete1"] += $word[$surface]["delete"];
$matrix[$base][$alt]["total1"] += $word[$surface]["total"];
@@ -413,14 +413,14 @@ ctx.fillRect (0, 0, $size, $size);
$prec_imp = (int)(sqrt($prec1-$prec2));
$prec_color = "255,100,100";
}
- else {
+ else {
$prec_base = (int)(sqrt($prec2));
$prec_imp = (int)(sqrt($prec2-$prec1));
$prec_color = "100,255,100";
}
$prec_base_top = (int)(($total-$prec_base)/2);
$prec_imp_top = (int)(($total-$prec_imp)/2);
-
+
$del1 = $matrix[$base][$alt]["delete1"]*$scale;
$del2 = $matrix[$base][$alt]["delete2"]*$scale;
if ($del1 > $del2) {
@@ -428,7 +428,7 @@ ctx.fillRect (0, 0, $size, $size);
$del_imp = $del1-$del2;
$del_color = "150,100,255";
}
- else {
+ else {
$del_base = $del2;
$del_imp = $del2-$del1;
$del_color = "100,200,200";
@@ -470,7 +470,7 @@ ctx.fillRect (0, ".($total+$del_base_height).", $total, $del_imp_height);
function precision_by_coverage_diff_matrix_details() {
$alt = $_GET["alt"];
$base = $_GET["base"];
-
+
$impact_total = 0;
$data = file(get_current_analysis_filename("precision","precision-by-input-word"));
$word = array(); $class = array();
@@ -483,7 +483,7 @@ function precision_by_coverage_diff_matrix_details() {
$surface = $item[5];
$word[$surface] = array();
$word[$surface]["precision"] = $item[0]; # number of precise translations
- $word[$surface]["delete"] = $item[1]; # number of deleted
+ $word[$surface]["delete"] = $item[1]; # number of deleted
$word[$surface]["total"] = $item[2]; # number of all translations
$word[$surface]["coverage"] = $item[4]; # count in training corpus
}
@@ -502,7 +502,7 @@ function precision_by_coverage_diff_matrix_details() {
$surface = $item[5];
if ($log_count-$base == $alt && array_key_exists($surface,$word)) {
$precision = $item[0]; # number of precise translations
- $delete = $item[1]; # number of deleted
+ $delete = $item[1]; # number of deleted
$total = $item[3]; # number of all translations + deletions
$coverage = $item[4]; # count in training corpus
$surface = $item[5];
@@ -527,17 +527,17 @@ function precision_by_coverage_diff_matrix_details() {
}
}
sort($all_out);
- foreach($all_out as $out) { $o = explode("\t",$out); print $o[1]; }
+ foreach($all_out as $out) { $o = explode("\t",$out); print $o[1]; }
print "</table>";
}
function precision_by_coverage_diff_graph($name,$log_info,$log_info_new,$total,$img_width,$sort_type) {
$keys = array_keys($log_info);
sort($keys,$sort_type);
-
+
print "<div id=\"Toggle$name\" onClick=\"document.getElementById('Table$name').style.display = 'none'; this.style.display = 'none';\" style=\"display:none;\"><font size=-2>(hide table)</font></div>\n";
print "<div id=\"Table$name\" style=\"display:none;\">\n";
- print "<table border=1><tr><td align=center>&nbsp;</td><td align=center colspan=3>Precision</td><td align=center colspan=2>Precision Impact</td><td align=center colspan=3>Delete</td><td align=center colspan=2>Delete Impact</td><td align=center>Length</td></tr>\n";
+ print "<table border=1><tr><td align=center>&nbsp;</td><td align=center colspan=3>Precision</td><td align=center colspan=2>Precision Impact</td><td align=center colspan=3>Delete</td><td align=center colspan=2>Delete Impact</td><td align=center>Length</td></tr>\n";
foreach ($keys as $i) {
if (array_key_exists($i,$log_info)) {
print "<tr><td align=center>$i</td>";
@@ -595,7 +595,7 @@ ctx.font = '9px serif';
print "ctx.fillRect ($x, 250, $width, $height);";
$total_so_far += $log_info[$i]["total"];
-
+
if ($width>3) {
print "ctx.fillStyle = \"rgb(0,0,0)\";";
// print "ctx.rotate(-1.5707);";
@@ -763,9 +763,9 @@ function bleu_diff_annotation() {
$data = file(get_analysis_filename($dir,$set,$idx?$id2:$id,"basic","bleu-annotation"));
for($i=0;$i<count($data);$i++) {
$item = split("\t",$data[$i]);
- $annotation[$item[1]]["bleu$idx"] = $item[0];
- $annotation[$item[1]]["system$idx"] = $item[2];
- $annotation[$item[1]]["reference"] = $item[3];
+ $annotation[$item[1]]["bleu$idx"] = $item[0];
+ $annotation[$item[1]]["system$idx"] = $item[2];
+ $annotation[$item[1]]["reference"] = $item[3];
$annotation[$item[1]]["id"] = $item[1];
}
}
@@ -825,7 +825,7 @@ function bleu_diff_annotation() {
// display
for($i=0;$i<$count && $i<count($annotation);$i++) {
- $line = $annotation[$i];
+ $line = $annotation[$i];
print "<font size=-2>[src]</font> ".$input[$line["id"]]."<br>";
$word_with_score1 = split(" ",$line["system1"]);
@@ -840,7 +840,7 @@ function bleu_diff_annotation() {
$matched_with_score1 = preg_replace('/D/',"",$matched_with_score);
bleu_line_diff( $word_with_score1, $matched1, $matched_with_score1 );
- print "<font size=-2>[".$id."-".$line["id"].":".$line["bleu0"]."]</font> ";
+ print "<font size=-2>[".$id."-".$line["id"].":".$line["bleu0"]."]</font> ";
$matched0 = preg_replace('/I/',"",$matched);
$matched_with_score0 = preg_replace('/I/',"",$matched_with_score);
bleu_line_diff( $word_with_score0, $matched0, $matched_with_score0 );
@@ -875,14 +875,14 @@ function ngram_diff($type) {
// load data
$order = $_GET['order'];
-
+
for($idx=0;$idx<2;$idx++) {
$data = file(get_analysis_filename($dir,$set,$idx?$id2:$id,"basic","n-gram-$type.$order"));
for($i=0;$i<count($data);$i++) {
$item = split("\t",$data[$i]);
- $ngram_hash[$item[2]]["total$idx"] = $item[0];
+ $ngram_hash[$item[2]]["total$idx"] = $item[0];
$ngram_hash[$item[2]]["correct$idx"] = $item[1];
- }
+ }
unset($data);
}
@@ -893,7 +893,7 @@ function ngram_diff($type) {
$sort = 'ratio_worse';
$smooth = 1;
}
-
+
error_reporting(E_ERROR); // otherwise undefined counts trigger notices
// sort index
@@ -914,12 +914,12 @@ function ngram_diff($type) {
+ (2*$item["correct0"] - $item["total0"]);
}
else if ($sort == "ratio_worse") {
- $item["index"] =
+ $item["index"] =
($item["correct1"] + $smooth) / ($item["total1"] + $smooth)
- ($item["correct0"] + $smooth) / ($item["total0"] + $smooth);
}
else if ($sort == "ratio_better") {
- $item["index"] =
+ $item["index"] =
- ($item["correct1"] + $smooth) / ($item["total1"] + $smooth)
+ ($item["correct0"] + $smooth) / ($item["total0"] + $smooth);
}
@@ -1010,7 +1010,7 @@ function ngram_diff($type) {
}
else {
printf("<td align=right>%+d</td><td>(%d)</td><td align=right>%+d</td><td>(%d)</td></tr>", $ok_diff,$ok,$wrong_diff,$wrong);
- }
+ }
}
print "</table>\n";
}
diff --git a/scripts/ems/web/bilingual-concordance.css b/scripts/ems/web/bilingual-concordance.css
index 4648a21dd..f9941175e 100644
--- a/scripts/ems/web/bilingual-concordance.css
+++ b/scripts/ems/web/bilingual-concordance.css
@@ -17,57 +17,57 @@
text-align: center;
}
-table.biconcor {
- table-layout: fixed;
- padding: 0px;
- margin: 0px;
+table.biconcor {
+ table-layout: fixed;
+ padding: 0px;
+ margin: 0px;
}
-tr.biconcor {
- padding: 0px;
- margin: 0px;
+tr.biconcor {
+ padding: 0px;
+ margin: 0px;
}
-td.biconcor {
- white-space: nowrap;
- overflow: hidden;
- padding: 0px;
- margin: 0px;
+td.biconcor {
+ white-space: nowrap;
+ overflow: hidden;
+ padding: 0px;
+ margin: 0px;
}
-td.pp_source_left {
+td.pp_source_left {
font-size: 70%;
- text-align: right;
+ text-align: right;
}
-td.pp_target_left {
+td.pp_target_left {
font-size: 70%;
- text-align: right;
+ text-align: right;
}
-td.pp_source {
+td.pp_source {
font-size: 70%;
- font-weight: bold;
+ font-weight: bold;
}
-td.pp_target {
+td.pp_target {
font-size: 70%;
- font-weight: bold;
+ font-weight: bold;
}
-td.mismatch_target {
+td.mismatch_target {
font-size: 70%;
text-align: center;
}
-td.pp_source_right {
+td.pp_source_right {
font-size: 70%;
- border-style:solid;
- border-width:0px 2px 0px 0px ;
- border-color: black;
+ border-style:solid;
+ border-width:0px 2px 0px 0px ;
+ border-color: black;
}
-td.pp_target_right {
+td.pp_target_right {
font-size: 70%;
}
@@ -88,11 +88,11 @@ span.mismatch_misaligned {
}
span.mismatch_aligned {
- font-weight: bold;
+ font-weight: bold;
}
td.pp_more {
font-size: 70%;
color: navy;
- text-align: center;
+ text-align: center;
}
diff --git a/scripts/ems/web/comment.php b/scripts/ems/web/comment.php
index 04628ea4f..dcd51ab00 100644
--- a/scripts/ems/web/comment.php
+++ b/scripts/ems/web/comment.php
@@ -1,4 +1,4 @@
-<?php
+<?php
$fp = fopen("comment","a");
fwrite($fp,$_GET{'run'} . ";" . $_GET{'text'} . "\n");
fclose($fp);
diff --git a/scripts/ems/web/diff.php b/scripts/ems/web/diff.php
index 71e732af1..f440d3240 100644
--- a/scripts/ems/web/diff.php
+++ b/scripts/ems/web/diff.php
@@ -37,7 +37,7 @@ function compute_diff($base,$change) {
foreach ($all_parameters as $parameter) {
if (!array_key_exists($parameter,$parameter_base)) {
$parameter_base[$parameter] = "";
- }
+ }
if (!array_key_exists($parameter,$parameter_change)) {
$parameter_change[$parameter] = "";
}
diff --git a/scripts/ems/web/hierarchical-segmentation.css b/scripts/ems/web/hierarchical-segmentation.css
index 47f2c2693..e66555070 100644
--- a/scripts/ems/web/hierarchical-segmentation.css
+++ b/scripts/ems/web/hierarchical-segmentation.css
@@ -10,7 +10,7 @@ div.leaf {
border:1px solid black;
margin:2px;
padding:0px;
- font-weight: normal;
+ font-weight: normal;
background-color: white;
}
div.empty {
@@ -34,7 +34,7 @@ div.continued {
margin-right: 0px;
padding: 0px;
padding-top: 0px;
- font-weight: normal;
+ font-weight: normal;
background-color: white;
}
div.opening {
@@ -44,7 +44,7 @@ div.opening {
margin:2px;
margin-right: 0px;
padding: 0px;
- font-weight: normal;
+ font-weight: normal;
background-color: white;
}
div.closing {
@@ -54,6 +54,6 @@ div.closing {
margin:2px;
margin-left: 0px;
padding: 0px;
- font-weight: normal;
+ font-weight: normal;
background-color: white;
}
diff --git a/scripts/ems/web/hierarchical-segmentation.js b/scripts/ems/web/hierarchical-segmentation.js
index b4eb206ce..819ed121e 100644
--- a/scripts/ems/web/hierarchical-segmentation.js
+++ b/scripts/ems/web/hierarchical-segmentation.js
@@ -1,6 +1,6 @@
-var nodeIn = [];
+var nodeIn = [];
var nodeOut = [];
-var nodeChildren = [];
+var nodeChildren = [];
var max_depth = [];
var span_count_in = [];
var span_count_out = [];
diff --git a/scripts/ems/web/index.php b/scripts/ems/web/index.php
index d216b114a..9c918a96a 100644
--- a/scripts/ems/web/index.php
+++ b/scripts/ems/web/index.php
@@ -29,7 +29,7 @@ else if (array_key_exists("setup",$_POST) || array_key_exists("setup",$_GET)) {
if (array_key_exists("show",$_GET)) { show(); }
else if (array_key_exists("diff",$_GET)) { diff(); }
- else if (array_key_exists("analysis",$_GET)) {
+ else if (array_key_exists("analysis",$_GET)) {
$action = $_GET["analysis"];
$set = $_GET["set"];
$id = $_GET["id"];
@@ -60,7 +60,7 @@ else if (array_key_exists("setup",$_POST) || array_key_exists("setup",$_GET)) {
if ($match[2] == $set) {
$id_array[] = $match[1];
}
- }
+ }
}
if (count($id_array) != 2) {
print "ERROR: comp 2!";
diff --git a/scripts/ems/web/javascripts/scriptaculous-js-1.8.3/src/unittest.js b/scripts/ems/web/javascripts/scriptaculous-js-1.8.3/src/unittest.js
index 33a0c7157..cd3d19b39 100644
--- a/scripts/ems/web/javascripts/scriptaculous-js-1.8.3/src/unittest.js
+++ b/scripts/ems/web/javascripts/scriptaculous-js-1.8.3/src/unittest.js
@@ -19,10 +19,10 @@ Event.simulateMouse = function(element, eventName) {
metaKey: false
}, arguments[2] || {});
var oEvent = document.createEvent("MouseEvents");
- oEvent.initMouseEvent(eventName, true, true, document.defaultView,
- options.buttons, options.pointerX, options.pointerY, options.pointerX, options.pointerY,
+ oEvent.initMouseEvent(eventName, true, true, document.defaultView,
+ options.buttons, options.pointerX, options.pointerY, options.pointerX, options.pointerY,
options.ctrlKey, options.altKey, options.shiftKey, options.metaKey, 0, $(element));
-
+
if(this.mark) Element.remove(this.mark);
this.mark = document.createElement('div');
this.mark.appendChild(document.createTextNode(" "));
@@ -34,10 +34,10 @@ Event.simulateMouse = function(element, eventName) {
this.mark.style.height = "5px;";
this.mark.style.borderTop = "1px solid red;";
this.mark.style.borderLeft = "1px solid red;";
-
+
if(this.step)
alert('['+new Date().getTime().toString()+'] '+eventName+'/'+Test.Unit.inspect(options));
-
+
$(element).dispatchEvent(oEvent);
};
@@ -55,7 +55,7 @@ Event.simulateKey = function(element, eventName) {
}, arguments[2] || {});
var oEvent = document.createEvent("KeyEvents");
- oEvent.initKeyEvent(eventName, true, true, window,
+ oEvent.initKeyEvent(eventName, true, true, window,
options.ctrlKey, options.altKey, options.shiftKey, options.metaKey,
options.keyCode, options.charCode );
$(element).dispatchEvent(oEvent);
@@ -123,7 +123,7 @@ Test.Unit.Logger.prototype = {
_toHTML: function(txt) {
return txt.escapeHTML().replace(/\n/g,"<br/>");
},
- addLinksToResults: function(){
+ addLinksToResults: function(){
$$("tr.failed .nameCell").each( function(td){ // todo: limit to children of this.log
td.title = "Run only this test";
Event.observe(td, 'click', function(){ window.location.search = "?tests=" + td.innerHTML;});
@@ -162,7 +162,7 @@ Test.Unit.Runner.prototype = {
if(/^test/.test(testcase)) {
this.tests.push(
new Test.Unit.Testcase(
- this.options.context ? ' -> ' + this.options.titles[testcase] : testcase,
+ this.options.context ? ' -> ' + this.options.titles[testcase] : testcase,
testcases[testcase], testcases["setup"], testcases["teardown"]
));
}
@@ -203,7 +203,7 @@ Test.Unit.Runner.prototype = {
},
postResults: function() {
if (this.options.resultsURL) {
- new Ajax.Request(this.options.resultsURL,
+ new Ajax.Request(this.options.resultsURL,
{ method: 'get', parameters: 'result=' + this.getResult(), asynchronous: false });
}
},
@@ -240,9 +240,9 @@ Test.Unit.Runner.prototype = {
errors += this.tests[i].errors;
}
return (
- (this.options.context ? this.options.context + ': ': '') +
- this.tests.length + " tests, " +
- assertions + " assertions, " +
+ (this.options.context ? this.options.context + ': ': '') +
+ this.tests.length + " tests, " +
+ assertions + " assertions, " +
failures + " failures, " +
errors + " errors");
}
@@ -258,7 +258,7 @@ Test.Unit.Assertions.prototype = {
},
summary: function() {
return (
- this.assertions + " assertions, " +
+ this.assertions + " assertions, " +
this.failures + " failures, " +
this.errors + " errors" + "\n" +
this.messages.join("\n"));
@@ -284,55 +284,55 @@ Test.Unit.Assertions.prototype = {
},
assert: function(expression) {
var message = arguments[1] || 'assert: got "' + Test.Unit.inspect(expression) + '"';
- try { expression ? this.pass() :
+ try { expression ? this.pass() :
this.fail(message); }
catch(e) { this.error(e); }
},
assertEqual: function(expected, actual) {
var message = arguments[2] || "assertEqual";
try { (expected == actual) ? this.pass() :
- this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
+ this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
'", actual "' + Test.Unit.inspect(actual) + '"'); }
catch(e) { this.error(e); }
},
assertInspect: function(expected, actual) {
var message = arguments[2] || "assertInspect";
try { (expected == actual.inspect()) ? this.pass() :
- this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
+ this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
'", actual "' + Test.Unit.inspect(actual) + '"'); }
catch(e) { this.error(e); }
},
assertEnumEqual: function(expected, actual) {
var message = arguments[2] || "assertEnumEqual";
- try { $A(expected).length == $A(actual).length &&
+ try { $A(expected).length == $A(actual).length &&
expected.zip(actual).all(function(pair) { return pair[0] == pair[1] }) ?
- this.pass() : this.fail(message + ': expected ' + Test.Unit.inspect(expected) +
+ this.pass() : this.fail(message + ': expected ' + Test.Unit.inspect(expected) +
', actual ' + Test.Unit.inspect(actual)); }
catch(e) { this.error(e); }
},
assertNotEqual: function(expected, actual) {
var message = arguments[2] || "assertNotEqual";
- try { (expected != actual) ? this.pass() :
+ try { (expected != actual) ? this.pass() :
this.fail(message + ': got "' + Test.Unit.inspect(actual) + '"'); }
catch(e) { this.error(e); }
},
- assertIdentical: function(expected, actual) {
- var message = arguments[2] || "assertIdentical";
- try { (expected === actual) ? this.pass() :
- this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
- '", actual "' + Test.Unit.inspect(actual) + '"'); }
- catch(e) { this.error(e); }
+ assertIdentical: function(expected, actual) {
+ var message = arguments[2] || "assertIdentical";
+ try { (expected === actual) ? this.pass() :
+ this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
+ '", actual "' + Test.Unit.inspect(actual) + '"'); }
+ catch(e) { this.error(e); }
},
- assertNotIdentical: function(expected, actual) {
- var message = arguments[2] || "assertNotIdentical";
- try { !(expected === actual) ? this.pass() :
- this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
- '", actual "' + Test.Unit.inspect(actual) + '"'); }
- catch(e) { this.error(e); }
+ assertNotIdentical: function(expected, actual) {
+ var message = arguments[2] || "assertNotIdentical";
+ try { !(expected === actual) ? this.pass() :
+ this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
+ '", actual "' + Test.Unit.inspect(actual) + '"'); }
+ catch(e) { this.error(e); }
},
assertNull: function(obj) {
var message = arguments[1] || 'assertNull';
- try { (obj==null) ? this.pass() :
+ try { (obj==null) ? this.pass() :
this.fail(message + ': got "' + Test.Unit.inspect(obj) + '"'); }
catch(e) { this.error(e); }
},
@@ -353,38 +353,38 @@ Test.Unit.Assertions.prototype = {
},
assertType: function(expected, actual) {
var message = arguments[2] || 'assertType';
- try {
- (actual.constructor == expected) ? this.pass() :
- this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
+ try {
+ (actual.constructor == expected) ? this.pass() :
+ this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
'", actual "' + (actual.constructor) + '"'); }
catch(e) { this.error(e); }
},
assertNotOfType: function(expected, actual) {
var message = arguments[2] || 'assertNotOfType';
- try {
- (actual.constructor != expected) ? this.pass() :
- this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
+ try {
+ (actual.constructor != expected) ? this.pass() :
+ this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
'", actual "' + (actual.constructor) + '"'); }
catch(e) { this.error(e); }
},
assertInstanceOf: function(expected, actual) {
var message = arguments[2] || 'assertInstanceOf';
- try {
- (actual instanceof expected) ? this.pass() :
+ try {
+ (actual instanceof expected) ? this.pass() :
this.fail(message + ": object was not an instance of the expected type"); }
- catch(e) { this.error(e); }
+ catch(e) { this.error(e); }
},
assertNotInstanceOf: function(expected, actual) {
var message = arguments[2] || 'assertNotInstanceOf';
- try {
- !(actual instanceof expected) ? this.pass() :
+ try {
+ !(actual instanceof expected) ? this.pass() :
this.fail(message + ": object was an instance of the not expected type"); }
- catch(e) { this.error(e); }
+ catch(e) { this.error(e); }
},
assertRespondsTo: function(method, obj) {
var message = arguments[2] || 'assertRespondsTo';
try {
- (obj[method] && typeof obj[method] == 'function') ? this.pass() :
+ (obj[method] && typeof obj[method] == 'function') ? this.pass() :
this.fail(message + ": object doesn't respond to [" + method + "]"); }
catch(e) { this.error(e); }
},
@@ -393,7 +393,7 @@ Test.Unit.Assertions.prototype = {
try {
var m = obj[method];
if(!m) m = obj['is'+method.charAt(0).toUpperCase()+method.slice(1)];
- m() ? this.pass() :
+ m() ? this.pass() :
this.fail(message + ": method returned false"); }
catch(e) { this.error(e); }
},
@@ -402,17 +402,17 @@ Test.Unit.Assertions.prototype = {
try {
var m = obj[method];
if(!m) m = obj['is'+method.charAt(0).toUpperCase()+method.slice(1)];
- !m() ? this.pass() :
+ !m() ? this.pass() :
this.fail(message + ": method returned true"); }
catch(e) { this.error(e); }
},
assertRaise: function(exceptionName, method) {
var message = arguments[2] || 'assertRaise';
- try {
+ try {
method();
this.fail(message + ": exception expected but none was raised"); }
catch(e) {
- ((exceptionName == null) || (e.name==exceptionName)) ? this.pass() : this.error(e);
+ ((exceptionName == null) || (e.name==exceptionName)) ? this.pass() : this.error(e);
}
},
assertElementsMatch: function() {
@@ -434,7 +434,7 @@ Test.Unit.Assertions.prototype = {
var startAt = new Date();
(iterations || 1).times(operation);
var timeTaken = ((new Date())-startAt);
- this.info((arguments[2] || 'Operation') + ' finished ' +
+ this.info((arguments[2] || 'Operation') + ' finished ' +
iterations + ' iterations in ' + (timeTaken/1000)+'s' );
return timeTaken;
},
@@ -444,7 +444,7 @@ Test.Unit.Assertions.prototype = {
this.assertNotNull(element);
if(element.style && Element.getStyle(element, 'display') == 'none')
return false;
-
+
return this._isVisible(element.parentNode);
},
assertNotVisible: function(element) {
@@ -457,7 +457,7 @@ Test.Unit.Assertions.prototype = {
var startAt = new Date();
(iterations || 1).times(operation);
var timeTaken = ((new Date())-startAt);
- this.info((arguments[2] || 'Operation') + ' finished ' +
+ this.info((arguments[2] || 'Operation') + ' finished ' +
iterations + ' iterations in ' + (timeTaken/1000)+'s' );
return timeTaken;
}
@@ -468,7 +468,7 @@ Object.extend(Object.extend(Test.Unit.Testcase.prototype, Test.Unit.Assertions.p
initialize: function(name, test, setup, teardown) {
Test.Unit.Assertions.prototype.initialize.bind(this)();
this.name = name;
-
+
if(typeof test == 'string') {
test = test.gsub(/(\.should[^\(]+\()/,'#{0}this,');
test = test.gsub(/(\.should[^\(]+)\(this,\)/,'#{1}(this)');
@@ -478,7 +478,7 @@ Object.extend(Object.extend(Test.Unit.Testcase.prototype, Test.Unit.Assertions.p
} else {
this.test = test || function() {};
}
-
+
this.setup = setup || function() {};
this.teardown = teardown || function() {};
this.isWaiting = false;
@@ -519,23 +519,23 @@ Test.setupBDDExtensionMethods = function(){
shouldNotBeAn: 'assertNotOfType',
shouldBeNull: 'assertNull',
shouldNotBeNull: 'assertNotNull',
-
+
shouldBe: 'assertReturnsTrue',
shouldNotBe: 'assertReturnsFalse',
shouldRespondTo: 'assertRespondsTo'
};
- var makeAssertion = function(assertion, args, object) {
+ var makeAssertion = function(assertion, args, object) {
this[assertion].apply(this,(args || []).concat([object]));
};
-
- Test.BDDMethods = {};
- $H(METHODMAP).each(function(pair) {
- Test.BDDMethods[pair.key] = function() {
- var args = $A(arguments);
- var scope = args.shift();
- makeAssertion.apply(scope, [pair.value, args, this]); };
+
+ Test.BDDMethods = {};
+ $H(METHODMAP).each(function(pair) {
+ Test.BDDMethods[pair.key] = function() {
+ var args = $A(arguments);
+ var scope = args.shift();
+ makeAssertion.apply(scope, [pair.value, args, this]); };
});
-
+
[Array.prototype, String.prototype, Number.prototype, Boolean.prototype].each(
function(p){ Object.extend(p, Test.BDDMethods) }
);
@@ -543,7 +543,7 @@ Test.setupBDDExtensionMethods = function(){
Test.context = function(name, spec, log){
Test.setupBDDExtensionMethods();
-
+
var compiledSpec = {};
var titles = {};
for(specName in spec) {
@@ -557,7 +557,7 @@ Test.context = function(name, spec, log){
var body = spec[specName].toString().split('\n').slice(1);
if(/^\{/.test(body[0])) body = body.slice(1);
body.pop();
- body = body.map(function(statement){
+ body = body.map(function(statement){
return statement.strip()
});
compiledSpec[testName] = body.join('\n');
diff --git a/scripts/ems/web/javascripts/sound.js b/scripts/ems/web/javascripts/sound.js
index a286eb98e..0a63379fb 100644
--- a/scripts/ems/web/javascripts/sound.js
+++ b/scripts/ems/web/javascripts/sound.js
@@ -56,4 +56,4 @@ if(Prototype.Browser.Gecko && navigator.userAgent.indexOf("Win") > 0){
Sound.template = new Template('<embed type="audio/x-pn-realaudio-plugin" style="height:0" id="sound_#{track}_#{id}" src="#{url}" loop="false" autostart="true" hidden="true"/>');
else
Sound.play = function(){};
-} \ No newline at end of file
+} \ No newline at end of file
diff --git a/scripts/ems/web/javascripts/unittest.js b/scripts/ems/web/javascripts/unittest.js
index 33a0c7157..cd3d19b39 100644
--- a/scripts/ems/web/javascripts/unittest.js
+++ b/scripts/ems/web/javascripts/unittest.js
@@ -19,10 +19,10 @@ Event.simulateMouse = function(element, eventName) {
metaKey: false
}, arguments[2] || {});
var oEvent = document.createEvent("MouseEvents");
- oEvent.initMouseEvent(eventName, true, true, document.defaultView,
- options.buttons, options.pointerX, options.pointerY, options.pointerX, options.pointerY,
+ oEvent.initMouseEvent(eventName, true, true, document.defaultView,
+ options.buttons, options.pointerX, options.pointerY, options.pointerX, options.pointerY,
options.ctrlKey, options.altKey, options.shiftKey, options.metaKey, 0, $(element));
-
+
if(this.mark) Element.remove(this.mark);
this.mark = document.createElement('div');
this.mark.appendChild(document.createTextNode(" "));
@@ -34,10 +34,10 @@ Event.simulateMouse = function(element, eventName) {
this.mark.style.height = "5px;";
this.mark.style.borderTop = "1px solid red;";
this.mark.style.borderLeft = "1px solid red;";
-
+
if(this.step)
alert('['+new Date().getTime().toString()+'] '+eventName+'/'+Test.Unit.inspect(options));
-
+
$(element).dispatchEvent(oEvent);
};
@@ -55,7 +55,7 @@ Event.simulateKey = function(element, eventName) {
}, arguments[2] || {});
var oEvent = document.createEvent("KeyEvents");
- oEvent.initKeyEvent(eventName, true, true, window,
+ oEvent.initKeyEvent(eventName, true, true, window,
options.ctrlKey, options.altKey, options.shiftKey, options.metaKey,
options.keyCode, options.charCode );
$(element).dispatchEvent(oEvent);
@@ -123,7 +123,7 @@ Test.Unit.Logger.prototype = {
_toHTML: function(txt) {
return txt.escapeHTML().replace(/\n/g,"<br/>");
},
- addLinksToResults: function(){
+ addLinksToResults: function(){
$$("tr.failed .nameCell").each( function(td){ // todo: limit to children of this.log
td.title = "Run only this test";
Event.observe(td, 'click', function(){ window.location.search = "?tests=" + td.innerHTML;});
@@ -162,7 +162,7 @@ Test.Unit.Runner.prototype = {
if(/^test/.test(testcase)) {
this.tests.push(
new Test.Unit.Testcase(
- this.options.context ? ' -> ' + this.options.titles[testcase] : testcase,
+ this.options.context ? ' -> ' + this.options.titles[testcase] : testcase,
testcases[testcase], testcases["setup"], testcases["teardown"]
));
}
@@ -203,7 +203,7 @@ Test.Unit.Runner.prototype = {
},
postResults: function() {
if (this.options.resultsURL) {
- new Ajax.Request(this.options.resultsURL,
+ new Ajax.Request(this.options.resultsURL,
{ method: 'get', parameters: 'result=' + this.getResult(), asynchronous: false });
}
},
@@ -240,9 +240,9 @@ Test.Unit.Runner.prototype = {
errors += this.tests[i].errors;
}
return (
- (this.options.context ? this.options.context + ': ': '') +
- this.tests.length + " tests, " +
- assertions + " assertions, " +
+ (this.options.context ? this.options.context + ': ': '') +
+ this.tests.length + " tests, " +
+ assertions + " assertions, " +
failures + " failures, " +
errors + " errors");
}
@@ -258,7 +258,7 @@ Test.Unit.Assertions.prototype = {
},
summary: function() {
return (
- this.assertions + " assertions, " +
+ this.assertions + " assertions, " +
this.failures + " failures, " +
this.errors + " errors" + "\n" +
this.messages.join("\n"));
@@ -284,55 +284,55 @@ Test.Unit.Assertions.prototype = {
},
assert: function(expression) {
var message = arguments[1] || 'assert: got "' + Test.Unit.inspect(expression) + '"';
- try { expression ? this.pass() :
+ try { expression ? this.pass() :
this.fail(message); }
catch(e) { this.error(e); }
},
assertEqual: function(expected, actual) {
var message = arguments[2] || "assertEqual";
try { (expected == actual) ? this.pass() :
- this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
+ this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
'", actual "' + Test.Unit.inspect(actual) + '"'); }
catch(e) { this.error(e); }
},
assertInspect: function(expected, actual) {
var message = arguments[2] || "assertInspect";
try { (expected == actual.inspect()) ? this.pass() :
- this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
+ this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
'", actual "' + Test.Unit.inspect(actual) + '"'); }
catch(e) { this.error(e); }
},
assertEnumEqual: function(expected, actual) {
var message = arguments[2] || "assertEnumEqual";
- try { $A(expected).length == $A(actual).length &&
+ try { $A(expected).length == $A(actual).length &&
expected.zip(actual).all(function(pair) { return pair[0] == pair[1] }) ?
- this.pass() : this.fail(message + ': expected ' + Test.Unit.inspect(expected) +
+ this.pass() : this.fail(message + ': expected ' + Test.Unit.inspect(expected) +
', actual ' + Test.Unit.inspect(actual)); }
catch(e) { this.error(e); }
},
assertNotEqual: function(expected, actual) {
var message = arguments[2] || "assertNotEqual";
- try { (expected != actual) ? this.pass() :
+ try { (expected != actual) ? this.pass() :
this.fail(message + ': got "' + Test.Unit.inspect(actual) + '"'); }
catch(e) { this.error(e); }
},
- assertIdentical: function(expected, actual) {
- var message = arguments[2] || "assertIdentical";
- try { (expected === actual) ? this.pass() :
- this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
- '", actual "' + Test.Unit.inspect(actual) + '"'); }
- catch(e) { this.error(e); }
+ assertIdentical: function(expected, actual) {
+ var message = arguments[2] || "assertIdentical";
+ try { (expected === actual) ? this.pass() :
+ this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
+ '", actual "' + Test.Unit.inspect(actual) + '"'); }
+ catch(e) { this.error(e); }
},
- assertNotIdentical: function(expected, actual) {
- var message = arguments[2] || "assertNotIdentical";
- try { !(expected === actual) ? this.pass() :
- this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
- '", actual "' + Test.Unit.inspect(actual) + '"'); }
- catch(e) { this.error(e); }
+ assertNotIdentical: function(expected, actual) {
+ var message = arguments[2] || "assertNotIdentical";
+ try { !(expected === actual) ? this.pass() :
+ this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
+ '", actual "' + Test.Unit.inspect(actual) + '"'); }
+ catch(e) { this.error(e); }
},
assertNull: function(obj) {
var message = arguments[1] || 'assertNull';
- try { (obj==null) ? this.pass() :
+ try { (obj==null) ? this.pass() :
this.fail(message + ': got "' + Test.Unit.inspect(obj) + '"'); }
catch(e) { this.error(e); }
},
@@ -353,38 +353,38 @@ Test.Unit.Assertions.prototype = {
},
assertType: function(expected, actual) {
var message = arguments[2] || 'assertType';
- try {
- (actual.constructor == expected) ? this.pass() :
- this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
+ try {
+ (actual.constructor == expected) ? this.pass() :
+ this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
'", actual "' + (actual.constructor) + '"'); }
catch(e) { this.error(e); }
},
assertNotOfType: function(expected, actual) {
var message = arguments[2] || 'assertNotOfType';
- try {
- (actual.constructor != expected) ? this.pass() :
- this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
+ try {
+ (actual.constructor != expected) ? this.pass() :
+ this.fail(message + ': expected "' + Test.Unit.inspect(expected) +
'", actual "' + (actual.constructor) + '"'); }
catch(e) { this.error(e); }
},
assertInstanceOf: function(expected, actual) {
var message = arguments[2] || 'assertInstanceOf';
- try {
- (actual instanceof expected) ? this.pass() :
+ try {
+ (actual instanceof expected) ? this.pass() :
this.fail(message + ": object was not an instance of the expected type"); }
- catch(e) { this.error(e); }
+ catch(e) { this.error(e); }
},
assertNotInstanceOf: function(expected, actual) {
var message = arguments[2] || 'assertNotInstanceOf';
- try {
- !(actual instanceof expected) ? this.pass() :
+ try {
+ !(actual instanceof expected) ? this.pass() :
this.fail(message + ": object was an instance of the not expected type"); }
- catch(e) { this.error(e); }
+ catch(e) { this.error(e); }
},
assertRespondsTo: function(method, obj) {
var message = arguments[2] || 'assertRespondsTo';
try {
- (obj[method] && typeof obj[method] == 'function') ? this.pass() :
+ (obj[method] && typeof obj[method] == 'function') ? this.pass() :
this.fail(message + ": object doesn't respond to [" + method + "]"); }
catch(e) { this.error(e); }
},
@@ -393,7 +393,7 @@ Test.Unit.Assertions.prototype = {
try {
var m = obj[method];
if(!m) m = obj['is'+method.charAt(0).toUpperCase()+method.slice(1)];
- m() ? this.pass() :
+ m() ? this.pass() :
this.fail(message + ": method returned false"); }
catch(e) { this.error(e); }
},
@@ -402,17 +402,17 @@ Test.Unit.Assertions.prototype = {
try {
var m = obj[method];
if(!m) m = obj['is'+method.charAt(0).toUpperCase()+method.slice(1)];
- !m() ? this.pass() :
+ !m() ? this.pass() :
this.fail(message + ": method returned true"); }
catch(e) { this.error(e); }
},
assertRaise: function(exceptionName, method) {
var message = arguments[2] || 'assertRaise';
- try {
+ try {
method();
this.fail(message + ": exception expected but none was raised"); }
catch(e) {
- ((exceptionName == null) || (e.name==exceptionName)) ? this.pass() : this.error(e);
+ ((exceptionName == null) || (e.name==exceptionName)) ? this.pass() : this.error(e);
}
},
assertElementsMatch: function() {
@@ -434,7 +434,7 @@ Test.Unit.Assertions.prototype = {
var startAt = new Date();
(iterations || 1).times(operation);
var timeTaken = ((new Date())-startAt);
- this.info((arguments[2] || 'Operation') + ' finished ' +
+ this.info((arguments[2] || 'Operation') + ' finished ' +
iterations + ' iterations in ' + (timeTaken/1000)+'s' );
return timeTaken;
},
@@ -444,7 +444,7 @@ Test.Unit.Assertions.prototype = {
this.assertNotNull(element);
if(element.style && Element.getStyle(element, 'display') == 'none')
return false;
-
+
return this._isVisible(element.parentNode);
},
assertNotVisible: function(element) {
@@ -457,7 +457,7 @@ Test.Unit.Assertions.prototype = {
var startAt = new Date();
(iterations || 1).times(operation);
var timeTaken = ((new Date())-startAt);
- this.info((arguments[2] || 'Operation') + ' finished ' +
+ this.info((arguments[2] || 'Operation') + ' finished ' +
iterations + ' iterations in ' + (timeTaken/1000)+'s' );
return timeTaken;
}
@@ -468,7 +468,7 @@ Object.extend(Object.extend(Test.Unit.Testcase.prototype, Test.Unit.Assertions.p
initialize: function(name, test, setup, teardown) {
Test.Unit.Assertions.prototype.initialize.bind(this)();
this.name = name;
-
+
if(typeof test == 'string') {
test = test.gsub(/(\.should[^\(]+\()/,'#{0}this,');
test = test.gsub(/(\.should[^\(]+)\(this,\)/,'#{1}(this)');
@@ -478,7 +478,7 @@ Object.extend(Object.extend(Test.Unit.Testcase.prototype, Test.Unit.Assertions.p
} else {
this.test = test || function() {};
}
-
+
this.setup = setup || function() {};
this.teardown = teardown || function() {};
this.isWaiting = false;
@@ -519,23 +519,23 @@ Test.setupBDDExtensionMethods = function(){
shouldNotBeAn: 'assertNotOfType',
shouldBeNull: 'assertNull',
shouldNotBeNull: 'assertNotNull',
-
+
shouldBe: 'assertReturnsTrue',
shouldNotBe: 'assertReturnsFalse',
shouldRespondTo: 'assertRespondsTo'
};
- var makeAssertion = function(assertion, args, object) {
+ var makeAssertion = function(assertion, args, object) {
this[assertion].apply(this,(args || []).concat([object]));
};
-
- Test.BDDMethods = {};
- $H(METHODMAP).each(function(pair) {
- Test.BDDMethods[pair.key] = function() {
- var args = $A(arguments);
- var scope = args.shift();
- makeAssertion.apply(scope, [pair.value, args, this]); };
+
+ Test.BDDMethods = {};
+ $H(METHODMAP).each(function(pair) {
+ Test.BDDMethods[pair.key] = function() {
+ var args = $A(arguments);
+ var scope = args.shift();
+ makeAssertion.apply(scope, [pair.value, args, this]); };
});
-
+
[Array.prototype, String.prototype, Number.prototype, Boolean.prototype].each(
function(p){ Object.extend(p, Test.BDDMethods) }
);
@@ -543,7 +543,7 @@ Test.setupBDDExtensionMethods = function(){
Test.context = function(name, spec, log){
Test.setupBDDExtensionMethods();
-
+
var compiledSpec = {};
var titles = {};
for(specName in spec) {
@@ -557,7 +557,7 @@ Test.context = function(name, spec, log){
var body = spec[specName].toString().split('\n').slice(1);
if(/^\{/.test(body[0])) body = body.slice(1);
body.pop();
- body = body.map(function(statement){
+ body = body.map(function(statement){
return statement.strip()
});
compiledSpec[testName] = body.join('\n');
diff --git a/scripts/ems/web/lib.php b/scripts/ems/web/lib.php
index 6d936ddf4..68c58860b 100644
--- a/scripts/ems/web/lib.php
+++ b/scripts/ems/web/lib.php
@@ -21,7 +21,7 @@ function load_experiment_info() {
file_exists($dir."/steps/1")) {
$topd = dir($dir."/steps");
while (false !== ($run = $topd->read())) {
- if (preg_match('/^([0-9]+)$/',$run,$match)
+ if (preg_match('/^([0-9]+)$/',$run,$match)
&& $run>0
&& !file_exists("$dir/steps/$run/deleted.$run")) {
$d = dir($dir."/steps/$run");
@@ -49,7 +49,7 @@ function load_experiment_info() {
}
$experiment[$id]->start = $stat[9];
}
-
+
reset($experiment);
while (list($id,$info) = each($experiment)) {
if (file_exists("$dir/evaluation/report.$id")) {
@@ -57,15 +57,15 @@ function load_experiment_info() {
foreach ($f as $line_num => $line) {
if (preg_match('/^(.+): (.+)/',$line,$match)) {
$experiment[$id]->result[$match[1]] = $match[2];
- if (!$evalset || !array_key_exists($match[1],$evalset)) {
- $evalset[$match[1]] = 0;
+ if (!$evalset || !array_key_exists($match[1],$evalset)) {
+ $evalset[$match[1]] = 0;
}
$evalset[$match[1]]++;
}
}
}
}
-
+
krsort($experiment);
uksort($evalset,"evalsetsort");
}
@@ -81,7 +81,7 @@ function load_parameter($run) {
if (file_exists($dir."/steps/new") ||
file_exists($dir."/steps/$run")) {
$file = file("$dir/steps/$run/parameter.$run");
- }
+ }
else {
$file = file("$dir/steps/parameter.$run");
}
@@ -121,7 +121,7 @@ function process_file_entry($dir,$entry) {
if (file_exists($file.".STDOUT")) { $stat2 = stat($file.".STDOUT"); }
if ($stat2[9] > $stat[9]) { $stat = $stat2; }
$time = $stat[9];
-
+
if (!$experiment || !array_key_exists($run,$experiment) ||
!property_exists($experiment[$run],"last_step_time") ||
$time > $experiment[$run]->last_step_time) {
@@ -142,7 +142,7 @@ function get_analysis_version($dir,$set,$id) {
#while(list($type,$i) = each($analysis_version[$id][$set])) {
# print "$type=$i ";
#}
- #print ") FROM CACHE<br>";
+ #print ") FROM CACHE<br>";
return $analysis_version[$id][$set];
}
$analysis_version[$id][$set]["basic"] = 0;
@@ -188,7 +188,7 @@ function get_analysis_version($dir,$set,$id) {
file_exists("$dir/model/biconcor.$match[1]")) {
$analysis_version[$id][$set]["biconcor"] = $match[1];
}
- }
+ }
}
# legacy stuff below...
@@ -225,7 +225,7 @@ function get_analysis_version($dir,$set,$id) {
#while(list($type,$i) = each($analysis_version[$id][$set])) {
# print "$type=$i ";
#}
- #print ") ZZ<br>";
+ #print ") ZZ<br>";
return $analysis_version[$id][$set];
}
diff --git a/scripts/ems/web/overview.php b/scripts/ems/web/overview.php
index ce0434bb8..534c7d8c0 100644
--- a/scripts/ems/web/overview.php
+++ b/scripts/ems/web/overview.php
@@ -39,7 +39,7 @@ function overview() {
$report_info = "$dir/steps/$id/REPORTING_report.$id.INFO";
// does the analysis file exist?
if (file_exists($analysis)) {
- if (!array_key_exists($set,$has_analysis)) {
+ if (!array_key_exists($set,$has_analysis)) {
$has_analysis[$set] = 0;
}
$has_analysis[$set]++;
@@ -138,7 +138,7 @@ new Ajax.Updater("<?php print "$module_step[0]-$module_step[1]-$id"; ?>", '?setS
}
}
}
- else { $score = ""; }
+ else { $score = ""; }
}
}
print "var best_score = [];\n";
@@ -158,9 +158,9 @@ function getHTTPObject(){
alert("Your browser does not support AJAX.");
return null;
}
-}
+}
function createCommentBox( runID ) {
- document.getElementById("run-" + runID).innerHTML = "<form onsubmit=\"return false;\"><input id=\"comment-" + runID + "\" name=\"comment-" + runID + "\" size=30><br><input type=submit onClick=\"addComment('" + runID + "');\" value=\"Add Comment\"></form>";
+ document.getElementById("run-" + runID).innerHTML = "<form onsubmit=\"return false;\"><input id=\"comment-" + runID + "\" name=\"comment-" + runID + "\" size=30><br><input type=submit onClick=\"addComment('" + runID + "');\" value=\"Add Comment\"></form>";
if (currentComment[runID]) {
document.getElementById("comment-" + runID).value = currentComment[runID];
}
@@ -196,7 +196,7 @@ function highlightBest() {
for (run in score) {
var column = "score-"+run+"-"+set;
if ($(column)) {
- if (score[run][set] == best_score[set]) {
+ if (score[run][set] == best_score[set]) {
$(column).setStyle({ backgroundColor: '#a0ffa0'});
}
else if (score[run][set]+1 >= best_score[set]) {
@@ -219,7 +219,7 @@ function highlightLine( id ) {
$(column).setStyle({ backgroundColor: '#ffffff'});
}
else {
- if (score[run][set] < score[id][set]-1) {
+ if (score[run][set] < score[id][set]-1) {
$(column).setStyle({ backgroundColor: '#ffa0a0'});
}
else if (score[run][set] < score[id][set]) {
@@ -234,7 +234,7 @@ function highlightLine( id ) {
}
}
}
- }
+ }
}
function lowlightAll() {
for (run in score) {
@@ -298,13 +298,13 @@ function output_score($id,$info) {
preg_match('/([\d\(\)\.\s]+) (METEOR[\-c]*)/',$each_score[$i],$match)) {
if ($i>0) { print "<BR>"; }
$opened_a_tag = 0;
- if ($set != "avg") {
+ if ($set != "avg") {
if (file_exists("$dir/evaluation/$set.cleaned.$id")) {
- print "<a href=\"?$state&show=evaluation/$set.cleaned.$id\">";
+ print "<a href=\"?$state&show=evaluation/$set.cleaned.$id\">";
$opened_a_tag = 1;
}
else if (file_exists("$dir/evaluation/$set.output.$id")) {
- print "<a href=\"?$state&show=evaluation/$set.output.$id\">";
+ print "<a href=\"?$state&show=evaluation/$set.output.$id\">";
$opened_a_tag = 1;
}
}
@@ -336,7 +336,7 @@ function tune_status($id) {
if (! file_exists($dir."/tuning/tmp.".$id)) { return ""; }
$d = dir($dir."/tuning/tmp.".$id);
while (false !== ($entry = $d->read())) {
- if (preg_match('/run(\d+).moses.ini/',$entry,$match)
+ if (preg_match('/run(\d+).moses.ini/',$entry,$match)
&& $match[1] > $max_iteration) {
$max_iteration = $match[1];
}
@@ -383,7 +383,7 @@ function show() {
}
$fullname = $dir."/steps/".$extra.$_GET["show"];
- if (preg_match("/\//",$_GET["show"])) {
+ if (preg_match("/\//",$_GET["show"])) {
$fullname = $dir."/".$_GET["show"];
}
if (preg_match("/graph/",$fullname)) {
diff --git a/scripts/ems/web/progress.perl b/scripts/ems/web/progress.perl
index fd742e410..fa2ce9e8f 100755
--- a/scripts/ems/web/progress.perl
+++ b/scripts/ems/web/progress.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/ems/web/sgviz.js b/scripts/ems/web/sgviz.js
index 03ad4741a..3e63260ee 100644
--- a/scripts/ems/web/sgviz.js
+++ b/scripts/ems/web/sgviz.js
@@ -54,7 +54,7 @@ function process_hypotheses() {
//
// INITIALIZATION
-//
+//
function index_hypotheses_by_cell() {
// init edge_lists
@@ -93,7 +93,7 @@ function find_reachable_hypotheses_recursive( id ) {
find_reachable_hypotheses_recursive( children[c] );
}
}
-
+
function compute_best_derivation_scores() {
for(var from=0; from<length; from++ ) {
cell_derivation_score[from] = Array();
@@ -147,7 +147,7 @@ function click_menu( id, force_flag ) {
return;
}
menu_processing = 1;
-
+
if (current_menu_selection == 1) { best_derivation(0); }
if (current_menu_selection == 2) { unannotate_cells(); }
if (current_menu_selection == 3) { unannotate_cells(); }
@@ -155,7 +155,7 @@ function click_menu( id, force_flag ) {
if (current_menu_selection == 5) { remove_non_terminal_treemap(0); }
if (current_menu_selection == 6 && SORT_OPTION != 3) { remove_hypothesis_overview(); }
if (current_menu_selection == 6 && SORT_OPTION == 3) { remove_hypothesis_overview(); remove_non_terminal_treemap(); }
- if (current_menu_selection > 0) {
+ if (current_menu_selection > 0) {
highlight_menu_button( current_menu_selection, 0 );
}
@@ -196,8 +196,8 @@ function draw_menu_button( id, label ) {
var content = document.createTextNode( label );
button_label.appendChild( content );
button_label.setAttribute("onclick","click_menu(" + id + ",0);")
-
- chart.appendChild( button_label );
+
+ chart.appendChild( button_label );
}
function highlight_menu_button( id, on_off ) {
@@ -251,8 +251,8 @@ function draw_option_button( rule_option, id, label ) {
button_label.setAttribute("pointer-events", "none");
var content = document.createTextNode( label );
button_label.appendChild( content );
-
- chart.appendChild( button_label );
+
+ chart.appendChild( button_label );
}
function draw_sort_button( id, label ) {
@@ -294,8 +294,8 @@ function draw_sort_button( id, label ) {
button_label.setAttribute("pointer-events", "none");
var content = document.createTextNode( label );
button_label.appendChild( content );
-
- chart.appendChild( button_label );
+
+ chart.appendChild( button_label );
}
function click_sort( id ) {
@@ -319,17 +319,17 @@ var show_scores = 0;
var show_id = 0;
var show_derivation = 0;
function click_option( id ) {
- if (id == 1) {
- show_scores = !show_scores;
+ if (id == 1) {
+ show_scores = !show_scores;
highlight_option_button( 0, 1, show_scores );
}
- if (id == 2) {
- show_derivation = !show_derivation;
+ if (id == 2) {
+ show_derivation = !show_derivation;
color_cells();
highlight_option_button( 0, 2, show_derivation );
}
- if (id == 3) {
- show_id = !show_id;
+ if (id == 3) {
+ show_id = !show_id;
highlight_option_button( 0, 3, show_id );
}
if (current_menu_selection > 0) {
@@ -340,15 +340,15 @@ function click_option( id ) {
var show_hyp_score = 0;
var show_derivation_score = 0;
function click_rule_option( id ) {
- if (id == 1) {
- show_hyp_score = !show_hyp_score;
+ if (id == 1) {
+ show_hyp_score = !show_hyp_score;
highlight_option_button( 1, 1, show_hyp_score );
}
- if (id == 2) {
- show_derivation_score = !show_derivation_score;
+ if (id == 2) {
+ show_derivation_score = !show_derivation_score;
highlight_option_button( 1, 2, show_derivation_score );
}
- if (id == 3) {
+ if (id == 3) {
if (ZOOM > 0) {
ZOOM = 0;
}
@@ -377,7 +377,7 @@ function draw_chart() {
for (var from=0;from<length;from++) {
for(var width=1; width<=length-from; width++) {
var to = from + width - 1;
-
+
// logical container
var container = document.createElementNS(xmlns,"svg");
container.setAttribute("id", "cell-container-" + from + "-" + to);
@@ -385,7 +385,7 @@ function draw_chart() {
var transform = document.createElementNS(xmlns,"g");
transform.setAttribute("id", "cell-" + from + "-" + to);
container.appendChild( transform );
-
+
// yellow box for the cell
var cell = document.createElementNS(xmlns,"rect");
cell.setAttribute("id", "cellbox-" + from + "-" + to);
@@ -403,7 +403,7 @@ function draw_chart() {
cell.setAttribute("onclick","click_cell(" + from + "," + to + ");")
transform.appendChild( cell );
}
-
+
// box for the input word
var input_box = document.createElementNS(xmlns,"rect");
input_box.setAttribute("id", "inputbox-" + from);
@@ -415,8 +415,8 @@ function draw_chart() {
input_box.setAttribute("height", CELL_HEIGHT/2);
//cell.setAttribute("opacity", .75);
input_box.setAttribute("fill", INPUT_REGULAR_COLOR);
- chart.appendChild( input_box );
-
+ chart.appendChild( input_box );
+
// input word
input_word = document.createElementNS(xmlns,"text");
input_word.setAttribute("id", "input-" + from);
@@ -426,7 +426,7 @@ function draw_chart() {
input_word.setAttribute("text-anchor", "middle");
var content = document.createTextNode( input[from] );
input_word.appendChild( content );
- chart.appendChild( input_word );
+ chart.appendChild( input_word );
}
assign_chart_coordinates();
}
@@ -435,7 +435,7 @@ function assign_chart_coordinates() {
for (var from=0;from<length;from++) {
for(var width=1; width<=length-from; width++) {
var to = from + width - 1;
-
+
var x = from*CELL_WIDTH + (width-1)*CELL_WIDTH/2;
var y = (length-width)*CELL_HEIGHT*(1-ZOOM);
//alert("(x,y) = (" + length + "," + width + "), width = " + ZOOM + ", height = " + (1-ZOOM));
@@ -505,7 +505,7 @@ var current_from = -1;
var current_to;
function hover_cell( from, to ) {
if (current_from >= 0) {
- highlight_input( current_from, current_to, 0)
+ highlight_input( current_from, current_to, 0)
}
highlight_input( from, to, 1)
current_from = from;
@@ -534,7 +534,7 @@ function highlight_input( from, to, on_off ) {
}
}
-//
+//
// VISUALIZATION OF CHART CELLS
//
@@ -570,7 +570,7 @@ function annotate_cells_with_rulecount() {
function annotate_cells_with_derivation_score() {
for (var from=0;from<length;from++) {
for(var width=1; width<=length-from; width++) {
- var to = from + width - 1;
+ var to = from + width - 1;
var score = cell_derivation_score[from][to];
if (score < -9e9) { score = "dead end"; }
annotate_cell( from, to, score, 15 )
@@ -596,12 +596,12 @@ function annotate_cell( from, to, label, font_size ) {
cell_label.setAttribute("pointer-events", "none");
cell_label.setAttribute("text-anchor", "middle");
var content = document.createTextNode(line[i]);
- cell_label.appendChild( content );
+ cell_label.appendChild( content );
cell_label_group.appendChild( cell_label );
}
var cell = document.getElementById("cell-" + from + "-" + to);
- cell.appendChild( cell_label_group );
+ cell.appendChild( cell_label_group );
}
function unannotate_cells() {
@@ -624,7 +624,7 @@ function unannotate_cell( from, to ) {
function non_terminal_treemap( with_hyps ) {
for (var from=0;from<length;from++) {
for(var width=1; width<=length-from; width++) {
- var to = from + width - 1;
+ var to = from + width - 1;
// get nt counts
var lhs = new Array();
var lhs_list = new Array();
@@ -639,7 +639,7 @@ function non_terminal_treemap( with_hyps ) {
lhs[nt]++;
}
}
- // sort
+ // sort
function sortByCount(a,b) {
return lhs[b] - lhs[a];
}
@@ -652,7 +652,7 @@ function non_terminal_treemap( with_hyps ) {
function remove_non_terminal_treemap() {
for (var from=0;from<length;from++) {
for(var width=1; width<=length-from; width++) {
- var to = from + width - 1;
+ var to = from + width - 1;
var cell = document.getElementById("cell-" + from + "-" + to);
var done = false;
var j=0;
@@ -693,7 +693,7 @@ function treemap_cell( from, to, label, count, total, with_hyps ) {
rect.setAttribute("stroke", "black");
rect.setAttribute("stroke-width", "0.5");
rect.setAttribute("onclick","click_menu(" + id + ",0);")
- cell.appendChild( rect );
+ cell.appendChild( rect );
x += width * count[label[i]] / total;
}
}
@@ -749,7 +749,7 @@ function treemap_squarify( from, to, label, count, total, with_hyps ) {
rect.setAttribute("pointer-events", "none");
rect.setAttribute("stroke", "black");
rect.setAttribute("stroke-width", "0.5");
- cell.appendChild( rect );
+ cell.appendChild( rect );
// hypotheses
if (with_hyps) {
var hyp_list = Array();
@@ -760,15 +760,15 @@ function treemap_squarify( from, to, label, count, total, with_hyps ) {
hyp_list.push( id );
}
}
- hypothesis_in_rect( this_width * scale_factor - 2,
- this_height * scale_factor - 2,
- CELL_MARGIN + (offset_x + cum_x) * scale_factor + 1,
- CELL_MARGIN + (offset_y + cum_y) * scale_factor + 1,
+ hypothesis_in_rect( this_width * scale_factor - 2,
+ this_height * scale_factor - 2,
+ CELL_MARGIN + (offset_x + cum_x) * scale_factor + 1,
+ CELL_MARGIN + (offset_y + cum_y) * scale_factor + 1,
cell, hyp_list );
}
// label
var font_size = Math.min( Math.round(this_width * scale_factor / label[j].length * 1.3),
- Math.round(this_height * scale_factor ));
+ Math.round(this_height * scale_factor ));
if (font_size > 20) { font_size = 20; }
if (font_size >= 3) {
var rect_label = document.createElementNS(xmlns,"text");
@@ -782,7 +782,7 @@ function treemap_squarify( from, to, label, count, total, with_hyps ) {
rect_label.setAttribute("pointer-events", "none");
var content = document.createTextNode( label[j] );
rect_label.appendChild( content );
- cell.appendChild( rect_label );
+ cell.appendChild( rect_label );
}
if (adding_on_left) { cum_y += this_height; }
else { cum_x += this_width; }
@@ -829,7 +829,7 @@ function best_derivation( on_off ) {
best_derivation_recurse( best_id, on_off, -1, -1, 0 );
}
-function best_derivation_recurse( id, on_off, parent_from, parent_to, child_pos ) {
+function best_derivation_recurse( id, on_off, parent_from, parent_to, child_pos ) {
var from = edge[id][FROM];
var to = edge[id][TO];
@@ -846,10 +846,10 @@ function best_derivation_recurse( id, on_off, parent_from, parent_to, child_pos
else {
unannotate_cell( from, to );
}
-
+
// highlight hyp
highlight_hyp( id, on_off );
-
+
// arrow to parent
if (parent_from >= 0) {
if (on_off) {
@@ -860,7 +860,7 @@ function best_derivation_recurse( id, on_off, parent_from, parent_to, child_pos
chart.removeChild(arrow);
}
}
-
+
var child_order = Array();
if (edge[id][ALIGNMENT] != "") {
var alignment = edge[id][ALIGNMENT].split(" ");
@@ -880,7 +880,7 @@ function best_derivation_recurse( id, on_off, parent_from, parent_to, child_pos
child_order[target_source[1]] = i;
}
}
-
+
// recurse
var covered = new Array;
var children = get_children( id );
@@ -888,7 +888,7 @@ function best_derivation_recurse( id, on_off, parent_from, parent_to, child_pos
var child = children[c];
for( var i=edge[child][FROM]; i<=edge[child][TO]; i++ ) {
covered[i] = 1;
- }
+ }
best_derivation_recurse( child, on_off, from, to, children.length == 1 ? 0.5 : child_order[c]/(children.length-1.0) );
}
@@ -900,10 +900,10 @@ function best_derivation_recurse( id, on_off, parent_from, parent_to, child_pos
}
else {
var arrow = document.getElementById("arrow-word-" + i);
- chart.removeChild(arrow);
+ chart.removeChild(arrow);
}
}
- }
+ }
}
function make_arrow( id, parent_from, parent_to, from, to, word_flag, position ) {
@@ -1034,7 +1034,7 @@ function hypothesis_in_rect( width, height, offset_x, offset_y, parent_element,
for (var i=0; i<hyp_list.length;i++) {
id = hyp_list[i];
-
+
//alert("adding circle (" + (x + diameter/2) + "," + (y + diameter/2) + ") - " + (diameter/2) );
var hyp = document.createElementNS(xmlns,"circle");
hyp.setAttribute("id", "hyp-" + id);
@@ -1045,7 +1045,7 @@ function hypothesis_in_rect( width, height, offset_x, offset_y, parent_element,
hyp.setAttribute("onmouseover","hover_hyp(" + id + ");")
hyp.setAttribute("onmouseout","unhover_hyp(" + id + ");")
parent_element.appendChild( hyp );
-
+
x += diameter;
if (++column >= row_size) {
column = 0;
@@ -1114,7 +1114,7 @@ function hyp_color( id, on_off ) {
if (on_off) {
var color = "#ff0000";
if (edge[id][RECOMBINED]>0) { color = "#808080"; }
- else if (id in reachable) { color = "#00c000"; }
+ else if (id in reachable) { color = "#00c000"; }
return color;
}
var color = "#ffc0c0";
@@ -1136,26 +1136,26 @@ function get_rule( id ) {
var source_target = alignment[i].split("-");
nt_label.push(output[source_target[1]]);
}
-
+
var rule = edge[id][LHS]+"\u2192";
var children = get_children(id);
var pos = edge[id][FROM];
for (var i=0; i<children.length; i++) {
if (pos != edge[id][FROM]) { rule += " "; }
var child = children[i];
- for(;pos<edge[child][FROM];pos++) {
- rule += (input[pos].length <= 10) ? input[pos] : input[pos].substr(0,8) + ".";
+ for(;pos<edge[child][FROM];pos++) {
+ rule += (input[pos].length <= 10) ? input[pos] : input[pos].substr(0,8) + ".";
rule += " ";
}
rule += nt_label[i];
- rule += (edge[child][FROM] == edge[child][TO]) ?
+ rule += (edge[child][FROM] == edge[child][TO]) ?
"[" + edge[child][FROM] + "]" :
"[" + edge[child][FROM] + "-" + edge[child][TO] + "]";
pos = edge[child][TO]+1;
}
- for(;pos<=edge[id][TO];pos++) {
+ for(;pos<=edge[id][TO];pos++) {
if (pos != edge[id][FROM]) { rule += " "; }
- rule += (input[pos].length <= 10) ? input[pos] : input[pos].substr(0,8) + ".";
+ rule += (input[pos].length <= 10) ? input[pos] : input[pos].substr(0,8) + ".";
}
return rule;
@@ -1176,10 +1176,10 @@ function show_rules( from, to ) {
cell.setAttribute("stroke-width", "3");
current_rule_from = from;
current_rule_to = to;
-
+
best_hyp_score = -9e9;
best_derivation_score = cell_derivation_score[from][to];
-
+
var rule_hash = Array();
var rule_count = Array();
rule_list = Array();
@@ -1196,16 +1196,16 @@ function show_rules( from, to ) {
rule_count[rule_hash[rule]]++;
}
edge2rule[id] = rule_hash[rule];
-
- if (edge[id][HYP_SCORE] > best_hyp_score) {
- best_hyp_score = edge[id][HYP_SCORE];
+
+ if (edge[id][HYP_SCORE] > best_hyp_score) {
+ best_hyp_score = edge[id][HYP_SCORE];
}
}
function sortByRuleCount( a, b ) {
return rule_count[rule_hash[b]] - rule_count[rule_hash[a]];
}
rule_list = rule_list.sort(sortByRuleCount);
-
+
RULE_HEIGHT = 15;
RULE_FONT_SIZE = 11;
// squeeze if too many rules
@@ -1214,7 +1214,7 @@ function show_rules( from, to ) {
RULE_HEIGHT = Math.floor( RULE_HEIGHT * factor );
RULE_FONT_SIZE = Math.ceil( RULE_FONT_SIZE * factor );
}
-
+
draw_rule_options();
for(var i=-1; i<rule_list.length; i++) {
draw_rule(from, to, i);
@@ -1243,7 +1243,7 @@ function unshow_rules() {
finished = 0;
for(var i=1; !finished; i++) {
var old = document.getElementById("rule-option-" + i);
- if (old != null) {
+ if (old != null) {
chart.removeChild( old );
var old = document.getElementById("rule-option-label-" + i);
chart.removeChild( old );
@@ -1306,12 +1306,12 @@ function click_rule( from, to, rule_id ) {
// highlight current rule
if (current_rule_id>=0) {
var rule_label = document.getElementById("rule-"+current_rule_id);
- rule_label.setAttribute("style", "font-size: "+RULE_FONT_SIZE+"; font-family: Verdana, Arial;");
+ rule_label.setAttribute("style", "font-size: "+RULE_FONT_SIZE+"; font-family: Verdana, Arial;");
}
var rule_label = document.getElementById("rule-"+rule_id);
rule_label.setAttribute("style", "font-size: "+RULE_FONT_SIZE+"; font-family: Verdana, Arial; font-weight: bold;");
current_rule_id = rule_id;
-
+
// first get all the data
output_list = Array();
var output_hash = Array();
@@ -1332,7 +1332,7 @@ function click_rule( from, to, rule_id ) {
// create index for children
var children = get_children( id );
for(var j=0;j<children.length;j++) {
- // init children indices if needed
+ // init children indices if needed
if (j > children_list.length-1) {
children_hash.push([]);
children_list.push([]);
@@ -1346,7 +1346,7 @@ function click_rule( from, to, rule_id ) {
}
}
}
-
+
// sort
function sortBySecond(a,b) {
asplit = a.split("|");
@@ -1361,7 +1361,7 @@ function click_rule( from, to, rule_id ) {
for(var i=0;i<children.length;i++) {
children_list[i].sort(sortHypByScore);
}
-
+
// select dimensions of rule cube
axis = Array();
axis.push(output_list);
@@ -1394,7 +1394,7 @@ function draw_rule_cube(z_pos_string) {
if (z_pos_string != "") {
z_pos = z_pos_string.split(",");
}
-
+
// draw rube cube
var old = document.getElementById("rule-cube");
if (old != null) { chart.removeChild( old ); }
@@ -1439,7 +1439,7 @@ function draw_rule_cube(z_pos_string) {
}
else if (max_length+8 > CHART_HEIGHT/9) {
RULE_CUBE_HYP_SIZE = 9;
- RULE_CUBE_FONT_SIZE = 7;
+ RULE_CUBE_FONT_SIZE = 7;
}
else {
RULE_CUBE_HYP_SIZE = CHART_HEIGHT/(max_length+8);
@@ -1455,7 +1455,7 @@ function draw_rule_cube(z_pos_string) {
rule_cube.setAttribute("x", CHART_WIDTH - 30);
rule_cube.setAttribute("y", 0);
chart.appendChild( rule_cube );
-
+
// draw y axis
var label = get_rule_axis_name(dimension_order[0]);
draw_rule_row(-1,label);
@@ -1470,7 +1470,7 @@ function draw_rule_cube(z_pos_string) {
// draw x axis
if (axis.length > 1) {
var label = get_rule_axis_name(dimension_order[1]);
- draw_rule_column(-1,label);
+ draw_rule_column(-1,label);
for(var x=0; x<axis[dimension_order[1]].length && x<CHART_HEIGHT/9-10; x++) {
var label = get_rule_axis_label(dimension_order[1], x);
draw_rule_column(x,label);
@@ -1478,7 +1478,7 @@ function draw_rule_cube(z_pos_string) {
if (axis[dimension_order[1]].length > CHART_HEIGHT/9-10) {
draw_rule_column(Math.ceil(CHART_HEIGHT/9-10),"(more, "+axis[dimension_order[1]].length+" total)");
}
- }
+ }
// draw hyps
for(var y=0; y<axis[dimension_order[0]].length && y<(CHART_HEIGHT-Z_HEIGHT)/9-10; y++) {
@@ -1493,8 +1493,8 @@ function draw_rule_cube(z_pos_string) {
}
}
}
-
-
+
+
// draw z-axes
var pos_offset = axis[dimension_order[0]].length+2;
for(var z=2;z<dimension_order.length;z++) {
@@ -1506,7 +1506,7 @@ function draw_rule_cube(z_pos_string) {
}
pos_offset += axis[dimension_order[z]].length+2;
}
-
+
// report summary statistics
var message = output_list.length + " output phrases";
message += "<br>DEBUG: " + axis.length;
@@ -1514,7 +1514,7 @@ function draw_rule_cube(z_pos_string) {
for(var i=0;i<children_list.length;i++) {
message += "<br>" + children_list[i].length + " hyps for NT" + (i+1);
}
- //draw_rule_message(message);
+ //draw_rule_message(message);
}
function find_hyp_by_rule(position, dimension_order) {
@@ -1524,14 +1524,14 @@ function find_hyp_by_rule(position, dimension_order) {
var match = 1;
for(var p=0; p<position.length; p++) {
if (dimension_order[p] == 0) {
- if (output_list[position[p]] != edge[id][OUTPUT]+"|"+edge[id][HEURISTIC_RULE_SCORE]) {
+ if (output_list[position[p]] != edge[id][OUTPUT]+"|"+edge[id][HEURISTIC_RULE_SCORE]) {
match = 0;
}
}
else {
var nt_number = dimension_order[p]-1;
- if (children_list[nt_number][position[p]] != children[nt_number]) {
- match = 0;
+ if (children_list[nt_number][position[p]] != children[nt_number]) {
+ match = 0;
}
}
}
@@ -1580,7 +1580,7 @@ function get_full_output( id ) {
var source_target = alignment[i].split("-");
nonterminal[source_target[1]] = children[i];
}
-
+
var full_output = "";
for(var i=0;i<output.length;i++) {
if (nonterminal[i] === undefined) {
@@ -1647,7 +1647,7 @@ function draw_rule_z( z,total_z, z_pos, pos,pos_offset, label ) {
else {
rule_label.setAttribute("style", "font-size: "+(RULE_CUBE_FONT_SIZE-2)+"; font-family: Verdana, Arial; font-weight: bold;");
}
-
+
var content = document.createTextNode( label );
rule_label.appendChild( content );
var rule_cube = document.getElementById("rule-cube");
@@ -1686,7 +1686,7 @@ function rule_hyp_color( id, on_off ) {
}
else {
derivation_score_color = get_score_from_color(best_derivation_score-edge[id][DERIVATION_SCORE]);
- }
+ }
}
return "#" + inactive_color + derivation_score_color + hyp_score_color;
}
diff --git a/scripts/ems/web/sgviz.php b/scripts/ems/web/sgviz.php
index bac9289ba..9fccadf60 100644
--- a/scripts/ems/web/sgviz.php
+++ b/scripts/ems/web/sgviz.php
@@ -39,7 +39,7 @@ new Ajax.Request('?analysis=sgviz_data'
method: "post"
});
</script></body></html>
-<?php
+<?php
// read graph
//$file = get_current_analysis_filename("basic","search-graph")."/graph.$sentence";
//$handle = fopen($file,"r");
@@ -55,7 +55,7 @@ function sgviz_data($sentence) {
$file = get_current_analysis_filename("basic","search-graph")."/graph.$sentence";
$handle = fopen($file,"r");
- while (($line = fgets($handle)) !== false) {
+ while (($line = fgets($handle)) !== false) {
$e = explode("\t",addslashes(chop($line)));
$edge[$e[0]] = array($e[1],$e[2],$e[3],$e[4],$e[5],$e[6],$e[7],$e[8],$e[9],$e[10]);
}
diff --git a/scripts/fuzzy-match/create_xml.perl b/scripts/fuzzy-match/create_xml.perl
index 80a1b3120..4ab281eae 100755
--- a/scripts/fuzzy-match/create_xml.perl
+++ b/scripts/fuzzy-match/create_xml.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
binmode( STDIN, ":utf8" );
binmode( STDOUT, ":utf8" );
diff --git a/scripts/generic/compound-splitter.perl b/scripts/generic/compound-splitter.perl
index c0b25f519..b39d4d660 100755
--- a/scripts/generic/compound-splitter.perl
+++ b/scripts/generic/compound-splitter.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -98,8 +98,8 @@ sub train_factored {
my $count = $FACTORED_COUNT{$word}{$factored_word};
$total += $count;
if ($count > $max) {
- $max = $count;
- $best = $factored_word;
+ $max = $count;
+ $best = $factored_word;
}
}
$COUNT{$best} = $total;
@@ -132,8 +132,8 @@ sub train_syntax {
my $count = $LABELED_COUNT{$word}{$label};
$total += $count;
if ($count > $max) {
- $max = $count;
- $best = "$word $label";
+ $max = $count;
+ $best = "$word $label";
}
}
$COUNT{$best} = $total;
@@ -165,7 +165,7 @@ sub apply {
chop; s/\s+/ /g; s/^ //; s/ $//;
my @BUFFER; # for xml tags
foreach my $factored_word (split) {
- print " " unless $first;
+ print " " unless $first;
$first = 0;
# syntax: don't split xml
@@ -174,12 +174,12 @@ sub apply {
$first = 1;
next;
}
-
+
# get case class
my $word = $factored_word;
$word =~ s/\|.+//g; # just first factor
my $lc = lc($word);
-
+
print STDERR "considering $word ($lc)...\n" if $VERBOSE;
# don't split frequent words
if ((defined($COUNT{$lc}) && $COUNT{$lc}>=$MAX_COUNT) ||
@@ -194,7 +194,7 @@ sub apply {
my $final = length($word)-1;
my %REACHABLE;
for(my $i=0;$i<=$final;$i++) { $REACHABLE{$i} = (); }
-
+
print STDERR "splitting $word:\n" if $VERBOSE;
for(my $end=$MIN_SIZE;$end<length($word);$end++) {
for(my $start=0;$start<=$end-$MIN_SIZE;$start++) {
@@ -205,10 +205,10 @@ sub apply {
my $subword = lc(substr($word,
$start+length($filler),
$end-$start+1-length($filler)));
- next unless defined($COUNT{$subword});
+ next unless defined($COUNT{$subword});
next unless $COUNT{$subword} >= $MIN_COUNT;
print STDERR "\tmatching word $start .. $end ($filler)$subword $COUNT{$subword}\n" if $VERBOSE;
- push @{$REACHABLE{$end}},"$start $TRUECASE{$subword} $COUNT{$subword}";
+ push @{$REACHABLE{$end}},"$start $TRUECASE{$subword} $COUNT{$subword}";
}
}
}
@@ -230,7 +230,7 @@ sub apply {
my ($pos,$decomp,$score,$num,@INDEX) = ($final,"",1,0);
while($pos>0) {
last unless scalar @{$REACHABLE{$pos}} > $ITERATOR{$pos}; # dead end?
- my ($nextpos,$subword,$count)
+ my ($nextpos,$subword,$count)
= split(/ /,$REACHABLE{$pos}[ $ITERATOR{$pos} ]);
$decomp = $subword." ".$decomp;
$score *= $count;
@@ -243,7 +243,7 @@ sub apply {
chop($decomp);
print STDERR "\tsplit: $decomp ($score ** 1/$num) = ".($score ** (1/$num))."\n" if $VERBOSE;
$score **= 1/$num;
- if ($score>$best_score) {
+ if ($score>$best_score) {
$best_score = $score;
$best_split = $decomp;
}
@@ -256,7 +256,7 @@ sub apply {
last if scalar @{$REACHABLE{$increase}} > $ITERATOR{$increase};
}
last unless scalar @{$REACHABLE{$final}} > $ITERATOR{$final};
- for(my $i=0;$i<$increase;$i++) { $ITERATOR{$i}=0; }
+ for(my $i=0;$i<$increase;$i++) { $ITERATOR{$i}=0; }
}
if ($best_split !~ / /) {
print join(" ",@BUFFER)." " if scalar(@BUFFER); @BUFFER = (); # clear buffer
diff --git a/scripts/generic/extract-factors.pl b/scripts/generic/extract-factors.pl
index 56c719051..38cf97bd4 100755
--- a/scripts/generic/extract-factors.pl
+++ b/scripts/generic/extract-factors.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
#extract-factors.pl: extract only the desired factors from a factored corpus
diff --git a/scripts/generic/extract-parallel.perl b/scripts/generic/extract-parallel.perl
index fe5666a8b..87f36d680 100755
--- a/scripts/generic/extract-parallel.perl
+++ b/scripts/generic/extract-parallel.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# example
# ./extract-parallel.perl 8 ./coreutils-8.9/src/split "./coreutils-8.9/src/sort --batch-size=253" ./extract ./corpus.5.en ./corpus.5.ar ./align.ar-en.grow-diag-final-and ./extracted 7 --NoFileLimit orientation --GZOutput
@@ -94,7 +94,7 @@ if ($numParallel > 1)
$cmd = "$splitCmd $splitCmdOption -l $linesPerSplit -a 7 $target $TMPDIR/target.";
$pid = RunFork($cmd);
push(@children, $pid);
-
+
$cmd = "$splitCmd $splitCmdOption -l $linesPerSplit -a 7 $source $TMPDIR/source.";
$pid = RunFork($cmd);
push(@children, $pid);
@@ -108,7 +108,7 @@ if ($numParallel > 1)
$pid = RunFork($cmd);
push(@children, $pid);
}
-
+
# wait for everything is finished
foreach (@children) {
waitpid($_, 0);
@@ -139,7 +139,7 @@ else
for (my $i = 0; $i < $numParallel; ++$i)
{
my $pid = fork();
-
+
if ($pid == 0)
{ # child
my $numStr = NumStr($i);
@@ -251,8 +251,8 @@ if ($phraseOrientation && defined($phraseOrientationPriorsFile)) {
foreach my $filenamePhraseOrientationPriors (@orientationPriorsCountFiles) {
if (-f $filenamePhraseOrientationPriors) {
open my $infilePhraseOrientationPriors, '<', $filenamePhraseOrientationPriors or die "cannot open $filenamePhraseOrientationPriors: $!";
- while (my $line = <$infilePhraseOrientationPriors>) {
- print $line;
+ while (my $line = <$infilePhraseOrientationPriors>) {
+ print $line;
my ($key, $value) = split / /, $line;
$priorCounts{$key} += $value;
}
@@ -281,7 +281,7 @@ sub RunFork($)
my $cmd = shift;
my $pid = fork();
-
+
if ($pid == 0)
{ # child
print STDERR $cmd;
diff --git a/scripts/generic/fsa2fsal.pl b/scripts/generic/fsa2fsal.pl
index 50bff1404..7dc7751ee 100755
--- a/scripts/generic/fsa2fsal.pl
+++ b/scripts/generic/fsa2fsal.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# A very simple script that converts fsa format (openfst lattices) to the same
# thing represented one sentence per line. It uses '|||' to delimit columns and
# ' ' to delimit nodes (i.e. original lines).
diff --git a/scripts/generic/fsa2plf.pl b/scripts/generic/fsa2plf.pl
index 4e7454a9f..07c8a4cc1 100755
--- a/scripts/generic/fsa2plf.pl
+++ b/scripts/generic/fsa2plf.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# Converts AT&T FSA format to 'python lattice format'.
# Note that the input FSA needs to be epsilon-free and topologically sorted.
# This script checks for topological sortedness.
@@ -66,7 +66,7 @@ foreach my $inf (@infiles) {
# final nodes can have a cost
die "$inf:$nr:Final state $src has cost $tgt. Unsupported, use --ignore-final-state-cost"
if defined $tgt && !$ignore_final_state_cost;
-
+
next;
}
$weight = 0 if !defined $weight;
@@ -107,7 +107,7 @@ foreach my $inf (@infiles) {
next if defined $denseids{$id};
$denseids{$id} = $nextid;
}
-
+
foreach my $f (keys %is_final) {
if (defined $outnodes[$f]) {
print STDERR "$inf:Node $f is final but it has outgoing edges!\n";
@@ -118,7 +118,7 @@ foreach my $inf (@infiles) {
# foreach my $src (sort {$a<=>$b} keys %denseids) {
# print STDERR "$src ...> $denseids{$src}\n";
# }
-
+
print "(";
for(my $origsrc = 0; $origsrc < @outnodes; $origsrc++) {
my $src = $denseids{$origsrc};
diff --git a/scripts/generic/fsal2fsa.pl b/scripts/generic/fsal2fsa.pl
index d1aa461ac..a21305dad 100755
--- a/scripts/generic/fsal2fsa.pl
+++ b/scripts/generic/fsal2fsa.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# A very simple script that converts fsal back to fsa format (openfst lattices)
# Ondrej Bojar, bojar@ufal.mff.cuni.cz
diff --git a/scripts/generic/generic-parallel.perl b/scripts/generic/generic-parallel.perl
index 653912c5c..a9bc73d85 100755
--- a/scripts/generic/generic-parallel.perl
+++ b/scripts/generic/generic-parallel.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -27,7 +27,7 @@ for (my $i = 2; $i < scalar(@ARGV); ++$i)
open (INPUT_ALL, "> $TMPDIR/input.all");
binmode INPUT_ALL, ":utf8";
while (my $line = <STDIN>)
-{
+{
chomp($line);
print INPUT_ALL $line."\n";
}
@@ -45,7 +45,7 @@ print STDERR "executing\n";
my $i = 0;
my $filePath = "$TMPDIR/x" .NumStr($i);
-while (-f $filePath)
+while (-f $filePath)
{
print EXEC "$cmd < $filePath > $filePath.out\n";
@@ -63,7 +63,7 @@ print STDERR "concatenating\n";
$i = 1;
my $firstPath = "$TMPDIR/x" .NumStr(0) .".out";
$filePath = "$TMPDIR/x" .NumStr($i) .".out";
-while (-f $filePath)
+while (-f $filePath)
{
$cmd = "cat $filePath >> $firstPath";
`$cmd`;
@@ -76,7 +76,7 @@ while (-f $filePath)
open (OUTPUT_ALL, "$firstPath");
binmode OUTPUT_ALL, ":utf8";
while (my $line = <OUTPUT_ALL>)
-{
+{
chomp($line);
print "$line\n";
}
diff --git a/scripts/generic/giza-parallel.perl b/scripts/generic/giza-parallel.perl
index 8793d3d8e..9a6516a8f 100755
--- a/scripts/generic/giza-parallel.perl
+++ b/scripts/generic/giza-parallel.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# example
# ~/giza-parallel.perl 10 split ~/workspace/sourceforge/trunk/scripts/training/train-model.perl ar en train align
@@ -47,7 +47,7 @@ my @childs;
for (my $i = 0; $i < $numParallel; ++$i)
{
my $pid = fork();
-
+
if ($pid == 0)
{ # child
$isParent = 0;
@@ -73,7 +73,7 @@ for (my $i = 0; $i < $numParallel; ++$i)
}
else
{ # parent
- push(@childs, $pid);
+ push(@childs, $pid);
}
}
diff --git a/scripts/generic/lopar2pos.pl b/scripts/generic/lopar2pos.pl
index c75069135..2b9245e0f 100755
--- a/scripts/generic/lopar2pos.pl
+++ b/scripts/generic/lopar2pos.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
#lopar2pos: extract POSs from LOPAR output
diff --git a/scripts/generic/moses-parallel.pl b/scripts/generic/moses-parallel.pl
index 7c0f56c70..eb51daa98 100755
--- a/scripts/generic/moses-parallel.pl
+++ b/scripts/generic/moses-parallel.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
#######################
@@ -19,7 +19,7 @@ use warnings;
use strict;
#######################
-#Customizable parameters
+#Customizable parameters
#parameters for submiiting processes through Sun GridEngine
my $queueparameters="";
@@ -29,7 +29,7 @@ my $queueparameters="";
# my $queueparameters="-l q1dm -pe pe_mth 2 -hard";
# etc.
-# look for the correct pwdcmd
+# look for the correct pwdcmd
my $pwdcmd = getPwdCmd();
my $workingdir = `$pwdcmd`; chomp $workingdir;
@@ -39,7 +39,7 @@ my $splitpfx="split$$";
$SIG{'INT'} = \&kill_all_and_quit; # catch exception for CTRL-C
#######################
-#Default parameters
+#Default parameters
my $jobscript="$workingdir/job$$";
my $qsubout="$workingdir/out.job$$";
my $qsuberr="$workingdir/err.job$$";
@@ -116,7 +116,7 @@ sub init(){
getSearchGraphParameters();
getWordGraphParameters();
-
+
getLogParameters();
#print_parameters();
@@ -205,7 +205,7 @@ sub print_parameters(){
print STDERR "Inputtype: text\n" if $inputtype == 0;
print STDERR "Inputtype: confusion network\n" if $inputtype == 1;
print STDERR "Inputtype: lattices\n" if $inputtype == 2;
-
+
print STDERR "parameters directly passed to Moses: $mosesparameters\n";
}
@@ -395,7 +395,7 @@ elsif ($inputtype==1){ #confusion network input
$cmd="split $decimal -a 2 -l $splitN $tmpfile $tmpfile-";
safesystem("$cmd") or die;
-
+
my @idxlist=();
chomp(@idxlist=`ls $tmpfile-*`);
grep(s/.+(\-\S+)$/$1/e,@idxlist);
@@ -456,7 +456,7 @@ while ($robust && scalar @idx_todo) {
$batch_and_join = "-b no -j yes";
}
$cmd="qsub $queueparameters $batch_and_join -o $qsubout$idx -e $qsuberr$idx -N $qsubname$idx ${jobscript}${idx}.bash > ${jobscript}${idx}.log 2>&1";
- print STDERR "$cmd\n" if $dbg;
+ print STDERR "$cmd\n" if $dbg;
safesystem($cmd) or die;
@@ -492,7 +492,7 @@ while ($robust && scalar @idx_todo) {
# start the 'hold' job, i.e. the job that will wait
$cmd="qsub -cwd $queueparameters $hj -o $checkpointfile -e /dev/null -N $qsubname.W $syncscript 2> $qsubname.W.log";
safesystem($cmd) or kill_all_and_quit();
-
+
# and wait for checkpoint file to appear
my $nr=0;
while (!-e $checkpointfile) {
@@ -502,7 +502,7 @@ while ($robust && scalar @idx_todo) {
}
print STDERR "End of waiting.\n";
safesystem("\\rm -f $checkpointfile $syncscript") or kill_all_and_quit();
-
+
my $failure = 1;
my $nr = 0;
while ($nr < 60 && $failure) {
@@ -542,22 +542,22 @@ while ($robust && scalar @idx_todo) {
print STDERR "some jobs crashed: ".join(" ",@idx_still_todo)."\n";
kill_all_and_quit();
}
-
+
}
}
#concatenating translations and removing temporary files
concatenate_1best();
concatenate_logs() if $logflag;
-concatenate_ali() if defined $alifile;
-concatenate_details() if defined $detailsfile;
-concatenate_nbest() if $nbestflag;
+concatenate_ali() if defined $alifile;
+concatenate_details() if defined $detailsfile;
+concatenate_nbest() if $nbestflag;
safesystem("cat nbest$$ >> /dev/stdout") if $nbestlist[0] eq '-';
-concatenate_searchgraph() if $searchgraphflag;
+concatenate_searchgraph() if $searchgraphflag;
safesystem("cat searchgraph$$ >> /dev/stdout") if $searchgraphlist eq '-';
-concatenate_wordgraph() if $wordgraphflag;
+concatenate_wordgraph() if $wordgraphflag;
safesystem("cat wordgraph$$ >> /dev/stdout") if $wordgraphlist[0] eq '-';
remove_temporary_files();
@@ -566,7 +566,7 @@ remove_temporary_files();
#script creation
sub preparing_script(){
my $currStartTranslationId = 0;
-
+
foreach my $idx (@idxlist){
my $scriptheader="";
$scriptheader.="\#\! /bin/bash\n\n";
@@ -653,7 +653,7 @@ sub preparing_script(){
#setting permissions of each script
chmod(oct(755),"${jobscript}${idx}.bash");
-
+
$currStartTranslationId += $splitN;
}
}
@@ -683,8 +683,8 @@ sub concatenate_wordgraph(){
my $code="";
if (/^UTTERANCE=/){
($code)=($_=~/^UTTERANCE=(\d+)/);
-
- print STDERR "code:$code offset:$offset\n";
+
+ print STDERR "code:$code offset:$offset\n";
$code += $offset;
if ($code ne $oldcode){
@@ -695,11 +695,11 @@ sub concatenate_wordgraph(){
while ($code - $oldcode > 1){
$oldcode++;
print OUT "UTTERANCE=$oldcode\n";
- print STDERR " to OUT -> code:$oldcode\n";
+ print STDERR " to OUT -> code:$oldcode\n";
print OUT "_EMPTYWORDGRAPH_\n";
}
}
-
+
$oldcode=$code;
print OUT "UTTERANCE=$oldcode\n";
next;
@@ -772,14 +772,14 @@ sub concatenate_nbest(){
my $newcode=-1;
my %inplength = ();
my $offset = 0;
-
+
# get the list of feature and set a fictitious string with zero scores
open (IN, "${nbestfile}.${splitpfx}$idxlist[0]");
my $str = <IN>;
chomp($str);
close(IN);
my ($code,$trans,$featurescores,$globalscore)=split(/\|\|\|/,$str);
-
+
my $emptytrans = " ";
my $emptyglobalscore = " 0.0";
my $emptyfeaturescores = $featurescores;
@@ -923,7 +923,7 @@ sub check_translation(){
die "INPUTTYPE:$inputtype is unknown!\n";
}
chomp($outputN=`wc -l ${inputfile}.$splitpfx$idx.trans | cut -d' ' -f1`);
-
+
if ($inputN != $outputN){
print STDERR "Split ($idx) were not entirely translated\n";
print STDERR "outputN=$outputN inputN=$inputN\n";
@@ -960,9 +960,9 @@ sub check_translation_old_sge(){
print STDERR "outputfile=${inputfile}.$splitpfx$idx.trans inputfile=${inputfile}.$splitpfx$idx\n";
return 1;
}
-
+
}
- return 0;
+ return 0;
}
sub remove_temporary_files(){
diff --git a/scripts/generic/mteval-v12.pl b/scripts/generic/mteval-v12.pl
index 360376242..2666c8012 100755
--- a/scripts/generic/mteval-v12.pl
+++ b/scripts/generic/mteval-v12.pl
@@ -1,5 +1,5 @@
-#!/usr/bin/env perl
-
+#!/usr/bin/env perl
+
use warnings;
use strict;
use utf8;
@@ -7,7 +7,7 @@ use Encode;
binmode STDOUT, ":utf8";
binmode STDERR, ":utf8";
-
+
#################################
# History:
#
@@ -116,7 +116,7 @@ my $usage = "\n\nUsage: $0 [-h] -r <ref_file> -s <src_file> -t <tst_file>\n\n".
" -e enclose non-ASCII characters between spaces\n".
" -h prints this help message to STDOUT\n".
"\n";
-
+
use vars qw ($opt_r $opt_s $opt_t $opt_d $opt_h $opt_b $opt_n $opt_c $opt_x $opt_e);
use Getopt::Std;
getopts ('r:s:t:d:hbncx:e');
@@ -133,11 +133,11 @@ my $METHOD = "BOTH";
if (defined $opt_b) { $METHOD = "BLEU"; }
if (defined $opt_n) { $METHOD = "NIST"; }
my $method;
-
+
my ($ref_file) = $opt_r;
my ($src_file) = $opt_s;
my ($tst_file) = $opt_t;
-
+
######
# Global variables
my ($src_lang, $tgt_lang, @tst_sys, @ref_sys); # evaluation parameters
@@ -145,30 +145,30 @@ my (%tst_data, %ref_data); # the data -- with structure: {system}{document}[seg
my ($src_id, $ref_id, $tst_id); # unique identifiers for ref and tst translation sets
my %eval_docs; # document information for the evaluation data set
my %ngram_info; # the information obtained from (the last word in) the ngram
-
+
######
# Get source document ID's
($src_id) = get_source_info ($src_file);
-
+
######
# Get reference translations
($ref_id) = get_MT_data (\%ref_data, "RefSet", $ref_file);
-
+
compute_ngram_info ();
-
+
######
# Get translations to evaluate
($tst_id) = get_MT_data (\%tst_data, "TstSet", $tst_file);
-
+
######
# Check data for completeness and correctness
check_MT_data ();
-
+
######
#
my %NISTmt = ();
my %BLEUmt = ();
-
+
######
# Evaluate
print " Evaluation of $src_lang-to-$tgt_lang translation using:\n";
@@ -179,7 +179,7 @@ foreach my $doc (sort keys %eval_docs) {
print " src set \"$src_id\" (", scalar keys %eval_docs, " docs, $cum_seg segs)\n";
print " ref set \"$ref_id\" (", scalar keys %ref_data, " refs)\n";
print " tst set \"$tst_id\" (", scalar keys %tst_data, " systems)\n\n";
-
+
foreach my $sys (sort @tst_sys) {
for (my $n=1; $n<=$max_Ngram; $n++) {
$NISTmt{$n}{$sys}{cum} = 0;
@@ -187,7 +187,7 @@ foreach my $sys (sort @tst_sys) {
$BLEUmt{$n}{$sys}{cum} = 0;
$BLEUmt{$n}{$sys}{ind} = 0;
}
-
+
if (($METHOD eq "BOTH") || ($METHOD eq "NIST")) {
$method="NIST";
score_system ($sys, %NISTmt);
@@ -197,44 +197,44 @@ foreach my $sys (sort @tst_sys) {
score_system ($sys, %BLEUmt);
}
}
-
+
######
printout_report ();
-
+
($date, $time) = date_time_stamp();
print "MT evaluation scorer ended on $date at $time\n";
-
+
exit 0;
-
+
#################################
-
+
sub get_source_info {
-
+
my ($file) = @_;
my ($name, $id, $src, $doc);
my ($data, $tag, $span);
-
-
+
+
#read data from file
open (FILE, $file) or die "\nUnable to open translation data file '$file'", $usage;
binmode FILE, ":utf8";
$data .= $_ while <FILE>;
close (FILE);
-
+
#get source set info
die "\n\nFATAL INPUT ERROR: no 'src_set' tag in src_file '$file'\n\n"
unless ($tag, $span, $data) = extract_sgml_tag_and_span ("SrcSet", $data);
-
+
die "\n\nFATAL INPUT ERROR: no tag attribute '$name' in file '$file'\n\n"
unless ($id) = extract_sgml_tag_attribute ($name="SetID", $tag);
-
+
die "\n\nFATAL INPUT ERROR: no tag attribute '$name' in file '$file'\n\n"
unless ($src) = extract_sgml_tag_attribute ($name="SrcLang", $tag);
die "\n\nFATAL INPUT ERROR: $name ('$src') in file '$file' inconsistent\n"
." with $name in previous input data ('$src_lang')\n\n"
unless (not defined $src_lang or $src eq $src_lang);
$src_lang = $src;
-
+
#get doc info -- ID and # of segs
$data = $span;
while (($tag, $span, $data) = extract_sgml_tag_and_span ("Doc", $data)) {
@@ -254,51 +254,51 @@ sub get_source_info {
unless keys %eval_docs > 0;
return $id;
}
-
+
#################################
-
+
sub get_MT_data {
-
+
my ($docs, $set_tag, $file) = @_;
my ($name, $id, $src, $tgt, $sys, $doc);
my ($tag, $span, $data);
-
+
#read data from file
open (FILE, $file) or die "\nUnable to open translation data file '$file'", $usage;
binmode FILE, ":utf8";
$data .= $_ while <FILE>;
close (FILE);
-
+
#get tag info
while (($tag, $span, $data) = extract_sgml_tag_and_span ($set_tag, $data)) {
die "\n\nFATAL INPUT ERROR: no tag attribute '$name' in file '$file'\n\n" unless
($id) = extract_sgml_tag_attribute ($name="SetID", $tag);
-
+
die "\n\nFATAL INPUT ERROR: no tag attribute '$name' in file '$file'\n\n" unless
($src) = extract_sgml_tag_attribute ($name="SrcLang", $tag);
die "\n\nFATAL INPUT ERROR: $name ('$src') in file '$file' inconsistent\n"
." with $name of source ('$src_lang')\n\n"
unless $src eq $src_lang;
-
+
die "\n\nFATAL INPUT ERROR: no tag attribute '$name' in file '$file'\n\n" unless
($tgt) = extract_sgml_tag_attribute ($name="TrgLang", $tag);
die "\n\nFATAL INPUT ERROR: $name ('$tgt') in file '$file' inconsistent\n"
." with $name of the evaluation ('$tgt_lang')\n\n"
unless (not defined $tgt_lang or $tgt eq $tgt_lang);
$tgt_lang = $tgt;
-
+
my $mtdata = $span;
while (($tag, $span, $mtdata) = extract_sgml_tag_and_span ("Doc", $mtdata)) {
die "\n\nFATAL INPUT ERROR: no tag attribute '$name' in file '$file'\n\n" unless
(my $sys) = extract_sgml_tag_attribute ($name="SysID", $tag);
-
+
die "\n\nFATAL INPUT ERROR: no tag attribute '$name' in file '$file'\n\n" unless
$doc = extract_sgml_tag_attribute ($name="DocID", $tag);
-
+
die "\n\nFATAL INPUT ERROR: document '$doc' for system '$sys' in file '$file'\n"
." previously loaded from file '$docs->{$sys}{$doc}{FILE}'\n\n"
unless (not defined $docs->{$sys}{$doc});
-
+
$span =~ s/[\s\n\r]+/ /g; # concatenate records
my $jseg=0, my $seg_data = $span;
while (($tag, $span, $seg_data) = extract_sgml_tag_and_span ("Seg", $seg_data)) {
@@ -311,14 +311,14 @@ sub get_MT_data {
}
return $id;
}
-
+
#################################
-
+
sub check_MT_data {
-
+
@tst_sys = sort keys %tst_data;
@ref_sys = sort keys %ref_data;
-
+
#every evaluation document must be represented for every system and every reference
foreach my $doc (sort keys %eval_docs) {
my $nseg_source = @{$eval_docs{$doc}{SEGS}};
@@ -331,7 +331,7 @@ sub check_MT_data {
." the source document contains $nseg_source segments.\n\n"
unless $nseg == $nseg_source;
}
-
+
foreach my $sys (@ref_sys) {
die "\n\nFATAL ERROR: no document '$doc' for reference '$sys'\n\n"
unless defined $ref_data{$sys}{$doc};
@@ -343,15 +343,15 @@ sub check_MT_data {
}
}
}
-
+
#################################
-
+
sub compute_ngram_info {
-
+
my ($ref, $doc, $seg);
my (@wrds, $tot_wrds, %ngrams, $ngram, $mgram);
my (%ngram_count, @tot_ngrams);
-
+
foreach $ref (keys %ref_data) {
foreach $doc (keys %{$ref_data{$ref}}) {
foreach $seg (@{$ref_data{$ref}{$doc}{SEGS}}) {
@@ -364,7 +364,7 @@ sub compute_ngram_info {
}
}
}
-
+
foreach $ngram (keys %ngram_count) {
@wrds = split / /, $ngram;
pop @wrds, $mgram = join " ", @wrds;
@@ -378,24 +378,24 @@ sub compute_ngram_info {
}
}
}
-
+
#################################
-
+
sub score_system {
-
+
my ($sys, $ref, $doc, %SCOREmt);
($sys, %SCOREmt) = @_;
my ($shortest_ref_length, $match_cnt, $tst_cnt, $ref_cnt, $tst_info, $ref_info);
my ($cum_ref_length, @cum_match, @cum_tst_cnt, @cum_ref_cnt, @cum_tst_info, @cum_ref_info);
-
+
$cum_ref_length = 0;
for (my $j=1; $j<=$max_Ngram; $j++) {
$cum_match[$j] = $cum_tst_cnt[$j] = $cum_ref_cnt[$j] = $cum_tst_info[$j] = $cum_ref_info[$j] = 0;
}
-
+
foreach $doc (sort keys %eval_docs) {
($shortest_ref_length, $match_cnt, $tst_cnt, $ref_cnt, $tst_info, $ref_info) = score_document ($sys, $doc);
-
+
#output document summary score
if (($detail >= 1 ) && ($METHOD eq "NIST")) {
my %DOCmt = ();
@@ -409,7 +409,7 @@ sub score_system {
bleu_score($shortest_ref_length, $match_cnt, $tst_cnt, $sys, %DOCmt),
scalar @{$tst_data{$sys}{$doc}{SEGS}}, $tst_cnt->[1];
}
-
+
$cum_ref_length += $shortest_ref_length;
for (my $j=1; $j<=$max_Ngram; $j++) {
$cum_match[$j] += $match_cnt->[$j];
@@ -422,7 +422,7 @@ sub score_system {
if (defined $opt_x and $opt_x eq "document info");
}
}
-
+
#x #output system summary score
#x printf "$method score = %.4f for system \"$sys\"\n",
#x $method eq "BLEU" ? bleu_score($cum_ref_length, \@cum_match, \@cum_tst_cnt) :
@@ -434,21 +434,21 @@ sub score_system {
nist_score (scalar @ref_sys, \@cum_match, \@cum_tst_cnt, \@cum_ref_cnt, \@cum_tst_info, \@cum_ref_info, $sys, %SCOREmt);
}
}
-
+
#################################
-
+
sub score_document {
-
+
my ($sys, $ref, $doc);
($sys, $doc) = @_;
my ($shortest_ref_length, $match_cnt, $tst_cnt, $ref_cnt, $tst_info, $ref_info);
my ($cum_ref_length, @cum_match, @cum_tst_cnt, @cum_ref_cnt, @cum_tst_info, @cum_ref_info);
-
+
$cum_ref_length = 0;
for (my $j=1; $j<=$max_Ngram; $j++) {
$cum_match[$j] = $cum_tst_cnt[$j] = $cum_ref_cnt[$j] = $cum_tst_info[$j] = $cum_ref_info[$j] = 0;
}
-
+
#score each segment
for (my $jseg=0; $jseg<@{$tst_data{$sys}{$doc}{SEGS}}; $jseg++) {
my @ref_segments = ();
@@ -461,7 +461,7 @@ sub score_document {
if $detail >= 3;
($shortest_ref_length, $match_cnt, $tst_cnt, $ref_cnt, $tst_info, $ref_info) =
score_segment ($tst_data{$sys}{$doc}{SEGS}[$jseg], @ref_segments);
-
+
#output segment summary score
#x printf "$method score = %.4f for system \"$sys\" on segment %d of document \"$doc\" (%d words)\n",
#x $method eq "BLEU" ? bleu_score($shortest_ref_length, $match_cnt, $tst_cnt) :
@@ -478,8 +478,8 @@ sub score_document {
printf " $method score using 5-grams = %.4f for system \"$sys\" on segment %d of document \"$doc\" (%d words)\n",
nist_score (scalar @ref_sys, $match_cnt, $tst_cnt, $ref_cnt, $tst_info, $ref_info, $sys, %DOCmt), $jseg+1, $tst_cnt->[1];
}
-
-
+
+
$cum_ref_length += $shortest_ref_length;
for (my $j=1; $j<=$max_Ngram; $j++) {
$cum_match[$j] += $match_cnt->[$j];
@@ -491,29 +491,29 @@ sub score_document {
}
return ($cum_ref_length, [@cum_match], [@cum_tst_cnt], [@cum_ref_cnt], [@cum_tst_info], [@cum_ref_info]);
}
-
+
#################################
-
+
sub score_segment {
-
+
my ($tst_seg, @ref_segs) = @_;
my (@tst_wrds, %tst_ngrams, @match_count, @tst_count, @tst_info);
my (@ref_wrds, $ref_seg, %ref_ngrams, %ref_ngrams_max, @ref_count, @ref_info);
my ($ngram);
my (@nwrds_ref);
my $shortest_ref_length;
-
+
for (my $j=1; $j<= $max_Ngram; $j++) {
$match_count[$j] = $tst_count[$j] = $ref_count[$j] = $tst_info[$j] = $ref_info[$j] = 0;
}
-
+
# get the ngram counts for the test segment
@tst_wrds = split /\s+/, $tst_seg;
%tst_ngrams = %{Words2Ngrams (@tst_wrds)};
for (my $j=1; $j<=$max_Ngram; $j++) { # compute ngram counts
$tst_count[$j] = $j<=@tst_wrds ? (@tst_wrds - $j + 1) : 0;
}
-
+
# get the ngram counts for the reference segments
foreach $ref_seg (@ref_segs) {
@ref_wrds = split /\s+/, $ref_seg;
@@ -531,7 +531,7 @@ sub score_segment {
$shortest_ref_length = scalar @ref_wrds # find the shortest reference segment
if (not defined $shortest_ref_length) or @ref_wrds < $shortest_ref_length;
}
-
+
# accumulate scoring stats for tst_seg ngrams that match ref_seg ngrams
foreach $ngram (keys %tst_ngrams) {
next unless defined $ref_ngrams_max{$ngram};
@@ -541,16 +541,16 @@ sub score_segment {
printf "%.2f info for each of $count %d-grams = '%s'\n", $ngram_info{$ngram}, scalar @wrds, $ngram
if $detail >= 3;
}
-
+
return ($shortest_ref_length, [@match_count], [@tst_count], [@ref_count], [@tst_info], [@ref_info]);
}
-
+
#################################
-
+
sub bleu_score {
-
+
my ($shortest_ref_length, $matching_ngrams, $tst_ngrams, $sys, %SCOREmt) = @_;
-
+
my $score = 0;
my $iscore = 0;
my $len_score = min (0, 1-$shortest_ref_length/$tst_ngrams->[1]);
@@ -570,33 +570,33 @@ sub bleu_score {
}
return $SCOREmt{4}{$sys}{cum};
}
-
+
#################################
-
+
sub nist_score {
-
+
my ($nsys, $matching_ngrams, $tst_ngrams, $ref_ngrams, $tst_info, $ref_info, $sys, %SCOREmt) = @_;
-
+
my $score = 0;
my $iscore = 0;
-
-
+
+
for (my $n=1; $n<=$max_Ngram; $n++) {
$score += $tst_info->[$n]/max($tst_ngrams->[$n],1);
$SCOREmt{$n}{$sys}{cum} = $score * nist_length_penalty($tst_ngrams->[1]/($ref_ngrams->[1]/$nsys));
-
+
$iscore = $tst_info->[$n]/max($tst_ngrams->[$n],1);
$SCOREmt{$n}{$sys}{ind} = $iscore * nist_length_penalty($tst_ngrams->[1]/($ref_ngrams->[1]/$nsys));
}
return $SCOREmt{5}{$sys}{cum};
}
-
+
#################################
-
+
sub Words2Ngrams { #convert a string of words to an Ngram count hash
-
+
my %count = ();
-
+
for (; @_; shift) {
my ($j, $ngram, $word);
for ($j=0; $j<$max_Ngram and defined($word=$_[$j]); $j++) {
@@ -608,7 +608,7 @@ sub Words2Ngrams { #convert a string of words to an Ngram count hash
}
#################################
-
+
sub NormalizeText {
my ($norm_text) = @_;
@@ -631,18 +631,18 @@ sub NormalizeText {
$norm_text =~ s/(\p{P})(\P{N})/ $1 $2/g;
$norm_text =~ s/(\p{S})/ $1 /g; # tokenize symbols
-
+
$norm_text =~ s/\p{Z}+/ /g; # one space only between words
$norm_text =~ s/^\p{Z}+//; # no leading space
$norm_text =~ s/\p{Z}+$//; # no trailing space
return $norm_text;
}
-
+
#################################
-
+
sub nist_length_penalty {
-
+
my ($ratio) = @_;
return 1 if $ratio >= 1;
return 0 if $ratio <= 0;
@@ -651,69 +651,69 @@ sub nist_length_penalty {
my $beta = -log($score_x)/log($ratio_x)/log($ratio_x);
return exp (-$beta*log($ratio)*log($ratio));
}
-
+
#################################
-
+
sub date_time_stamp {
-
+
my ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localtime();
my @months = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
my ($date, $time);
-
+
$time = sprintf "%2.2d:%2.2d:%2.2d", $hour, $min, $sec;
$date = sprintf "%4.4s %3.3s %s", 1900+$year, $months[$mon], $mday;
return ($date, $time);
}
-
+
#################################
-
+
sub extract_sgml_tag_and_span {
-
+
my ($name, $data) = @_;
-
+
($data =~ m|<$name\s*([^>]*)>(.*?)</$name\s*>(.*)|si) ? ($1, $2, $3) : ();
}
-
+
#################################
-
+
sub extract_sgml_tag_attribute {
-
+
my ($name, $data) = @_;
-
+
($data =~ m|$name\s*=\s*\"([^\"]*)\"|si) ? ($1) : ();
}
-
+
#################################
-
+
sub max {
-
+
my ($max, $next);
-
+
return unless defined ($max=pop);
while (defined ($next=pop)) {
$max = $next if $next > $max;
}
return $max;
}
-
+
#################################
-
+
sub min {
-
+
my ($min, $next);
-
+
return unless defined ($min=pop);
while (defined ($next=pop)) {
$min = $next if $next < $min;
}
return $min;
}
-
+
#################################
-
+
sub printout_report
{
-
+
if ( $METHOD eq "BOTH" ) {
foreach my $sys (sort @tst_sys) {
printf "NIST score = %2.4f BLEU score = %.4f for system \"$sys\"\n",$NISTmt{5}{$sys}{cum},$BLEUmt{4}{$sys}{cum};
@@ -727,13 +727,13 @@ sub printout_report
printf "\nBLEU score = %.4f for system \"$sys\"\n",$BLEUmt{4}{$sys}{cum};
}
}
-
-
+
+
printf "\n# ------------------------------------------------------------------------\n\n";
printf "Individual N-gram scoring\n";
printf " 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram 9-gram\n";
printf " ------ ------ ------ ------ ------ ------ ------ ------ ------\n";
-
+
if (( $METHOD eq "BOTH" ) || ($METHOD eq "NIST")) {
foreach my $sys (sort @tst_sys) {
printf " NIST:";
@@ -744,7 +744,7 @@ sub printout_report
}
printf "\n";
}
-
+
if (( $METHOD eq "BOTH" ) || ($METHOD eq "BLEU")) {
foreach my $sys (sort @tst_sys) {
printf " BLEU:";
@@ -754,12 +754,12 @@ sub printout_report
printf " \"$sys\"\n";
}
}
-
+
printf "\n# ------------------------------------------------------------------------\n";
printf "Cumulative N-gram scoring\n";
printf " 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram 9-gram\n";
printf " ------ ------ ------ ------ ------ ------ ------ ------ ------\n";
-
+
if (( $METHOD eq "BOTH" ) || ($METHOD eq "NIST")) {
foreach my $sys (sort @tst_sys) {
printf " NIST:";
@@ -770,8 +770,8 @@ sub printout_report
}
}
printf "\n";
-
-
+
+
if (( $METHOD eq "BOTH" ) || ($METHOD eq "BLEU")) {
foreach my $sys (sort @tst_sys) {
printf " BLEU:";
diff --git a/scripts/generic/mteval-v13a.pl b/scripts/generic/mteval-v13a.pl
index 453c03e19..41a88800a 100755
--- a/scripts/generic/mteval-v13a.pl
+++ b/scripts/generic/mteval-v13a.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -157,7 +157,7 @@ my $usage = "\n\nUsage: $0 -r <ref_file> -s <src_file> -t <tst_file>\n\n".
" BLEU-sys.scr and NIST-sys.scr : system-level scores\n" .
" --no-smoothing : disable smoothing on BLEU scores\n" .
"\n";
-
+
use vars qw ($opt_r $opt_s $opt_t $opt_d $opt_h $opt_b $opt_n $opt_c $opt_x $opt_e);
use Getopt::Long;
my $ref_file = '';
@@ -220,7 +220,7 @@ my $METHOD = "BOTH";
if ( $opt_b ) { $METHOD = "BLEU"; }
if ( $opt_n ) { $METHOD = "NIST"; }
my $method;
-
+
######
# Global variables
my ($src_lang, $tgt_lang, @tst_sys, @ref_sys); # evaluation parameters
@@ -265,7 +265,7 @@ foreach my $doc (sort keys %eval_docs)
print " src set \"$src_id\" (", scalar keys %eval_docs, " docs, $cum_seg segs)\n";
print " ref set \"$ref_id\" (", scalar keys %ref_data, " refs)\n";
print " tst set \"$tst_id\" (", scalar keys %tst_data, " systems)\n\n";
-
+
foreach my $sys (sort @tst_sys)
{
for (my $n=1; $n<=$max_Ngram; $n++)
@@ -642,9 +642,9 @@ sub score_document
{
printf "ref '$ref', seg $seg: %s\n", $ref_data{$ref}{$doc}{SEGS}{$seg}
}
-
+
}
-
+
printf "sys '$sys', seg $seg: %s\n", $tst_data{$sys}{$doc}{SEGS}{$seg} if ( $detail >= 3 );
($ref_length, $match_cnt, $tst_cnt, $ref_cnt, $tst_info, $ref_info) = score_segment ($tst_data{$sys}{$doc}{SEGS}{$seg}, @ref_segments);
@@ -654,7 +654,7 @@ sub score_document
my $segScore = &{$BLEU_SCORE}($ref_length, $match_cnt, $tst_cnt, $sys, %DOCmt);
$overallScore->{ $sys }{ 'documents' }{ $doc }{ 'segments' }{ $seg }{ 'score' } = $segScore;
if ( $detail >= 2 )
- {
+ {
printf " $method score using 4-grams = %.4f for system \"$sys\" on segment $seg of document \"$doc\" (%d words)\n", $segScore, $tst_cnt->[1]
}
}
@@ -664,7 +664,7 @@ sub score_document
my $segScore = nist_score (scalar @ref_sys, $match_cnt, $tst_cnt, $ref_cnt, $tst_info, $ref_info, $sys, %DOCmt);
$overallScore->{ $sys }{ 'documents' }{ $doc }{ 'segments' }{ $seg }{ 'score' } = $segScore;
if ( $detail >= 2 )
- {
+ {
printf " $method score using 5-grams = %.4f for system \"$sys\" on segment $seg of document \"$doc\" (%d words)\n", $segScore, $tst_cnt->[1];
}
}
diff --git a/scripts/generic/multi-bleu.perl b/scripts/generic/multi-bleu.perl
index 2f44d419f..344f58c6f 100755
--- a/scripts/generic/multi-bleu.perl
+++ b/scripts/generic/multi-bleu.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
use warnings;
@@ -73,7 +73,7 @@ while(<STDIN>) {
$REF_NGRAM_N{$ngram}++;
}
foreach my $ngram (keys %REF_NGRAM_N) {
- if (!defined($REF_NGRAM{$ngram}) ||
+ if (!defined($REF_NGRAM{$ngram}) ||
$REF_NGRAM{$ngram} < $REF_NGRAM_N{$ngram}) {
$REF_NGRAM{$ngram} = $REF_NGRAM_N{$ngram};
# print "$i: REF_NGRAM{$ngram} = $REF_NGRAM{$ngram}<BR>\n";
diff --git a/scripts/generic/ph_numbers.perl b/scripts/generic/ph_numbers.perl
index ea56927ac..612263249 100755
--- a/scripts/generic/ph_numbers.perl
+++ b/scripts/generic/ph_numbers.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
package ph_numbers;
@@ -60,8 +60,8 @@ sub mark_numbers {
}
$position = $numend;
}
- $output .= substr($input,$position);
- return $output;
+ $output .= substr($input,$position);
+ return $output;
}
sub recognize {
@@ -76,17 +76,17 @@ sub recognize {
$end = $+[2];
}
- # ALL characters in the word must be
+ # ALL characters in the word must be
my $isRecognized = 1;
if ($start == 0 || substr($input, $start - 1, 1) eq " ") {
- # 1st word, or previous char is a space
+ # 1st word, or previous char is a space
}
else {
$isRecognized = 0;
}
if ($end == length($input) -1 || substr($input, $end, 1) eq " ") {
- # last word, or next char is a space
+ # last word, or next char is a space
}
else {
$isRecognized = 0;
diff --git a/scripts/generic/qsub-wrapper.pl b/scripts/generic/qsub-wrapper.pl
index 622323bdb..ac3d0900a 100755
--- a/scripts/generic/qsub-wrapper.pl
+++ b/scripts/generic/qsub-wrapper.pl
@@ -1,16 +1,16 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
use warnings;
use strict;
#######################
-#Default parameters
+#Default parameters
#parameters for submiiting processes through SGE
#NOTE: group name is ws06ossmt (with 2 's') and not ws06osmt (with 1 's')
my $queueparameters="";
-# look for the correct pwdcmd
+# look for the correct pwdcmd
my $pwdcmd = getPwdCmd();
my $workingdir = `$pwdcmd`; chomp $workingdir;
@@ -45,7 +45,7 @@ sub init(){
'old-sge' => \$old_sge,
) or exit(1);
$parameters="@ARGV";
-
+
version() if $version;
usage() if $help;
print_parameters() if $dbg;
@@ -94,7 +94,7 @@ sub preparing_script(){
$scriptheader.="uname -a\n\n";
$scriptheader.="cd $workingdir\n\n";
-
+
open (OUT, "> $jobscript");
print OUT $scriptheader;
@@ -142,7 +142,7 @@ my $maysync = $old_sge ? "" : "-sync y";
# create the qsubcmd to submit to the queue with the parameter "-b yes"
my $qsubcmd="qsub $queueparameters $maysync -V -o $qsubout -e $qsuberr -N $qsubname -b yes $jobscript > $jobscript.log 2>&1";
-#run the qsubcmd
+#run the qsubcmd
safesystem($qsubcmd) or die;
#getting id of submitted job
@@ -172,7 +172,7 @@ if ($old_sge) {
# start the 'hold' job, i.e. the job that will wait
$cmd="qsub -cwd $queueparameters -hold_jid $id -o $checkpointfile -e /dev/null -N $qsubname.W $syncscript >& $qsubname.W.log";
safesystem($cmd) or die;
-
+
# and wait for checkpoint file to appear
my $nr=0;
while (!-e $checkpointfile) {
diff --git a/scripts/generic/reverse-alignment.perl b/scripts/generic/reverse-alignment.perl
index d00140c74..681b3221e 100755
--- a/scripts/generic/reverse-alignment.perl
+++ b/scripts/generic/reverse-alignment.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -8,7 +8,7 @@ while ($line = <STDIN>)
{
chomp($line);
my @toks = split(/ /, $line);
-
+
foreach (my $i = 0; $i < @toks; ++$i)
{
my $tok = $toks[$i];
diff --git a/scripts/generic/score-parallel.perl b/scripts/generic/score-parallel.perl
index 9e5ee0025..e911cd4a3 100755
--- a/scripts/generic/score-parallel.perl
+++ b/scripts/generic/score-parallel.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# example
# ./score-parallel.perl 8 "gsort --batch-size=253" ./score ./extract.2.sorted.gz ./lex.2.f2e ./phrase-table.2.half.f2e --GoodTuring ./phrase-table.2.coc 0
@@ -35,7 +35,7 @@ my $sortCmd = $ARGV[1];
my $scoreCmd = $ARGV[2];
my $extractFile = $ARGV[3]; # 1st arg of extract argument
-my $lexFile = $ARGV[4];
+my $lexFile = $ARGV[4];
my $ptHalf = $ARGV[5]; # output
my $inverse = 0;
my $sourceLabelsFile;
@@ -92,7 +92,7 @@ if ($numParallel <= 1)
}
print STDERR "$cmd \n";
systemCheck($cmd);
-
+
$fileCount = 1;
}
else
@@ -103,7 +103,7 @@ else
else {
open(IN, $extractFile) || die "can't open $extractFile";
}
-
+
my $lastlineContext;
if ($FlexibilityScore) {
$lastlineContext = "";
@@ -117,25 +117,25 @@ else
my $filePath = "$TMPDIR/extract.$fileCount.gz";
open (OUT, "| $GZIP_EXEC -c > $filePath") or die "error starting $GZIP_EXEC $!";
-
+
my $lineCount = 0;
my $line;
my $prevSourcePhrase = "";
- while ($line=<IN>)
+ while ($line=<IN>)
{
chomp($line);
++$lineCount;
-
+
if ($lineCount > $EXTRACT_SPLIT_LINES)
{ # over line limit. Cut off at next source phrase change
my $sourcePhrase = GetSourcePhrase($line);
-
+
if ($prevSourcePhrase eq "")
{ # start comparing
$prevSourcePhrase = $sourcePhrase;
}
elsif ($sourcePhrase eq $prevSourcePhrase)
- { # can't cut off yet. Do nothing
+ { # can't cut off yet. Do nothing
}
else
{ # cut off, open next min-extract file & write to that instead
@@ -155,9 +155,9 @@ else
else
{ # keep on writing to current mini-extract file
}
-
+
print OUT "$line\n";
-
+
}
close OUT;
if ($FlexibilityScore) {
@@ -287,7 +287,7 @@ if (-e $cocPath)
}
# merge source label files
-if (!$inverse && defined($sourceLabelsFile))
+if (!$inverse && defined($sourceLabelsFile))
{
my $cmd = "(echo \"GlueTop 0\"; echo \"GlueX 1\"; echo \"SSTART 2\"; echo \"SEND 3\"; cat $TMPDIR/phrase-table.half.*.gz.syntaxLabels.src | LC_ALL=C sort | uniq | perl -pe \"s/\$/ \@{[\$.+3]}/\") > $sourceLabelsFile";
print STDERR "Merging source label files: $cmd \n";
@@ -295,7 +295,7 @@ if (!$inverse && defined($sourceLabelsFile))
}
# merge parts-of-speech files
-if (!$inverse && defined($partsOfSpeechFile))
+if (!$inverse && defined($partsOfSpeechFile))
{
my $cmd = "(echo \"SSTART 0\"; echo \"SEND 1\"; cat $TMPDIR/phrase-table.half.*.gz.partsOfSpeech | LC_ALL=C sort | uniq | perl -pe \"s/\$/ \@{[\$.+1]}/\") > $partsOfSpeechFile";
print STDERR "Merging parts-of-speech files: $cmd \n";
@@ -317,7 +317,7 @@ sub RunFork($)
my $cmd = shift;
my $pid = fork();
-
+
if ($pid == 0)
{ # child
print STDERR $cmd;
@@ -388,7 +388,7 @@ sub CutContextFile($$$)
}
#write all lines in context file until we meet last source phrase in extract file
- while ($line=<IN_CONTEXT>)
+ while ($line=<IN_CONTEXT>)
{
chomp($line);
$sourcePhrase = GetSourcePhrase($line);
@@ -397,7 +397,7 @@ sub CutContextFile($$$)
}
#write all lines in context file that correspond to last source phrase in extract file
- while ($line=<IN_CONTEXT>)
+ while ($line=<IN_CONTEXT>)
{
chomp($line);
$sourcePhrase = GetSourcePhrase($line);
diff --git a/scripts/generic/strip-xml.perl b/scripts/generic/strip-xml.perl
index 95513b608..c993421f0 100755
--- a/scripts/generic/strip-xml.perl
+++ b/scripts/generic/strip-xml.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/generic/trainlm-irst2.perl b/scripts/generic/trainlm-irst2.perl
index 596143386..f664e96ee 100755
--- a/scripts/generic/trainlm-irst2.perl
+++ b/scripts/generic/trainlm-irst2.perl
@@ -1,14 +1,14 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# Compatible with sri LM-creating script, eg.
# ngram-count -order 5 -interpolate -wbdiscount -unk -text corpus.txt -lm lm.txt
# To use it in the EMS, add this to the [LM] section
# lm-training = "$moses-script-dir/generic/trainlm-irst2.perl -cores $cores -irst-dir $irst-dir"
# settings = ""
-# Also, make sure that $irst-dir is defined (in the [LM] or [GENERAL] section.
+# Also, make sure that $irst-dir is defined (in the [LM] or [GENERAL] section.
# It should point to the root of the LM toolkit, eg
# irst-dir = /Users/hieu/workspace/irstlm/trunk/bin
-# Set smoothing method in settings, if different from modified Kneser-Ney
+# Set smoothing method in settings, if different from modified Kneser-Ney
use warnings;
use strict;
@@ -19,7 +19,7 @@ my $order = 3; # order of language model (default trigram)
my $corpusPath; # input text data
my $lmPath; # generated language model
my $cores = 2; # number of CPUs used
-my $irstPath; # bin directory of IRSTLM
+my $irstPath; # bin directory of IRSTLM
my $tempPath = "tmp"; # temp dir
my $pruneSingletons = 1; # 1 = prune singletons, 0 = keep singletons
my $smoothing = "msb"; # smoothing method: wb = witten-bell, sb = kneser-ney, msb = modified-kneser-ney
diff --git a/scripts/other/blame-stat.sh b/scripts/other/blame-stat.sh
index 7ceddfc5d..1d5c0186c 100755
--- a/scripts/other/blame-stat.sh
+++ b/scripts/other/blame-stat.sh
@@ -1,4 +1,4 @@
git ls-files | xargs -n1 git blame --line-porcelain | sed -n 's/^author //p' | sort -f | uniq -ic | sort -nr
-
+
#git ls-files | grep -Ei "\.h$|\.cpp$|\.hh$|\.cc$" | xargs -n1 git blame --line-porcelain | sed -n 's/^author //p' | sort -f | uniq -ic | sort -nr
diff --git a/scripts/other/convert-pt.perl b/scripts/other/convert-pt.perl
index f530a447a..e087126f1 100755
--- a/scripts/other/convert-pt.perl
+++ b/scripts/other/convert-pt.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
# convert a phrase-table with alignment in Moses' dead-end format
diff --git a/scripts/other/delete-scores.perl b/scripts/other/delete-scores.perl
index 08316c95b..ffb788867 100755
--- a/scripts/other/delete-scores.perl
+++ b/scripts/other/delete-scores.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -21,10 +21,10 @@ my @keepScores = split(/,/, $keepScoresStr);
while (my $line = <STDIN>) {
chomp($line);
#print STDERR "line=$line\n";
-
+
my @toks = split(/\|/, $line);
my @scores = split(/ /, $toks[6]);
-
+
$toks[6] = DeleteScore($toks[6], \@keepScores);
# output
@@ -48,7 +48,7 @@ sub DeleteScore
{
my $string = $_[0];
my @keepScores = @{$_[1]};
-
+
$string = trim($string);
my @toks = split(/ /, $string);
@@ -57,7 +57,7 @@ sub DeleteScore
$string .= $toks[ $keepScores[$i] ] ." ";
}
$string = " " .$string;
-
+
return $string;
}
diff --git a/scripts/other/get_many_translations_from_google.perl b/scripts/other/get_many_translations_from_google.perl
index 512b84e36..0b1436c20 100755
--- a/scripts/other/get_many_translations_from_google.perl
+++ b/scripts/other/get_many_translations_from_google.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# Uses Google AJAX API to collect many translations, i.e. create a parallel
# corpus of Google translations.
@@ -102,7 +102,7 @@ sub collect_translations {
# infinite loop, until everything translated
my $gotlines = wcl($outfile);
print STDERR "$outfile contains $gotlines lines already, extending.\n";
-
+
my $nr = 0;
my @inlines = ();
my $droplast = 0;
@@ -146,9 +146,9 @@ sub collect_translations {
}
}
}
-
+
my $outlines;
-
+
if (0 == scalar @inlines) {
# special case: skipping too long sentences
$outlines = [""];
@@ -156,7 +156,7 @@ sub collect_translations {
$outlines = translate_batch(\@inlines);
last if !defined $outlines;
}
-
+
*OUTF = my_append($outfile);
foreach my $outline (@$outlines) {
print OUTF $outline."\n";
diff --git a/scripts/other/retain-lines.perl b/scripts/other/retain-lines.perl
index b865e1af7..f04a8ebad 100755
--- a/scripts/other/retain-lines.perl
+++ b/scripts/other/retain-lines.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
#retain lines in clean.lines-retained.1
use strict;
diff --git a/scripts/other/translate_by_microsoft_bing.perl b/scripts/other/translate_by_microsoft_bing.perl
index ad7a9c3b7..c9b1b31de 100755
--- a/scripts/other/translate_by_microsoft_bing.perl
+++ b/scripts/other/translate_by_microsoft_bing.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# Script implemented by Pranava Swaroop Madhyastha (a student at Charles
# University, UFAL)
diff --git a/scripts/recaser/detruecase.perl b/scripts/recaser/detruecase.perl
index 549cd8abe..b882852a0 100755
--- a/scripts/recaser/detruecase.perl
+++ b/scripts/recaser/detruecase.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/recaser/recase.perl b/scripts/recaser/recase.perl
index 3ba83712a..52cec36ea 100755
--- a/scripts/recaser/recase.perl
+++ b/scripts/recaser/recase.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
use warnings;
diff --git a/scripts/recaser/train-recaser.perl b/scripts/recaser/train-recaser.perl
index 87a720f6e..dce388bca 100755
--- a/scripts/recaser/train-recaser.perl
+++ b/scripts/recaser/train-recaser.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
use warnings;
@@ -40,7 +40,7 @@ $ERROR = "training Aborted."
# check and set default to unset parameters
$ERROR = "please specify working dir --dir" unless defined($DIR) || defined($HELP);
-$ERROR = "please specify --corpus" if !defined($CORPUS) && !defined($HELP)
+$ERROR = "please specify --corpus" if !defined($CORPUS) && !defined($HELP)
&& $FIRST_STEP <= 2 && $LAST_STEP >= 1;
if ($HELP || $ERROR) {
@@ -70,7 +70,7 @@ if ($HELP || $ERROR) {
(1) Truecasing;
(2) Language Model Training;
(3) Data Preparation
- (4-10) Recaser Model Training;
+ (4-10) Recaser Model Training;
(11) Cleanup.
--first-step=[1-11] ... step where script starts (default: 1).
--last-step=[1-11] ... step where script ends (default: 11).
@@ -189,7 +189,7 @@ sub train_recase_model {
}
else {
$cmd .= " --score-options='--OnlyDirect'";
- }
+ }
if (uc $LM eq "IRSTLM") {
$cmd .= " --lm 0:3:$DIR/cased.irstlm.gz:1";
}
diff --git a/scripts/recaser/train-truecaser.perl b/scripts/recaser/train-truecaser.perl
index b653a8ca5..753183324 100755
--- a/scripts/recaser/train-truecaser.perl
+++ b/scripts/recaser/train-truecaser.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id: train-recaser.perl 1326 2007-03-26 05:44:27Z bojar $
diff --git a/scripts/recaser/truecase.perl b/scripts/recaser/truecase.perl
index 373aa509f..544b79c47 100755
--- a/scripts/recaser/truecase.perl
+++ b/scripts/recaser/truecase.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id: train-recaser.perl 1326 2007-03-26 05:44:27Z bojar $
diff --git a/scripts/regression-testing/MosesScriptsRegressionTesting.pm b/scripts/regression-testing/MosesScriptsRegressionTesting.pm
index 6f78961e5..d8b0590c8 100644
--- a/scripts/regression-testing/MosesScriptsRegressionTesting.pm
+++ b/scripts/regression-testing/MosesScriptsRegressionTesting.pm
@@ -26,8 +26,8 @@ sub find_data_directory
print STDERR<<EOT;
You do not appear to have the regression testing data installed.
-You may either specify a non-standard location (absolute path)
-when running the test suite with the --data-dir option,
+You may either specify a non-standard location (absolute path)
+when running the test suite with the --data-dir option,
or, you may install it in any one of the following
standard locations: $test_script_root, /tmp, or /var/tmp with these
commands:
diff --git a/scripts/regression-testing/compare-results.pl b/scripts/regression-testing/compare-results.pl
index df14d444f..572431951 100755
--- a/scripts/regression-testing/compare-results.pl
+++ b/scripts/regression-testing/compare-results.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/regression-testing/create_localized_moses_ini.pl b/scripts/regression-testing/create_localized_moses_ini.pl
index 612a39e82..1d03e5ab8 100755
--- a/scripts/regression-testing/create_localized_moses_ini.pl
+++ b/scripts/regression-testing/create_localized_moses_ini.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/regression-testing/modify-pars.pl b/scripts/regression-testing/modify-pars.pl
index 5ad2514a4..de2df2919 100755
--- a/scripts/regression-testing/modify-pars.pl
+++ b/scripts/regression-testing/modify-pars.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/regression-testing/moses-virtual.pl b/scripts/regression-testing/moses-virtual.pl
index 41ddd6b13..3af3c79e4 100755
--- a/scripts/regression-testing/moses-virtual.pl
+++ b/scripts/regression-testing/moses-virtual.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -52,13 +52,13 @@ sub init(){
}
sub VersionMessage(){
- print STDERR "moses-virtual version 1.0\n";
+ print STDERR "moses-virtual version 1.0\n";
exit;
}
sub HelpMessage(){
- print STDERR "moses-virtual simulates the standard behavior of Moses\n";
- print STDERR "USAGE: moses-virtual\n";
+ print STDERR "moses-virtual simulates the standard behavior of Moses\n";
+ print STDERR "USAGE: moses-virtual\n";
print_parameters(1);
exit;
}
diff --git a/scripts/regression-testing/run-single-test.pl b/scripts/regression-testing/run-single-test.pl
index bb66e96f6..e8307da36 100755
--- a/scripts/regression-testing/run-single-test.pl
+++ b/scripts/regression-testing/run-single-test.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/regression-testing/run-test-suite.pl b/scripts/regression-testing/run-test-suite.pl
index 8ae9ec60f..b384f8b98 100755
--- a/scripts/regression-testing/run-test-suite.pl
+++ b/scripts/regression-testing/run-test-suite.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/tokenizer/deescape-special-chars-PTB.perl b/scripts/tokenizer/deescape-special-chars-PTB.perl
index 0e73a7718..f9601924f 100755
--- a/scripts/tokenizer/deescape-special-chars-PTB.perl
+++ b/scripts/tokenizer/deescape-special-chars-PTB.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/tokenizer/deescape-special-chars.perl b/scripts/tokenizer/deescape-special-chars.perl
index 076d1e62f..002955e62 100755
--- a/scripts/tokenizer/deescape-special-chars.perl
+++ b/scripts/tokenizer/deescape-special-chars.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/tokenizer/detokenizer.perl b/scripts/tokenizer/detokenizer.perl
index 7874d5d04..3a92bd024 100755
--- a/scripts/tokenizer/detokenizer.perl
+++ b/scripts/tokenizer/detokenizer.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id: detokenizer.perl 4134 2011-08-08 15:30:54Z bgottesman $
# Sample De-Tokenizer
@@ -309,7 +309,7 @@ sub charIsCJK {
my ($char) = @_;
# $char should be a string of length 1
my $codepoint = &codepoint_dec($char);
-
+
# The following is based on http://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane
# Hangul Jamo (1100–11FF)
diff --git a/scripts/tokenizer/escape-special-chars.perl b/scripts/tokenizer/escape-special-chars.perl
index e94b91744..fbbbae292 100755
--- a/scripts/tokenizer/escape-special-chars.perl
+++ b/scripts/tokenizer/escape-special-chars.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -21,7 +21,7 @@ while(<STDIN>) {
s/\"/\&quot;/g; # xml
s/\[/\&#91;/g; # syntax non-terminal
s/\]/\&#93;/g; # syntax non-terminal
-
+
# restore xml instructions
s/\&lt;(\S+) translation=&quot;(.+?)&quot;&gt; (.+?) &lt;\/(\S+)&gt;/\<$1 translation=\"$2\"> $3 <\/$4>/g;
print $_."\n";
diff --git a/scripts/tokenizer/lowercase.perl b/scripts/tokenizer/lowercase.perl
index 9ee307bc2..e5c41bbed 100755
--- a/scripts/tokenizer/lowercase.perl
+++ b/scripts/tokenizer/lowercase.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/tokenizer/normalize-punctuation.perl b/scripts/tokenizer/normalize-punctuation.perl
index db8f9c60e..13e9fd3fc 100755
--- a/scripts/tokenizer/normalize-punctuation.perl
+++ b/scripts/tokenizer/normalize-punctuation.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/tokenizer/pre-tokenizer.perl b/scripts/tokenizer/pre-tokenizer.perl
index 499671b44..514d8da8d 100755
--- a/scripts/tokenizer/pre-tokenizer.perl
+++ b/scripts/tokenizer/pre-tokenizer.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# script for preprocessing language data prior to tokenization
# Start by Ulrich Germann, after noticing systematic preprocessing errors
diff --git a/scripts/tokenizer/remove-non-printing-char.perl b/scripts/tokenizer/remove-non-printing-char.perl
index 2b90dfd3b..9125b7691 100755
--- a/scripts/tokenizer/remove-non-printing-char.perl
+++ b/scripts/tokenizer/remove-non-printing-char.perl
@@ -1,7 +1,7 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
-use utf8;
+use utf8;
binmode(STDIN, ":utf8");
binmode(STDOUT, ":utf8");
@@ -11,8 +11,8 @@ while (my $line = <STDIN>) {
chomp($line);
#$line =~ tr/\040-\176/ /c;
#$line =~ s/[^[:print:]]/ /g;
- #$line =~ s/\s+/ /g;
- $line =~ s/\p{C}/ /g;
+ #$line =~ s/\s+/ /g;
+ $line =~ s/\p{C}/ /g;
print "$line\n";
}
diff --git a/scripts/tokenizer/replace-unicode-punctuation.perl b/scripts/tokenizer/replace-unicode-punctuation.perl
index 08eb766bf..cda69ddf7 100755
--- a/scripts/tokenizer/replace-unicode-punctuation.perl
+++ b/scripts/tokenizer/replace-unicode-punctuation.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/tokenizer/tokenizer.perl b/scripts/tokenizer/tokenizer.perl
index 8abffbea4..a5d4fadd3 100755
--- a/scripts/tokenizer/tokenizer.perl
+++ b/scripts/tokenizer/tokenizer.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
@@ -41,7 +41,7 @@ my $NUM_THREADS = 1;
my $NUM_SENTENCES_PER_THREAD = 2000;
my $PENN = 0;
my $NO_ESCAPING = 0;
-while (@ARGV)
+while (@ARGV)
{
$_ = shift;
/^-b$/ && ($| = 1, next);
@@ -67,7 +67,7 @@ if ($TIMING)
}
# print help message
-if ($HELP)
+if ($HELP)
{
print "Usage ./tokenizer.perl (-l [en|de|...]) (-threads 4) < textfile > tokenizedfile\n";
print "Options:\n";
@@ -81,7 +81,7 @@ if ($HELP)
exit;
}
-if (!$QUIET)
+if (!$QUIET)
{
print STDERR "Tokenizer Version 1.1\n";
print STDERR "Language: $language\n";
@@ -112,7 +112,7 @@ my $count_sentences = 0;
if ($NUM_THREADS > 1)
{# multi-threading tokenization
- while(<STDIN>)
+ while(<STDIN>)
{
$count_sentences = $count_sentences + 1;
push(@batch_sentences, $_);
@@ -172,14 +172,14 @@ if ($NUM_THREADS > 1)
}
else
{# single thread only
- while(<STDIN>)
+ while(<STDIN>)
{
- if (($SKIP_XML && /^<.+>$/) || /^\s*$/)
+ if (($SKIP_XML && /^<.+>$/) || /^\s*$/)
{
#don't try to tokenize XML/HTML tag lines
print $_;
}
- else
+ else
{
print &tokenize($_);
}
@@ -205,7 +205,7 @@ sub tokenize_batch
my(@tokenized_list) = ();
foreach (@text_list)
{
- if (($SKIP_XML && /^<.+>$/) || /^\s*$/)
+ if (($SKIP_XML && /^<.+>$/) || /^\s*$/)
{
#don't try to tokenize XML/HTML tag lines
push(@tokenized_list, $_);
@@ -221,7 +221,7 @@ sub tokenize_batch
# the actual tokenize function which tokenizes one input string
# input: one string
# return: the tokenized string for the input string
-sub tokenize
+sub tokenize
{
my($text) = @_;
@@ -231,7 +231,7 @@ sub tokenize
chomp($text);
$text = " $text ";
-
+
# remove ASCII junk
$text =~ s/\s+/ /g;
$text =~ s/[\000-\037]//g;
@@ -258,14 +258,14 @@ sub tokenize
$text =~ s/([^\p{IsAlnum}\s\.\'\`\,\-])/ $1 /g;
# aggressive hyphen splitting
- if ($AGGRESSIVE)
+ if ($AGGRESSIVE)
{
$text =~ s/([\p{IsAlnum}])\-(?=[\p{IsAlnum}])/$1 \@-\@ /g;
}
#multi-dots stay together
$text =~ s/\.([\.]+)/ DOTMULTI$1/g;
- while($text =~ /DOTMULTI\./)
+ while($text =~ /DOTMULTI\./)
{
$text =~ s/DOTMULTI\.([^\.])/DOTDOTMULTI $1/g;
$text =~ s/DOTMULTI\./DOTDOTMULTI/g;
@@ -285,14 +285,14 @@ sub tokenize
# separate , pre and post number
#$text =~ s/([\p{IsN}])[,]([^\p{IsN}])/$1 , $2/g;
#$text =~ s/([^\p{IsN}])[,]([\p{IsN}])/$1 , $2/g;
-
+
# turn `into '
#$text =~ s/\`/\'/g;
-
+
#turn '' into "
#$text =~ s/\'\'/ \" /g;
- if ($language eq "en")
+ if ($language eq "en")
{
#split contractions right
$text =~ s/([^\p{IsAlpha}])[']([^\p{IsAlpha}])/$1 ' $2/g;
@@ -301,44 +301,44 @@ sub tokenize
$text =~ s/([\p{IsAlpha}])[']([\p{IsAlpha}])/$1 '$2/g;
#special case for "1990's"
$text =~ s/([\p{IsN}])[']([s])/$1 '$2/g;
- }
- elsif (($language eq "fr") or ($language eq "it"))
+ }
+ elsif (($language eq "fr") or ($language eq "it"))
{
- #split contractions left
+ #split contractions left
$text =~ s/([^\p{IsAlpha}])[']([^\p{IsAlpha}])/$1 ' $2/g;
$text =~ s/([^\p{IsAlpha}])[']([\p{IsAlpha}])/$1 ' $2/g;
$text =~ s/([\p{IsAlpha}])[']([^\p{IsAlpha}])/$1 ' $2/g;
$text =~ s/([\p{IsAlpha}])[']([\p{IsAlpha}])/$1' $2/g;
- }
- else
+ }
+ else
{
$text =~ s/\'/ \' /g;
}
-
+
#word token method
my @words = split(/\s/,$text);
$text = "";
- for (my $i=0;$i<(scalar(@words));$i++)
+ for (my $i=0;$i<(scalar(@words));$i++)
{
my $word = $words[$i];
- if ( $word =~ /^(\S+)\.$/)
+ if ( $word =~ /^(\S+)\.$/)
{
my $pre = $1;
- if (($pre =~ /\./ && $pre =~ /\p{IsAlpha}/) || ($NONBREAKING_PREFIX{$pre} && $NONBREAKING_PREFIX{$pre}==1) || ($i<scalar(@words)-1 && ($words[$i+1] =~ /^[\p{IsLower}]/)))
+ if (($pre =~ /\./ && $pre =~ /\p{IsAlpha}/) || ($NONBREAKING_PREFIX{$pre} && $NONBREAKING_PREFIX{$pre}==1) || ($i<scalar(@words)-1 && ($words[$i+1] =~ /^[\p{IsLower}]/)))
{
#no change
- }
- elsif (($NONBREAKING_PREFIX{$pre} && $NONBREAKING_PREFIX{$pre}==2) && ($i<scalar(@words)-1 && ($words[$i+1] =~ /^[0-9]+/)))
+ }
+ elsif (($NONBREAKING_PREFIX{$pre} && $NONBREAKING_PREFIX{$pre}==2) && ($i<scalar(@words)-1 && ($words[$i+1] =~ /^[0-9]+/)))
{
#no change
- }
- else
+ }
+ else
{
$word = $pre." .";
}
}
$text .= $word." ";
- }
+ }
# clean up extraneous spaces
$text =~ s/ +/ /g;
@@ -352,7 +352,7 @@ sub tokenize
}
#restore multi-dots
- while($text =~ /DOTDOTMULTI/)
+ while($text =~ /DOTDOTMULTI/)
{
$text =~ s/DOTDOTMULTI/DOTMULTI./g;
}
@@ -516,34 +516,34 @@ $text =~ s=([;:@#\$%&\p{IsSc}\p{IsSo}])= $1 =g;
return $text;
}
-sub load_prefixes
+sub load_prefixes
{
my ($language, $PREFIX_REF) = @_;
-
+
my $prefixfile = "$mydir/nonbreaking_prefix.$language";
-
+
#default back to English if we don't have a language-specific prefix file
- if (!(-e $prefixfile))
+ if (!(-e $prefixfile))
{
$prefixfile = "$mydir/nonbreaking_prefix.en";
print STDERR "WARNING: No known abbreviations for language '$language', attempting fall-back to English version...\n";
die ("ERROR: No abbreviations files found in $mydir\n") unless (-e $prefixfile);
}
-
- if (-e "$prefixfile")
+
+ if (-e "$prefixfile")
{
open(PREFIX, "<:utf8", "$prefixfile");
- while (<PREFIX>)
+ while (<PREFIX>)
{
my $item = $_;
chomp($item);
- if (($item) && (substr($item,0,1) ne "#"))
+ if (($item) && (substr($item,0,1) ne "#"))
{
- if ($item =~ /(.*)[\s]+(\#NUMERIC_ONLY\#)/)
+ if ($item =~ /(.*)[\s]+(\#NUMERIC_ONLY\#)/)
{
$PREFIX_REF->{$1} = 2;
- }
- else
+ }
+ else
{
$PREFIX_REF->{$item} = 1;
}
@@ -552,4 +552,3 @@ sub load_prefixes
close(PREFIX);
}
}
-
diff --git a/scripts/tokenizer/tokenizer_PTB.perl b/scripts/tokenizer/tokenizer_PTB.perl
index bce7a38a0..6fff8d7f7 100755
--- a/scripts/tokenizer/tokenizer_PTB.perl
+++ b/scripts/tokenizer/tokenizer_PTB.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# Sample Tokenizer
### Version 1.1
@@ -32,7 +32,7 @@ my $TIMING = 0;
my $NUM_THREADS = 1;
my $NUM_SENTENCES_PER_THREAD = 2000;
-while (@ARGV)
+while (@ARGV)
{
$_ = shift;
/^-b$/ && ($| = 1, next);
@@ -54,7 +54,7 @@ if ($TIMING)
}
# print help message
-if ($HELP)
+if ($HELP)
{
print "Usage ./tokenizer.perl (-l [en|de|...]) (-threads 4) < textfile > tokenizedfile\n";
print "Options:\n";
@@ -65,7 +65,7 @@ if ($HELP)
exit;
}
-if (!$QUIET)
+if (!$QUIET)
{
print STDERR "Tokenizer Version 1.1\n";
print STDERR "Language: $language\n";
@@ -86,7 +86,7 @@ my $count_sentences = 0;
if ($NUM_THREADS > 1)
{# multi-threading tokenization
- while(<STDIN>)
+ while(<STDIN>)
{
$count_sentences = $count_sentences + 1;
push(@batch_sentences, $_);
@@ -146,14 +146,14 @@ if ($NUM_THREADS > 1)
}
else
{# single thread only
- while(<STDIN>)
+ while(<STDIN>)
{
- if (($SKIP_XML && /^<.+>$/) || /^\s*$/)
+ if (($SKIP_XML && /^<.+>$/) || /^\s*$/)
{
#don't try to tokenize XML/HTML tag lines
print $_;
}
- else
+ else
{
print &tokenize($_);
}
@@ -179,7 +179,7 @@ sub tokenize_batch
my(@tokenized_list) = ();
foreach (@text_list)
{
- if (($SKIP_XML && /^<.+>$/) || /^\s*$/)
+ if (($SKIP_XML && /^<.+>$/) || /^\s*$/)
{
#don't try to tokenize XML/HTML tag lines
push(@tokenized_list, $_);
@@ -195,13 +195,13 @@ sub tokenize_batch
# the actual tokenize function which tokenizes one input string
# input: one string
# return: the tokenized string for the input string
-sub tokenize
+sub tokenize
{
my($text) = @_;
#clean some stuff so you don't get &amp; -> &amp;amp;
#news-commentary stuff
-
+
$text =~ s/\&#45;/ /g;
$text =~ s/\&45;/ /g;
$text =~ s/\&#160;/ /g;
@@ -221,7 +221,7 @@ sub tokenize
chomp($text);
$text = " $text ";
-
+
# remove ASCII junk
$text =~ s/\s+/ /g;
$text =~ s/[\000-\037]//g;
@@ -230,14 +230,14 @@ sub tokenize
$text =~ s/([^\p{IsAlnum}\s\.\'\`\,\-])/ $1 /g;
# aggressive hyphen splitting
- if ($AGGRESSIVE)
+ if ($AGGRESSIVE)
{
$text =~ s/([\p{IsAlnum}])\-([\p{IsAlnum}])/$1 \@-\@ $2/g;
}
#multi-dots stay together
$text =~ s/\.([\.]+)/ DOTMULTI$1/g;
- while($text =~ /DOTMULTI\./)
+ while($text =~ /DOTMULTI\./)
{
$text =~ s/DOTMULTI\.([^\.])/DOTDOTMULTI $1/g;
$text =~ s/DOTMULTI\./DOTDOTMULTI/g;
@@ -248,14 +248,14 @@ sub tokenize
# separate , pre and post number
$text =~ s/([\p{IsN}])[,]([^\p{IsN}])/$1 , $2/g;
$text =~ s/([^\p{IsN}])[,]([\p{IsN}])/$1 , $2/g;
-
+
# turn `into '
#$text =~ s/\`/\'/g;
-
+
#turn '' into "
#$text =~ s/\'\'/ \" /g;
- if ($language eq "en")
+ if ($language eq "en")
{
#split contractions right
# $text =~ s/ [']([\p{IsAlpha}])/ '$1/g; #MARIA: is pretokenized for parsing vb'll -> vb 'll
@@ -282,8 +282,8 @@ sub tokenize
$text =~ s/([\p{IsN}])[']s/$1 's/g;
$text =~ s/([\p{IsN}]) [']s/$1 's/g;
$text =~ s/([\p{IsN}]) ['] s/$1 's/g;
-
-
+
+
#other english contractions -> from PTB tokenizer.sed
$text =~ s/([Cc])annot/$1an not/g;
@@ -293,45 +293,45 @@ sub tokenize
$text =~ s/([Ll])emme/$1em me/g;
$text =~ s/([Ww])anna/$1an na/g;
$text =~ s/([Dd]) 'ye/$1' ye/g;
-
- }
- elsif (($language eq "fr") or ($language eq "it"))
+
+ }
+ elsif (($language eq "fr") or ($language eq "it"))
{
- #split contractions left
+ #split contractions left
$text =~ s/([^\p{IsAlpha}])[']([^\p{IsAlpha}])/$1 ' $2/g;
$text =~ s/([^\p{IsAlpha}])[']([\p{IsAlpha}])/$1 ' $2/g;
$text =~ s/([\p{IsAlpha}])[']([^\p{IsAlpha}])/$1 ' $2/g;
$text =~ s/([\p{IsAlpha}])[']([\p{IsAlpha}])/$1' $2/g;
- }
- else
+ }
+ else
{
$text =~ s/\'/ \' /g;
}
-
+
#word token method
my @words = split(/\s/,$text);
$text = "";
- for (my $i=0;$i<(scalar(@words));$i++)
+ for (my $i=0;$i<(scalar(@words));$i++)
{
my $word = $words[$i];
- if ( $word =~ /^(\S+)\.$/)
+ if ( $word =~ /^(\S+)\.$/)
{
my $pre = $1;
- if (($pre =~ /\./ && $pre =~ /\p{IsAlpha}/) || ($NONBREAKING_PREFIX{$pre} && $NONBREAKING_PREFIX{$pre}==1) || ($i<scalar(@words)-1 && ($words[$i+1] =~ /^[\p{IsLower}]/)))
+ if (($pre =~ /\./ && $pre =~ /\p{IsAlpha}/) || ($NONBREAKING_PREFIX{$pre} && $NONBREAKING_PREFIX{$pre}==1) || ($i<scalar(@words)-1 && ($words[$i+1] =~ /^[\p{IsLower}]/)))
{
#no change
- }
- elsif (($NONBREAKING_PREFIX{$pre} && $NONBREAKING_PREFIX{$pre}==2) && ($i<scalar(@words)-1 && ($words[$i+1] =~ /^[0-9]+/)))
+ }
+ elsif (($NONBREAKING_PREFIX{$pre} && $NONBREAKING_PREFIX{$pre}==2) && ($i<scalar(@words)-1 && ($words[$i+1] =~ /^[0-9]+/)))
{
#no change
- }
- else
+ }
+ else
{
$word = $pre." .";
}
}
$text .= $word." ";
- }
+ }
# clean up extraneous spaces
$text =~ s/ +/ /g;
@@ -339,7 +339,7 @@ sub tokenize
$text =~ s/ $//g;
#restore multi-dots
- while($text =~ /DOTDOTMULTI/)
+ while($text =~ /DOTDOTMULTI/)
{
$text =~ s/DOTDOTMULTI/DOTMULTI./g;
}
@@ -361,34 +361,34 @@ sub tokenize
return $text;
}
-sub load_prefixes
+sub load_prefixes
{
my ($language, $PREFIX_REF) = @_;
-
+
my $prefixfile = "$mydir/nonbreaking_prefix.$language";
-
+
#default back to English if we don't have a language-specific prefix file
- if (!(-e $prefixfile))
+ if (!(-e $prefixfile))
{
$prefixfile = "$mydir/nonbreaking_prefix.en";
print STDERR "WARNING: No known abbreviations for language '$language', attempting fall-back to English version...\n";
die ("ERROR: No abbreviations files found in $mydir\n") unless (-e $prefixfile);
}
-
- if (-e "$prefixfile")
+
+ if (-e "$prefixfile")
{
open(PREFIX, "<:utf8", "$prefixfile");
- while (<PREFIX>)
+ while (<PREFIX>)
{
my $item = $_;
chomp($item);
- if (($item) && (substr($item,0,1) ne "#"))
+ if (($item) && (substr($item,0,1) ne "#"))
{
- if ($item =~ /(.*)[\s]+(\#NUMERIC_ONLY\#)/)
+ if ($item =~ /(.*)[\s]+(\#NUMERIC_ONLY\#)/)
{
$PREFIX_REF->{$1} = 2;
- }
- else
+ }
+ else
{
$PREFIX_REF->{$item} = 1;
}
diff --git a/scripts/training/LexicalTranslationModel.pm b/scripts/training/LexicalTranslationModel.pm
index 08d161cc1..c5dad60fb 100644
--- a/scripts/training/LexicalTranslationModel.pm
+++ b/scripts/training/LexicalTranslationModel.pm
@@ -25,16 +25,16 @@ sub open_compressed {
# add extensions, if necessary
$file = $file.".bz2" if ! -e $file && -e $file.".bz2";
$file = $file.".gz" if ! -e $file && -e $file.".gz";
-
+
# pipe zipped, if necessary
return "$BZCAT $file|" if $file =~ /\.bz2$/;
- return "$ZCAT $file|" if $file =~ /\.gz$/;
+ return "$ZCAT $file|" if $file =~ /\.gz$/;
return $file;
}
sub fix_spaces {
my ($in) = @_;
- $$in =~ s/[ \t]+/ /g; $$in =~ s/[ \t]$//; $$in =~ s/^[ \t]//;
+ $$in =~ s/[ \t]+/ /g; $$in =~ s/[ \t]$//; $$in =~ s/^[ \t]//;
}
sub get_lexical {
@@ -112,7 +112,7 @@ sub get_lexical_counts {
# local counts
$FOREIGN_ALIGNED{$fi}+=$iw;
$ENGLISH_ALIGNED{$ei}+=$iw;
-
+
# global counts
$$WORD_TRANSLATION{$FOREIGN[$fi]}{$ENGLISH[$ei]}+=$iw;
$$TOTAL_FOREIGN{$FOREIGN[$fi]}+=$iw;
diff --git a/scripts/training/absolutize_moses_model.pl b/scripts/training/absolutize_moses_model.pl
index 5c9c0970a..bb7085895 100755
--- a/scripts/training/absolutize_moses_model.pl
+++ b/scripts/training/absolutize_moses_model.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
# given a moses.ini file, prints a copy to stdout but replaces all relative
@@ -48,14 +48,14 @@ while (<$inih>) {
$abs = ensure_absolute($fn, $ini);
die "File not found or empty: $fn (searched for $abs.minphr)"
if ! -s $abs.".minphr"; # accept compact binarized ttables
- $_ = "$type $b $c $d $abs\n";
+ $_ = "$type $b $c $d $abs\n";
}
else {
$abs = ensure_absolute($fn, $ini);
die "File not found or empty: $fn (searched for $abs or $abs.binphr.idx)"
if ! -s $abs && ! -s $abs.".binphr.idx"; # accept binarized ttables
$_ = "$type $b $c $d $abs\n";
- }
+ }
}
if ($section eq "generation-file" || $section eq "lmodel-file") {
chomp;
diff --git a/scripts/training/analyse_moses_model.pl b/scripts/training/analyse_moses_model.pl
index 7a3b27e65..656f4a59b 100755
--- a/scripts/training/analyse_moses_model.pl
+++ b/scripts/training/analyse_moses_model.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
# given a moses.ini file, checks the translation and generation tables and reports
diff --git a/scripts/training/binarize-model.perl b/scripts/training/binarize-model.perl
index 3d4798ffd..0239f5fc8 100755
--- a/scripts/training/binarize-model.perl
+++ b/scripts/training/binarize-model.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
#
# Binarize a Moses model
@@ -37,7 +37,7 @@ my $hierarchical = "";
$hierarchical = "-Hierarchical" if $opt_hierarchical;
my $targetdir = "$output_config.tables";
-safesystem("$RealBin/filter-model-given-input.pl $targetdir $input_config /dev/null $hierarchical -nofilter -Binarizer $binarizer") || die "binarising failed";
+safesystem("$RealBin/filter-model-given-input.pl $targetdir $input_config /dev/null $hierarchical -nofilter -Binarizer $binarizer") || die "binarising failed";
safesystem("rm -f $output_config; ln -s $targetdir/moses.ini $output_config") || die "failed to link new ini file";
#FIXME: Why isn't this in a module?
diff --git a/scripts/training/build-generation-table.perl b/scripts/training/build-generation-table.perl
index fb59f4acc..435f7f58e 100755
--- a/scripts/training/build-generation-table.perl
+++ b/scripts/training/build-generation-table.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
use warnings;
@@ -48,7 +48,7 @@ sub get_generation {
my %INCLUDE_SOURCE;
foreach my $factor (split(/,/,$factor_e_source)) {
-
+
$INCLUDE_SOURCE{$factor} = 1;
}
my %INCLUDE;
@@ -76,14 +76,14 @@ sub get_generation {
$target .= "|" unless $first_factor;
$first_factor = 0;
$target .= $FACTOR[$factor];
- }
+ }
$GENERATION{$source}{$target}++;
$GENERATION_TOTAL_SOURCE{$source}++;
$GENERATION_TOTAL_TARGET{$target}++;
}
- }
+ }
close(E);
-
+
open(GEN,">$_OUTPUT.$factor") or die "Can't write $_OUTPUT.$factor";
foreach my $source (keys %GENERATION) {
foreach my $target (keys %{$GENERATION{$source}}) {
diff --git a/scripts/training/build-mmsapt.perl b/scripts/training/build-mmsapt.perl
index a7ddaff70..00cbd09d6 100755
--- a/scripts/training/build-mmsapt.perl
+++ b/scripts/training/build-mmsapt.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/training/clean-corpus-n.perl b/scripts/training/clean-corpus-n.perl
index e1e96528c..cee4c76a2 100755
--- a/scripts/training/clean-corpus-n.perl
+++ b/scripts/training/clean-corpus-n.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id: clean-corpus-n.perl 3633 2010-10-21 09:49:27Z phkoehn $
use warnings;
@@ -64,7 +64,7 @@ if (-e $l2input) {
} else {
die "Error: $l2input does not exist";
}
-
+
open(E,$opn) or die "Can't open '$opn'";
open(FO,">$out.$l1") or die "Can't write $out.$l1";
@@ -102,7 +102,7 @@ while(my $f = <F>) {
$e = lc($e);
$f = lc($f);
}
-
+
$e =~ s/\|//g unless $factored_flag;
$e =~ s/\s+/ /g;
$e =~ s/^ //;
@@ -126,13 +126,13 @@ while(my $f = <F>) {
my $max_word_length_plus_one = $max_word_length + 1;
next if $e =~ /[^\s\|]{$max_word_length_plus_one}/;
next if $f =~ /[^\s\|]{$max_word_length_plus_one}/;
-
+
# An extra check: none of the factors can be blank!
die "There is a blank factor in $corpus.$l1 on line $innr: $f"
if $f =~ /[ \|]\|/;
die "There is a blank factor in $corpus.$l2 on line $innr: $e"
if $e =~ /[ \|]\|/;
-
+
$outnr++;
print FO $f."\n";
print EO $e."\n";
@@ -158,7 +158,7 @@ sub word_count {
$line =~ s/<\S[^>]*\S>/ /g;
$line =~ s/\s+/ /g;
$line =~ s/^ //g;
- $line =~ s/ $//g;
+ $line =~ s/ $//g;
}
my @w = split(/ /,$line);
return scalar @w;
diff --git a/scripts/training/clone_moses_model.pl b/scripts/training/clone_moses_model.pl
index 5e9dff72a..bf6708fca 100755
--- a/scripts/training/clone_moses_model.pl
+++ b/scripts/training/clone_moses_model.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
# given a moses.ini file, creates a fresh version of it
@@ -136,7 +136,7 @@ sub clone_file_or_die {
or die "Failed to clone $src into $tgt";
}
}
-
+
safesystem("echo $src > $tgt.info"); # dump a short information
}
diff --git a/scripts/training/combine_factors.pl b/scripts/training/combine_factors.pl
index dfdf020a0..fa6f15db2 100755
--- a/scripts/training/combine_factors.pl
+++ b/scripts/training/combine_factors.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
# given a list of files, combines them to a single corpus (sent to stdout)
@@ -50,7 +50,7 @@ while (defined $_) {
if $#toks != $#intokens;
$lines_of_extratoks[$factor] = \@toks;
}
-
+
# for every token, print the factors in the order as user wished
for(my $i=0; $i<=$#intokens; $i++) {
my $token = $intokens[$i];
diff --git a/scripts/training/convert-moses-ini-to-v2.perl b/scripts/training/convert-moses-ini-to-v2.perl
index 25c562ef4..e091a710d 100755
--- a/scripts/training/convert-moses-ini-to-v2.perl
+++ b/scripts/training/convert-moses-ini-to-v2.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -24,7 +24,7 @@ for(; $i<scalar(@INI); $i++) {
if ($section eq "ttable-file" ||
$section eq "distortion-file" ||
$section eq "generation-file" ||
- $section eq "lmodel-file" ||
+ $section eq "lmodel-file" ||
$section eq "ttable-limit" ||
$section eq "target-word-insertion-feature" ||
$section eq "source-word-deletion-feature" ||
@@ -50,7 +50,7 @@ for(; $i<scalar(@INI); $i++) {
}
elsif ($section eq "report-sparse-features") {
&get_data(); # ignore
- }
+ }
else {
print STDERR "include section [$section] verbatim.\n";
print $header.$line;
@@ -153,12 +153,12 @@ foreach my $section (keys %FEATURE) {
foreach my $line (@{$FEATURE{$section}}) {
my ($factors,$type,$weight_count,$file) = split(/ /,$line);
my ($input_factor,$output_factor) = split(/\-/, $factors);
- $feature .= "LexicalReordering name=LexicalReordering$i num-features=$weight_count type=$type input-factor=$input_factor output-factor=$output_factor path=$file\n";
+ $feature .= "LexicalReordering name=LexicalReordering$i num-features=$weight_count type=$type input-factor=$input_factor output-factor=$output_factor path=$file\n";
$weight .= "LexicalReordering$i=".&get_weights(\@W,$weight_count)."\n";
$i++;
}
}
-
+
elsif ($section eq "lmodel-file") {
my $i = 0;
my @W = @{$WEIGHT{"l"}};
diff --git a/scripts/training/convert-moses-ini-v2-to-v1.perl b/scripts/training/convert-moses-ini-v2-to-v1.perl
index aad3ba15e..44f192efe 100755
--- a/scripts/training/convert-moses-ini-v2-to-v1.perl
+++ b/scripts/training/convert-moses-ini-v2-to-v1.perl
@@ -94,7 +94,7 @@ class moses2_to_ini(object):
# second, match feature/functions attributes to [weight] section values
for i, line in [(i, line.strip()) for i, line in enumerate(lines)
- if line.strip() and not line.strip().startswith('#')]:
+ if line.strip() and not line.strip().startswith('#')]:
# add "feature" to assist creating tmpdict for feature/functions
line = 'feature=%s' % line
@@ -104,7 +104,7 @@ class moses2_to_ini(object):
if tmpdict.get('name') not in self._config:
raise RuntimeError('malformed moses.ini v2 file')
- for key, value in [(key.strip(), value.strip()) for key, value
+ for key, value in [(key.strip(), value.strip()) for key, value
in tmpdict.items() if key.strip() != 'name']:
self._config[tmpdict['name']][key] = value
@@ -195,7 +195,7 @@ def makedir(path, mode=0o777):
def get_args():
'''Parse command-line arguments
- Uses the API compatibility between the legacy
+ Uses the API compatibility between the legacy
argparse.OptionParser and its replacement argparse.ArgumentParser
for functional equivelancy and nearly identical help prompt.
'''
diff --git a/scripts/training/corpus-sizes.perl b/scripts/training/corpus-sizes.perl
index 02dd4ae9b..30ae67ebb 100755
--- a/scripts/training/corpus-sizes.perl
+++ b/scripts/training/corpus-sizes.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id: consolidate-training-data.perl 928 2009-09-02 02:58:01Z philipp $
@@ -11,7 +11,7 @@ foreach my $part (@PART) {
die("ERROR: no part $part.$in or $part.$out") if (! -e "$part.$in" || ! -e "$part.$out");
my $in_size = `cat $part.$in | wc -l`;
my $out_size = `cat $part.$out | wc -l`;
- die("number of lines don't match: '$part.$in' ($in_size) != '$part.$out' ($out_size)")
+ die("number of lines don't match: '$part.$in' ($in_size) != '$part.$out' ($out_size)")
if $in_size != $out_size;
print "$in_size";
}
diff --git a/scripts/training/exodus.perl b/scripts/training/exodus.perl
index d3466f5dd..bb8616007 100755
--- a/scripts/training/exodus.perl
+++ b/scripts/training/exodus.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
@@ -49,7 +49,7 @@ for(my $i=$header;$i<=$#LINE;$i++) {
my @DUMMY;
$i = &read(\@DUMMY,$i);
}
- #
+ #
elsif ($LINE[$i] =~ /^\[distortion-type\]/) {
my @DISTORTION_TYPE;
$i = &read(\@DISTORTION_TYPE,$i);
@@ -58,11 +58,11 @@ for(my $i=$header;$i<=$#LINE;$i++) {
s/orientation/msd/;
s/monotonicity/monotone/;
s/unidirectional/backward/;
- }
+ }
}
# parameters to be changed
elsif ($LINE[$i] =~ /^\[lmodel-file\]/) {
- print $LINE[$i];
+ print $LINE[$i];
# add language model type, factors
my @LMODEL_FILE;
$i = &read(\@LMODEL_FILE,$i);
@@ -85,7 +85,7 @@ for(my $i=$header;$i<=$#LINE;$i++) {
$i = &read(\@TTABLE_FILE,$i);
my $first_line;
if (-e $TTABLE_FILE[0]) {
- if ($TTABLE_FILE[0] =~ /\.gz$/) {
+ if ($TTABLE_FILE[0] =~ /\.gz$/) {
$first_line = `zcat $TTABLE_FILE[0] | head -1`;
}
else {
@@ -139,7 +139,7 @@ sub read {
$i++;
while($i<=$#LINE && $LINE[$i] !~ /^\[/) {
if ($LINE[$i] !~ /^\s*$/ && # ignore comments and empty lines
- $LINE[$i] !~ /^\#/) {
+ $LINE[$i] !~ /^\#/) {
# store value
my $line = $LINE[$i];
chop($line);
diff --git a/scripts/training/filter-model-given-input.pl b/scripts/training/filter-model-given-input.pl
index 1464fdb73..e3a34c40b 100755
--- a/scripts/training/filter-model-given-input.pl
+++ b/scripts/training/filter-model-given-input.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
# Given a moses.ini file and an input text prepare minimized translation
@@ -96,8 +96,8 @@ if (-d $dir && ! -e "$dir/info") {
if (-d $dir) {
my @INFO = `cat $dir/info`;
chop(@INFO);
- if($INFO[0] ne $config
- || ($INFO[1] ne $input &&
+ if($INFO[0] ne $config
+ || ($INFO[1] ne $input &&
$INFO[1].".tagged" ne $input)) {
print STDERR "WARNING: directory exists but does not match parameters:\n";
print STDERR " ($INFO[0] ne $config || $INFO[1] ne $input)\n";
@@ -140,7 +140,7 @@ while(my $line = <INI>) {
$table_flag = "";
$phrase_table_impl = $toks[0];
$skip = 0;
-
+
for (my $i = 1; $i < scalar(@toks); ++$i) {
my @args = split(/=/, $toks[$i]);
chomp($args[0]);
@@ -162,7 +162,7 @@ while(my $line = <INI>) {
$skip = 1;
}
} #for (my $i = 1; $i < scalar(@toks); ++$i) {
-
+
if (($phrase_table_impl ne "PhraseDictionaryMemory" && $phrase_table_impl ne "PhraseDictionarySCFG" && $phrase_table_impl ne "RuleTable") || $file =~ /glue-grammar/ || $skip) {
# Only Memory ("0") and NewFormat ("6") can be filtered.
print INI_OUT "$line\n";
@@ -210,7 +210,7 @@ while(my $line = <INI>) {
$CONSIDER_FACTORS{$source_factor} = 1;
print STDERR "Considering factor $source_factor\n";
push @TABLE_FACTORS, $source_factor;
-
+
} #if (/PhraseModel /) {
elsif ($line =~ /LexicalReordering /) {
print STDERR "ro:$line\n";
@@ -220,7 +220,7 @@ while(my $line = <INI>) {
my @args = split(/=/, $toks[$i]);
chomp($args[0]);
chomp($args[1]);
-
+
if ($args[0] eq "num-features") {
$w = $args[1];
}
@@ -238,14 +238,14 @@ while(my $line = <INI>) {
}
} # for (my $i = 1; $i < scalar(@toks); ++$i) {
-
+
push @TABLE, $file;
push @TABLE_WEIGHTS,$w;
-
+
$file =~ s/^.*\/+([^\/]+)/$1/g;
my $new_name = "$dir/$file";
$new_name =~ s/\.gz//;
-
+
#print INI_OUT "$source_factor $t $w $new_name\n";
@toks = set_value(\@toks, "path", "$new_name");
print INI_OUT join_array(\@toks)."\n";
@@ -256,10 +256,10 @@ while(my $line = <INI>) {
print STDERR "Considering factor $source_factor\n";
push @TABLE_FACTORS,$source_factor;
-
+
} #elsif (/LexicalReordering /) {
else {
- print INI_OUT "$line\n";
+ print INI_OUT "$line\n";
}
} # while(<INI>) {
close(INI);
@@ -412,7 +412,7 @@ for(my $i=0;$i<=$#TABLE;$i++) {
} elsif ($binarizer =~ /CreateOnDiskPt/) {
my $cmd = "$binarizer $mid_file $new_file.bin";
safesystem($cmd) or die "Can't binarize";
- } else {
+ } else {
my $cmd = "$catcmd $mid_file | LC_ALL=C sort -T $tempdir | $binarizer -ttable 0 0 - -nscores $TABLE_WEIGHTS[$i] -out $new_file";
safesystem($cmd) or die "Can't binarize";
}
@@ -507,13 +507,13 @@ sub ensure_full_path {
sub join_array {
my @outside = @{$_[0]};
-
+
my $ret = "";
for (my $i = 0; $i < scalar(@outside); ++$i) {
- my $tok = $outside[$i];
+ my $tok = $outside[$i];
$ret .= "$tok ";
}
-
+
return $ret;
}
diff --git a/scripts/training/get-lexical.perl b/scripts/training/get-lexical.perl
index 45fe6d54c..2ce151481 100755
--- a/scripts/training/get-lexical.perl
+++ b/scripts/training/get-lexical.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -12,7 +12,7 @@ if (scalar(@ARGV) < 4) {
} else {
my ($SOURCE,$TARGET,$ALIGNMENT,$OUT) = @ARGV;
-
+
&get_lexical($SOURCE,$TARGET,$ALIGNMENT,$OUT,0);
}
diff --git a/scripts/training/giza2bal.pl b/scripts/training/giza2bal.pl
index 56fc9a466..27ba9d659 100755
--- a/scripts/training/giza2bal.pl
+++ b/scripts/training/giza2bal.pl
@@ -1,8 +1,8 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
-#Converts direct and inverted alignments into a more compact
-#bi-alignment format. It optionally reads the counting file
+#Converts direct and inverted alignments into a more compact
+#bi-alignment format. It optionally reads the counting file
#produced by giza containing the frequency of each traning sentence.
#Copyright Marcello Federico, November 2004
@@ -15,12 +15,12 @@ while ($w=shift @ARGV){
$dir=shift(@ARGV),next if $w eq "-d";
$inv=shift(@ARGV),next if $w eq "-i";
$cnt=shift(@ARGV),next if $w eq "-c";
-}
+}
my $lc = 0;
if (!$dir || !$inv){
- print "usage: giza2bal.pl [-c <count-file>] -d <dir-align-file> -i <inv-align-file>\n";
+ print "usage: giza2bal.pl [-c <count-file>] -d <dir-align-file> -i <inv-align-file>\n";
print "input files can be also commands, e.g. -d \"gunzip -c file.gz\"\n";
exit(0);
}
diff --git a/scripts/training/mert-moses.pl b/scripts/training/mert-moses.pl
index a7263d4bd..92e1a79ff 100755
--- a/scripts/training/mert-moses.pl
+++ b/scripts/training/mert-moses.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
# Usage:
# mert-moses.pl <foreign> <english> <decoder-executable> <decoder-config>
@@ -99,7 +99,7 @@ my $megam_default_options = "-fvals -maxi 30 -nobias binary";
# Flags related to Batch MIRA (Cherry & Foster, 2012)
my $___BATCH_MIRA = 0; # flg to enable batch MIRA
-# Hypergraph mira
+# Hypergraph mira
my $___HG_MIRA = 0;
# Train phrase model mixture weights with PRO (Haddow, NAACL 2012)
@@ -380,7 +380,7 @@ if ($__PROMIX_TRAINING) {
die "Not executable $__PROMIX_TRAINING" unless -x $__PROMIX_TRAINING;
die "For promix training, specify the tables using --promix-table arguments" unless @__PROMIX_TABLES;
die "For mixture model, need at least 2 tables" unless scalar(@__PROMIX_TABLES) > 1;
-
+
for my $TABLE (@__PROMIX_TABLES) {
die "Phrase table $TABLE not found" unless -r $TABLE;
}
@@ -537,7 +537,7 @@ if ($__PROMIX_TRAINING) {
for (my $i = 0; $i < scalar(@__PROMIX_TABLES); ++$i) {
# Create filtered, binarised tables
my $filtered_config = "moses_$i.ini";
- substitute_ttable($___CONFIG, $filtered_config, $__PROMIX_TABLES[$i]);
+ substitute_ttable($___CONFIG, $filtered_config, $__PROMIX_TABLES[$i]);
#TODO: Remove reordering table from config, as we don't need to filter
# and binarise it.
my $filtered_path = "filtered_$i";
@@ -548,7 +548,7 @@ if ($__PROMIX_TRAINING) {
push (@_PROMIX_TABLES_BIN,"$filtered_path/phrase-table.0-0.1.1");
}
}
-
+
if ($___FILTER_PHRASE_TABLE) {
my $outdir = "filtered";
if (-e "$outdir/moses.ini") {
@@ -577,7 +577,7 @@ if ($___FILTER_PHRASE_TABLE) {
my $featlist = get_featlist_from_moses($___CONFIG);
$featlist = insert_ranges_to_featlist($featlist, $___RANGES);
-# Mark which features are disabled
+# Mark which features are disabled
if (defined $___ACTIVATE_FEATURES) {
$featlist->{"enabled"} = undef;
my %enabled = map { ($_, 1) } split /[, ]+/, $___ACTIVATE_FEATURES;
@@ -730,22 +730,22 @@ while (1) {
# total number of weights is 1 less than number of phrase features, multiplied
# by the number of tables
$num_mixed_phrase_features = (grep { $_ eq 'tm' } @{$featlist->{"names"}}) - 1;
-
- @promix_weights = (1.0/scalar(@__PROMIX_TABLES)) x
+
+ @promix_weights = (1.0/scalar(@__PROMIX_TABLES)) x
($num_mixed_phrase_features * scalar(@__PROMIX_TABLES));
}
-
+
# backup orig config, so we always add the table into it
- $uninterpolated_config= $___CONFIG unless $uninterpolated_config;
+ $uninterpolated_config= $___CONFIG unless $uninterpolated_config;
# Interpolation
my $interpolated_phrase_table = "interpolate";
for my $itable (@_PROMIX_TABLES_BIN) {
$interpolated_phrase_table .= " 1:$itable";
}
-
+
# Create an ini file for the interpolated phrase table
- $interpolated_config ="moses.interpolated.ini";
+ $interpolated_config ="moses.interpolated.ini";
substitute_ttable($uninterpolated_config, $interpolated_config, $interpolated_phrase_table, "99");
# Append the multimodel weights
@@ -903,7 +903,7 @@ while (1) {
" --scfile " . join(" --scfile ", split(/,/, $scfiles));
push @allnbests, $nbest_file;
- my $promix_file_settings =
+ my $promix_file_settings =
"--scfile " . join(" --scfile ", split(/,/, $scfiles)) .
" --nbest " . join(" --nbest ", @allnbests);
@@ -961,7 +961,7 @@ while (1) {
$mira_settings .= " --type hypergraph ";
$mira_settings .= join(" ", map {"--reference $_"} @references);
$mira_settings .= " --hgdir $hypergraph_dir ";
- #$mira_settings .= "--verbose ";
+ #$mira_settings .= "--verbose ";
$cmd = "$mert_mira_cmd $mira_settings $seed_settings -o $mert_outfile";
&submit_or_exec($cmd, "run$run.mira.out", $mert_logfile, 1);
} elsif ($__PROMIX_TRAINING) {
@@ -976,7 +976,7 @@ while (1) {
print "Finished promix optimisation at " . `date`;
} else { # just mert
&submit_or_exec($cmd . $mert_settings, $mert_outfile, $mert_logfile, ($__THREADS ? $__THREADS : 1) );
- }
+ }
die "Optimization failed, file $weights_out_file does not exist or is empty"
if ! -s $weights_out_file;
@@ -1195,7 +1195,7 @@ sub get_weights_from_mert {
}
close $fh;
die "It seems feature values are invalid or unable to read $outfile." if $sum < 1e-09;
-
+
$devbleu = "unknown";
foreach (@WEIGHT) { $_ /= $sum; }
foreach (keys %{$sparse_weights}) { $$sparse_weights{$_} /= $sum; }
@@ -1286,7 +1286,7 @@ sub run_decoder {
if (defined $___JOBS && $___JOBS > 1) {
die "Hypergraph mira not supported by moses-parallel" if $___HG_MIRA;
$decoder_cmd = "$moses_parallel_cmd $pass_old_sge -config $___CONFIG";
- $decoder_cmd .= " -inputtype $___INPUTTYPE" if defined($___INPUTTYPE);
+ $decoder_cmd .= " -inputtype $___INPUTTYPE" if defined($___INPUTTYPE);
$decoder_cmd .= " -qsub-prefix mert$run -queue-parameters \"$queue_flags\" -decoder-parameters \"$___DECODER_FLAGS $decoder_config\" $lsamp_cmd -n-best-list \"$filename $___N_BEST_LIST_SIZE distinct\" -input-file $___DEV_F -jobs $___JOBS -decoder $___DECODER > run$run.out";
} else {
my $nbest_list_cmd = "-n-best-list $filename $___N_BEST_LIST_SIZE distinct";
@@ -1403,7 +1403,7 @@ sub get_featlist_from_file {
if (/^(\S+)= (.+)$/) { # only for feature functions with dense features
my ($longname, $valuesStr) = ($1, $2);
next if (!defined($valuesStr));
-
+
my @values = split(/ /, $valuesStr);
my $valcnt = 0;
my $hastuneablecomponent = 0;
@@ -1605,7 +1605,7 @@ sub create_config {
# write all weights
print $out "[weight]\n";
-
+
my $prevName = "";
my $outStr = "";
my $valcnt = 0;
@@ -1613,7 +1613,7 @@ sub create_config {
for (my $i = 0; $i < scalar(@{$featlist->{"names"}}); $i++) {
my $name = $featlist->{"names"}->[$i];
my $val = $featlist->{"values"}->[$i];
-
+
if ($prevName ne $name) {
print $out "$outStr\n";
$valcnt = 0;
diff --git a/scripts/training/postprocess-lopar.perl b/scripts/training/postprocess-lopar.perl
index 5171e02fb..44be9c26c 100755
--- a/scripts/training/postprocess-lopar.perl
+++ b/scripts/training/postprocess-lopar.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
@@ -26,7 +26,7 @@ while (my $l =<STDIN>) {
foreach my $w (@ws) {
$wc++;
my ($surface, $morph, $lemma);
-
+
if ($w =~ /^(.+)_([^_]+)_(.+)$/o) {
($surface, $morph, $lemma) = ($1, $2, $3);
} else {
@@ -74,7 +74,7 @@ while (my $l =<STDIN>) {
$morph = join '.', @xs;
if (!defined $morph || $morph eq '') {
$morph = '-';
- }
+ }
# if (defined($lemma) && defined($morph) && defined($surface)) {
push @js, "$surface|$morph|$lemma";
push @ls, $lemma;
diff --git a/scripts/training/reduce-factors.perl b/scripts/training/reduce-factors.perl
index c265652f6..09f9c7f2b 100755
--- a/scripts/training/reduce-factors.perl
+++ b/scripts/training/reduce-factors.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -97,7 +97,7 @@ sub reduce_factors {
# $first_factor = 0;
# print OUT $FACTOR[$factor];
# }
- }
+ }
print OUT "\n";
}
print STDERR "\n";
diff --git a/scripts/training/reduce-topt-count.pl b/scripts/training/reduce-topt-count.pl
index 769f44a7e..f760051c4 100755
--- a/scripts/training/reduce-topt-count.pl
+++ b/scripts/training/reduce-topt-count.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# given a moses.ini, filter the phrase tables to contain
# only ttable-limit options per source phrase
@@ -155,7 +155,7 @@ sub filter_table
push @tgt_phrases, {
str => $line,
score => sum(map { $weights[$_] * log $scores[$_] } (0 .. $#weights))
- };
+ };
}
printf STDERR "Finished, kept %d%% of phrases\n", $kept / $total * 100;
close $in;
diff --git a/scripts/training/reduce_combine.pl b/scripts/training/reduce_combine.pl
index 3d0abf29a..a7614f73e 100755
--- a/scripts/training/reduce_combine.pl
+++ b/scripts/training/reduce_combine.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
# $Id$
# given a pathname to a factored corpus, a list of (numeric) factors to keep
@@ -72,7 +72,7 @@ while (<$corp_stream>) {
if $#toks != $#intokens;
$lines_of_extratoks{$factor} = \@toks;
}
-
+
# for every token, print the factors in the order as user wished
for(my $i=0; $i<=$#intokens; $i++) {
my $token = $intokens[$i];
diff --git a/scripts/training/remove-orphan-phrase-pairs-from-reordering-table.perl b/scripts/training/remove-orphan-phrase-pairs-from-reordering-table.perl
index bd5d7f1d2..eda529393 100755
--- a/scripts/training/remove-orphan-phrase-pairs-from-reordering-table.perl
+++ b/scripts/training/remove-orphan-phrase-pairs-from-reordering-table.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/training/threshold-filter.perl b/scripts/training/threshold-filter.perl
index a23fb8b5c..3e42ca795 100755
--- a/scripts/training/threshold-filter.perl
+++ b/scripts/training/threshold-filter.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/training/train-global-lexicon-model.perl b/scripts/training/train-global-lexicon-model.perl
index 0e7d3077d..d3c55789d 100755
--- a/scripts/training/train-global-lexicon-model.perl
+++ b/scripts/training/train-global-lexicon-model.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/training/train-model.perl b/scripts/training/train-model.perl
index e6eacfd98..6fb784e1e 100755
--- a/scripts/training/train-model.perl
+++ b/scripts/training/train-model.perl
@@ -22,65 +22,65 @@ if ($SCRIPTS_ROOTDIR eq '') {
$SCRIPTS_ROOTDIR =~ s/\/training$//;
#$SCRIPTS_ROOTDIR = $ENV{"SCRIPTS_ROOTDIR"} if defined($ENV{"SCRIPTS_ROOTDIR"});
-my($_EXTERNAL_BINDIR,
- $_ROOT_DIR,
- $_CORPUS_DIR,
- $_GIZA_E2F,
- $_GIZA_F2E,
- $_MODEL_DIR,
- $_TEMP_DIR,
- $_SORT_BUFFER_SIZE,
- $_SORT_BATCH_SIZE,
- $_SORT_COMPRESS,
- $_SORT_PARALLEL,
+my($_EXTERNAL_BINDIR,
+ $_ROOT_DIR,
+ $_CORPUS_DIR,
+ $_GIZA_E2F,
+ $_GIZA_F2E,
+ $_MODEL_DIR,
+ $_TEMP_DIR,
+ $_SORT_BUFFER_SIZE,
+ $_SORT_BATCH_SIZE,
+ $_SORT_COMPRESS,
+ $_SORT_PARALLEL,
$_CORPUS,
- $_CORPUS_COMPRESSION,
- $_FIRST_STEP,
- $_LAST_STEP,
- $_F,
- $_E,
- $_MAX_PHRASE_LENGTH,
+ $_CORPUS_COMPRESSION,
+ $_FIRST_STEP,
+ $_LAST_STEP,
+ $_F,
+ $_E,
+ $_MAX_PHRASE_LENGTH,
$_DISTORTION_LIMIT,
- $_LEXICAL_FILE,
- $_NO_LEXICAL_WEIGHTING,
- $_LEXICAL_COUNTS,
- $_VERBOSE,
+ $_LEXICAL_FILE,
+ $_NO_LEXICAL_WEIGHTING,
+ $_LEXICAL_COUNTS,
+ $_VERBOSE,
$_ALIGNMENT,
- $_ALIGNMENT_FILE,
- $_ALIGNMENT_STEM,
- @_LM,
- $_EXTRACT_FILE,
- $_GIZA_OPTION,
- $_HELP,
+ $_ALIGNMENT_FILE,
+ $_ALIGNMENT_STEM,
+ @_LM,
+ $_EXTRACT_FILE,
+ $_GIZA_OPTION,
+ $_HELP,
$_PARTS,
- $_DIRECTION,
- $_ONLY_PRINT_GIZA,
- $_GIZA_EXTENSION,
+ $_DIRECTION,
+ $_ONLY_PRINT_GIZA,
+ $_GIZA_EXTENSION,
$_REORDERING,
- $_REORDERING_SMOOTH,
- $_INPUT_FACTOR_MAX,
+ $_REORDERING_SMOOTH,
+ $_INPUT_FACTOR_MAX,
$_ALIGNMENT_FACTORS,
- $_TRANSLATION_FACTORS,
- $_REORDERING_FACTORS,
+ $_TRANSLATION_FACTORS,
+ $_REORDERING_FACTORS,
$_GENERATION_FACTORS,
$_DECODING_GRAPH_BACKOFF,
- $_DECODING_STEPS,
- $_PARALLEL,
- $_FACTOR_DELIMITER,
+ $_DECODING_STEPS,
+ $_PARALLEL,
+ $_FACTOR_DELIMITER,
@_PHRASE_TABLE,
- @_REORDERING_TABLE,
- @_GENERATION_TABLE,
- @_GENERATION_TYPE,
+ @_REORDERING_TABLE,
+ @_GENERATION_TABLE,
+ @_GENERATION_TYPE,
$_GENERATION_CORPUS,
- $_DONT_ZIP,
- $_MGIZA,
- $_MGIZA_CPUS,
- $_SNT2COOC,
- $_HMM_ALIGN,
- $_CONFIG,
- $_OSM,
- $_OSM_FACTORS,
- $_POST_DECODING_TRANSLIT,
+ $_DONT_ZIP,
+ $_MGIZA,
+ $_MGIZA_CPUS,
+ $_SNT2COOC,
+ $_HMM_ALIGN,
+ $_CONFIG,
+ $_OSM,
+ $_OSM_FACTORS,
+ $_POST_DECODING_TRANSLIT,
$_TRANSLITERATION_PHRASE_TABLE,
$_HIERARCHICAL,
$_XML,
@@ -104,13 +104,13 @@ my($_EXTERNAL_BINDIR,
@_EXTRACT_OPTIONS,
@_SCORE_OPTIONS,
$_S2T,
- $_ALT_DIRECT_RULE_SCORE_1,
- $_ALT_DIRECT_RULE_SCORE_2,
+ $_ALT_DIRECT_RULE_SCORE_1,
+ $_ALT_DIRECT_RULE_SCORE_2,
$_UNKNOWN_WORD_SOFT_MATCHES_FILE,
$_USE_SYNTAX_INPUT_WEIGHT_FEATURE,
$_OMIT_WORD_ALIGNMENT,
$_FORCE_FACTORED_FILENAMES,
- $_MEMSCORE,
+ $_MEMSCORE,
$_FINAL_ALIGNMENT_MODEL,
$_CONTINUE,
$_MAX_LEXICAL_REORDERING,
@@ -119,17 +119,17 @@ my($_EXTERNAL_BINDIR,
@_ADDITIONAL_INI,
$_ADDITIONAL_INI_FILE,
$_MMSAPT,
- @_BASELINE_ALIGNMENT_MODEL,
- $_BASELINE_EXTRACT,
+ @_BASELINE_ALIGNMENT_MODEL,
+ $_BASELINE_EXTRACT,
$_BASELINE_ALIGNMENT,
- $_DICTIONARY,
- $_SPARSE_PHRASE_FEATURES,
- $_EPPEX,
- $_INSTANCE_WEIGHTS_FILE,
- $_LMODEL_OOV_FEATURE,
- $_NUM_LATTICE_FEATURES,
- $IGNORE,
- $_FLEXIBILITY_SCORE,
+ $_DICTIONARY,
+ $_SPARSE_PHRASE_FEATURES,
+ $_EPPEX,
+ $_INSTANCE_WEIGHTS_FILE,
+ $_LMODEL_OOV_FEATURE,
+ $_NUM_LATTICE_FEATURES,
+ $IGNORE,
+ $_FLEXIBILITY_SCORE,
$_EXTRACT_COMMAND,
$_SCORE_COMMAND);
my $_BASELINE_CORPUS = "";
@@ -168,8 +168,8 @@ $_HELP = 1
'parallel' => \$_PARALLEL,
'lm=s' => \@_LM,
'help' => \$_HELP,
- 'mgiza' => \$_MGIZA, # multi-thread
- 'mgiza-cpus=i' => \$_MGIZA_CPUS, # multi-thread
+ 'mgiza' => \$_MGIZA, # multi-thread
+ 'mgiza-cpus=i' => \$_MGIZA_CPUS, # multi-thread
'snt2cooc=s' => \$_SNT2COOC, # override snt2cooc exe. For when you want to run reduced memory snt2cooc.perl from mgiza
'hmm-align' => \$_HMM_ALIGN,
'final-alignment-model=s' => \$_FINAL_ALIGNMENT_MODEL, # use word alignment model 1/2/hmm/3/4/5 as final (default is 4); value 'hmm' equivalent to the --hmm-align switch
@@ -227,7 +227,7 @@ $_HELP = 1
'osm-model=s' => \$_OSM,
'osm-setting=s' => \$_OSM_FACTORS,
'post-decoding-translit=s' => \$_POST_DECODING_TRANSLIT,
- 'transliteration-phrase-table=s' => \$_TRANSLITERATION_PHRASE_TABLE,
+ 'transliteration-phrase-table=s' => \$_TRANSLITERATION_PHRASE_TABLE,
'mmsapt' => \$_MMSAPT,
'max-lexical-reordering' => \$_MAX_LEXICAL_REORDERING,
'lexical-reordering-default-scores=s' => \$_LEXICAL_REORDERING_DEFAULT_SCORES,
@@ -237,8 +237,8 @@ $_HELP = 1
'dictionary=s' => \$_DICTIONARY,
'sparse-phrase-features' => \$_SPARSE_PHRASE_FEATURES,
'eppex:s' => \$_EPPEX,
- 'additional-ini=s' => \@_ADDITIONAL_INI,
- 'additional-ini-file=s' => \$_ADDITIONAL_INI_FILE,
+ 'additional-ini=s' => \@_ADDITIONAL_INI,
+ 'additional-ini-file=s' => \$_ADDITIONAL_INI_FILE,
'baseline-alignment-model=s{8}' => \@_BASELINE_ALIGNMENT_MODEL,
'baseline-extract=s' => \$_BASELINE_EXTRACT,
'baseline-corpus=s' => \$_BASELINE_CORPUS,
@@ -342,7 +342,7 @@ foreach my $step (@step_conf) {
}
die("Only steps between 1 and 9 can be used") if ($f < 1 || $l > 9);
die("The first step must be smaller than the last step") if ($f > $l);
-
+
for (my $i=$f; $i<=$l; $i++) {
$STEPS[$i] = 1;
}
@@ -356,12 +356,12 @@ my $GIZA;
my $SNT2COOC;
if ($STEPS[1] || $STEPS[2])
-{
+{
if(!defined $_MGIZA ){
$GIZA = "$_EXTERNAL_BINDIR/GIZA++";
if (-x "$_EXTERNAL_BINDIR/snt2cooc.out") {
$SNT2COOC = "$_EXTERNAL_BINDIR/snt2cooc.out";
- } elsif (-x "$_EXTERNAL_BINDIR/snt2cooc") { # Since "snt2cooc.out" and "snt2cooc" work the same
+ } elsif (-x "$_EXTERNAL_BINDIR/snt2cooc") { # Since "snt2cooc.out" and "snt2cooc" work the same
$SNT2COOC = "$_EXTERNAL_BINDIR/snt2cooc";
}
print STDERR "Using single-thread GIZA\n";
@@ -377,19 +377,19 @@ if ($STEPS[1] || $STEPS[2])
} elsif (-x "$_EXTERNAL_BINDIR/snt2cooc.out") { # Important for users that use MGIZA and copy only the "mgiza" file to $_EXTERNAL_BINDIR
$SNT2COOC = "$_EXTERNAL_BINDIR/snt2cooc.out";
}
- print STDERR "Using multi-thread GIZA\n";
+ print STDERR "Using multi-thread GIZA\n";
if (!defined($_MGIZA_CPUS)) {
$_MGIZA_CPUS=4;
}
die("ERROR: Cannot find $MGIZA_MERGE_ALIGN") unless (-x $MGIZA_MERGE_ALIGN);
}
-
+
# override
- $SNT2COOC = "$_EXTERNAL_BINDIR/$_SNT2COOC" if defined($_SNT2COOC);
+ $SNT2COOC = "$_EXTERNAL_BINDIR/$_SNT2COOC" if defined($_SNT2COOC);
}
# parallel extract
-my $SPLIT_EXEC = `gsplit --help 2>/dev/null`;
+my $SPLIT_EXEC = `gsplit --help 2>/dev/null`;
if($SPLIT_EXEC) {
$SPLIT_EXEC = 'gsplit';
}
@@ -397,7 +397,7 @@ else {
$SPLIT_EXEC = 'split';
}
-my $SORT_EXEC = `gsort --help 2>/dev/null`;
+my $SORT_EXEC = `gsort --help 2>/dev/null`;
if($SORT_EXEC) {
$SORT_EXEC = 'gsort';
}
@@ -549,13 +549,13 @@ $___GLUE_GRAMMAR_FILE = $_GLUE_GRAMMAR_FILE if $_GLUE_GRAMMAR_FILE;
my $___CONFIG = $___MODEL_DIR."/moses.ini";
$___CONFIG = $_CONFIG if $_CONFIG;
-my $___DONT_ZIP = 0;
+my $___DONT_ZIP = 0;
$_DONT_ZIP = $___DONT_ZIP unless $___DONT_ZIP;
my $___TEMP_DIR = $___MODEL_DIR;
$___TEMP_DIR = $_TEMP_DIR if $_TEMP_DIR;
-my $___CONTINUE = 0;
+my $___CONTINUE = 0;
$___CONTINUE = $_CONTINUE if $_CONTINUE;
my $___MAX_PHRASE_LENGTH = "7";
@@ -632,13 +632,13 @@ foreach my $r (split(/\,/,$___REORDERING)) {
$r =~ s/unidirectional/backward/;
#set default values
push @REORDERING_MODELS, {};
- $REORDERING_MODELS[$model_num]{"dir"} = "backward";
+ $REORDERING_MODELS[$model_num]{"dir"} = "backward";
$REORDERING_MODELS[$model_num]{"type"} = "wbe";
$REORDERING_MODELS[$model_num]{"collapse"} = "allff";
#handle the options set in the config string
foreach my $reoconf (split(/\-/,$r)) {
- if ($reoconf =~ /^((msd)|(mslr)|(monotonicity)|(leftright))/) {
+ if ($reoconf =~ /^((msd)|(mslr)|(monotonicity)|(leftright))/) {
$REORDERING_MODELS[$model_num]{"orient"} = $reoconf;
$REORDERING_LEXICAL = 1;
}
@@ -695,7 +695,7 @@ foreach my $r (split(/\,/,$___REORDERING)) {
# fix the overall model selection
if (defined $REORDERING_MODEL_TYPES{$REORDERING_MODELS[$model_num]{"type"}}) {
$REORDERING_MODEL_TYPES{$REORDERING_MODELS[$model_num]{"type"}} .=
- $REORDERING_MODELS[$model_num]{"orient"}."-";
+ $REORDERING_MODELS[$model_num]{"orient"}."-";
}
else {
$REORDERING_MODEL_TYPES{$REORDERING_MODELS[$model_num]{"type"}} =
@@ -728,26 +728,26 @@ $___NOT_FACTORED = 0 unless $___ALIGNMENT_FACTORS eq "0-0";
my $___TRANSLATION_FACTORS = undef;
$___TRANSLATION_FACTORS = "0-0" unless defined($_DECODING_STEPS); # single factor default
$___TRANSLATION_FACTORS = $_TRANSLATION_FACTORS if defined($_TRANSLATION_FACTORS);
-die("ERROR: format for translation factors is \"0-0\" or \"0-0+1-1\" or \"0-0+0,1-0,1\", you provided $___TRANSLATION_FACTORS\n")
+die("ERROR: format for translation factors is \"0-0\" or \"0-0+1-1\" or \"0-0+0,1-0,1\", you provided $___TRANSLATION_FACTORS\n")
if defined $___TRANSLATION_FACTORS && $___TRANSLATION_FACTORS !~ /^\d+(\,\d+)*\-\d+(\,\d+)*(\+\d+(\,\d+)*\-\d+(\,\d+)*)*$/;
$___NOT_FACTORED = 0 unless $___TRANSLATION_FACTORS eq "0-0";
my $___REORDERING_FACTORS = undef;
$___REORDERING_FACTORS = "0-0" if defined($_REORDERING) && ! defined($_DECODING_STEPS); # single factor default
$___REORDERING_FACTORS = $_REORDERING_FACTORS if defined($_REORDERING_FACTORS);
-die("ERROR: format for reordering factors is \"0-0\" or \"0-0+1-1\" or \"0-0+0,1-0,1\", you provided $___REORDERING_FACTORS\n")
+die("ERROR: format for reordering factors is \"0-0\" or \"0-0+1-1\" or \"0-0+0,1-0,1\", you provided $___REORDERING_FACTORS\n")
if defined $___REORDERING_FACTORS && $___REORDERING_FACTORS !~ /^\d+(\,\d+)*\-\d+(\,\d+)*(\+\d+(\,\d+)*\-\d+(\,\d+)*)*$/;
$___NOT_FACTORED = 0 if defined($_REORDERING) && $___REORDERING_FACTORS ne "0-0";
my $___GENERATION_FACTORS = undef;
$___GENERATION_FACTORS = $_GENERATION_FACTORS if defined($_GENERATION_FACTORS);
-die("ERROR: format for generation factors is \"0-1\" or \"0-1+0-2\" or \"0-1+0,1-1,2\", you provided $___GENERATION_FACTORS\n")
+die("ERROR: format for generation factors is \"0-1\" or \"0-1+0-2\" or \"0-1+0,1-1,2\", you provided $___GENERATION_FACTORS\n")
if defined $___GENERATION_FACTORS && $___GENERATION_FACTORS !~ /^\d+(\,\d+)*\-\d+(\,\d+)*(\+\d+(\,\d+)*\-\d+(\,\d+)*)*$/;
$___NOT_FACTORED = 0 if defined($___GENERATION_FACTORS);
my $___DECODING_STEPS = "t0";
$___DECODING_STEPS = $_DECODING_STEPS if defined($_DECODING_STEPS);
-die("ERROR: format for decoding steps is \"t0,g0,t1,g1:t2\", you provided $___DECODING_STEPS\n")
+die("ERROR: format for decoding steps is \"t0,g0,t1,g1:t2\", you provided $___DECODING_STEPS\n")
if defined $_DECODING_STEPS && $_DECODING_STEPS !~ /^[tg]\d+([,:][tg]\d+)*$/;
### MAIN
@@ -767,7 +767,7 @@ die("ERROR: format for decoding steps is \"t0,g0,t1,g1:t2\", you provided $___DE
sub prepare {
print STDERR "(1) preparing corpus @ ".`date`;
safesystem("mkdir -p $___CORPUS_DIR") or die("ERROR: could not create corpus dir $___CORPUS_DIR");
-
+
print STDERR "(1.0) selecting factors @ ".`date`;
my ($factor_f,$factor_e) = split(/\-/,$___ALIGNMENT_FACTORS);
my $corpus = ($___NOT_FACTORED && !$_XML) ? $___CORPUS : $___CORPUS.".".$___ALIGNMENT_FACTORS;
@@ -779,21 +779,21 @@ sub prepare {
&reduce_factors($___CORPUS.".".$___F,$corpus.".".$___F,$factor_f);
&reduce_factors($___CORPUS.".".$___E,$corpus.".".$___E,$factor_e);
}
-
+
&make_classes($corpus.".".$___F,$___VCB_F.".classes");
&make_classes($corpus.".".$___E,$___VCB_E.".classes");
-
+
$VCB_F = &get_vocabulary($corpus.".".$___F,$___VCB_F,0);
$VCB_E = &get_vocabulary($corpus.".".$___E,$___VCB_E,1);
-
+
&numberize_txt_file($VCB_F,$corpus.".".$___F,
$VCB_E,$corpus.".".$___E,
$___CORPUS_DIR."/$___F-$___E-int-train.snt");
-
+
&numberize_txt_file($VCB_E,$corpus.".".$___E,
$VCB_F,$corpus.".".$___F,
$___CORPUS_DIR."/$___E-$___F-int-train.snt");
- }
+ }
else {
print "Forking...\n";
if (! $___NOT_FACTORED || $_XML) {
@@ -802,7 +802,7 @@ sub prepare {
if (!$pid) {
&reduce_factors($___CORPUS.".".$___F,$corpus.".".$___F,$factor_f);
exit 0;
- }
+ }
else {
&reduce_factors($___CORPUS.".".$___E,$corpus.".".$___E,$factor_e);
}
@@ -821,14 +821,14 @@ sub prepare {
&make_classes($corpus.".".$___E,$___VCB_E.".classes");
exit 0;
}
-
+
$VCB_F = &get_vocabulary($corpus.".".$___F,$___VCB_F,0);
$VCB_E = &get_vocabulary($corpus.".".$___E,$___VCB_E,1);
-
+
&numberize_txt_file($VCB_F,$corpus.".".$___F,
$VCB_E,$corpus.".".$___E,
$___CORPUS_DIR."/$___F-$___E-int-train.snt");
-
+
&numberize_txt_file($VCB_E,$corpus.".".$___E,
$VCB_F,$corpus.".".$___F,
$___CORPUS_DIR."/$___E-$___F-int-train.snt");
@@ -930,7 +930,7 @@ sub reduce_factors {
# $first_factor = 0;
# print OUT $FACTOR[$factor];
# }
- }
+ }
print OUT "\n";
}
print STDERR "\n";
@@ -954,7 +954,7 @@ sub get_vocabulary {
# return unless $___LEXICAL_WEIGHTING;
my($corpus,$vcb,$is_target) = @_;
print STDERR "(1.2) creating vcb file $vcb @ ".`date`;
-
+
my %WORD;
open(TXT,$corpus) or die "ERROR: Can't read $corpus";
while(<TXT>) {
@@ -1000,7 +1000,7 @@ sub get_vocabulary {
$id++;
}
close(VCB);
-
+
return \%VCB;
}
@@ -1028,7 +1028,7 @@ sub make_dicts_files {
}
close(DICT);
my @items = sort {$a <=> $b} keys %numberized_dict;
- if (scalar(@items) == 0) { return 0; }
+ if (scalar(@items) == 0) { return 0; }
foreach my $key (@items)
{
print OUT1 "$key $numberized_dict{$key}\n";
@@ -1066,7 +1066,7 @@ sub numberize_line {
chomp($txt);
my $out = "";
my $not_first = 0;
- foreach (split(/ /,$txt)) {
+ foreach (split(/ /,$txt)) {
next if $_ eq '';
$out .= " " if $not_first++;
print STDERR "Unknown word '$_'\n" unless defined($$VCB{$_});
@@ -1115,13 +1115,13 @@ sub run_giza_on_parts {
my $size = `cat $___CORPUS_DIR/$___F-$___E-int-train.snt | wc -l`;
die "ERROR: Failed to get number of lines in $___CORPUS_DIR/$___F-$___E-int-train.snt"
if $size == 0;
-
+
if ($___DIRECTION == 1 || $___DIRECTION == 2 || $___NOFORK) {
&run_single_giza_on_parts($___GIZA_F2E,$___E,$___F,
$___VCB_E,$___VCB_F,
$___CORPUS_DIR."/$___F-$___E-int-train.snt",$size)
unless $___DIRECTION == 2;
-
+
&run_single_giza_on_parts($___GIZA_E2F,$___F,$___E,
$___VCB_F,$___VCB_E,
$___CORPUS_DIR."/$___E-$___F-int-train.snt",$size)
@@ -1148,12 +1148,12 @@ sub run_giza_on_parts {
sub run_single_giza_on_parts {
my($dir,$e,$f,$vcb_e,$vcb_f,$train,$size) = @_;
-
+
my $part = 0;
# break up training data into parts
open(SNT,$train) or die "ERROR: Can't read $train";
- {
+ {
my $i=0;
while(<SNT>) {
$i++;
@@ -1220,7 +1220,7 @@ sub merge_cooc_files {
$CURRENT[$i] = <$pf>;
chop($CURRENT[$i]) if $CURRENT[$i];
}
- }
+ }
}
for(my $i=0;$i<scalar(@COOC_PART_FILE_NAME);$i++) {
close($PF[$i]);
@@ -1231,16 +1231,16 @@ sub merge_cooc_files {
sub run_single_giza {
my($dir,$e,$f,$vcb_e,$vcb_f,$train) = @_;
- my %GizaDefaultOptions =
+ my %GizaDefaultOptions =
(p0 => .999 ,
- m1 => 5 ,
- m2 => 0 ,
- m3 => 3 ,
- m4 => 3 ,
+ m1 => 5 ,
+ m2 => 0 ,
+ m3 => 3 ,
+ m4 => 3 ,
o => "giza" ,
nodumps => 1 ,
onlyaldumps => 1 ,
- nsmooth => 4 ,
+ nsmooth => 4 ,
model1dumpfrequency => 1,
model4smoothfactor => 0.4 ,
t => $vcb_f,
@@ -1248,10 +1248,10 @@ sub run_single_giza {
c => $train,
CoocurrenceFile => "$dir/$f-$e.cooc",
o => "$dir/$f-$e");
-
+
if (defined $_DICTIONARY)
{ $GizaDefaultOptions{d} = $___CORPUS_DIR."/gizadict.$f-$e"; }
-
+
# 5 Giza threads
if (defined $_MGIZA){ $GizaDefaultOptions{"ncpus"} = $_MGIZA_CPUS; }
@@ -1266,15 +1266,15 @@ sub run_single_giza {
if ($___FINAL_ALIGNMENT_MODEL) {
$GizaDefaultOptions{nodumps} = ($___FINAL_ALIGNMENT_MODEL =~ /^[345]$/)? 1: 0;
$GizaDefaultOptions{model345dumpfrequency} = 0;
-
+
$GizaDefaultOptions{model1dumpfrequency} = ($___FINAL_ALIGNMENT_MODEL eq '1')? 5: 0;
-
+
$GizaDefaultOptions{m2} = ($___FINAL_ALIGNMENT_MODEL eq '2')? 5: 0;
$GizaDefaultOptions{model2dumpfrequency} = ($___FINAL_ALIGNMENT_MODEL eq '2')? 5: 0;
-
+
$GizaDefaultOptions{hmmiterations} = ($___FINAL_ALIGNMENT_MODEL =~ /^(hmm|[345])$/)? 5: 0;
$GizaDefaultOptions{hmmdumpfrequency} = ($___FINAL_ALIGNMENT_MODEL eq 'hmm')? 5: 0;
-
+
$GizaDefaultOptions{m3} = ($___FINAL_ALIGNMENT_MODEL =~ /^[345]$/)? 3: 0;
$GizaDefaultOptions{m4} = ($___FINAL_ALIGNMENT_MODEL =~ /^[45]$/)? 3: 0;
$GizaDefaultOptions{m5} = ($___FINAL_ALIGNMENT_MODEL eq '5')? 3: 0;
@@ -1298,7 +1298,7 @@ sub run_single_giza {
my $value = $GizaDefaultOptions{$option} ;
$GizaOptions .= " -$option $value" ;
}
-
+
&run_single_snt2cooc($dir,$e,$f,$vcb_e,$vcb_f,$train) if $___PARTS == 1;
print STDERR "(2.1b) running giza $f-$e @ ".`date`."$GIZA $GizaOptions\n";
@@ -1311,7 +1311,7 @@ sub run_single_giza {
print "$GIZA $GizaOptions\n";
return if $___ONLY_PRINT_GIZA;
safesystem("$GIZA $GizaOptions");
-
+
if (defined $_MGIZA and (!defined $___FINAL_ALIGNMENT_MODEL or $___FINAL_ALIGNMENT_MODEL ne '2')){
print STDERR "Merging $___GIZA_EXTENSION.part\* tables\n";
safesystem("$MGIZA_MERGE_ALIGN $dir/$f-$e.$___GIZA_EXTENSION.part*>$dir/$f-$e.$___GIZA_EXTENSION");
@@ -1353,7 +1353,7 @@ sub word_align {
### build arguments for giza2bal.pl
my($__ALIGNMENT_CMD,$__ALIGNMENT_INV_CMD);
-
+
if (-e "$___GIZA_F2E/$___F-$___E.$___GIZA_EXTENSION.bz2"){
$__ALIGNMENT_CMD="\"$BZCAT $___GIZA_F2E/$___F-$___E.$___GIZA_EXTENSION.bz2\"";
} elsif (-e "$___GIZA_F2E/$___F-$___E.$___GIZA_EXTENSION.gz") {
@@ -1361,7 +1361,7 @@ sub word_align {
} else {
die "ERROR: Can't read $___GIZA_F2E/$___F-$___E.$___GIZA_EXTENSION.{bz2,gz}\n";
}
-
+
if ( -e "$___GIZA_E2F/$___E-$___F.$___GIZA_EXTENSION.bz2"){
$__ALIGNMENT_INV_CMD="\"$BZCAT $___GIZA_E2F/$___E-$___F.$___GIZA_EXTENSION.bz2\"";
}elsif (-e "$___GIZA_E2F/$___E-$___F.$___GIZA_EXTENSION.gz"){
@@ -1369,9 +1369,9 @@ sub word_align {
}else{
die "ERROR: Can't read $___GIZA_E2F/$___E-$___F.$___GIZA_EXTENSION.{bz2,gz}\n\n";
}
-
+
safesystem("mkdir -p $___MODEL_DIR") or die("ERROR: could not create dir $___MODEL_DIR");
-
+
#build arguments for symal
my($__symal_a)="";
$__symal_a="union" if $___ALIGNMENT eq 'union';
@@ -1379,22 +1379,22 @@ sub word_align {
$__symal_a="grow" if $___ALIGNMENT=~ /grow/;
$__symal_a="srctotgt" if $___ALIGNMENT=~ /srctotgt/;
$__symal_a="tgttosrc" if $___ALIGNMENT=~ /tgttosrc/;
-
-
+
+
my($__symal_d,$__symal_f,$__symal_b);
($__symal_d,$__symal_f,$__symal_b)=("no","no","no");
$__symal_d="yes" if $___ALIGNMENT=~ /diag/;
$__symal_f="yes" if $___ALIGNMENT=~ /final/;
$__symal_b="yes" if $___ALIGNMENT=~ /final-and/;
-
+
safesystem("$GIZA2BAL -d $__ALIGNMENT_INV_CMD -i $__ALIGNMENT_CMD |".
"$SYMAL -alignment=\"$__symal_a\" -diagonal=\"$__symal_d\" ".
"-final=\"$__symal_f\" -both=\"$__symal_b\" > ".
- "$___ALIGNMENT_FILE.$___ALIGNMENT")
+ "$___ALIGNMENT_FILE.$___ALIGNMENT")
||
die "ERROR: Can't generate symmetrized alignment file\n"
-
+
}
### (4) BUILDING LEXICAL TRANSLATION TABLE
@@ -1405,7 +1405,7 @@ sub get_lexical_factored {
&get_lexical($___CORPUS.".".$___F,
$___CORPUS.".".$___E,
$___ALIGNMENT_FILE.".".$___ALIGNMENT,
- $___LEXICAL_FILE,
+ $___LEXICAL_FILE,
$___LEXICAL_COUNTS,
$_BASELINE_CORPUS.".".$___F,
$_BASELINE_CORPUS.".".$___E,
@@ -1427,7 +1427,7 @@ sub get_lexical_factored {
&get_lexical($___ALIGNMENT_STEM.".".$factor_f.".".$___F,
$___ALIGNMENT_STEM.".".$factor_e.".".$___E,
$___ALIGNMENT_FILE.".".$___ALIGNMENT,
- $lexical_file,
+ $lexical_file,
$___LEXICAL_COUNTS,
$_BASELINE_CORPUS.".".$factor_f.".".$___F,
$_BASELINE_CORPUS.".".$factor_e.".".$___E,
@@ -1461,7 +1461,7 @@ sub extract_phrase_factored {
foreach my $factor (split(/\+/,"$___REORDERING_FACTORS")) {
my $factor_key = $factor.":".&get_max_phrase_length(-1); # max
if (!defined($EXTRACT_FOR_FACTOR{$factor_key}{"translation"})) {
- push @FACTOR_LIST, $factor_key;
+ push @FACTOR_LIST, $factor_key;
}
$EXTRACT_FOR_FACTOR{$factor_key}{"reordering"}++;
}
@@ -1471,14 +1471,14 @@ sub extract_phrase_factored {
my ($factor,$max_length) = split(/:/,$factor_key);
print STDERR "(5) [$factor] extract phrases (max length $max_length)@ ".`date`;
my ($factor_f,$factor_e) = split(/\-/,$factor);
-
+
&reduce_factors($___CORPUS.".".$___F,
$___ALIGNMENT_STEM.".".$factor_f.".".$___F,
$factor_f);
&reduce_factors($___CORPUS.".".$___E,
$___ALIGNMENT_STEM.".".$factor_e.".".$___E,
$factor_e);
-
+
&extract_phrase($___ALIGNMENT_STEM.".".$factor_f.".".$___F,
$___ALIGNMENT_STEM.".".$factor_e.".".$___E,
$___EXTRACT_FILE.".".$factor,
@@ -1491,7 +1491,7 @@ sub extract_phrase_factored {
sub get_max_phrase_length {
my ($table_number) = @_;
-
+
# single length? that's it then
if ($___MAX_PHRASE_LENGTH =~ /^\d+$/) {
return $___MAX_PHRASE_LENGTH;
@@ -1508,7 +1508,7 @@ sub get_max_phrase_length {
return $max_length;
}
- # look up length for table
+ # look up length for table
$max_length = $max[0]; # fallback: first specified length
if ($#max >= $table_number) {
$max_length = $max[$table_number];
@@ -1520,7 +1520,7 @@ sub get_extract_reordering_flags {
if ($___MAX_LEXICAL_REORDERING) {
return " --model wbe-mslr --model phrase-mslr --model hier-mslr";
}
- return "" unless @REORDERING_MODELS;
+ return "" unless @REORDERING_MODELS;
my $config_string = "";
for my $type ( keys %REORDERING_MODEL_TYPES) {
$config_string .= " --model $type-".$REORDERING_MODEL_TYPES{$type};
@@ -1552,7 +1552,7 @@ sub extract_phrase {
$cmd .= " --PCFG" if $_PCFG;
$cmd .= " --UnpairedExtractFormat" if $_ALT_DIRECT_RULE_SCORE_1 || $_ALT_DIRECT_RULE_SCORE_2;
$cmd .= " --ConditionOnTargetLHS" if $_ALT_DIRECT_RULE_SCORE_1;
- if (defined($_GHKM))
+ if (defined($_GHKM))
{
$cmd .= " --TreeFragments" if $_GHKM_TREE_FRAGMENTS;
$cmd .= " --PhraseOrientation" if $_GHKM_PHRASE_ORIENTATION;
@@ -1590,12 +1590,12 @@ sub extract_phrase {
}
$cmd .= " ".$_EXTRACT_OPTIONS if defined($_EXTRACT_OPTIONS);
}
-
+
$cmd .= " --GZOutput ";
$cmd .= " --InstanceWeights $_INSTANCE_WEIGHTS_FILE " if defined $_INSTANCE_WEIGHTS_FILE;
$cmd .= " --BaselineExtract $_BASELINE_EXTRACT" if defined($_BASELINE_EXTRACT) && $PHRASE_EXTRACT =~ /extract-parallel.perl/;
$cmd .= " --FlexibilityScore" if $_FLEXIBILITY_SCORE;
-
+
map { die "File not found: $_" if ! -e $_ } ($alignment_file_e, $alignment_file_f, $alignment_file_a);
print STDERR "$cmd\n";
safesystem("$cmd") or die "ERROR: Phrase extraction failed (missing input files?)";
@@ -1608,7 +1608,7 @@ sub extract_phrase {
if -e "$extract_file$suffix.o.gz";
safesystem("rm $extract_file$suffix.gz");
safesystem("rm $extract_file$suffix.inv.gz");
- safesystem("rm $extract_file$suffix.o.gz")
+ safesystem("rm $extract_file$suffix.o.gz")
if -e "$extract_file$suffix.o.gz";
}
@@ -1710,7 +1710,7 @@ sub score_phrase_phrase_extract {
$substep+=2;
}
my $pid = fork();
-
+
if ($pid == 0)
{
next if $___CONTINUE && -e "$ttable_file.half.$direction";
@@ -1721,7 +1721,7 @@ sub score_phrase_phrase_extract {
$inverse = "--Inverse";
$extract_filename = $extract_file.".inv";
}
-
+
my $extract = "$extract_filename.sorted.gz";
print STDERR "(6.".($substep++).") creating table half $ttable_file.half.$direction @ ".`date`;
@@ -1756,8 +1756,8 @@ sub score_phrase_phrase_extract {
}
print STDERR $cmd."\n";
- safesystem($cmd) or die "ERROR: Scoring of phrases failed";
-
+ safesystem($cmd) or die "ERROR: Scoring of phrases failed";
+
exit();
}
else
@@ -1796,9 +1796,9 @@ sub score_phrase_phrase_extract {
$cmd .= " --KneserNey $ttable_file.half.f2e.gz.coc" if $KNESER_NEY;
$cmd .= " --SourceLabels $_GHKM_SOURCE_LABELS_FILE" if $_GHKM_SOURCE_LABELS && defined($_GHKM_SOURCE_LABELS_FILE);
$cmd .= " --PartsOfSpeech $_GHKM_PARTS_OF_SPEECH_FILE" if $_GHKM_PARTS_OF_SPEECH && defined($_GHKM_PARTS_OF_SPEECH_FILE);
-
+
$cmd .= " | $GZIP_EXEC -c > $ttable_file.gz";
-
+
safesystem($cmd) or die "ERROR: Consolidating the two phrase table halves failed";
if (! $debug) { safesystem("rm -f $ttable_file.half.*") or die("ERROR"); }
}
@@ -1849,7 +1849,7 @@ sub get_reordering_factored {
foreach my $factor (split(/\+/,$___REORDERING_FACTORS)) {
print STDERR "(7.1) [$factor] learn reordering model @ ".`date`;
my ($factor_f,$factor_e) = split(/\-/,$factor);
- # foreach my $model (@REORDERING_MODELS) {
+ # foreach my $model (@REORDERING_MODELS) {
# my $file = "$___MODEL_DIR/reordering-table.$factor";
# $file .= $model->{"all"};
# $file = shift @SPECIFIED_TABLE if scalar(@SPECIFIED_TABLE);
@@ -1860,7 +1860,7 @@ sub get_reordering_factored {
$file .= ".";
&get_reordering("$___EXTRACT_FILE.$factor",$file);
}
- }
+ }
}
else {
print STDERR " ... skipping this step, reordering is not lexicalized ...\n";
@@ -1870,9 +1870,9 @@ sub get_reordering_factored {
sub get_reordering {
my ($extract_file,$reo_model_path) = @_;
my $smooth = $___REORDERING_SMOOTH;
-
+
print STDERR "(7.2) building tables @ ".`date`;
-
+
#create cmd string for lexical reordering scoring
my $cmd = "$LEXICAL_REO_SCORER $extract_file.o.sorted.gz $smooth $reo_model_path";
$cmd .= " --SmoothWithCounts" if ($smooth =~ /(.+)u$/);
@@ -1891,10 +1891,10 @@ sub get_reordering {
}
$cmd .= "\"";
}
-
+
#Call the lexical reordering scorer
safesystem("$cmd") or die "ERROR: Lexical reordering scoring failed";
-
+
}
@@ -1917,7 +1917,7 @@ sub get_generation_factored {
$type = shift @TYPE if scalar @TYPE;
&get_generation($file,$type,$factor,$factor_e_source,$factor_e,$corpus);
}
- }
+ }
else {
print STDERR " no generation model requested, skipping step\n";
}
@@ -1930,7 +1930,7 @@ sub get_generation {
my (%WORD_TRANSLATION,%TOTAL_FOREIGN,%TOTAL_ENGLISH);
my %INCLUDE_SOURCE;
- foreach my $factor (split(/,/,$factor_e_source)) {
+ foreach my $factor (split(/,/,$factor_e_source)) {
$INCLUDE_SOURCE{$factor} = 1;
}
my %INCLUDE;
@@ -1958,14 +1958,14 @@ sub get_generation {
$target .= $___FACTOR_DELIMITER unless $first_factor;
$first_factor = 0;
$target .= $FACTOR[$factor];
- }
+ }
$GENERATION{$source}{$target}++;
$GENERATION_TOTAL_SOURCE{$source}++;
$GENERATION_TOTAL_TARGET{$target}++;
}
- }
+ }
close(E);
-
+
open(GEN,">$file") or die "ERROR: Can't write $file";
foreach my $source (keys %GENERATION) {
foreach my $target (keys %{$GENERATION{$source}}) {
@@ -1986,7 +1986,7 @@ sub get_generation {
sub create_ini {
print STDERR "(9) create moses.ini @ ".`date`;
-
+
&full_path(\$___MODEL_DIR);
&full_path(\$___VCB_E);
&full_path(\$___VCB_F);
@@ -1996,7 +1996,7 @@ sub create_ini {
### MOSES CONFIG FILE ###
#########################
\n";
-
+
if (defined $___TRANSLATION_FACTORS) {
print INI "# input factors\n";
print INI "[input-factors]\n";
@@ -2005,7 +2005,7 @@ sub create_ini {
my ($factor_list, $output) = split /-+/, $table;
foreach (split(/,/,$factor_list)) {
$INPUT_FACTOR_MAX = $_ if $_>$INPUT_FACTOR_MAX;
- }
+ }
}
$INPUT_FACTOR_MAX = $_INPUT_FACTOR_MAX if $_INPUT_FACTOR_MAX; # use specified, if exists
for (my $c = 0; $c <= $INPUT_FACTOR_MAX; $c++) { print INI "$c\n"; }
@@ -2022,7 +2022,7 @@ sub create_ini {
foreach (split(/:/,$___DECODING_STEPS)) {
my $first_ttable_flag = 1;
foreach (split(/,/,$_)) {
- s/t/T /g;
+ s/t/T /g;
s/g/G /g;
my ($type, $num) = split /\s+/;
if ($first_ttable_flag && $type eq "T") {
@@ -2035,8 +2035,8 @@ sub create_ini {
$path++;
}
print INI "1 T 1\n" if $_GLUE_GRAMMAR;
-
- print INI "1 T 1\n" if $_TRANSLITERATION_PHRASE_TABLE;
+
+ print INI "1 T 1\n" if $_TRANSLITERATION_PHRASE_TABLE;
if (defined($_DECODING_GRAPH_BACKOFF)) {
$_DECODING_GRAPH_BACKOFF =~ s/\s+/ /g;
@@ -2069,7 +2069,7 @@ sub create_ini {
my $count = `cut -d\\ -f 2 $file | sort | uniq | wc -l`;
$basic_weight_count += $count if $method eq "Indicator" || $method eq "Ratio";
$basic_weight_count += 2**$count-1 if $method eq "Subset";
- }
+ }
$basic_weight_count++ if $_PCFG;
$basic_weight_count+=4 if $_FLEXIBILITY_SCORE;
$basic_weight_count+=2 if $_FLEXIBILITY_SCORE && $_HIERARCHICAL;
@@ -2134,8 +2134,8 @@ sub create_ini {
if ($_TRANSLITERATION_PHRASE_TABLE) {
$feature_spec .= "PhraseDictionaryMemory name=TranslationModel$i table-limit=100 num-features=4 path=$_TRANSLITERATION_PHRASE_TABLE input-factor=0 output-factor=0\n";
$weight_spec .= "TranslationModel$i= 0.2 0.2 0.2 0.2\n";
- $i++;
- }
+ $i++;
+ }
# glue grammar
if ($_GLUE_GRAMMAR) {
@@ -2185,7 +2185,7 @@ sub create_ini {
# lexicalized reordering model
if ($___REORDERING ne "distance") {
my $i = 0;
-
+
my @SPECIFIED_TABLE = @_REORDERING_TABLE;
foreach my $factor (split(/\+/,$___REORDERING_FACTORS)) {
my ($input_factor,$output_factor) = split(/\-/,$factor);
@@ -2196,7 +2196,7 @@ sub create_ini {
$table_file .= ".";
$table_file .= $model->{"filename"};
$table_file .= ".gz";
- $feature_spec .= "LexicalReordering name=LexicalReordering$i num-features=".$model->{"numfeatures"}." type=".$model->{"config"}." input-factor=$input_factor output-factor=$output_factor path=$table_file".(defined($_LEXICAL_REORDERING_DEFAULT_SCORES)?" default-scores=$_LEXICAL_REORDERING_DEFAULT_SCORES":"")."\n";
+ $feature_spec .= "LexicalReordering name=LexicalReordering$i num-features=".$model->{"numfeatures"}." type=".$model->{"config"}." input-factor=$input_factor output-factor=$output_factor path=$table_file".(defined($_LEXICAL_REORDERING_DEFAULT_SCORES)?" default-scores=$_LEXICAL_REORDERING_DEFAULT_SCORES":"")."\n";
$weight_spec .= "LexicalReordering$i=";
for(my $j=0;$j<$model->{"numfeatures"};$j++) { $weight_spec .= " 0.3"; }
$weight_spec .= "\n";
@@ -2220,11 +2220,11 @@ sub create_ini {
if($count == 0){
$feature_spec .= "OpSequenceModel name=OpSequenceModel$count num-features=5 path=". $_OSM . $factor_val . "/operationLM.bin" . " input-factor=". $factor_f . " output-factor=". $factor_e . " support-features=yes \n";
- $weight_spec .= "OpSequenceModel$count= 0.08 -0.02 0.02 -0.001 0.03\n";
+ $weight_spec .= "OpSequenceModel$count= 0.08 -0.02 0.02 -0.001 0.03\n";
}
else{
$feature_spec .= "OpSequenceModel name=OpSequenceModel$count num-features=1 path=". $_OSM . $factor_val . "/operationLM.bin" . " input-factor=". $factor_f . " output-factor=". $factor_e . " support-features=no \n";
- $weight_spec .= "OpSequenceModel$count= 0.08 \n";
+ $weight_spec .= "OpSequenceModel$count= 0.08 \n";
}
$count++;
@@ -2235,7 +2235,7 @@ sub create_ini {
$feature_spec .= "OpSequenceModel name=OpSequenceModel0 num-features=5 path=". $_OSM . " \n";
$weight_spec .= "OpSequenceModel0= 0.08 -0.02 0.02 -0.001 0.03\n";
}
- }
+ }
# distance-based reordering
if (!$_HIERARCHICAL) {
@@ -2267,14 +2267,14 @@ sub create_ini {
die "Unknown numeric LM type given: $type";
}
}
-
+
my $lm_oov_prob = 0.1;
-
+
if ($_POST_DECODING_TRANSLIT || $_TRANSLITERATION_PHRASE_TABLE){
$lm_oov_prob = -100.0;
$_LMODEL_OOV_FEATURE = "yes";
- }
-
+ }
+
$feature_spec .= "$type name=LM$i factor=$f path=$fn order=$o\n";
$weight_spec .= "LM$i= 0.5".($_LMODEL_OOV_FEATURE?" $lm_oov_prob":"")."\n";
$i++;
@@ -2394,5 +2394,3 @@ sub open_or_zcat {
open($hdl,$read) or die "Can't read $fn ($read)";
return $hdl;
}
-
-
diff --git a/scripts/training/wrappers/berkeleyparsed2mosesxml.perl b/scripts/training/wrappers/berkeleyparsed2mosesxml.perl
index 3dd8fc4ac..232cfefab 100755
--- a/scripts/training/wrappers/berkeleyparsed2mosesxml.perl
+++ b/scripts/training/wrappers/berkeleyparsed2mosesxml.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -22,13 +22,13 @@ while(<STDIN>) {
s/\"/\&quot;/g; # xml
s/\[/\&#91;/g; # syntax non-terminal
s/\]/\&#93;/g; # syntax non-terminal
-
+
# escape parentheses that were part of the input text
s/(\(\S+ )\(\)/$1\&openingparenthesis;\)/g;
s/(\(\S+ )\)\)/$1\&closingparenthesis;\)/g;
-
+
# convert into tree
s/\((\S+) /<tree label=\"$1\"> /g;
s/\)/ <\/tree> /g;
@@ -38,7 +38,7 @@ while(<STDIN>) {
s/\-RRB\-/\)/g;
s/ +/ /g;
s/ $//g;
-
+
# de-escape parentheses that were part of the input text
s/\&openingparenthesis;/\(/g;
s/\&closingparenthesis;/\)/g;
diff --git a/scripts/training/wrappers/berkeleyparsed2mosesxml_PTB.perl b/scripts/training/wrappers/berkeleyparsed2mosesxml_PTB.perl
index e61a53652..9e8c30d42 100755
--- a/scripts/training/wrappers/berkeleyparsed2mosesxml_PTB.perl
+++ b/scripts/training/wrappers/berkeleyparsed2mosesxml_PTB.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -31,7 +31,7 @@ while(<STDIN>) {
s/(\(\S+ )\)\)/$1\&closingparenthesis;\)/g;
-
+
# convert into tree
s/\((\S+) /<tree label=\"$1\"> /g;
s/\)/ <\/tree> /g;
@@ -41,7 +41,7 @@ while(<STDIN>) {
s/\-RRB\-/\)/g;
s/ +/ /g;
s/ $//g;
-
+
# de-escape parentheses that were part of the input text
s/\&openingparenthesis;/\(/g;
s/\&closingparenthesis;/\)/g;
diff --git a/scripts/training/wrappers/filter-excluded-lines.perl b/scripts/training/wrappers/filter-excluded-lines.perl
index 7f9da3efa..dff104dba 100755
--- a/scripts/training/wrappers/filter-excluded-lines.perl
+++ b/scripts/training/wrappers/filter-excluded-lines.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -29,11 +29,11 @@ if (@excludedLines > 0)
while(<STDIN>)
{
my $line = $_;
-
+
if ($nextLineExcl == $lineNum)
{
$exclInd++;
- if ($exclInd < @excludedLines)
+ if ($exclInd < @excludedLines)
{
$nextLineExcl = $excludedLines[$exclInd];
}
@@ -43,7 +43,7 @@ while(<STDIN>)
print $line;
$linesOut++;
}
-
+
$lineNum++;
}
#close(STDIN);
diff --git a/scripts/training/wrappers/find-unparseable.perl b/scripts/training/wrappers/find-unparseable.perl
index b0d38027b..00009e2e9 100755
--- a/scripts/training/wrappers/find-unparseable.perl
+++ b/scripts/training/wrappers/find-unparseable.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/training/wrappers/mada-wrapper.perl b/scripts/training/wrappers/mada-wrapper.perl
index 20f76f821..f2cf14f40 100755
--- a/scripts/training/wrappers/mada-wrapper.perl
+++ b/scripts/training/wrappers/mada-wrapper.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -20,7 +20,7 @@ my ($dummy, $tmpfile) = tempfile("mada-in-XXXX", DIR=>$TMPDIR, UNLINK=>!$KEEP_TM
print STDERR $tmpfile."\n";
open(TMP,">$tmpfile");
-while(<STDIN>) {
+while(<STDIN>) {
print TMP $_;
}
close(TMP);
diff --git a/scripts/training/wrappers/madamira-tok.perl b/scripts/training/wrappers/madamira-tok.perl
index 00639b7a7..37e70079e 100755
--- a/scripts/training/wrappers/madamira-tok.perl
+++ b/scripts/training/wrappers/madamira-tok.perl
@@ -56,7 +56,7 @@ my $infile = "$TMPDIR/input";
print STDERR $infile."\n";
open(TMP,">$infile");
-while(<STDIN>) {
+while(<STDIN>) {
print TMP $_;
}
close(TMP);
@@ -65,7 +65,7 @@ my $cmd;
if ($USE_PARALLEL) {
# split input file
- my $SPLIT_EXEC = `gsplit --help 2>/dev/null`;
+ my $SPLIT_EXEC = `gsplit --help 2>/dev/null`;
if($SPLIT_EXEC) {
$SPLIT_EXEC = 'gsplit';
}
@@ -97,7 +97,7 @@ else {
# get stuff out of mada output
open(MADA_OUT,"<$infile.mada");
#binmode(MADA_OUT, ":utf8");
-while(my $line = <MADA_OUT>) {
+while(my $line = <MADA_OUT>) {
chomp($line);
print "$line\n";
}
diff --git a/scripts/training/wrappers/madamira-wrapper.perl b/scripts/training/wrappers/madamira-wrapper.perl
index 1e6b63225..6535b6187 100755
--- a/scripts/training/wrappers/madamira-wrapper.perl
+++ b/scripts/training/wrappers/madamira-wrapper.perl
@@ -50,7 +50,7 @@ my $infile = "$TMPDIR/input";
print STDERR $infile."\n";
open(TMP,">$infile");
-while(<STDIN>) {
+while(<STDIN>) {
print TMP $_;
}
close(TMP);
@@ -58,7 +58,7 @@ close(TMP);
my $cmd;
# split input file
-my $SPLIT_EXEC = `gsplit --help 2>/dev/null`;
+my $SPLIT_EXEC = `gsplit --help 2>/dev/null`;
if($SPLIT_EXEC) {
$SPLIT_EXEC = 'gsplit';
}
@@ -80,7 +80,7 @@ print STDERR "Executing: $cmd\n";
# get stuff out of mada output
open(MADA_OUT,"<$infile.mada");
#binmode(MADA_OUT, ":utf8");
-while(my $line = <MADA_OUT>) {
+while(my $line = <MADA_OUT>) {
chomp($line);
#print STDERR "line=$line \n";
@@ -93,11 +93,11 @@ while(my $line = <MADA_OUT>) {
# word
my $word = substr($line, 7, length($line) - 8);
#print STDERR "FOund $word\n";
-
+
for (my $i = 0; $i < 4; ++$i) {
$line = <MADA_OUT>;
}
-
+
my $factors = GetFactors($line, \@FACTORS);
$word .= $factors;
@@ -140,7 +140,7 @@ sub GetFactors
$ret .= "|$value";
}
-
+
return $ret;
}
diff --git a/scripts/training/wrappers/make-factor-brown-cluster-mkcls.perl b/scripts/training/wrappers/make-factor-brown-cluster-mkcls.perl
index 35714271c..1e3a1ce3f 100755
--- a/scripts/training/wrappers/make-factor-brown-cluster-mkcls.perl
+++ b/scripts/training/wrappers/make-factor-brown-cluster-mkcls.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/training/wrappers/make-factor-de-pos.perl b/scripts/training/wrappers/make-factor-de-pos.perl
index 2eadd4123..495517352 100755
--- a/scripts/training/wrappers/make-factor-de-pos.perl
+++ b/scripts/training/wrappers/make-factor-de-pos.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/training/wrappers/make-factor-en-pos.mxpost.perl b/scripts/training/wrappers/make-factor-en-pos.mxpost.perl
index 0d27aa12f..4aa66bac6 100755
--- a/scripts/training/wrappers/make-factor-en-pos.mxpost.perl
+++ b/scripts/training/wrappers/make-factor-en-pos.mxpost.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -22,7 +22,7 @@ while(<TAGGER>) {
foreach my $word_pos (split) {
$word_pos =~ s/\/([^\/]+)$/_$1/;
$word_pos = "//_:" if $word_pos eq "//";
- print STDERR "faulty POS tag: $word_pos\n"
+ print STDERR "faulty POS tag: $word_pos\n"
unless $word_pos =~ /^.+_([^_]+)$/;
print OUT "$1 ";
}
diff --git a/scripts/training/wrappers/make-factor-pos.tree-tagger.perl b/scripts/training/wrappers/make-factor-pos.tree-tagger.perl
index 2af6eb75c..0ad04d4de 100755
--- a/scripts/training/wrappers/make-factor-pos.tree-tagger.perl
+++ b/scripts/training/wrappers/make-factor-pos.tree-tagger.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -31,12 +31,12 @@ $MODEL = "greek" if $LANGUAGE eq "el";
die("Unknown language '$LANGUAGE'") unless defined($MODEL);
$MODEL = $TREE_TAGGER."/lib/".$MODEL.".par";
-# define encoding conversion into Latin1 or Greek if required
+# define encoding conversion into Latin1 or Greek if required
my $CONV = "";
-#$CONV = "iconv --unicode-subst=X -f utf8 -t iso-8859-1|"
-$CONV = "perl -ne 'use Encode; print encode(\"iso-8859-1\", decode(\"utf8\", \$_));' |"
+#$CONV = "iconv --unicode-subst=X -f utf8 -t iso-8859-1|"
+$CONV = "perl -ne 'use Encode; print encode(\"iso-8859-1\", decode(\"utf8\", \$_));' |"
unless $MODEL =~ /utf8/ || $LANGUAGE eq "bg";
-$CONV = "perl -ne 'use Encode; print encode(\"iso-8859-7\", decode(\"utf8\", \$_));' |"
+$CONV = "perl -ne 'use Encode; print encode(\"iso-8859-7\", decode(\"utf8\", \$_));' |"
if $LANGUAGE eq "el";
# pipe in data into tagger, process its output
diff --git a/scripts/training/wrappers/make-factor-stem.perl b/scripts/training/wrappers/make-factor-stem.perl
index 60aca0b34..662f1d882 100755
--- a/scripts/training/wrappers/make-factor-stem.perl
+++ b/scripts/training/wrappers/make-factor-stem.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/training/wrappers/make-factor-suffix.perl b/scripts/training/wrappers/make-factor-suffix.perl
index 7e864ea0c..6a59254e4 100755
--- a/scripts/training/wrappers/make-factor-suffix.perl
+++ b/scripts/training/wrappers/make-factor-suffix.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -17,7 +17,7 @@ while(<IN>) {
if (length($word) > $size) {
$word = substr($word,length($word)-$size);
}
- print OUT " " unless $first;
+ print OUT " " unless $first;
$first = 0;
print OUT lc($word);
}
diff --git a/scripts/training/wrappers/morfessor-wrapper.perl b/scripts/training/wrappers/morfessor-wrapper.perl
index b0debe38c..b20190cd4 100755
--- a/scripts/training/wrappers/morfessor-wrapper.perl
+++ b/scripts/training/wrappers/morfessor-wrapper.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/training/wrappers/mosesxml2berkeleyparsed.perl b/scripts/training/wrappers/mosesxml2berkeleyparsed.perl
index fc1f0c532..e929658ff 100755
--- a/scripts/training/wrappers/mosesxml2berkeleyparsed.perl
+++ b/scripts/training/wrappers/mosesxml2berkeleyparsed.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/training/wrappers/parse-de-berkeley.perl b/scripts/training/wrappers/parse-de-berkeley.perl
index 68df07c49..596fb3eff 100755
--- a/scripts/training/wrappers/parse-de-berkeley.perl
+++ b/scripts/training/wrappers/parse-de-berkeley.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -18,7 +18,7 @@ die("ERROR: syntax is: parse-de-berkeley.perl [-split-hyphen] [-split-slash] [-m
'mark-split' => \$MARK_SPLIT,
'binarize' => \$BINARIZE,
'unparseable' => \$UNPARSEABLE
-
+
)
&& defined($JAR) && defined($GRAMMAR);
@@ -76,7 +76,7 @@ while(<PARSE>) {
#print STDERR "outLine=$outLine" .length($outLine) ."\n";
if ($UNPARSEABLE == 1 && length($outLine) == 1) {
- print $unparsedLine;
+ print $unparsedLine;
}
else {
print $outLine;
diff --git a/scripts/training/wrappers/parse-de-bitpar.perl b/scripts/training/wrappers/parse-de-bitpar.perl
index 4723d6aa0..1bbcf5329 100755
--- a/scripts/training/wrappers/parse-de-bitpar.perl
+++ b/scripts/training/wrappers/parse-de-bitpar.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
@@ -51,13 +51,13 @@ while(<INPUT>)
}
print $TMP $_."\n";
- $hasWords = 1;
+ $hasWords = 1;
}
if ($hasWords == 0) {
print $TMP " \n";
}
-
+
print $TMP "\n";
}
close($TMP);
@@ -76,7 +76,7 @@ while(my $line = <PARSER>) {
if ($line =~ /^No parse for/) {
if ($UNPARSEABLE) {
my $len = length($line);
- $line = substr($line, 15, $len - 17);
+ $line = substr($line, 15, $len - 17);
$line = escape($line);
print $line;
}
@@ -90,7 +90,7 @@ while(my $line = <PARSER>) {
for(my $i=0;$i<length($line);$i++) {
# print STDERR substr($line,$i)."\n";
if (substr($line,$i,4) eq "(*T*") {
- my ($trace,$rest) = split(/\)/,substr($line,$i+1));
+ my ($trace,$rest) = split(/\)/,substr($line,$i+1));
$i+=length($trace)+2;
$i++ if substr($line,$i+1,1) eq " ";
die("ERROR: NO LABEL FOR TRACE") unless @LABEL;
diff --git a/scripts/training/wrappers/parse-en-collins.perl b/scripts/training/wrappers/parse-en-collins.perl
index 27b33a2dd..252d3d2b7 100755
--- a/scripts/training/wrappers/parse-en-collins.perl
+++ b/scripts/training/wrappers/parse-en-collins.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/training/wrappers/parse-en-egret.perl b/scripts/training/wrappers/parse-en-egret.perl
index c3d23a4ee..9f434063b 100755
--- a/scripts/training/wrappers/parse-en-egret.perl
+++ b/scripts/training/wrappers/parse-en-egret.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/training/wrappers/syntax-hyphen-splitting.perl b/scripts/training/wrappers/syntax-hyphen-splitting.perl
index 1bb616939..653b410d0 100755
--- a/scripts/training/wrappers/syntax-hyphen-splitting.perl
+++ b/scripts/training/wrappers/syntax-hyphen-splitting.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
diff --git a/scripts/training/wrappers/tagger-german-chunk.perl b/scripts/training/wrappers/tagger-german-chunk.perl
index 4f26efabe..c57031889 100755
--- a/scripts/training/wrappers/tagger-german-chunk.perl
+++ b/scripts/training/wrappers/tagger-german-chunk.perl
@@ -1,11 +1,11 @@
-#!/usr/bin/env perl
+#!/usr/bin/env perl
use warnings;
use strict;
use Getopt::Long "GetOptions";
-# split -a 5 -d ../europarl.clean.5.de
-# ls -1 x????? | ~/workspace/coreutils/parallel/src/parallel /home/s0565741/workspace/treetagger/cmd/run-tagger-chunker-german.sh
+# split -a 5 -d ../europarl.clean.5.de
+# ls -1 x????? | ~/workspace/coreutils/parallel/src/parallel /home/s0565741/workspace/treetagger/cmd/run-tagger-chunker-german.sh
# cat x?????.out > ../out
my $chunkedPath;
@@ -40,11 +40,11 @@ if (!defined($chunkedPath)) {
print STDERR "must defined -tree-tagger \n";
exit(1);
}
-
+
$chunkedPath = "$TMPDIR/chunked";
print STDERR "chunkedPath not defined. Now $chunkedPath \n";
my $cmd = "$treetaggerPath/cmd/tagger-chunker-german-utf8 < $inPath > $chunkedPath";
- `$cmd`;
+ `$cmd`;
}
# convert chunked file into Moses XML
@@ -83,7 +83,7 @@ while(my $chunkLine = <CHUNKED>) {
else {
# beginning of tag
if ($wordPos == ($numWords - 1)) {
- # closing bracket of last word in sentence
+ # closing bracket of last word in sentence
print "\n";
$sentence = <IN>;
chomp($sentence);
diff --git a/vw/README.md b/vw/README.md
index fcbe40bea..66a0c4a2c 100644
--- a/vw/README.md
+++ b/vw/README.md
@@ -7,7 +7,7 @@ function.
Compatible with this frozen version of VW:
https://github.com/moses-smt/vowpal_wabbit
-
+
To enable VW, you need to provide a path where VW was installed (using `make install`) to bjam:
./bjam --with-vw=<path/to/vw/installation>
@@ -17,7 +17,7 @@ Implemented classifier features
* `VWFeatureSourceBagOfWords`: This creates a feature of form bow^token for every
source sentence token.
-* `VWFeatureSourceExternalFeatures column=0`: when used with -inputtype 5 (`TabbedSentence`) this can be used to supply additional feature to VW. The input is a tab-separated file, the first column is the usual input sentence, all other columns can be used for meta-data. Parameter column=0 counts beginning with the first column that is not the input sentence.
+* `VWFeatureSourceExternalFeatures column=0`: when used with -inputtype 5 (`TabbedSentence`) this can be used to supply additional feature to VW. The input is a tab-separated file, the first column is the usual input sentence, all other columns can be used for meta-data. Parameter column=0 counts beginning with the first column that is not the input sentence.
* `VWFeatureSourceIndicator`: Ass a feature for the whole source phrase.
* `VWFeatureSourcePhraseInternal`: Adds a separate feature for every word of the source phrase.
* `VWFeatureSourceWindow size=3`: Adds source words in a window of size 3 before and after the source phrase as features. These do not overlap with `VWFeatureSourcePhraseInternal`.
@@ -36,7 +36,7 @@ To use the classifier edit your moses.ini
VWFeatureTargetIndicator
VWFeatureSourceIndicator
...
-
+
[weights]
...
VW0= 0.2
@@ -47,12 +47,12 @@ features which classifier they belong to:
[features]
...
- VW name=bart path=/home/username/vw/classifier1.vw
+ VW name=bart path=/home/username/vw/classifier1.vw
VWFeatureSourceBagOfWords used-by=bart
VWFeatureTargetIndicator used-by=bart
VWFeatureSourceIndicator used-by=bart
...
-
+
[weights]
...
bart= 0.2
@@ -62,14 +62,14 @@ You can also use multiple classifiers:
[features]
...
- VW name=bart path=/home/username/vw/classifier1.vw
+ VW name=bart path=/home/username/vw/classifier1.vw
VW path=/home/username/vw/classifier2.vw
VW path=/home/username/vw/classifier3.vw
- VWFeatureSourceBagOfWords used-by=bart,VW0
+ VWFeatureSourceBagOfWords used-by=bart,VW0
VWFeatureTargetIndicator used-by=VW1,VW0,bart
VWFeatureSourceIndicator used-by=bart,VW1
...
-
+
[weights]
...
bart= 0.2
@@ -78,7 +78,7 @@ You can also use multiple classifiers:
...
Features can use any combination of factors. Provide a comma-delimited list of factors in the `source-factors` or `target-factors` variables to override the default setting (`0`, i.e. the first factor).
-
+
Training the classifier
-----------------------
@@ -94,7 +94,7 @@ Use Moses format for the word alignment (`0-0 1-0` etc.). Set the input type to
Configure your features in the `moses.ini` file (see above) and set the `train` flag:
[features]
- ...
+ ...
VW name=bart path=/home/username/vw/features.txt train=1
...