diff options
author | Kenneth Heafield <github@kheafield.com> | 2017-10-20 00:57:36 +0300 |
---|---|---|
committer | Kenneth Heafield <github@kheafield.com> | 2017-10-20 00:57:36 +0300 |
commit | 545eee7e75487aeaf45a8b077c57e189e50b2c2e (patch) | |
tree | 6c1436f6192bbf35ded19d9d3df1efe4e9653825 | |
parent | eced95d694cb0297ebaba3a66cd4ee3f4d97f3c6 (diff) |
Attempt to stop people from publishing non-comparable BLEU scores, as discussed in statmt meeting
-rwxr-xr-x | scripts/generic/multi-bleu.perl | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/scripts/generic/multi-bleu.perl b/scripts/generic/multi-bleu.perl index a25e347bb..15e26ff4a 100755 --- a/scripts/generic/multi-bleu.perl +++ b/scripts/generic/multi-bleu.perl @@ -168,6 +168,9 @@ printf "BLEU = %.2f, %.1f/%.1f/%.1f/%.1f (BP=%.3f, ratio=%.3f, hyp_len=%d, ref_l $length_translation, $length_reference; + +print STDERR "Do not publish scores from multi-bleu.perl. The scores depend on your tokenizer, which is unlikely to be reproducible from your paper or consistent across research groups. Instead you should detokenize then use mteval-v14.pl, which has a standard tokenization. Scores from multi-bleu.perl can still be used for internal purposes when you have a consistent tokenizer.\n"; + sub my_log { return -9999999999 unless $_[0]; return log($_[0]); |