Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/moses-smt/mosesdecoder.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorbgottesman <bgottesman@1f5c12ca-751b-0410-a591-d2e778427230>2011-08-08 19:02:56 +0400
committerbgottesman <bgottesman@1f5c12ca-751b-0410-a591-d2e778427230>2011-08-08 19:02:56 +0400
commit14587cdafc42cdbff9221c07b5545551ecec475b (patch)
tree94072fc1768258fc972820f41c16a0551e29ac71 /scripts/tokenizer
parent79142d18e644325e0a870610d54652d9618b60de (diff)
fix a detokenization bug that was preventing the removal of the whitespace following a contracted French or Italian article/pronoun (e.g. "l' immigration") when the contraction was the second-last word in the segment
remove the expectation of failure on the corresponding unit test git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@4133 1f5c12ca-751b-0410-a591-d2e778427230
Diffstat (limited to 'scripts/tokenizer')
-rwxr-xr-xscripts/tokenizer/detokenizer.perl2
1 files changed, 1 insertions, 1 deletions
diff --git a/scripts/tokenizer/detokenizer.perl b/scripts/tokenizer/detokenizer.perl
index 0b8e5af73..f049b8080 100755
--- a/scripts/tokenizer/detokenizer.perl
+++ b/scripts/tokenizer/detokenizer.perl
@@ -92,7 +92,7 @@ sub detokenize {
#left-shift floats in Czech
$text=$text.$words[$i];
$prependSpace = " ";
- } elsif ((($language eq "fr") ||($language eq "it")) && ($i<(scalar(@words)-2)) && ($words[$i] =~ /[\p{IsAlpha}][\']$/) && ($words[$i+1] =~ /^[\p{IsAlpha}]/)) {
+ } elsif ((($language eq "fr") ||($language eq "it")) && ($i<=(scalar(@words)-2)) && ($words[$i] =~ /[\p{IsAlpha}][\']$/) && ($words[$i+1] =~ /^[\p{IsAlpha}]/)) {
#right-shift the contraction for French and Italian
$text = $text.$prependSpace.$words[$i];
$prependSpace = "";