Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/moses-smt/mosesdecoder.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorrafpayen <rafpayen@1f5c12ca-751b-0410-a591-d2e778427230>2011-06-16 21:24:25 +0400
committerrafpayen <rafpayen@1f5c12ca-751b-0410-a591-d2e778427230>2011-06-16 21:24:25 +0400
commitcdc4179ce130fceba77422f774554cb1073c906b (patch)
treed8ea486f53f5905829b72cda752776c28caf6a33 /scripts/tokenizer
parent85283f5bee9edfd5ca117cf225ffe3e72480fcbd (diff)
Add a space before double punctuation signs in French
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@4016 1f5c12ca-751b-0410-a591-d2e778427230
Diffstat (limited to 'scripts/tokenizer')
-rwxr-xr-xscripts/tokenizer/detokenizer.perl3
1 files changed, 3 insertions, 0 deletions
diff --git a/scripts/tokenizer/detokenizer.perl b/scripts/tokenizer/detokenizer.perl
index 3538fd6db..0b8e5af73 100755
--- a/scripts/tokenizer/detokenizer.perl
+++ b/scripts/tokenizer/detokenizer.perl
@@ -78,6 +78,9 @@ sub detokenize {
$text = $text.$prependSpace.$words[$i];
$prependSpace = "";
} elsif ($words[$i] =~ /^[\,\.\?\!\:\;\\\%\}\]\)]+$/){
+ if (($language eq "fr") && ($words[$i] =~ /^[\?\!\:\;\\\%]$/)) {
+ #these punctuations are prefixed with a non-breakable space in french
+ $text .= " "; }
#perform left shift on punctuation items
$text=$text.$words[$i];
$prependSpace = " ";