Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/moses-smt/mosesdecoder.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKenneth Heafield <github@kheafield.com>2020-08-03 23:51:09 +0300
committerKenneth Heafield <github@kheafield.com>2020-08-03 23:51:09 +0300
commit78ca5f3cc5aa671a8a5d36c56452e217e6f00828 (patch)
treed5c5ec74eb777bfcadcd850f516f619ffaa2c0f0 /scripts
parentd65d392d468bd1006b1b820aeedf84e9ade65ea3 (diff)
Allow Arabic letters to begin a fa sentence
Diffstat (limited to 'scripts')
-rwxr-xr-xscripts/ems/support/split-sentences.perl1
1 files changed, 1 insertions, 0 deletions
diff --git a/scripts/ems/support/split-sentences.perl b/scripts/ems/support/split-sentences.perl
index 206b7ebe9..5df22cdc9 100755
--- a/scripts/ems/support/split-sentences.perl
+++ b/scripts/ems/support/split-sentences.perl
@@ -141,6 +141,7 @@ sub preprocess {
$sentence_start .= "\\p{Block: Tamil}" if $language eq "ta";
$sentence_start .= "\\p{Block: Telugu}" if $language eq "te";
$sentence_start .= "\\p{Block: Hangul}\\p{Block: Hangul_Compatibility_Jamo}\\p{Block: Hangul_Jamo}\\p{Block: Hangul_Jamo_Extended_A}\\p{Block: Hangul_Jamo_Extended_B}" if $language eq "ko";
+ $sentence_start .= "\\p{Arabic}" if $language eq "fa";
# we include danda and double danda (U+0964 and U+0965) as sentence split characters