diff options
author | Kenneth Heafield <github@kheafield.com> | 2020-08-03 23:51:09 +0300 |
---|---|---|
committer | Kenneth Heafield <github@kheafield.com> | 2020-08-03 23:51:09 +0300 |
commit | 78ca5f3cc5aa671a8a5d36c56452e217e6f00828 (patch) | |
tree | d5c5ec74eb777bfcadcd850f516f619ffaa2c0f0 /scripts | |
parent | d65d392d468bd1006b1b820aeedf84e9ade65ea3 (diff) |
Allow Arabic letters to begin a fa sentence
Diffstat (limited to 'scripts')
-rwxr-xr-x | scripts/ems/support/split-sentences.perl | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/scripts/ems/support/split-sentences.perl b/scripts/ems/support/split-sentences.perl index 206b7ebe9..5df22cdc9 100755 --- a/scripts/ems/support/split-sentences.perl +++ b/scripts/ems/support/split-sentences.perl @@ -141,6 +141,7 @@ sub preprocess { $sentence_start .= "\\p{Block: Tamil}" if $language eq "ta"; $sentence_start .= "\\p{Block: Telugu}" if $language eq "te"; $sentence_start .= "\\p{Block: Hangul}\\p{Block: Hangul_Compatibility_Jamo}\\p{Block: Hangul_Jamo}\\p{Block: Hangul_Jamo_Extended_A}\\p{Block: Hangul_Jamo_Extended_B}" if $language eq "ko"; + $sentence_start .= "\\p{Arabic}" if $language eq "fa"; # we include danda and double danda (U+0964 and U+0965) as sentence split characters |