Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/moses-smt/mosesdecoder.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLinas Vepstas <linasvepstas@gmail.com>2017-01-05 21:53:21 +0300
committerLinas Vepstas <linasvepstas@gmail.com>2017-01-05 21:53:21 +0300
commitbd9d12351b6a5648d45b54c9293e2c1d5fc20722 (patch)
treee2e64764395c9930380ecdd0ecf7b6c9c943e609 /scripts/share/nonbreaking_prefixes
parent1933bcbf33bf1726c66f4013def40ad4bba13c7d (diff)
Create a Cantonese version, distinct from Mandarin.
The content is identical, at this moment, but having distinct langauge suffixes solves processing-pipeline problems later on.
Diffstat (limited to 'scripts/share/nonbreaking_prefixes')
-rw-r--r--scripts/share/nonbreaking_prefixes/nonbreaking_prefix.yue53
-rw-r--r--scripts/share/nonbreaking_prefixes/nonbreaking_prefix.zh2
2 files changed, 54 insertions, 1 deletions
diff --git a/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.yue b/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.yue
new file mode 100644
index 000000000..37942ade9
--- /dev/null
+++ b/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.yue
@@ -0,0 +1,53 @@
+#
+# Cantonese (Chinese)
+#
+# Anything in this file, followed by a period,
+# does NOT indicate an end-of-sentence marker.
+#
+# English/Euro-language given-name initials (appearing in
+# news, periodicals, etc.)
+A
+B
+C
+D
+E
+F
+G
+H
+I
+J
+K
+L
+M
+N
+O
+P
+Q
+R
+S
+T
+U
+V
+W
+X
+Y
+Z
+
+# Numbers only. These should only induce breaks when followed by
+# a numeric sequence.
+# Add NUMERIC_ONLY after the word for this function. This case is
+# mostly for the english "No." which can either be a sentence of its
+# own, or if followed by a number, a non-breaking prefix.
+No #NUMERIC_ONLY#
+Nr #NUMERIC_ONLY#
diff --git a/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.zh b/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.zh
index 077710c87..df4c2ff88 100644
--- a/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.zh
+++ b/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.zh
@@ -1,5 +1,5 @@
#
-# Chinese (Mandarin, Cantonese)
+# Mandarin (Chinese)
#
# Anything in this file, followed by a period,
# does NOT indicate an end-of-sentence marker.