Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/moses-smt/mosesdecoder.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAchim <achim@achim01.(none)>2013-03-19 01:17:35 +0400
committerAchim <achim@achim01.(none)>2013-03-19 01:17:35 +0400
commit038871fdb36359e1ec2027de1ef0227162f5473a (patch)
tree23b681f76040f1b7a949d047a9a4ec7e78b772fa /scripts/share/nonbreaking_prefixes/nonbreaking_prefix.lv
parentdbfd4847ceb5be12d0cfe303b0750194ffeff389 (diff)
Hungarian and Latvian non-breaking prefix files
Diffstat (limited to 'scripts/share/nonbreaking_prefixes/nonbreaking_prefix.lv')
-rw-r--r--scripts/share/nonbreaking_prefixes/nonbreaking_prefix.lv100
1 files changed, 100 insertions, 0 deletions
diff --git a/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.lv b/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.lv
new file mode 100644
index 000000000..81754a17a
--- /dev/null
+++ b/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.lv
@@ -0,0 +1,100 @@
+#Anything in this file, followed by a period (and an upper-case word), does NOT indicate an end-of-sentence marker.
+#Special cases are included for prefixes that ONLY appear before 0-9 numbers.
+
+#any single upper case letter followed by a period is not a sentence ender (excluding I occasionally, but we leave it in)
+#usually upper case letters are initials in a name
+A
+B
+C
+D
+E
+F
+G
+H
+I
+J
+K
+L
+M
+N
+O
+P
+Q
+R
+S
+T
+U
+V
+W
+X
+Y
+Z
+
+#List of titles. These are often followed by upper-case names, but do not indicate sentence breaks
+dr
+Dr
+med
+prof
+Prof
+inž
+Inž
+ist.loc
+Ist.loc
+kor.loc
+Kor.loc
+v.i
+vietn
+Vietn
+
+#misc - odd period-ending items that NEVER indicate breaks (p.m. does NOT fall into this category - it sometimes ends a sentence)
+a.l
+t.p
+pārb
+Pārb
+vec
+Vec
+inv
+Inv
+sk
+Sk
+spec
+Spec
+vienk
+Vienk
+virz
+Virz
+māksl
+Māksl
+mūz
+Mūz
+akad
+Akad
+soc
+Soc
+galv
+Galv
+vad
+Vad
+sertif
+Sertif
+folkl
+Folkl
+hum
+Hum
+
+#Numbers only. These should only induce breaks when followed by a numeric sequence
+# add NUMERIC_ONLY after the word for this function
+#This case is mostly for the english "No." which can either be a sentence of its own, or
+#if followed by a number, a non-breaking prefix
+Nr #NUMERIC_ONLY#