Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/marian-nmt/marian-examples.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMarcin Junczys-Dowmunt <marcinjd@microsoft.com>2018-11-26 21:53:26 +0300
committerMarcin Junczys-Dowmunt <marcinjd@microsoft.com>2018-11-26 21:53:26 +0300
commit0d2b8fff001cb1da42659679d356d42bdc994ef9 (patch)
tree209e6b96a6d71616860c753fcb6d6b370d744717
parent39d1fa288891c17f0b5f4b02cf2a84b6ea942ef9 (diff)
change description of bleu-detok
-rw-r--r--training-basics-sentencepiece/README.md3
1 files changed, 2 insertions, 1 deletions
diff --git a/training-basics-sentencepiece/README.md b/training-basics-sentencepiece/README.md
index 8e97851..176ef1b 100644
--- a/training-basics-sentencepiece/README.md
+++ b/training-basics-sentencepiece/README.md
@@ -198,7 +198,8 @@ We can pass the Romanian-specific normalizaton rules via the `--sentencepiece-op
argument. The values of this option are passed on to the SentencePiece trainer, note the required single
quotes around the SentencePiece options: `--sentencepiece-options '--normalization_rule_tsv=data/norm_romanian.tsv'`.
-Another new feature is the `bleu-detok` validation metric. When used with SentencePiece this should
+Another new feature is the `bleu-detok` validation metric which can be used when SentencePiece support
+is compiled into Marian. Since SentencePiece is a reversible tokenizer, this should
give you in-training BLEU scores that are very close to sacreBLEU's scores. Differences may appear
if unexpected SentencePiece normalization rules are used. You should still report only official
sacreBLEU scores for publications.