diff options
author | Marcin Junczys-Dowmunt <marcinjd@microsoft.com> | 2018-11-26 21:53:26 +0300 |
---|---|---|
committer | Marcin Junczys-Dowmunt <marcinjd@microsoft.com> | 2018-11-26 21:53:26 +0300 |
commit | 0d2b8fff001cb1da42659679d356d42bdc994ef9 (patch) | |
tree | 209e6b96a6d71616860c753fcb6d6b370d744717 | |
parent | 39d1fa288891c17f0b5f4b02cf2a84b6ea942ef9 (diff) |
change description of bleu-detok
-rw-r--r-- | training-basics-sentencepiece/README.md | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/training-basics-sentencepiece/README.md b/training-basics-sentencepiece/README.md index 8e97851..176ef1b 100644 --- a/training-basics-sentencepiece/README.md +++ b/training-basics-sentencepiece/README.md @@ -198,7 +198,8 @@ We can pass the Romanian-specific normalizaton rules via the `--sentencepiece-op argument. The values of this option are passed on to the SentencePiece trainer, note the required single quotes around the SentencePiece options: `--sentencepiece-options '--normalization_rule_tsv=data/norm_romanian.tsv'`. -Another new feature is the `bleu-detok` validation metric. When used with SentencePiece this should +Another new feature is the `bleu-detok` validation metric which can be used when SentencePiece support +is compiled into Marian. Since SentencePiece is a reversible tokenizer, this should give you in-training BLEU scores that are very close to sacreBLEU's scores. Differences may appear if unexpected SentencePiece normalization rules are used. You should still report only official sacreBLEU scores for publications. |