From f3d292740fc7ea19e0a4bc970d66e49ef982a698 Mon Sep 17 00:00:00 2001 From: Marcin Junczys-Dowmunt Date: Sun, 25 Nov 2018 23:38:51 -0800 Subject: Update README.md --- training-basics-sentencepiece/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/training-basics-sentencepiece/README.md b/training-basics-sentencepiece/README.md index 0fb6a85..76c547e 100644 --- a/training-basics-sentencepiece/README.md +++ b/training-basics-sentencepiece/README.md @@ -267,7 +267,7 @@ BLEU+case.mixed+lang.ro-en+numrefs.1+smooth.exp+test.wmt16+tok.13a+version.1.2.1 ## Is Normalization Actually Required? We also quickly tested if the normalization of Romanian characters is actually neccessary and if there are other methods -of dealing with the noise. SentencePiece supports a method called subword-regularization ((Kudo 2018)[]) that samples different +of dealing with the noise. SentencePiece supports a method called subword-regularization ([Kudo 2018](https://arxiv.org/abs/1804.10959)) that samples different subword splits at training time; ideally resulting in a more robust translation at inference time. Here's the table: -- cgit v1.2.3