add comment on training time for sentencepiece models

author: Marcin Junczys-Dowmunt <marcinjd@microsoft.com> 2018-11-26 22:09:02 +0300
committer: Marcin Junczys-Dowmunt <marcinjd@microsoft.com> 2018-11-26 22:09:02 +0300
commit: 6ab33f71542e48d0c47628281a3ff6776dacd1f0 (patch)
tree: 86499187d738041d0b0f58ef19cb5c4212d10de1
parent: 0d2b8fff001cb1da42659679d356d42bdc994ef9 (diff)
1 files changed, 3 insertions, 1 deletions
diff --git a/training-basics-sentencepiece/README.md b/training-basics-sentencepiece/README.md
index 176ef1b..5841db7 100644
--- a/training-basics-sentencepiece/README.md
+++ b/training-basics-sentencepiece/README.md
@@ -192,7 +192,9 @@ raw training and validation data into Marian. A single joint SentencePiece model
 `model/vocab.roen.spm`. The `*.spm` suffix is required and tells Marian to train a SentencePiece
 vocabulary. When the same vocabulary file is specified multiple times - like in this example - a single
 vocabulary is built for the union of the corresponding training files. This also enables us to use
-tied embeddings (`--tied-embeddings-all`).
+tied embeddings (`--tied-embeddings-all`). The SentencePiece training process takes a couple of 
+minutes depending on the input data size. The same `*.spm` can be later reused for other experiments
+with the same language pair and training is then of course omitted. 
 
 We can pass the Romanian-specific normalizaton rules via the `--sentencepiece-options` command line
 argument. The values of this option are passed on to the SentencePiece trainer, note the required single
author	Marcin Junczys-Dowmunt <marcinjd@microsoft.com>	2018-11-26 22:09:02 +0300
committer	Marcin Junczys-Dowmunt <marcinjd@microsoft.com>	2018-11-26 22:09:02 +0300
commit	6ab33f71542e48d0c47628281a3ff6776dacd1f0 (patch)
tree	86499187d738041d0b0f58ef19cb5c4212d10de1
parent	0d2b8fff001cb1da42659679d356d42bdc994ef9 (diff)