Update README.md

author: Taku Kudo <taku910@users.noreply.github.com> 2018-07-12 03:38:11 +0300
committer: GitHub <noreply@github.com> 2018-07-12 03:38:11 +0300
commit: 983c0f5aeb26d6963c3adef94b12e2ea1595dac9 (patch)
tree: 4fbfa2bc98d9d93a35449ae83aa40726cd781343 /README.md
parent: f949f9e571f824ea7ded63782eef0e3498b1e091 (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/README.md b/README.md
index b8964eb..63e967e 100644
--- a/README.md
+++ b/README.md
@@ -18,7 +18,7 @@ with the extension of direct training from raw sentences. SentencePiece allows u
 
 ## Technical highlights
 - **Purely data driven**: SentencePiece trains tokenization and detokenization
-  models from only raw sentences. No pre-tokenization ([Moses tokenizer](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl)/[MeCab](http://taku910.github.io/mecab/)/[KyTea](http://www.phontron.com/kytea/)) is required.
+  models from from sentences. Pre-tokenization ([Moses tokenizer](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl)/[MeCab](http://taku910.github.io/mecab/)/[KyTea](http://www.phontron.com/kytea/)) is not always required.
 - **Language independent**: SentencePiece treats the sentences just as sequences of Unicode characters. There is no language-dependent logic.
 - **Multiple subword algorithms**: **BPE**  [[Sennrich et al.](http://www.aclweb.org/anthology/P16-1162)] and **unigram language model** [[Kudo.](https://arxiv.org/abs/1804.10959)] are supported.
 - **Subword regularization**: SentencePiece implements subword sampling for [subword regularization](https://arxiv.org/abs/1804.10959) which helps to improve the robustness and accuracy of NMT models.
author	Taku Kudo <taku910@users.noreply.github.com>	2018-07-12 03:38:11 +0300
committer	GitHub <noreply@github.com>	2018-07-12 03:38:11 +0300
commit	983c0f5aeb26d6963c3adef94b12e2ea1595dac9 (patch)
tree	4fbfa2bc98d9d93a35449ae83aa40726cd781343 /README.md
parent	f949f9e571f824ea7ded63782eef0e3498b1e091 (diff)