diff options
author | Ryuichirou <gokenya@gmail.com> | 2018-05-02 08:09:02 +0300 |
---|---|---|
committer | Ryuichirou <gokenya@gmail.com> | 2018-05-02 08:09:02 +0300 |
commit | 80411b9e3d49247c84ad3a5a991ad327461edd7e (patch) | |
tree | 1e1160956a20a379cec3812fdd130d570098ec23 /README.md | |
parent | 18fcf6be46cbe5b9d7940dd8324f3db25652a68b (diff) |
Fix typo
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 2 |
1 files changed, 1 insertions, 1 deletions
@@ -14,7 +14,7 @@ Subword segmentation with unigram language model supports probabilistic subword ## Technical highlights - **Multiple subword algorithms**: **BPE** [[Sennrich et al.](http://www.aclweb.org/anthology/P16-1162)] and **unigram language model** [[Kudo.](https://arxiv.org/abs/1804.10959)] are supported. -- **Subword regularization**: SentencePiece implements subwrod sampling for [subword regularization](https://arxiv.org/abs/1804.10959) which helps to improve the robustness and accuracy of NMT models. +- **Subword regularization**: SentencePiece implements subword sampling for [subword regularization](https://arxiv.org/abs/1804.10959) which helps to improve the robustness and accuracy of NMT models. - **Purely data driven**: SentencePiece trains tokenization and detokenization models from only raw sentences. No pre-tokenization ([Moses tokenizer](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl)/[MeCab](http://taku910.github.io/mecab/)/[KyTea](http://www.phontron.com/kytea/)) is required. - **Language independent**: SentencePiece treats the sentences just as sequences of Unicode characters. There is no language-dependent logic. |