Update README.md

author: Taku Kudo <taku910@users.noreply.github.com> 2018-06-28 20:45:55 +0300
committer: GitHub <noreply@github.com> 2018-06-28 20:45:55 +0300
commit: e605990832d872e6a0a24cc8bb96f00386ca6fd8 (patch)
tree: b692c8d369fc5faabd2b08d82afb54a40a07bd72 /README.md
parent: 6f84d6bfa5017d0744c807625a0f3d7459001b0e (diff)
1 files changed, 4 insertions, 2 deletions
diff --git a/README.md b/README.md
index 0f243af..f76ad05 100644
--- a/README.md
+++ b/README.md
@@ -17,11 +17,11 @@ with the extension of direct training from raw sentences. SentencePiece allows u
 **This is not an official Google product.**
 
 ## Technical highlights
-- **Multiple subword algorithms**: **BPE**  [[Sennrich et al.](http://www.aclweb.org/anthology/P16-1162)] and **unigram language model** [[Kudo.](https://arxiv.org/abs/1804.10959)] are supported.
-- **Subword regularization**: SentencePiece implements subword sampling for [subword regularization](https://arxiv.org/abs/1804.10959) which helps to improve the robustness and accuracy of NMT models.
 - **Purely data driven**: SentencePiece trains tokenization and detokenization
   models from only raw sentences. No pre-tokenization ([Moses tokenizer](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl)/[MeCab](http://taku910.github.io/mecab/)/[KyTea](http://www.phontron.com/kytea/)) is required.
 - **Language independent**: SentencePiece treats the sentences just as sequences of Unicode characters. There is no language-dependent logic.
+bh- **Multiple subword algorithms**: **BPE**  [[Sennrich et al.](http://www.aclweb.org/anthology/P16-1162)] and **unigram language model** [[Kudo.](https://arxiv.org/abs/1804.10959)] are supported.
+- **Subword regularization**: SentencePiece implements subword sampling for [subword regularization](https://arxiv.org/abs/1804.10959) which helps to improve the robustness and accuracy of NMT models.
 - **Fast and lightweight**: Segmentation speed is around 50k sentences/sec, and memory footprint is around 6MB.
 - **Self-contained**: The same tokenization/detokenization is obtained as long as the same model file is used.
 - **Direct vocabulary id generation**: SentencePiece manages vocabulary to id mapping and can directly generate vocabulary id sequences from raw sentences.
@@ -286,5 +286,7 @@ Then segment train/test corpus with ```--vocabulary``` option
 * [SentencePiece Experiments](doc/experiments.md)
 * [SentencePieceProcessor C++ API](doc/api.md)
 * [Use custom text normalization rules](doc/normalization.md)
+- **Multiple subword algorithms**: **BPE**  [[Sennrich et al.](http://www.aclweb.org/anthology/P16-1162)] and **unigram language model** [[Kudo.](https://arxiv.org/abs/1804.10959)] are supported.
+- **Subword regularization**: SentencePiece implements subword sampling for [subword regularization](https://arxiv.org/abs/1804.10959) which helps to improve the robustness and accuracy of NMT models.
 * [Use custom symbols](doc/special_symbols.md)
 * [Segmentation and training algorithms in detail]
author	Taku Kudo <taku910@users.noreply.github.com>	2018-06-28 20:45:55 +0300
committer	GitHub <noreply@github.com>	2018-06-28 20:45:55 +0300
commit	e605990832d872e6a0a24cc8bb96f00386ca6fd8 (patch)
tree	b692c8d369fc5faabd2b08d82afb54a40a07bd72 /README.md
parent	6f84d6bfa5017d0744c807625a0f3d7459001b0e (diff)