Update README.md

author: Marcin Junczys-Dowmunt <marcinjd@microsoft.com> 2018-11-26 10:36:58 +0300
committer: GitHub <noreply@github.com> 2018-11-26 10:36:58 +0300
commit: dc285935e12a560a82033873d79445402e2e7ec0 (patch)
tree: 77529e6851ad77c239e2a71ed875c466a1cc35cf
parent: 1d5df93e22a2dfb58330a98f182cfd4f32825aba (diff)
1 files changed, 2 insertions, 2 deletions
diff --git a/training-basics-sentencepiece/README.md b/training-basics-sentencepiece/README.md
index 0941189..0fb6a85 100644
--- a/training-basics-sentencepiece/README.md
+++ b/training-basics-sentencepiece/README.md
@@ -279,8 +279,8 @@ Here's the table:
 | raw+sampling |      |      |
 
 We see that keeping the noise untouched (raw) results indeed in the worst of the three system, normalization (normalized) is best,
-closely followed by sampled subwords splits (raw+sampling). This is an interesting result: although normalization is generally better
-it is not trivial to discover the problem in the first place. Creating a normalization table is another added difficulty and on top of
+closely followed by sampled subwords splits (raw+sampling). This is an interesting result: although normalization is generally better,
+it is not trivial to discover the problem in the first place. Creating a normalization table is another added difficulty - and on top of
 that normalization breaks reversibility. Subword sampling seems to be a viable alternative when dealing with character-level noise with
 no added complexity compared to raw text. It does however take longer to converge, being a regularization method.
author	Marcin Junczys-Dowmunt <marcinjd@microsoft.com>	2018-11-26 10:36:58 +0300
committer	GitHub <noreply@github.com>	2018-11-26 10:36:58 +0300
commit	dc285935e12a560a82033873d79445402e2e7ec0 (patch)
tree	77529e6851ad77c239e2a71ed875c466a1cc35cf
parent	1d5df93e22a2dfb58330a98f182cfd4f32825aba (diff)