Update README.md

author: Taku Kudo <taku910@users.noreply.github.com> 2020-05-21 06:12:53 +0300
committer: GitHub <noreply@github.com> 2020-05-21 06:12:53 +0300
commit: a32d7dc6ce6f383a65ad6e1cbe1983f94ab11932 (patch)
tree: 37060c5b85180ac02142c871df35a1b60cac59d0
parent: c1fbda8995514172141c02356cdc9518a53aac94 (diff)
1 files changed, 25 insertions, 0 deletions
diff --git a/python/README.md b/python/README.md
index 8ed7d9e..b683082 100644
--- a/python/README.md
+++ b/python/README.md
@@ -92,6 +92,31 @@ trainer_interface.cc(619) LOG(INFO) Saving vocabs: m.vocab
 >>>
 ```
 
+### Training without local filesystem
+Sentencepiece trainer can receive any iterable object to feed training sentences. You can also pass a file object (instance with write() method) to emit the output model to any devices. These features are useful to run sentencepiece on environment that have limited access to the local file system (e.g., Google colab.)
+
+```
+import urllib.request
+import io
+import sentencepiece as spm
+
+# Loads model from URL as iterator and stores the model to BytesIO.
+model = io.BytesIO()
+with urllib.request.urlopen(
+    'https://raw.githubusercontent.com/google/sentencepiece/master/data/botchan.txt'
+) as response:
+  spm.SentencePieceTrainer.train(
+      sentence_iterator=response, model_writer=model, vocab_size=1000)
+
+# Serialize the model as file.
+# with open('out.model', 'wb') as f:
+#   f.write(model.getvalue())
+
+# Directly load the model from serialized model.
+sp = spm.SentencePieceProcessor(model_proto=model.getvalue())
+print(sp.encode('this is test'))
+```
+
 
 ### Segmentation (old interface)
 ```
author	Taku Kudo <taku910@users.noreply.github.com>	2020-05-21 06:12:53 +0300
committer	GitHub <noreply@github.com>	2020-05-21 06:12:53 +0300
commit	a32d7dc6ce6f383a65ad6e1cbe1983f94ab11932 (patch)
tree	37060c5b85180ac02142c871df35a1b60cac59d0
parent	c1fbda8995514172141c02356cdc9518a53aac94 (diff)