Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/miloyip/rapidjson.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorMilo Yip <miloyip@gmail.com>2014-07-15 21:56:11 +0400
committerMilo Yip <miloyip@gmail.com>2014-07-15 21:56:11 +0400
commit7cfe718d3d1abbb15676b8f83e001b00eb2f1473 (patch)
tree62d0a84ba57f4894cd57575d2fa7b7bc432d2beb /doc
parente590e0757ec8cb6f602e4668defdae2c2593b042 (diff)
Minor update to encoding documentation
Diffstat (limited to 'doc')
-rw-r--r--doc/encoding.md11
1 files changed, 5 insertions, 6 deletions
diff --git a/doc/encoding.md b/doc/encoding.md
index cc764c2e..bc5c178f 100644
--- a/doc/encoding.md
+++ b/doc/encoding.md
@@ -6,8 +6,7 @@ According to [ECMA-404](http://www.ecma-international.org/publications/files/ECM
The earlier [RFC4627](http://www.ietf.org/rfc/rfc4627.txt) stated that,
-> (in §3) JSON text SHALL be encoded in Unicode. The default encoding is
- UTF-8.
+> (in §3) JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.
> (in §6) JSON may be represented using UTF-8, UTF-16, or UTF-32. When JSON is written in UTF-8, JSON is 8bit compatible. When JSON is written in UTF-16 or UTF-32, the binary content-transfer-encoding must be used.
@@ -28,9 +27,9 @@ Those unique numbers are called code points, which is in the range `0x0` to `0x1
There are various encodings for storing Unicode code points. These are called Unicode Transformation Format (UTF). RapidJSON supports the most commonly used UTFs, including
-* UTF-8: 8-bit variable-width encoding. It maps a code point to 1-4 bytes.
-* UTF-16: 16-bit variable-width encoding. It maps a code point to 1-2 16-bit code units (i.e., 2-4 bytes).
-* UTF-32: 32-bit fixed-width encoding. It directly maps a code point to 1 32-bit code unit (i.e. 4 bytes).
+* UTF-8: 8-bit variable-width encoding. It maps a code point to 1–4 bytes.
+* UTF-16: 16-bit variable-width encoding. It maps a code point to 1–2 16-bit code units (i.e., 2–4 bytes).
+* UTF-32: 32-bit fixed-width encoding. It directly maps a code point to a single 32-bit code unit (i.e. 4 bytes).
For UTF-16 and UTF-32, the byte order (endianness) does matter. Within computer memory, they are often stored in the computer's endianness. However, when it is stored in file or transferred over network, we need to state the byte order of the byte sequence, either little-endian (LE) or big-endian (BE).
@@ -78,7 +77,7 @@ For a detail example, please check the example in [DOM's Encoding](doc/stream.md
## Character Type {#CharacterType}
-As shown in the declaration, each encoding has a `CharType` template parameter. Actually, it may be a little bit confusing, but each `CharType` stores a code unit, not a character (code point). As mentioned in previous section, a code point may be encoded to 1-4 code units for UTF-8.
+As shown in the declaration, each encoding has a `CharType` template parameter. Actually, it may be a little bit confusing, but each `CharType` stores a code unit, not a character (code point). As mentioned in previous section, a code point may be encoded to 1–4 code units for UTF-8.
For `UTF16(LE|BE)`, `UTF32(LE|BE)`, the `CharType` must be integer type of at least 2 and 4 bytes respectively.