diff options
Diffstat (limited to 'newlib/libc/iconv/README.ORIGINAL')
-rw-r--r-- | newlib/libc/iconv/README.ORIGINAL | 68 |
1 files changed, 68 insertions, 0 deletions
diff --git a/newlib/libc/iconv/README.ORIGINAL b/newlib/libc/iconv/README.ORIGINAL new file mode 100644 index 000000000..b11b8de69 --- /dev/null +++ b/newlib/libc/iconv/README.ORIGINAL @@ -0,0 +1,68 @@ + ICONV - Charset Conversion Library. Version 2.0 + ----------------------------------------------- + +This distribution provides: + * the library (libiconv.a and .so) for conversion between + various charsets (character encoding schemes); + * and the command line utility (iconv), providing + conversion of a file, standard input or its argument + line from one charset to another; + * a set of coded character set tables (binary files) and + character encoding schemes (dynamically loaded modules) + for use by the library; + * a utility for creating character set tables from Unicode + conversion tables and RFC1345-style charset descriptions. + +Syntax of the library functions (iconv_open, iconv, iconv_close) +and the utility is described in the man pages. + +Features of the library: +- Coded character set (CCS) tables are binary files containing + pairs of tables for converting characters from some charset to + Unicode (UCS-2 in host byte order) and vice versa. There are 4 + types of tables supported in iconv-2.0: for 7-bit, 8-bit, 14-bit + and 16-bit charsets. The library uses memory mapping (in + read-only mode) to access the table data. +- Character encoding schemes (CES) are small sets of C structures + and functions. The functions implement virtual methods for + converting a sequence of characters in some charset to a Unicode + character (UCS-4 in host byte order). Each encoding scheme is + located in a separate C file and can be compiled to a dynamically + loaded shared module. +- A universal CES for all table driven charsets is compiled into + the library and used for all CCS tables. +- Both CCS tables and CES C code can be built into the library by + specifying the corresponding charset name in the + ICONV_BUILTIN_CHARSETS make variable. By default us-ascii, utf-8 + and ucs-4-internal are built in (plus the CES for all CCS + tables). All the CES modules are included to a static version of + the library (libiconv.a). +- Multiple aliases for every charset are supported. All aliases are + listed in the charset.aliases file(s). The library uses memory + mapping to parse alias information and find a canonical name + of a charset before looking it up in the internal list or + external table or shared module. Alias information can also be + compiled into the library (which is useful for compiled-in + charsets ;-) +- ISO/IEC 10646 conformance of the internal representation of + characters; conversion is done in two steps: + (1) a sequence of zero or more bytes from input buffer coded in + the source charset is converted to exactly one valid UCS-4 + character and + (2) the UCS-4 character is converted to a sequence of zero or + more bytes in the target charset to the output buffer. + In the case when two charset names are found to be aliases + of the same charset, conversion is done via a simplified + converter by copying the data from the input buffer to the + output one. +- Open module API: adding new modules is easy. API has only been + documented via iconv.h file comments so far. A perl utility is + provided for conversion of Unicode charset tables + (http://www.unicode.org/Public/MAPPINGS/) and RFC1345-style + charset tables into the CCS format recognized by the library. +- API conformance to Unix98 specification. +- BSD-style copyright. + + Konstantin Chuguev + <Konstantin.Chuguev@dante.org.uk> + November 2000. |