Welcome to mirror list, hosted at ThFree Co, Russian Federation.

cygwin.com/git/newlib-cygwin.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'newlib/libc/iconv/iconv.tex')
-rw-r--r--newlib/libc/iconv/iconv.tex1710
1 files changed, 0 insertions, 1710 deletions
diff --git a/newlib/libc/iconv/iconv.tex b/newlib/libc/iconv/iconv.tex
deleted file mode 100644
index 305c3ce16..000000000
--- a/newlib/libc/iconv/iconv.tex
+++ /dev/null
@@ -1,1710 +0,0 @@
-@node Iconv
-@chapter Encoding conversions (@file{iconv.h})
-
-This chapter describes the Newlib iconv library.
-The iconv functions declarations are in
-@file{iconv.h}.
-
-@menu
-* iconv:: Encoding conversion routines
-* Introduction:: Introduction to iconv and encodings
-* Supported encodings:: The list of currently supported encodings
-* iconv design decisions:: General iconv library design issues
-* iconv configuration:: iconv-related configure script options
-* Encoding names:: How encodings are named.
-* CCS tables:: CCS tables format and 'mktbl.pl' Perl script
-* CES converters:: CES converters description
-* The encodings description file:: The 'encoding.deps' file and 'mkdeps.pl'
-* How to add new encoding:: The steps to add new encoding support
-* The locale support interfaces:: Locale-related iconv interfaces
-* Contact:: The author contact
-@end menu
-
-@page
-@include iconv/iconv.def
-
-@page
-@node Introduction
-@section Introduction
-@findex encoding
-@findex character set
-@findex charset
-@findex CES
-@findex CCS
-@*
-The iconv library is intended to convert characters from one encoding to
-another. It implements iconv(), iconv_open() and iconv_close()
-calls, which are defined by the Single Unix Specification.
-
-@*
-In addition to these user-level interfaces, the iconv library also has
-several useful interfaces which are needed to support coding
-capabilities of the Newlib Locale infrastructure. Since Locale
-support also needs to
-convert various character sets to and from the @emph{wide characters
-set}, the iconv library shares it's capabilities with the Newlib Locale
-subsystem. Moreover, the iconv library supports several features which are
-only needed for the Locale infrastructure (for example, the MB_CUR_MAX value).
-
-@*
-The Newlib iconv library was created using concepts from another iconv
-library implemented by Konstantin Chuguev (ver 2.0). The Newlib iconv library
-was rewritten from scratch and contains a lot of improvements with respect to
-the original iconv library.
-
-@*
-Terms like @dfn{encoding} or @dfn{character set} aren't well defined and
-are often used with various meanings. The following are the definitions of terms
-which are used in this documentation as well as in the iconv library
-implementation:
-
-@itemize @bullet
-@item
-@dfn{encoding} - a machine representation of characters by means of bits;
-
-@item
-@dfn{Character Set} or @dfn{Charset} - just a collection of
-characters, i.e. the encoding is the machine representation of the character set;
-
-@item
-@dfn{CCS} (@dfn{Coded Character Set}) - a mapping from an character set to a
-set of integers @dfn{character codes};
-
-@item
-@dfn{CES} (@dfn{Character Encoding Scheme}) - a mapping from a set of character
-codes to a sequence of bytes;
-@end itemize
-
-@*
-Users usually deal with encodings, for example, KOI8-R, Unicode, UTF-8,
-ASCII, etc. Encodings are formed by the following chain of steps:
-
-@enumerate
-@item
-User has a set of characters which are specific to his or her language (character set).
-
-@item
-Each character from this set is uniquely numbered, resulting in an CCS.
-
-@item
-Each number from the CCS is converted to a sequence of bits or bytes by means
-of a CES and form some encoding. Thus, CES may be considered as a
-function of CCS which produces some encoding. Note, that CES may be
-applied to more than one CCS.
-@end enumerate
-
-@*
-Thus, an encoding may be considered as one or more CCS + CES.
-
-@*
-Sometimes, there is no CES and in such cases encoding is equivalent
-to CCS, e.g. KOI8-R or ASCII.
-
-@*
-An example of a more complicated encoding is UTF-8 which is the UCS
-(or Unicode) CCS plus the UTF-8 CES.
-
-@*
-The following is a brief list of iconv library features:
-@itemize
-@item
-Generic architecture;
-@item
-Locale infrastructure support;
-@item
-Automatic generation of the program code which handles
-CES/CCS/Encoding/Names/Aliases dependencies;
-@item
-The ability to choose size- or speed-optimazed
-configuration;
-@item
-The ability to exclude a lot of unneeded code and data from the linking step.
-@end itemize
-
-
-
-
-@page
-@node Supported encodings
-@section Supported encodings
-@findex big5
-@findex cp775
-@findex cp850
-@findex cp852
-@findex cp855
-@findex cp866
-@findex euc_jp
-@findex euc_kr
-@findex euc_tw
-@findex iso_8859_1
-@findex iso_8859_10
-@findex iso_8859_11
-@findex iso_8859_13
-@findex iso_8859_14
-@findex iso_8859_15
-@findex iso_8859_2
-@findex iso_8859_3
-@findex iso_8859_4
-@findex iso_8859_5
-@findex iso_8859_6
-@findex iso_8859_7
-@findex iso_8859_8
-@findex iso_8859_9
-@findex iso_ir_111
-@findex koi8_r
-@findex koi8_ru
-@findex koi8_u
-@findex koi8_uni
-@findex ucs_2
-@findex ucs_2_internal
-@findex ucs_2be
-@findex ucs_2le
-@findex ucs_4
-@findex ucs_4_internal
-@findex ucs_4be
-@findex ucs_4le
-@findex us_ascii
-@findex utf_16
-@findex utf_16be
-@findex utf_16le
-@findex utf_8
-@findex win_1250
-@findex win_1251
-@findex win_1252
-@findex win_1253
-@findex win_1254
-@findex win_1255
-@findex win_1256
-@findex win_1257
-@findex win_1258
-@*
-The following is the list of currently supported encodings. The first column
-corresponds to the encoding name, the second column is the list of aliases,
-the third column is its CES and CCS components names, and the fourth column
-is a short description.
-
-@multitable @columnfractions .20 .26 .24 .30
-@item
-Name
-@tab
-Aliases
-@tab
-CES/CCS
-@tab
-Short description
-@item
-@tab
-@tab
-@tab
-
-
-@item
-big5
-@tab
-csbig5, big_five, bigfive, cn_big5, cp950
-@tab
-table_pcs / big5, us_ascii
-@tab
-The encoding for the Traditional Chinese.
-
-
-@item
-cp775
-@tab
-ibm775, cspc775baltic
-@tab
-table / cp775
-@tab
-The updated version of CP 437 that supports the balitic languages.
-
-
-@item
-cp850
-@tab
-ibm850, 850, cspc850multilingual
-@tab
-table / cp850
-@tab
-IBM 850 - the updated version of CP 437 where several Latin 1 characters have been
-added instead of some less-often used characters like the line-drawing
-and the greek ones.
-
-
-@item
-cp852
-@tab
-ibm852, 852, cspcp852
-@tab
-@tab
-IBM 852 - the updated version of CP 437 where several Latin 2 characters have been added
-instead of some less-often used characters like the line-drawing and the greek ones.
-
-
-@item
-cp855
-@tab
-ibm855, 855, csibm855
-@tab
-table / cp855
-@tab
-IBM 855 - the updated version of CP 437 that supports Cyrillic.
-
-
-@item
-cp866
-@tab
-866, IBM866, CSIBM866
-@tab
-table / cp866
-@tab
-IBM 866 - the updated version of CP 855 which follows more the logical Russian alphabet
-ordering of the alternative variant that is preferred by many Russian users.
-
-
-@item
-euc_jp
-@tab
-eucjp
-@tab
-euc / jis_x0208_1990, jis_x0201_1976, jis_x0212_1990
-@tab
-EUC-JP - The EUC for Japanese.
-
-
-@item
-euc_kr
-@tab
-euckr
-@tab
-euc / ksx1001
-@tab
-EUC-KR - The EUC for Korean.
-
-
-@item
-euc_tw
-@tab
-euctw
-@tab
-euc / cns11643_plane1, cns11643_plane2, cns11643_plane14
-@tab
-EUC-TW - The EUC for Traditional Chinese.
-
-
-@item
-iso_8859_1
-@tab
-iso8859_1, iso88591, iso_8859_1:1987, iso_ir_100, latin1, l1, ibm819, cp819, csisolatin1
-@tab
-table / iso_8859_1
-@tab
-ISO 8859-1:1987 - Latin 1, West European.
-
-
-@item
-iso_8859_10
-@tab
-iso_8859_10:1992, iso_ir_157, iso885910, latin6, l6, csisolatin6, iso8859_10
-@tab
-table / iso_8859_10
-@tab
-ISO 8859-10:1992 - Latin 6, Nordic.
-
-
-@item
-iso_8859_11
-@tab
-iso8859_11, iso885911
-@tab
-table / iso_8859_11
-@tab
-ISO 8859-11 - Thai.
-
-
-@item
-iso_8859_13
-@tab
-iso_8859_13:1998, iso8859_13, iso885913
-@tab
-table / iso_8859_13
-@tab
-ISO 8859-13:1998 - Latin 7, Baltic Rim.
-
-
-@item
-iso_8859_14
-@tab
-iso_8859_14:1998, iso885914, iso8859_14
-@tab
-table / iso_8859_14
-@tab
-ISO 8859-14:1998 - Latin 8, Celtic.
-
-
-@item
-iso_8859_15
-@tab
-iso885915, iso_8859_15:1998, iso8859_15,
-@tab
-table / iso_8859_15
-@tab
-ISO 8859-15:1998 - Latin 9, West Europe, successor of Latin 1.
-
-
-@item
-iso_8859_2
-@tab
-iso8859_2, iso88592, iso_8859_2:1987, iso_ir_101, latin2, l2, csisolatin2
-@tab
-table / iso_8859_2
-@tab
-ISO 8859-2:1987 - Latin 2, East European.
-
-
-@item
-iso_8859_3
-@tab
-iso_8859_3:1988, iso_ir_109, iso8859_3, latin3, l3, csisolatin3, iso88593
-@tab
-table / iso_8859_3
-@tab
-ISO 8859-3:1988 - Latin 3, South European.
-
-
-@item
-iso_8859_4
-@tab
-iso8859_4, iso88594, iso_8859_4:1988, iso_ir_110, latin4, l4, csisolatin4
-@tab
-table / iso_8859_4
-@tab
-ISO 8859-4:1988 - Latin 4, North European.
-
-
-@item
-iso_8859_5
-@tab
-iso8859_5, iso88595, iso_8859_5:1988, iso_ir_144, cyrillic, csisolatincyrillic
-@tab
-table / iso_8859_5
-@tab
-ISO 8859-5:1988 - Cyrillic.
-
-
-@item
-iso_8859_6
-@tab
-iso_8859_6:1987, iso_ir_127, iso8859_6, ecma_114, asmo_708, arabic, csisolatinarabic, iso88596
-@tab
-table / iso_8859_6
-@tab
-ISO i8859-6:1987 - Arabic.
-
-
-@item
-iso_8859_7
-@tab
-iso_8859_7:1987, iso_ir_126, iso8859_7, elot_928, ecma_118, greek, greek8, csisolatingreek, iso88597
-@tab
-table / iso_8859_7
-@tab
-ISO 8859-7:1987 - Greek.
-
-
-@item
-iso_8859_8
-@tab
-iso_8859_8:1988, iso_ir_138, iso8859_8, hebrew, csisolatinhebrew, iso88598
-@tab
-table / iso_8859_8
-@tab
-ISO 8859-8:1988 - Hebrew.
-
-
-@item
-iso_8859_9
-@tab
-iso_8859_9:1989, iso_ir_148, iso8859_9, latin5, l5, csisolatin5, iso88599
-@tab
-table / iso_8859_9
-@tab
-ISO 8859-9:1989 - Latin 5, Turkish.
-
-
-@item
-iso_ir_111
-@tab
-ecma_cyrillic, koi8_e, koi8e, csiso111ecmacyrillic
-@tab
-table / iso_ir_111
-@tab
-ISO IR 111/ECMA Cyrillic.
-
-
-@item
-koi8_r
-@tab
-cskoi8r, koi8r, koi8
-@tab
-table / koi8_r
-@tab
-RFC 1489 Cyrillic.
-
-
-@item
-koi8_ru
-@tab
-koi8ru
-@tab
-table / koi8_ru
-@tab
-The obsolete Ukrainian.
-
-
-@item
-koi8_u
-@tab
-koi8u
-@tab
-table / koi8_u
-@tab
-RFC 2319 Ukrainian.
-
-
-@item
-koi8_uni
-@tab
-koi8uni
-@tab
-table / koi8_uni
-@tab
-KOI8 Unified.
-
-
-@item
-ucs_2
-@tab
-ucs2, iso_10646_ucs_2, iso10646_ucs_2, iso_10646_ucs2, iso10646_ucs2, iso10646ucs2, csUnicode
-@tab
-ucs_2 / (UCS)
-@tab
-ISO-10646-UCS-2. Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported).
-
-
-@item
-ucs_2_internal
-@tab
-ucs2_internal, ucs_2internal, ucs2internal
-@tab
-ucs_2_internal / (UCS)
-@tab
-ISO-10646-UCS-2 in system byte order.
-NBSP is always interpreted as NBSP (BOM isn't supported).
-
-
-@item
-ucs_2be
-@tab
-ucs2be
-@tab
-ucs_2 / (UCS)
-@tab
-Big Endian version of ISO-10646-UCS-2 (in fact, equivalent to ucs_2).
-Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported).
-
-
-@item
-ucs_2le
-@tab
-ucs2le
-@tab
-ucs_2 / (UCS)
-@tab
-Little Endian version of ISO-10646-UCS-2.
-Little Endian, NBSP is always interpreted as NBSP (BOM isn't supported).
-
-
-@item
-ucs_4
-@tab
-ucs4, iso_10646_ucs_4, iso10646_ucs_4, iso_10646_ucs4, iso10646_ucs4, iso10646ucs4
-@tab
-ucs_4 / (UCS)
-@tab
-ISO-10646-UCS-4. Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported).
-
-
-@item
-ucs_4_internal
-@tab
-ucs4_internal, ucs_4internal, ucs4internal
-@tab
-ucs_4_internal / (UCS)
-@tab
-ISO-10646-UCS-4 in system byte order.
-NBSP is always interpreted as NBSP (BOM isn't supported).
-
-
-@item
-ucs_4be
-@tab
-ucs4be
-@tab
-ucs_4 / (UCS)
-@tab
-Big Endian version of ISO-10646-UCS-4 (in fact, equivalent to ucs_4).
-Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported).
-
-
-@item
-ucs_4le
-@tab
-ucs4le
-@tab
-ucs_4 / (UCS)
-@tab
-Little Endian version of ISO-10646-UCS-4.
-Little Endian, NBSP is always interpreted as NBSP (BOM isn't supported).
-
-
-@item
-us_ascii
-@tab
-ansi_x3.4_1968, ansi_x3.4_1986, iso_646.irv:1991, ascii, iso646_us, us, ibm367, cp367, csascii
-@tab
-us_ascii / (ASCII)
-@tab
-7-bit ASCII.
-
-
-@item
-utf_16
-@tab
-utf16
-@tab
-utf_16 / (UCS)
-@tab
-RFC 2781 UTF-16. The very first NBSP code in stream is interpreted as BOM.
-
-
-@item
-utf_16be
-@tab
-utf16be
-@tab
-utf_16 / (UCS)
-@tab
-Big Endian version of RFC 2781 UTF-16.
-NBSP is always interpreted as NBSP (BOM isn't supported).
-
-
-@item
-utf_16le
-@tab
-utf16le
-@tab
-utf_16 / (UCS)
-@tab
-Little Endian version of RFC 2781 UTF-16.
-NBSP is always interpreted as NBSP (BOM isn't supported).
-
-
-@item
-utf_8
-@tab
-utf8
-@tab
-utf_8 / (UCS)
-@tab
-RFC 3629 UTF-8.
-
-
-@item
-win_1250
-@tab
-cp1250
-@tab
-@tab
-Win-1250 Croatian.
-
-
-@item
-win_1251
-@tab
-cp1251
-@tab
-table / win_1251
-@tab
-Win-1251 - Cyrillic.
-
-
-@item
-win_1252
-@tab
-cp1252
-@tab
-table / win_1252
-@tab
-Win-1252 - Latin 1.
-
-
-@item
-win_1253
-@tab
-cp1253
-@tab
-table / win_1253
-@tab
-Win-1253 - Greek.
-
-
-@item
-win_1254
-@tab
-cp1254
-@tab
-table / win_1254
-@tab
-Win-1254 - Turkish.
-
-
-@item
-win_1255
-@tab
-cp1255
-@tab
-table / win_1255
-@tab
-Win-1255 - Hebrew.
-
-
-@item
-win_1256
-@tab
-cp1256
-@tab
-table / win_1256
-@tab
-Win-1256 - Arabic.
-
-
-@item
-win_1257
-@tab
-cp1257
-@tab
-table / win_1257
-@tab
-Win-1257 - Baltic.
-
-
-@item
-win_1258
-@tab
-cp1258
-@tab
-table / win_1258
-@tab
-Win-1258 - Vietnamese7 that supports Cyrillic.
-@end multitable
-
-
-
-
-
-@page
-@node iconv design decisions
-@section iconv design decisions
-@findex CCS table
-@findex CES converter
-@findex Speed-optimized tables
-@findex Size-optimized tables
-@*
-The first iconv library design issue arises when considering the
-following two design approaches:
-
-@enumerate
-@item
-Have modules which implement conversion from the encoding A to the encoding B
-and vice versa i.e., one conversion module relates to any two encodings.
-@item
-Have modules which implement conversion from the encoding A to the fixed
-encoding C and vice versa i.e., one conversion module relates to any
-one encoding A and one fixed encoding C. In this case, to convert from
-the encoding A to the encoding B, two modules are needed (in order to convert
-from A to C and then from C to B).
-@end enumerate
-
-@*
-It's obvious, that we have tradeoff between commonality/flexibility and
-efficiency: the first method is more efficient since it converts
-directly; however, it isn't so flexible since for each
-encoding pair a distinct module is needed.
-
-@*
-The Newlib iconv model uses the second method and always converts through the 32-bit
-UCS but its design also allows one to write specialized conversion
-modules if the conversion speed is critical.
-
-@*
-The second design issue is how to break down (decompose) encodings.
-The Newlib iconv library uses the fact that any encoding may be
-considered as one or more CCS plus a CES. It also decomposes its
-conversion modules on @dfn{CES converter} plus one or more @dfn{CCS
-tables}. CCS tables map CCS to UCS and vice versa; the CES converters
-map CCS to the encoding and vice versa.
-
-@*
-As the example, let's consider the conversion from the big5 encoding to
-the EUC-TW encoding. The big5 encoding may be decomposed to the ASCII and BIG5
-CCS-es plus the BIG5 CES. EUC-TW may be decomposed on the CNS11643_PLANE1, CNS11643_PLANE2,
-and CNS11643_PLANE14 CCS-es plus the EUC CES.
-
-@*
-The euc_jp -> big5 conversion is performed as follows:
-
-@enumerate
-@item
-The EUC converter performs the EUC-TW encoding to the corresponding CCS-es
-transformation (CNS11643_PLANE1, CNS11643_PLANE2 and CNS11643_PLANE14
-CCS-es);
-@item
-The obtained CCS codes are transformed to the UCS codes using the CNS11643_PLANE1,
-CNS11643_PLANE2 and CNS11643_PLANE14 CCS tables;
-@item
-The resulting UCS codes are transformed to the ASCII and BIG5 codes using
-the corresponding CCS tables;
-@item
-The obtained CCS codes are transformed to the big5 encoding using the corresponding
-CES converter.
-@end enumerate
-
-@*
-Analogously, the backward conversion is performed as follows:
-
-@enumerate
-@item
-The BIG5 converter performs the big5 encoding to the corresponding CCS-es transformation
-(the ASCII and BIG5 CCS-es);
-@item
-The obtained CCS codes are transformed to the UCS codes using the ASCII and BIG5 CCS tables;
-@item
-The resulting UCS codes are transformed to the ASCII and BIG5 codes using
-the corresponding CCS tables;
-@item
-The obtained CCS codes are transformed to the EUC-TW encoding using the corresponding
-CES converter.
-@end enumerate
-
-@*
-Note, the above is just an example and real names (which are implemented
-in the Newlib iconv) of the CES converters and the CCS tables are slightly different.
-
-@*
-The third design issue also relates to flexibility. Obviously, it isn't
-desirable to always link all the CES converters and the CCS tables to the library
-but instead, we want to be able to load the needed converters and tables
-dynamically on demand. This isn't a problem on "big" machines such as
-a PC, but it may be very problematical within "small" embedded systems.
-
-@*
-Since the CCS tables are just data, it is possible to load them
-dynamically from external files. The CES converters, on the other hand
-are algorithms with some code so a dynamic library loading
-capability is required.
-
-@*
-Apart from possible restrictions applied by embedded systems (small
-RAM for example), Newlib itself has no dynamic library support and
-therefore, all the CES converters which will ever be used must be linked into
-the library. However, loading of the dynamic CCS tables is possible and is
-implemented in the Newlib iconv library. It may be enabled via the Newlib
-configure script options.
-
-@*
-The next design issue is fine-tuning the iconv library
-configuration. One important ability is for iconv to not link all it's
-converters and tables (if dynamic loading is not enabled) but instead,
-enable only those encodings which are specified at configuration
-time (see the section about the configure script options).
-
-@*
-In addition, the Newlib iconv library configure options distinguish between
-conversion directions. This means that not only are supported encodings
-selectable, the conversion direction is as well. For example, if user wants
-the configuration which allows conversions from UTF-8 to UTF-16 and
-doesn't plan using the "UTF-16 to UTF-8" conversions, he or she can
-enable only
-this conversion direction (i.e., no "UTF-16 -> UTF-8"-related code will
-be included) thus, saving some memory (note, that such technique allows to
-exclude one half of a CCS table from linking which may be big enough).
-
-@*
-One more design aspect are the speed- and size- optimized tables. Users can
-select between them using configure script options. The
-speed-optimized CCS tables are the same as the size-optimized ones in
-case of 8-bit CCS (e.g.m KOI8-R), but for 16-bit CCS-es the size-optimized
-CCS tables may be 1.5 to 2 times less then the speed-optimized ones. On the
-other hand, conversion with speed tables is several times faster.
-
-@*
-Its worth to stress that the new encoding support can't be
-dynamically added into an already compiled Newlib library, even if it
-needs only an additional CCS table and iconv is configured to use
-the external files with CCS tables (this isn't the fundamental restriction
-and the possibility to add new Table-based encoding support dynamically, by
-means of just adding new .cct file, may be easily added).
-
-@*
-Theoretically, the compiled-in CCS tables should be more appropriate for
-embedded systems than dynamically loaded CCS tables. This is because the compiled-in tables are read-only and can be placed in ROM
-whereas dynamic loading requires RAM. Moreover, in the current iconv
-implementation, a distinct copy of the dynamic CCS file is loaded for each opened iconv descriptor even in case of the same encoding.
-This means, for example, that if two iconv descriptors for
-"KOI8-R -> UCS-4BE" and "KOI8-R -> UTF-16BE" are opened, two copies of
-koi8-r .cct file will be loaded (actually, iconv loads only the needed part
-of these files). On the other hand, in the case of compiled-in CCS tables, there will always be only one copy.
-
-@page
-@node iconv configuration
-@section iconv configuration
-@findex iconv configuration
-@findex --enable-newlib-iconv-encodings
-@findex --enable-newlib-iconv-from-encodings
-@findex --enable-newlib-iconv-to-encodings
-@findex --enable-newlib-iconv-external-ccs
-@findex NLSPATH
-@*
-To enable an encoding, the @emph{--enable-newlib-iconv-encodings} configure
-script option should be used. This option accepts a comma-separated list
-of @emph{encodings} that should be enabled. The option enables each encoding in both
-("to" and "from") directions.
-
-@*
-The @option{--enable-newlib-iconv-from-encodings} configure script option enables
-"from" support for each encoding that was passed to it.
-
-@*
-The @option{--enable-newlib-iconv-to-encodings} configure script option enables
-"to" support for each encoding that was passed to it.
-
-@*
-Example: if user plans only the "KOI8-R -> UTF-8", "UTF-8 -> ISO-8859-5" and
-"KOI8-R -> UCS-2" conversions, the most optimal way (minimal iconv
-code and data will be linked) is to configure Newlib with the following
-options:
-@*
-@code{--enable-newlib-iconv-encodings=UTF-8
---enable-newlib-iconv-from-encodings=KOI8-R
---enable-newlib-iconv-to-encodings=UCS-2,ISO-8859-5}
-@*
-which is the same as
-@*
-@code{--enable-newlib-iconv-from-encodings=KOI8-R,UTF-8
---enable-newlib-iconv-to-encodings=UCS-2,ISO-8859-5,UTF-8}
-@*
-User may also just use the
-@*
-@code{--enable-newlib-iconv-encodings=KOI8-R,ISO-8859-5,UTF-8,UCS-2}
-@*
-configure script option, but it isn't so optimal since there will be
-some unneeded data and code.
-
-@*
-The @option{--enable-newlib-iconv-external-ccs} option enables iconv's
-capabilities to work with the external CCS files.
-
-@*
-The @option{--enable-target-optspace} Newlib configure script option also affects
-the iconv library. If this option is present, the library uses the size
-optimized CCS tables. This means, that only the size-optimized CCS
-tables will be linked or, if the
-@option{--enable-newlib-iconv-external-ccs} configure script option was used,
-the iconv library will load the size-optimized tables. If the
-@option{--enable-target-optspace}configure script option is disabled,
-the speed-optimized CCS tables are used.
-
-@*
-Note: .cct files are searched by iconv_open in the $NLSPATH/iconv_data/ directory.
-Thus, the NLSPATH environment variable should be set.
-
-
-
-
-
-@page
-@node Encoding names
-@section Encoding names
-@findex encoding name
-@findex encoding alias
-@findex normalized name
-@*
-Each encoding has one @dfn{name} and a number of @dfn{aliases}. When
-user works with the iconv library (i.e., when the @code{iconv_open} call
-is used) both name or aliases may be used. The same is when encoding
-names are used in configure script options.
-
-@*
-Names and aliases may be specified in any case (small or capital
-letters) and the @kbd{-} symbol is equivalent to the @kbd{_} symbol.
-Also, when working with the iconv library,
-
-@*
-Internally the Newlib iconv library always converts aliases to names. It
-also converts names and aliases in the @dfn{normalized} form which means
-that all capital letters are converted to small letters and the @kbd{-}
-symbols are converted to @kbd{_} symbols.
-
-
-
-
-@page
-@node CCS tables
-@section CCS tables
-@findex Size-optimized CCS table
-@findex Speed-optimized CCS table
-@findex mktbl.pl Perl script
-@findex .cct files
-@findex The CCT tables source files
-@findex CCS source files
-@*
-The iconv library stores files with CCS tables in the the @emph{ccs/}
-subdirectory. The CCS tables for any CCS may be kept in two forms - in the binary form
-(@dfn{.cct files}, see the @emph{ccs/binary/} subdirectory) and in form
-of compilable .c source files. The .cct files are only used when the
-@option{--enable-newlib-iconv-external-ccs} configure script option is enabled.
-The .c files are linked to the Newlib library if the corresponding
-encoding is enabled.
-
-@*
-As stated earlier, the Newlib iconv library performs all
-conversions through the 32-bit UCS, but the codes which are used
-in most CCS-es, fit into the first 16-bit subset of the 32-bit UCS set.
-Thus, in order to make the CCS tables more compact, the 16-bit UCS-2 is
-used instead of the 32-bit UCS-4.
-
-@*
-CCS tables may be 8- or 16-bit wide. 8-bit CCS tables map 8-bit CCS to
-16-bit UCS-2 and vice versa while 16-bit CCS tables map
-16-bit CCS to 16-bit UCS-2 and vice versa.
-8-bit tables are small (in size) while 16-bit tables may be big enough.
-Because of this, 16-bit CCS tables may be
-either speed- or size-optimized. Size-optimized CCS tables are
-smaller then speed-optimized ones, but the conversion process is
-slower if the size-optimized CCS tables are used. 8-bit CCS tables have only
-size-optimized variant.
-
-Each CCS table (both speed- and size-optimized) consists of
-@dfn{from_ucs} and @dfn{to_ucs} subtables. "from_ucs" subtable maps
-UCS-2 codes to CCS codes, while "to_ucs" subtable maps CCS codes to
-UCS-2 codes.
-
-@*
-Almost all 16-bit CCS tables contain less then 0xFFFF codes and
-a lot of gaps exist.
-
-@subsection Speed-optimized tables format
-@*
-In case of 8-bit speed-optimized CCS tables the "to_ucs" subtables format is
-trivial - it is just the array of 256 16-bit UCS codes. Therefore, an
-UCS-2 code @emph{Y} corresponding to a @emph{X} CCS code is calculates
-as @emph{Y = to_ucs[X]}.
-
-@*
-Obviously, the simplest way to create the "from_ucs" table or the
-16-bit "to_ucs" table is to use the huge 16-bit array like in case
-of the 8-bit "to_ucs" table. But almost all the 16-bit CCS tables contain
-less then 0xFFFF code maps and this fact may be exploited to reduce
-the size of the CCS tables.
-
-@*
-In this chapter the "UCS-2 -> CCS" 8-bit CCS table format is described. The
-16-bit "CCS -> UCS-2" CCS table format is the same, except the mapping
-direction and the CCS bits number.
-
-@*
-In case of the 8-bit speed-optimized table the "from_ucs" subtable
-corresponds the "from_ucs" array and has the following layout:
-
-@*
-from_ucs array:
-@*
--------------------------------------
-@*
-0xFF mapping (2 bytes) (only for
-8-bit table).
-@*
--------------------------------------
-@*
-Heading block
-@*
--------------------------------------
-@*
-Block 1
-@*
--------------------------------------
-@*
-Block 2
-@*
--------------------------------------
-@*
- ...
-@*
--------------------------------------
-@*
-Block N
-@*
--------------------------------------
-
-@*
-The 0x0000-0xFFFF 16-bit code range is divided to 256 code subranges. Each
-subrange is represented by an 256-element @dfn{block} (256 1-byte
-elements or 256 2-byte element in case of 16-bit CCS table) with
-elements which are equivalent to the CCS codes of this subrange.
-If the "UCS-2 -> CCS" mapping has big enough gaps, some blocks will be
-absent and there will be less then 256 blocks.
-
-@*
-Any element number @emph{m} of @dfn{the heading block} (which contains
-256 2-byte elements) corresponds to the @emph{m}-th 256-element subrange.
-If the subrange contains some codes, the value of the @emph{m}-th element of
-the heading block contains the offset of the corresponding block in the
-"from_ucs" array. If there is no codes in the subrange, the heading
-block element contains 0xFFFF.
-
-@*
-If there are some gaps in a block, the corresponding block elements have
-the 0xFF value. If there is an 0xFF code present in the CCS, it's mapping
-is defined in the first 2-byte element of the "from_ucs" array.
-
-@*
-Having such a table format, the algorithm of searching the CCS code
-@emph{X} which corresponds to the UCS-2 code @emph{Y} is as follows.
-
-@*
-@enumerate
-@item If @emph{Y} is equivalent to the value of the first 2-byte element
-of the "from_ucs" array, @emph{X} is 0xFF. Else, continue to search.
-
-@item Calculate the block number: @emph{BlkN = (Y & 0xFF00) >> 8}.
-
-@item If the heading block element with number @emph{BlkN} is 0xFFFF, there
-is no corresponding CCS code (error, wrong input data). Else, fetch the
-"flom_ucs" array index of the @emph{BlkN}-th block.
-
-@item Calculate the offset of the @emph{X} code in its block:
-@emph{Xindex = Y & 0xFF}
-
-@item If the @emph{Xintex}-th element of the block (which is equivalent to
-@emph{from_ucs[BlkN+Xindex]}) value is 0xFF, there is no corresponding
-CCS code (error, wrong input data). Else, @emph{X = from_ucs[BlkN+Xindex]}.
-@end enumerate
-
-@subsection Size-optimized tables format
-@*
-As it is stated above, size-optimized tables exist only for 16-bit CCS-es.
-This is because there is too small difference between the speed-optimized
-and the size-optimized table sizes in case of 8-bit CCS-es.
-
-@*
-Formats of the "to_ucs" and "from_ucs" subtables are equivalent in case of
-size-optimized tables.
-
-This sections describes the format of the "UCS-2 -> CCS" size-optimized
-CCS table. The format of "CCS -> UCS-2" table is the same.
-
-The idea of the size-optimized tables is to split the UCS-2 codes
-("from" codes) on @dfn{ranges} (@dfn{range} is a number of consecutive UCS-2 codes).
-Then CCS codes ("to" codes) are stored only for the codes from these
-ranges. Distinct "from" codes, which have no range (@dfn{unranged codes}, are stored
-together with the corresponding "to" codes.
-
-@*
-The following is the layout of the size-optimized table array:
-
-@*
-size_arr array:
-@*
--------------------------------------
-@*
-Ranges number (2 bytes)
-@*
--------------------------------------
-@*
-Unranged codes number (2 bytes)
-@*
--------------------------------------
-@*
-Unranged codes array index (2 bytes)
-@*
--------------------------------------
-@*
-Ranges indexes (triads)
-@*
--------------------------------------
-@*
-Ranges
-@*
--------------------------------------
-@*
-Unranged codes array
-@*
--------------------------------------
-
-@*
-The @dfn{Unranged codes array index} @emph{size_arr} section helps to find
-the offset of the needed range in the @emph{size_arr} and has
-the following format (triads):
-@*
-the first code in range, the last code in range, range offset.
-
-@*
-The array of these triads is sorted by the firs element, therefore it is
-possible to quickly find the needed range index.
-
-@*
-Each range has the corresponding sub-array containing the "to" codes. These
-sub-arrays are stored in the place marked as "Ranges" in the layout
-diagram.
-
-@*
-The "Unranged codes array" contains pairs ("from" code, "to" code") for
-each unranged code. The array of these pairs is sorted by "from" code
-values, therefore it is possible to find the needed pair quickly.
-
-@*
-Note, that each range requires 6 bytes to form its index. If, for
-example, there are two ranges (1 - 5 and 9 - 10), and one unranged code
-(7), 12 bytes are needed for two range indexes and 4 bytes for the unranged
-code (total 16). But it is better to join both ranges as 1 - 10 and
-mark codes 6 and 8 as absent. In this case, only 6 additional bytes for the
-range index and 4 bytes to mark codes 6 and 8 as absent are needed
-(total 10 bytes). This optimization is done in the size-optimized tables.
-Thus, ranges may contain small gaps. The absent codes in ranges are marked
-as 0xFFFF.
-
-@*
-Note, a pair of "from" codes is stored by means of unranged codes since
-the number of bytes which are needed to form the range is greater than
-the number of bytes to store two unranged codes (5 against 4).
-
-@*
-The algorithm of searching of the CCS code
-@emph{X} which corresponds to the UCS-2 code @emph{Y} (input) in the "UCS-2 ->
-CCS" size-optimized table is as follows.
-
-@*
-@enumerate
-@item Try to find the corresponding triad in the "Unranged codes array
-index". Since we are searching in the sorted array, we can do it quickly
-(divide by 2, compare, etc).
-
-@item If the triad is found, fetch the @emph{X} code from the corresponding
-range array. If it is 0xFFFF, return an error.
-
-@item If there is no corresponding triad, search the @emph{X} code among the
-sorted unranged codes. Return error, if noting was found.
-@end enumerate
-
-@subsection .cct ant .c CCS Table files
-@*
-The .c source files for 8-bit CCS tables have "to_ucs" and "from_ucs"
-speed-optimized tables. The .c source files for 16-bit CCS tables have
-"to_ucs_speed", "to_ucs_size", "from_ucs_speed" and "from_ucs_size"
-tables.
-
-@*
-When .c files are compiled and used, all the 16-bit and 32-bit values
-have the native endian format (Big Endian for the BE systems and Little
-Endian for the LE systems) since they are compile for the system before
-they are used.
-
-@*
-In case of .cct files, which are intended for dynamic CCS tables
-loading, the CCS tables are stored either in LE or BE format. Since the
-.cct files are generated by the 'mktbl.pl' Perl script, it is possible
-to choose the endianess of the tables. It is also possible to store two
-copies (both LE and BE) of the CCS tables in one .cct file. The default
-.cct files (which come with the Newlib sources) have both LE and BE CCS
-tables. The Newlib iconv library automatically chooses the needed CCS tables
-(with appropriate endianess).
-
-@*
-Note, the .cct files are only used when the
-@option{--enable-newlib-iconv-external-ccs} is used.
-
-@subsection The 'mktbl.pl' Perl script
-@*
-The 'mktbl.pl' script is intended to generate .cct and .c CCS table
-files from the @dfn{CCS source files}.
-
-@*
-The CCS source files are just text files which has one or more colons
-with CCS <-> UCS-2 codes mapping. To see an example of the CCS table
-source files see one of them using URL-s which will be given bellow.
-
-@*
-The following table describes where the source files for CCS table files
-provided by the Newlib distribution are located.
-
-@multitable @columnfractions .25 .75
-@item
-Name
-@tab
-URL
-
-@item
-@tab
-
-@item
-big5
-@tab
-http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
-
-@item
-cns11643_plane1
-cns11643_plane14
-cns11643_plane2
-@tab
-http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/CNS11643.TXT
-
-@item
-cp775
-cp850
-cp852
-cp855
-cp866
-@tab
-http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/
-
-@item
-iso_8859_1
-iso_8859_2
-iso_8859_3
-iso_8859_4
-iso_8859_5
-iso_8859_6
-iso_8859_7
-iso_8859_8
-iso_8859_9
-iso_8859_10
-iso_8859_11
-iso_8859_13
-iso_8859_14
-iso_8859_15
-@tab
-http://www.unicode.org/Public/MAPPINGS/ISO8859/
-
-@item
-iso_ir_111
-@tab
-http://crl.nmsu.edu/~mleisher/csets/ISOIR111.TXT
-
-@item
-jis_x0201_1976
-jis_x0208_1990
-jis_x0212_1990
-@tab
-http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0201.TXT
-
-@item
-koi8_r
-@tab
-http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT
-
-@item
-koi8_ru
-@tab
-http://crl.nmsu.edu/~mleisher/csets/KOI8RU.TXT
-
-@item
-koi8_u
-@tab
-http://crl.nmsu.edu/~mleisher/csets/KOI8U.TXT
-
-@item
-koi8_uni
-@tab
-http://crl.nmsu.edu/~mleisher/csets/KOI8UNI.TXT
-
-@item
-ksx1001
-@tab
-http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/KSX1001.TXT
-
-@item
-win_1250
-win_1251
-win_1252
-win_1253
-win_1254
-win_1255
-win_1256
-win_1257
-win_1258
-@tab
-http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/
-@end multitable
-
-The CCS source files aren't distributed with Newlib because of License
-restrictions in most Unicode.org's files.
-
-The following are 'mktbl.pl' options which were used to generate .cct
-files. Note, to generate CCS tables source files @option{-s} option
-should be added.
-
-@enumerate
-@item For the iso_8859_10.cct, iso_8859_13.cct, iso_8859_14.cct, iso_8859_15.cct,
-iso_8859_1.cct, iso_8859_2.cct, iso_8859_3.cct, iso_8859_4.cct,
-iso_8859_5.cct, iso_8859_6.cct, iso_8859_7.cct, iso_8859_8.cct,
-iso_8859_9.cct, iso_8859_11.cct, win_1250.cct, win_1252.cct, win_1254.cct
-win_1256.cct, win_1258.cct, win_1251.cct,
-win_1253.cct, win_1255.cct, win_1257.cct,
-koi8_r.cct, koi8_ru.cct, koi8_u.cct, koi8_uni.cct, iso_ir_111.cct,
-big5.cct, cp775.cct, cp850.cct, cp852.cct, cp855.cct, cp866.cct, cns11643.cct
-files, only the @option{-i <SRC_FILE_NAME>} option were used.
-
-@item To generate the jis_x0208_1990.cct file, the
-@option{-i jis_x0208_1990.txt -x 2 -y 3} options were used.
-
-@item To generate the cns11643_plane1.cct file, the
-@option{-i cns11643.txt -p1 -N cns11643_plane1 -o cns11643_plane1.cct}
-options were used.
-
-@item To generate the cns11643_plane2.cct file, the
-@option{-i cns11643.txt -p2 -N cns11643_plane2 -o cns11643_plane2.cct}
-options were used.
-
-@item To generate the cns11643_plane14.cct file, the
-@option{-i cns11643.txt -p0xE -N cns11643_plane14 -o cns11643_plane14.cct}
-options were used.
-@end enumerate
-
-@*
-For more info about the 'mktbl.pl' options, see the 'mktbl.pl -h' output.
-
-@*
-It is assumed that CCS codes are 16 or less bits wide. If there are wider CCS codes
-in the CCS source file, the bits which are higher then 16 defines plane (see the
-cns11643.txt CCS source file).
-
-@*
-Sometimes, it is impossible to map some CCS codes to the 16-bit UCS if, for example,
-several different CCS codes are mapped to one UCS-2 code or one CCS code is mapped to
-the pair of UCS-2 codes. In these cases, such CCS codes (@dfn{lost
-codes}) aren't just rejected but instead, they are mapped to the default
-UCS-2 code (which is currently the @kbd{?} character's code).
-
-
-
-
-
-@page
-@node CES converters
-@section CES converters
-@findex PCS
-@*
-Similar to the CCS tables, CES converters are also split into "from UCS"
-and "to UCS" parts. Depending on the iconv library configuration, these
-parts are enabled or disabled.
-
-@*
-The following it the list of CES converters which are currently present
-in the Newlib iconv library.
-
-@itemize @bullet
-@item
-@emph{euc} - supports the @emph{euc_jp}, @emph{euc_kr} and @emph{euc_tw}
-encodings. The @emph{euc} CES converter uses the @emph{table} and the
-@emph{us_ascii} CES converters.
-
-@item
-@emph{table} - this CES converter corresponds to "null" and just performs
-tables-based conversion using 8- and 16-bit CCS tables. This converter
-is also used by any other CES converter which needs the CCS table-based
-conversions. The @emph{table} converter is also responsible for .cct files
-loading.
-
-@item
-@emph{table_pcs} - this is the wrapper over the @emph{table} converter
-which is intended for 16-bit encodings which also use the @dfn{Portable
-Character Set} (@dfn{PCS}) which is the same as the @emph{US-ASCII}.
-This means, that if the first byte the CCS code is in range of [0x00-0x7f],
-this is the 7-bit PCS code. Else, this is the 16-bit CCS code. Of course,
-the 16-bit codes must not contain bytes in the range of [0x00-0x7f].
-The @emph{big5} encoding uses the @emph{table_pcs} CES converter and the
-@emph{table_pcs} CES converter depends on the @emph{table} CES converter.
-
-@item
-@emph{ucs_2} - intended for the @emph{ucs_2}, @emph{ucs_2be} and
-@emph{ucs_2le} encodings support.
-
-@item
-@emph{ucs_4} - intended for the @emph{ucs_4}, @emph{ucs_4be} and
-@emph{ucs_4le} encodings support.
-
-@item
-@emph{ucs_2_internal} - intended for the @emph{ucs_2_internal} encoding support.
-
-@item
-@emph{ucs_4_internal} - intended for the @emph{ucs_4_internal} encoding support.
-
-@item
-@emph{us_ascii} - intended for the @emph{us_ascii} encoding support. In
-principle, the most natural way to support the @emph{us_ascii} encoding
-is to define the @emph{us_ascii} CCS and use the @emph{table} CES
-converter. But for the optimization purposes, the specialized
-@emph{us_ascii} CES converter was created.
-
-@item
-@emph{utf_16} - intended for the @emph{utf_16}, @emph{utf_16be} and
-@emph{utf_16le} encodings support.
-
-@item
-@emph{utf_8} - intended for the @emph{utf_8} encoding support.
-@end itemize
-
-
-
-
-
-@page
-@node The encodings description file
-@section The encodings description file
-@findex encoding.deps description file
-@findex mkdeps.pl Perl script
-@*
-To simplify the process of adding new encodings support allowing to
-automatically generate a lot of "glue" files.
-
-@*
-There is the 'encoding.deps' file in the @emph{lib/} subdirectory which
-is used to describe encoding's properties. The 'mkdeps.pl' Perl script
-uses 'encoding.deps' to generates the "glue" files.
-
-@*
-The 'encoding.deps' file is composed of sections, each section consists
-of entries, each entry contains some encoding/CES/CCS description.
-
-@*
-The 'encoding.deps' file's syntax is very simple. Currently only two
-sections are defined: @emph{ENCODINGS} and @emph{CES_DEPENDENCIES}.
-
-@*
-Each @emph{ENCODINGS} section's entry describes one encoding and
-contains the following information.
-
-@itemize @bullet
-@item
-Encoding name (the @emph{ENCODING} field). The name should
-be unique and only one name is possible.
-
-@item
-The encoding's CES converter name (the @emph{CES} field). Only one CES
-converter is allowed.
-
-@item
-The whitespace-separated list of CCS table names which are used by the
-encoding (the @emph{CCS} field).
-
-@item
-The whitespace-separated list of aliases names (the @emph{ENCODING}
-field).
-@end itemize
-
-@*
-Note all names in the 'encoding.deps' file have to have the normalized
-form.
-
-@*
-Each @emph{CES_DEPENDENCIES} section's entry describes dependencies of
-one CES converted. For example, the @emph{euc} CES converter depends on
-the @emph{table} and the @emph{us_ascii} CES converter since the
-@emph{euc} CES converter uses them. This means, that both @emph{table}
-and @emph{us_ascii} CES converters should be linked if the @emph{euc}
-CES converter is enabled.
-
-@*
-The @emph{CES_DEPENDENCIES} section defines the following:
-
-@itemize @bullet
-@item
-the CES converter name for which the dependencies are defined in this
-entry (the @emph{CES} field);
-
-@item
-the whitespace-separated list of CES converters which are needed for
-this CES converter (the @emph{USED_CES} field).
-@end itemize
-
-@*
-The 'mktbl.pl' Perl script automatically solves the following tasks.
-
-@itemize @bullet
-@item
-User works with the iconv library in terms of encodings and doesn't know
-anything about CES converters and CCS tables. The script automatically
-generates code which enables all needed CES converters and CCS tables
-for all encodings, which were enabled by the user.
-
-@item
-The CES converters may have dependencies and the script automatically
-generates the code which handles these dependencies.
-
-@item
-The list of encoding's aliases is also automatically generated.
-
-@item
-The script uses a lot of macros in order to enable only the minimum set
-of code/data which is needed to support the requested encodings in the
-requested directions.
-@end itemize
-
-@*
-The 'mktbl.pl' Perl script is intended to interpret the 'encoding.deps'
-file and generates the following files.
-
-@itemize @bullet
-@item
-@emph{lib/encnames.h} - this header files contains macro definitions for all
-encoding names
-
-@item
-@emph{lib/aliasesbi.c} - the array of encoding names and aliases. The array
-is used to find the name of requested encoding by it's alias.
-
-@item
-@emph{ces/cesbi.c} - this file defines two arrays
-(@code{_iconv_from_ucs_ces} and @code{_iconv_to_ucs_ces}) which contain
-description of enabled "to UCS" and "from UCS" CES converters and the
-names of encodings which are supported by these CES converters.
-
-@item
-@emph{ces/cesbi.h} - this file contains the set of macros which defines
-the set of CES converters which should be enabled if only the set of
-enabled encodings is given (through macros defined in the
-@emph{newlib.h} file). Note, that one CES converter may handle several
-encodings.
-
-@item
-@emph{ces/cesdeps.h} - the CES converters dependencies are handled in
-this file.
-
-@item
-@emph{ccs/ccsdeps.h} - the array of linked-in CCS tables is defined
-here.
-
-@item
-@emph{ccs/ccsnames.h} - this header files contains macro definitions for all
-CCS names.
-
-@item
-@emph{encoding.aliases} - the list of supported encodings and their
-aliases which is intended for the Newlib configure scripts in order to
-handle the iconv-related configure script options.
-@end itemize
-
-
-
-
-
-@page
-@node How to add new encoding
-@section How to add new encoding
-@*
-At first, the new encoding should be broken down to CCS and CES. Then,
-the process of adding new encoding is split to the following activities.
-
-@enumerate
-@item Generate the .cct CCS file and the .c source file for the new
-encoding's CCS (if it isn't already present). To do this, the CCS source
-file should be had and the 'mktbl.pl' script should be used.
-
-@item Write the corresponding CES converter (if it isn't already
-present). Use the existing CES converters as an example.
-
-@item
-Add the corresponding entries to the 'encoding.deps' file and regenerate
-the autogenerated "glue" files using the 'mkdeps.pl' script.
-
-@item
-Don't forget to add entries to the newlib/newlib.hin file.
-
-@item
-Of course, the 'Makefile.am'-s should also be updated (if new files were
-added) and the 'Makefile.in'-s should be regenerated using the correct
-version of 'automake'.
-
-@item
-Don't forget to update the documentation (the list of
-supported encodings and CES converters).
-@end enumerate
-
-In case a new encoding doesn't fit to the CES/CCS decomposition model or
-it is desired to add the specialized (non UCS-based) conversion support,
-the Newlib iconv library code should be upgraded.
-
-
-
-
-
-@page
-@node The locale support interfaces
-@section The locale support interfaces
-@*
-The newlib iconv library also has some interface functions (besides the
-@code{iconv}, @code{iconv_open} and @code{iconv_close} interfaces) which
-are intended for the Locale subsystem. All the locale-related code is
-placed in the @emph{lib/iconvnls.c} file.
-
-@*
-The following is the description of the locale-related interfaces:
-
-@itemize @bullet
-@item
-@code{_iconv_nls_open} - opens two iconv descriptors for "CCS ->
-wchar_t" and "wchar_t -> CCS" conversions. The normalized CCS name is
-passed in the function parameters. The @emph{wchar_t} characters encoding is
-either ucs_2_internal or ucs_4_internal depending on size of
-@emph{wchar_t}.
-
-@item
-@code{_iconv_nls_conv} - the function is similar to the @code{iconv}
-functions, but if there is no character in the output encoding which
-corresponds to the character in the input encoding, the default
-conversion isn't performed (the @code{iconv} function sets such output
-characters to the @kbd{?} symbol and this is the behavior, which is
-specified in SUSv3).
-
-@item
-@code{_iconv_nls_get_state} - returns the current encoding's shift state
-(the @code{mbstate_t} object).
-
-@item
-@code{_iconv_nls_set_state} sets the current encoding's shift state (the
-@code{mbstate_t} object).
-
-@item
-@code{_iconv_nls_is_stateful} - checks whether the encoding is stateful
-or stateless.
-
-@item
-@code{_iconv_nls_get_mb_cur_max} - returns the maximum length (the
-maximum bytes number) of the encoding's characters.
-@end itemize
-
-
-
-
-@page
-@node Contact
-@section Contact
-@*
-The author of the original BSD iconv library (Alexander Chuguev) no longer
-supports that code.
-
-@*
-Any questions regarding the iconv library may be forwarded to
-Artem B. Bityuckiy (dedekind@@oktetlabs.ru or dedekind@@mail.ru) as
-well as to the public Newlib mailing list.
-