Welcome to mirror list, hosted at ThFree Co, Russian Federation.

iconv.tex « iconv « libc « newlib - cygwin.com/git/newlib-cygwin.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
blob: 1a36ceb2b8f49f9967f7cda2c0a19f4373b95a05 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
@node Iconv
@chapter Character-set conversions (@file{iconv.h})

This chapter describes the Newlib iconv library.
The iconv functions declarations are in
@file{iconv.h}.

@menu
* iconv::                 Character set conversion routines
* iconv architecture::    Architecture of Newlib iconv library
* iconv configuration::   Newlib iconv-specific configure options
* Generating CCS tables:: How to generate CCS tables
* Adding new converter::  Steps on adding a new converter
@end menu

@page
@include iconv/iconv.def

@page
@node iconv architecture
@section iconv architecture
@findex iconv architecture
@findex encoding
@findex CCS
@findex CES
@findex iconv converter
@*
@itemize @bullet
@item
Encoding - a rule to represent computer text by means of bits and bytes.
@item
CCS (Coded Character Set) - a mapping from an abstract character set
to a set of non-negative integers (character codes).
@item
CES (Character Encoding Scheme) - a mapping from a set of character codes
units to a sequence of bytes.
@end itemize

@*
Examples of CCS: ASCII, ISO-8859-x, KOI8-R, KSX-1001, GB-2312.@*
Examples of CES: UTF-8, UTF-16, EUC-JP, ISO-2022-JP.

@*
The iconv library is used to convert an array of characters in one encoding
to array in another encoding.

@*
From a user's point of view, the iconv library is a set of converters. Each converter
corresponds to one encoding (e.g., KOI8-R converter, UTF-8 converter).
Internally the meaning of converter is different.

@*
The iconv library always performs conversions through UCS-32: i.e., to convert
from A to B, iconv library first converts A to UCS-32, and then USC-32 to B.

@*
Each encoding consists of CES and CCS. CCS may be represented as data tables
but CES always implies some code (algorithm). Iconv uses CCS tables 
to map from some encoding to UCS-32. CCS tables are placed into
the iconv/ccs subdirectory of newlib.  The iconv code also uses CES 
modules which can convert some CCS to and from UCS-32.  CES modules are placed 
in the iconv/ces subdirectory.

@*
Some encodings have CES = CCS (e.g., KOI8-R). For such encodings iconv uses
special subroutines which perform simple table conversions (ccs_table.c).

@*
Among specialized CES modules, the iconv library has
generic support for EUC and ISO-2022-family encodings (ces_euc.c and
ces_iso2022.c).

@*
To enable iconv to work with CCS or CES-based encodings, the correspondent
CES table or CCS module should be linked with Newlib.  The iconv support
can also load CCS tables dynamically from external files (.cct files from
iconv/ccs/binary subdirectory).  CES modules, on the other-hand, can't 
be dynamically loaded.

@*
Each iconv converter has one name and a set of aliases. The list of
aliases for each converter's name is in the iconv/charset.aliases file.
Note: iconv always normalizes converter names and aliases before using.

@page
@node iconv configuration
@section iconv configuration
@findex iconv configuration
@findex iconv converter
@*
To enable iconv, the --enable-newlib-iconv configuration option should be
used when configuring newlib.

@*
To link a specific converter (CCS table or CES module) into Newlib, the
---enable-newlib-builtin-converters option should be used.  A 
comma-separated list of converters can be passed with this option
(e.g., ---enable-newlib-builtin-converters=koi8-r,euc-jp to link KOI8-R
and EUC-JP converters).  Either converter names or aliases may be used.

@*
If the target system has a file system accessible by Newlib, table-based
converters may be loaded dynamically from external files.  The iconv 
code tries to load files from the iconv_data subdirectory of the directory 
specified by the NLSPATH environment variable.

@*
Since Newlib has no generic dynamic module load support, CES-based converters
can't be dynamically loaded and should be linked-in.

@page
@node Generating CCS tables
@section Generating CCS tables
@*
CCS tables are placed in the ccs subdirectory of the iconv directory.  
This subdirectory contains .cct and .c files.  The .cct files are for 
dynamic loading whereas the .c files are for static linking with Newlib. 
Both .c and .cct files are generated by the 'iconv_mktbl' perl script 
from special source files (call them
.txt files).  The 'iconv_mktbl' script can be found in the iconv/ccs
subdirectory.  Input .txt files can be found at the Unicode.org site or
other locations found on the web.

@*
The .c files are linked with Newlib if the correspondent 'configure' script 
option was given.  This is needed to use iconv on targets without file system 
support.  If a CCS table isn't configured to be linked, the iconv library 
tries to load it dynamically from a corresponding .cct file.

@*
The following are commands to build .c and .cct CCS table files from .txt 
files for several supported encodings.

@*
@itemize
@item
cp775:@*
iconv_mktbl -Co cp775.c cp775.txt@*
iconv_mktbl -o cp775.cct cp775.txt
@end itemize

@itemize
@item
cp850:@*
iconv_mktbl -Co cp850.c cp850.txt@*
iconv_mktbl -o cp850.cct cp850.txt
@end itemize

@itemize
@item
cp852:@*
iconv_mktbl -Co cp852.c cp852.txt@*
iconv_mktbl -o cp852.cct cp852.txt
@end itemize

@itemize
@item
cp855:@*
iconv_mktbl -Co cp855.c cp855.txt@*
iconv_mktbl -o cp855.cct cp855.txt
@end itemize

@itemize
@item
cp866@*
iconv_mktbl -Co cp866.c cp866.txt@*
iconv_mktbl -o cp866.cct cp866.txt
@end itemize

@itemize
@item
iso-8859-1@*
iconv_mktbl -Co iso-8859-1.c iso-8859-1.txt@*
iconv_mktbl -o iso-8859-1.cct iso-8859-1.txt
@end itemize

@itemize
@item
iso-8859-4@*
iconv_mktbl -Co iso-8859-4.c iso-8859-4.txt@*
iconv_mktbl -o iso-8859-4.cct iso-8859-4.txt
@end itemize

@itemize
@item
iso-8859-5@*
iconv_mktbl -Co iso-8859-5.c iso-8859-5.txt@*
iconv_mktbl -o iso-8859-5.cct iso-8859-5.txt
@end itemize

@itemize
@item
iso-8859-2@*
iconv_mktbl -Co iso-8859-2.c iso-8859-2.txt@*
iconv_mktbl -o iso-8859-2.cct iso-8859-2.txt
@end itemize

@itemize
@item
iso-8859-15@*
iconv_mktbl -Co iso-8859-15.c iso-8859-15.txt@*
iconv_mktbl -o iso-8859-15.cct iso-8859-15.txt
@end itemize

@itemize
@item
big5@*
iconv_mktbl -Co big5.c big5.txt@*
iconv_mktbl -o big5.cct big5.txt
@end itemize

@itemize
@item
ksx1001@*
iconv_mktbl -Co ksx1001.c ksx1001.txt@*
iconv_mktbl -o ksx1001.cct ksx1001.txt
@end itemize

@itemize
@item
gb_2312@*
iconv_mktbl -Co gb_2312-80.c gb_2312-80.txt@*
iconv_mktbl -o gb_2312-80.cct gb_2312-80.txt
@end itemize

@itemize
@item
jis_x0201@*
iconv_mktbl -Co jis_x0201.c jis_x0201.txt@*
iconv_mktbl -o jis_x0201.cct jis_x0201.txt
@end itemize

@itemize
@item
iconv_mktbl -Co shift_jis.c shift_jis.txt@*
iconv_mktbl -o shift_jis.cct shift_jis.txt
@end itemize

@itemize
@item
jis_x0208@*
iconv_mktbl -C -c 1 -u 2 -o jis_x0208-1983.c jis_x0208-1983.txt@*
iconv_mktbl -c 1 -u 2 -o jis_x0208-1983.cct jis_x0208-1983.txt
@end itemize

@itemize
@item
jis_x0212@*
iconv_mktbl -Co jis_x0212-1990.c jis_x0212-1990.txt@*
iconv_mktbl -o jis_x0212-1990.cct jis_x0212-1990.txt
@end itemize

@itemize
@item
cns11643-plane1@*
iconv_mktbl -C -p 0x1 -o cns11643-plane1.c cns11643.txt@*
iconv_mktbl -p 0x1 -o cns11643-plane1.cct cns11643.txt
@end itemize

@itemize
@item
cns11643-plane2@*
iconv_mktbl -C -p 0x2 -o cns11643-plane2.c cns11643.txt@*
iconv_mktbl -p 0x2 -o cns11643-plane2.cct cns11643.txt
@end itemize

@itemize
@item
cns11643-plane14@*
iconv_mktbl -C -p 0xE -o cns11643-plane14.c cns11643.txt@*
iconv_mktbl -p 0xE -o cns11643-plane14.cct cns11643.txt
@end itemize

@itemize
@item
koi8-r@*
iconv_mktbl -Co koi8-r.c koi8-r.txt@*
iconv_mktbl -o koi8-r.cct koi8-r.txt
@end itemize

@itemize
@item
koi8-u@*
iconv_mktbl -Co koi8-u.c koi8-u.txt@*
iconv_mktbl -o koi8-u.cct koi8-u.txt
@end itemize

@itemize
@item
us-ascii@*
iconv_mktbl -Cao us-ascii.c iso-8859-1.txt@*
iconv_mktbl -ao us-ascii.cct iso-8859-1.txt
@end itemize

@*
Source files for CCS tables can be taken from at least two places:

@*
@enumerate
@item
http://www.unicode.org/Public/MAPPINGS/ contains a lot of encoding 
map files.
@item
http://www.dante.net/staff/konstantin/FreeBSD/iconv/ contains original 
iconv sources and encoding map files.
@end enumerate

@*
The following are URLs where source files for some of the CCS tables 
are found:

@itemize
@item
big5:@* 
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
@end itemize

@itemize
@item
cns11643_plane14, cns11643_plane1 and cns11643_plane2:@*
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/CNS11643.TXT
@end itemize

@itemize
@item
cp775, cp850, cp852, cp855, cp866:@*
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/
@end itemize

@itemize
@item
gb_2312_80:@*
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/GB/GB2312.TXT
@end itemize

@itemize
@item
iso_8859_15, iso_8859_1, iso_8859_2, iso_8859_4, iso_8859_5:@*
http://www.unicode.org/Public/MAPPINGS/ISO8859/
@end itemize

@itemize
@item
jis_x0201, jis_x0208_1983, jis_x0212_1990, shift_jis@*
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0201.TXT
@end itemize

@itemize
@item
koi8_r@*
http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT
@end itemize

@itemize
@item
ksx1001@*
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/KSX1001.TXT
@end itemize

@itemize
@item
koi8-u can be given from original FreeBSD iconv library distribution
http://www.dante.net/staff/konstantin/FreeBSD/iconv/
@end itemize

@*
Moreover, http://www.dante.net/staff/konstantin/FreeBSD/iconv/ contains a 
lot of additional CCS tables that you can use with Newlib (iso-2022 and
RFC1345 encodings).

@page
@node Adding new converter
@section Adding a new iconv converter
@*
The following steps should be taken to add a new iconv converter:
 
@*
@enumerate
@item
Converter's name and aliases list should be added to 
the iconv/charset.aliases file
@item
All iconv converters are protected by a _ICONV_CONVERTER_XXX
macro, where XXX is converter name.  This protection macro should be added to
newlib/newlib.hin file.
@item
Converter's name and aliases should be also registered in _iconv_builtin_aliases
table in iconv/lib/bialiasesi.c.  The list should be protected by
the corresponding macro mentioned above.
@item
If a new converter is just a CCS table, the corresponding .cct and .c files
should be added to the iconv/ccs/ subdirectory. The name of the files 
should be equivalent to the normalized encoding name.  The 'iconv_mktbl' 
Perl script (found in iconv/ccs) may
be used to generate such files.  The file's name should be added to
iconv/ccs/Makefile.am and iconv/ccs/binary/Makefile.am files and then
automake should be used to regenerate the Makefile.in files.
@item
If a new converter has a CES algorithm, the appropriate file should be 
added to the
iconv/ces/ subdirectory.  The name of the file again should be equivalent 
to the normalized
encoding name.
@item
If a converter is EUC or ISO-2022-family CES, then the converter
is just an array with a list of used CCS (See ccs/euc-jp.c for example). This
is because iconv already has EUC and ISO-2022 support.  Used CCS tables should
be provided in iconv/ccs/.
@item
If a converter isn't EUC or ISO-2022-based CCS, the following two functions
should be provided (see utf-8.c for example):
@enumerate -
@item A function to convert from new CES to UCS-32;
@item A function to convert from UCS-32 to new CES;
@item An 'init' function;
@item A 'close' function;
@item A 'reset' function to reset shift state for stateful CES.
@end enumerate

@*
All these functions are registered into a 'struct iconv_ces_desc' object.
The name of the object should be _iconv_ces_module_XXX, where XXX is the
name of the converter.
@item
For CES converters the correspondent 'struct iconv_ces_desc' reference should
be added into iconv/lib/bices.c file.

@*
For CCS converters, the corresponding table reference should be added into
the iconv/lib/biccs.c file.
@end enumerate