newlib/libc/iconv/iconv.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885

@node Iconv
@chapter Encoding conversions (@file{iconv.h})

This chapter describes the Newlib iconv library.
The iconv functions declarations are in
@file{iconv.h}.

@menu
* iconv::                  Encoding conversion routines
* Introduction::           Introduction to iconv and encodings
* Supported encodings::    The list of currently supported encodings
* iconv design decisions:: General iconv library design issues and decisions
* iconv configuration::    iconv-related configure script options
@end menu

@page
@include iconv/iconv.def

@page
@node Introduction
@section Introduction
@findex encoding
@findex character set
@findex charset
@findex CES
@findex CCS
@*
The iconv library is intended to convert characters from one encoding to
another. It implements iconv(), iconv_open() and iconv_close() calls
defined by the Single Unix Specification.

@*
In addition to these user-level interfaces, the iconv library also has
several useful internal interfaces which are needed to support coding
capabilities of the Locale infrastructure. Since Locale also needs to
convert various character sets to and from Wide characters set, iconv
library shares it's capabilities with Locale subsystem. Moreover, iconv
supports several features which are only needed for Locale infrastructure
(for example, the MB_CUR_MAX value).

@*
The Newlib iconv library was created using ideas of another iconv
library implemented by Konstantin Chuguev (ver 2.0). Thus, the Newlib iconv
library has double Copyright. The Newlib iconv library was rewritten from
scratch by Artem B. Bityuckiy and contains a lot of improvements with respect
to original iconv library.

@*
Terms like @dfn{encoding} or @dfn{character set} aren't well defined and
are used with various meanings. The following is definitions of terms
used in this documentation as well as in iconv library implementation:

@itemize @bullet
@item
@dfn{encoding} - a machine representation of characters by means of bits;

@item
@dfn{Character Set} or @dfn{Charset} - just a collection of
characters, i.e. encoding is a machine representation of character set; 

@item
@dfn{CCS} (@dfn{Coded Character Set}) - a mapping from an character set to a
set of integers @dfn{character codes};

@item
@dfn{CES} (@dfn{Character Encoding Scheme}) - a mapping from a set of character
codes to a sequence of bytes;
@end itemize

@*
Users usually deal with encodings, for example, KOI8-R, Unicode, UTF-8,
ASCII, etc. Encodings are formed by the following chain:

@enumerate
@item
User has a set of characters specific to his language (character set).

@item
Each character from this set uniquely numbered, resulting in CCS.

@item
Each number from CCS is converted to a sequence of bits or bytes by means
of CES resulting in some encoding. Thus, CES may be considered as a
function of CCS which produces some encoding. Note, that CES may be
applied to more than one CCS.
@end enumerate

@*
Thus, an encoding may be considered as one or more CCS + CES.

@*
Sometimes, there is no CES and in such cases Encoding is equivalent to CCS,
e.g. KOI8-R or ASCII.

@*
The example of more complicated encoding is UTF-8 which is the UCS
(or Unicode) CCS plus UTF-8 CES.

@*
The following is a brief list of iconv library features:
@itemize
@item
Generic architecture
@item
Locale infrastructure support
@item
Automatic generation of code which handles various CES/CCS/Encoding/Names/Aliases
dependencies.
@item
The possibility to choose size- or speed-optimazed configuration
@item
The possibility to exclude almost all unneeded code from linking.
@end itemize


@page
@node Supported encodings
@section Supported encodings
@findex big5
@findex cp775
@findex cp850
@findex cp852
@findex cp855
@findex cp866
@findex euc_jp
@findex euc_kr
@findex euc_tw
@findex iso_8859_1
@findex iso_8859_10
@findex iso_8859_11
@findex iso_8859_13
@findex iso_8859_14
@findex iso_8859_15
@findex iso_8859_2
@findex iso_8859_3
@findex iso_8859_4
@findex iso_8859_5
@findex iso_8859_6
@findex iso_8859_7
@findex iso_8859_8
@findex iso_8859_9
@findex iso_ir_111
@findex koi8_r
@findex koi8_ru
@findex koi8_u
@findex koi8_uni
@findex ucs_2
@findex ucs_2_internal
@findex ucs_2be
@findex ucs_2le
@findex ucs_4
@findex ucs_4_internal
@findex ucs_4be
@findex ucs_4le
@findex us_ascii
@findex utf_16
@findex utf_16be
@findex utf_16le
@findex utf_8
@findex win_1250
@findex win_1251
@findex win_1252
@findex win_1253
@findex win_1254
@findex win_1255
@findex win_1256
@findex win_1257
@findex win_1258
@*
The following is a list of currently supported encodings. The first column
corresponds to encoding name, the second to the list of its aliases, third
- to its CES and CCS components names, fourth - to its short description.

@multitable @columnfractions .20 .26 .24 .30
@item
Name
@tab
Aliases
@tab
CES/CCS
@tab
Short description
@item
@tab
@tab
@tab


@item
big5
@tab
csbig5, big_five, bigfive, cn_big5, cp950
@tab
table_pcs / big5, us_ascii 
@tab
An encoding for Traditional Chinese.


@item
cp775
@tab
ibm775, cspc775baltic
@tab
table / cp775
@tab
An updated version of CP 437 that supports balitic languages.


@item
cp850
@tab
ibm850, 850, cspc850multilingual
@tab
table / cp850
@tab
IBM 850 - an updated version of CP 437 where several Latin 1 characters have been
added instead of some less-often used characters like line-drawing and greek ones.


@item
cp852
@tab
ibm852, 852, cspcp852
@tab
@tab
IBM 852 - an updated version of CP 437 where several Latin 2 characters have been added
instead of some less-often used characters like line-drawing and greek ones.


@item
cp855
@tab
ibm855, 855, csibm855
@tab
table / cp855
@tab
IBM 855 - an updated version of CP 437 that supports Cyrillic.


@item
cp866
@tab
866, IBM866, CSIBM866
@tab
table / cp866
@tab
IBM 866 - an updated version of CP 855 which followes the more logical Russian alphabet 
ordering of the alternativny variant that is preferred by many Russian users.


@item
euc_jp
@tab
eucjp
@tab
euc / jis_x0208_1990, jis_x0201_1976, jis_x0212_1990
@tab
EUC-JP - The EUC for Japanese.


@item
euc_kr
@tab
euckr
@tab
euc / ksx1001
@tab
EUC-KR - The EUC for Korean.


@item
euc_tw
@tab
euctw
@tab
euc / cns11643_plane1, cns11643_plane2, cns11643_plane14
@tab
EUC-TW - The EUC for Traditional Chinese.


@item
iso_8859_1
@tab
iso8859_1, iso88591, iso_8859_1:1987, iso_ir_100, latin1, l1, ibm819, cp819, csisolatin1
@tab
table / iso_8859_1
@tab
ISO 8859-1:1987 - Latin 1, West European.


@item
iso_8859_10
@tab
iso_8859_10:1992, iso_ir_157, iso885910, latin6, l6, csisolatin6, iso8859_10
@tab
table / iso_8859_10
@tab
ISO 8859-10:1992 - Latin 6, Nordic.


@item
iso_8859_11
@tab
iso8859_11, iso885911
@tab
table / iso_8859_11
@tab
ISO 8859-11 - Thai.


@item
iso_8859_13
@tab
iso_8859_13:1998, iso8859_13, iso885913
@tab
table / iso_8859_13
@tab
ISO 8859-13:1998 - Latin 7, Baltic Rim.


@item
iso_8859_14
@tab
iso_8859_14:1998, iso885914, iso8859_14
@tab
table / iso_8859_14
@tab
ISO 8859-14:1998 - Latin 8, Celtic.


@item
iso_8859_15
@tab
iso885915, iso_8859_15:1998, iso8859_15, 
@tab
table / iso_8859_15
@tab
ISO 8859-15:1998 - Latin 9, West Europe, successor of Latin 1.


@item
iso_8859_2
@tab
iso8859_2, iso88592, iso_8859_2:1987, iso_ir_101, latin2, l2, csisolatin2
@tab
table / iso_8859_2
@tab
ISO 8859-2:1987 - Latin 2, East European.


@item
iso_8859_3
@tab
iso_8859_3:1988, iso_ir_109, iso8859_3, latin3, l3, csisolatin3, iso88593
@tab
table / iso_8859_3
@tab
ISO 8859-3:1988 - Latin 3, South European.


@item
iso_8859_4
@tab
iso8859_4, iso88594, iso_8859_4:1988, iso_ir_110, latin4, l4, csisolatin4
@tab
table / iso_8859_4
@tab
ISO 8859-4:1988 - Latin 4, North European.


@item
iso_8859_5
@tab
iso8859_5, iso88595, iso_8859_5:1988, iso_ir_144, cyrillic, csisolatincyrillic
@tab
table / iso_8859_5
@tab
ISO 8859-5:1988 - Cyrillic.


@item
iso_8859_6
@tab
iso_8859_6:1987, iso_ir_127, iso8859_6, ecma_114, asmo_708, arabic, csisolatinarabic, iso88596
@tab
table / iso_8859_6
@tab
ISO i8859-6:1987 - Arabic.


@item
iso_8859_7
@tab
iso_8859_7:1987, iso_ir_126, iso8859_7, elot_928, ecma_118, greek, greek8, csisolatingreek, iso88597
@tab
table / iso_8859_7
@tab
ISO 8859-7:1987 - Greek.


@item
iso_8859_8
@tab
iso_8859_8:1988, iso_ir_138, iso8859_8, hebrew, csisolatinhebrew, iso88598
@tab
table / iso_8859_8
@tab
ISO 8859-8:1988 - Hebrew.


@item
iso_8859_9
@tab
iso_8859_9:1989, iso_ir_148, iso8859_9, latin5, l5, csisolatin5, iso88599
@tab
table / iso_8859_9
@tab
ISO 8859-9:1989 - Latin 5, Turkish.


@item
iso_ir_111
@tab
ecma_cyrillic, koi8_e, koi8e, csiso111ecmacyrillic
@tab
table / iso_ir_111
@tab
ISO IR 111/ECMA Cyrillic.


@item
koi8_r
@tab
cskoi8r, koi8r, koi8
@tab
table / koi8_r
@tab
RFC 1489 Cyrillic.


@item
koi8_ru
@tab
koi8ru
@tab
table / koi8_ru
@tab
Obsoleted Ukrainian.


@item
koi8_u
@tab
koi8u
@tab
table / koi8_u
@tab
RFC 2319 Ukrainian.


@item
koi8_uni
@tab
koi8uni
@tab
table / koi8_uni
@tab
KOI8 Unified.


@item
ucs_2
@tab
ucs2, iso_10646_ucs_2, iso10646_ucs_2, iso_10646_ucs2, iso10646_ucs2, iso10646ucs2, csUnicode
@tab
ucs_2 / (UCS)
@tab
ISO-10646-UCS-2. Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported).


@item
ucs_2_internal
@tab
ucs2_internal, ucs_2internal, ucs2internal
@tab
ucs_2_internal / (UCS)
@tab
ISO-10646-UCS-2 in system byte order.
NBSP is always interpreted as NBSP (BOM isn't supported).


@item
ucs_2be
@tab
ucs2be
@tab
ucs_2 / (UCS)
@tab
Big Endian version of ISO-10646-UCS-2 (in fact, equivalent to ucs_2).
Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported).


@item
ucs_2le
@tab
ucs2le
@tab
ucs_2 / (UCS)
@tab
Little Endian version of ISO-10646-UCS-2.
Little Endian, NBSP is always interpreted as NBSP (BOM isn't supported).


@item
ucs_4
@tab
ucs4, iso_10646_ucs_4, iso10646_ucs_4, iso_10646_ucs4, iso10646_ucs4, iso10646ucs4
@tab
ucs_4 / (UCS)
@tab
ISO-10646-UCS-4. Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported).


@item
ucs_4_internal
@tab
ucs4_internal, ucs_4internal, ucs4internal
@tab
ucs_4_internal / (UCS)
@tab
ISO-10646-UCS-4 in system byte order.
NBSP is always interpreted as NBSP (BOM isn't supported).


@item
ucs_4be
@tab
ucs4be
@tab
ucs_4 / (UCS)
@tab
Big Endian version of ISO-10646-UCS-4 (in fact, equivalent to ucs_4).
Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported).


@item
ucs_4le
@tab
ucs4le
@tab
ucs_4 / (UCS)
@tab
Little Endian version of ISO-10646-UCS-4.
Little Endian, NBSP is always interpreted as NBSP (BOM isn't supported).


@item
us_ascii
@tab
ansi_x3.4_1968, ansi_x3.4_1986, iso_646.irv:1991, ascii, iso646_us, us, ibm367, cp367, csascii
@tab
us_ascii / (ASCII)
@tab
7-bit ASCII.


@item
utf_16
@tab
utf16
@tab
utf_16 / (UCS)
@tab
RFC 2781 UTF-16. The very first NBSP code in stream is interpreted as BOM.


@item
utf_16be
@tab
utf16be
@tab
utf_16 / (UCS)
@tab
Big Endian version of RFC 2781 UTF-16.
NBSP is always interpreted as NBSP (BOM isn't supported).


@item
utf_16le
@tab
utf16le
@tab
utf_16 / (UCS)
@tab
Little Endian version of RFC 2781 UTF-16.
NBSP is always interpreted as NBSP (BOM isn't supported).


@item
utf_8
@tab
utf8
@tab
utf_8 / (UCS)
@tab
RFC 3629 UTF-8.


@item
win_1250
@tab
cp1250
@tab
@tab
Win-1250 Croatian.


@item
win_1251
@tab
cp1251
@tab
table / win_1251
@tab
Win-1251 - Cyrillic.


@item
win_1252
@tab
cp1252
@tab
table / win_1252
@tab
Win-1252 - Latin 1.


@item
win_1253
@tab
cp1253
@tab
table / win_1253
@tab
Win-1253 - Greek.


@item
win_1254
@tab
cp1254
@tab
table / win_1254
@tab
Win-1254 - Turkish.


@item
win_1255
@tab
cp1255
@tab
table / win_1255
@tab
Win-1255 - Hebrew.


@item
win_1256
@tab
cp1256
@tab
table / win_1256
@tab
Win-1256 - Arabic.


@item
win_1257
@tab
cp1257
@tab
table / win_1257
@tab
Win-1257 - Baltic.


@item
win_1258
@tab
cp1258
@tab
table / win_1258
@tab
Win-1258 - Vietnamese7 that supports Cyrillic.
@end multitable


@page
@node iconv design decisions
@section iconv design decisions
@findex CCS table
@findex CES converter
@*
The first iconv library design issue arises when considering the
following two design approaches:

@enumerate
@item
Have modules which implement conversion from encoding A to encoding B
and vice versa, i.e., one conversion module relates to any two
encodings.
@item
Have modules which implement conversion from encoding A to fixed
encoding C and vice versa, i.e., on conversion module relates to any
one encoding A and one fixed encoding C. In this case, to convert from
encoding A to encoding B, two modules are needed in order to convert
from A to C and then from C to B.
@end enumerate

@*
It's obvious, that we have a tradeoff between commonness/flexibility and
efficiency: the first method is more efficient since it converts
directly. But from other hand, it isn't so flexible since for each
encoding pair distinct module is needed.

@*
The Newlib iconv uses the second method and always converts through 32
bit UCS. But its design also allows to write specialized conversion
modules if the conversion speed is critical.

@*
The second design issue is how to decompose encodings.
The Newlib iconv library uses the fact that any encoding may be
considered as one or more CCS plus CES. It also decomposes its
conversion modules on @dfn{CES converter} plus one or more @dfn{CCS
tables}. CCS tables maps CCS to UCS and vice versa, CES converters
map CCS to encoding and vice versa.

@*
As an example, consider conversion from big5 encoding to EUC-TW
encoding. big5 encoding may be decomposed on ASCII and BIG5 CCSes plus
BIG5 CES. EUC-TW may be decomposed on CNS11643_PLANE1, CNS11643_PLANE2,
and CNS11643_PLANE14 CCSes plus EUC CES.

@*
The euc_jp -> big5 conversion happens as follows:

@enumerate
@item
EUC converter performs EUC-TW encoding to correspondent CCSes transformation
(CNS11643_PLANE1, CNS11643_PLANE2 and CNS11643_PLANE14 CCSes);
@item
Obtained CCS codes are transformed to UCS codes using CNS11643_PLANE1,
CNS11643_PLANE2 and CNS11643_PLANE14 CCS tables;
@item
Resulting UCS codes are transformed to ASCII and BIG5 codes using
correspondent CCS tables;
@item
Obtained CCS codes are transformed to big5 encoding using correspondent
CES converter.
@end enumerate

@*
Analogously, the backward conversion is performed as follows:

@enumerate
@item
BIG converter performs big5 encoding -> correspondent CCSes transformation
(ASCII and BIG5 CCSes);
@item
Obtained CCS codes are transformed to UCS codes using ASCII and BIG5 CCS tables;
@item
Resulting UCS codes are transformed to ASCII and BIG5 codes using
correspondent CCS tables;
@item
Obtained CCS codes are transformed to EUC-TW encoding using correspondent
CES converter.
@end enumerate

@*
Note, the above is just an example and real names (implemented in Newlib
iconv) of CES converters and CCS tables are slightly different.

@*
The third design issue also relates to flexibility. Obviously, it isn't
wanted to always link all CES converters and CCS tables to the library
but instead, it is wanted to be able to load needed converters and tables
dynamically on demand. This isn't a problem on "big" machines like PC
but may be very problematical within "small" embedded systems.

@*
Since the CCS tables are just data, it is possible to load them
dynamically from external files. Instead, CES converters are algorithms
and contain some code and the dynamic library loading capability is needed.

@*
Apart from possible restrictions applied by embedded systems (too few
RAM for example), the  Newlib itself  has no dynamic libraries support and,
therefore, all CES converters which will ever be uses must be linked into
the library. But the dynamic CCS tables loading is possible and is
implemented in the Newlib iconv library and may be enabled via Newlib
configure script options.

@*
The next design decision is the possibility to of fine iconv library
configuring. This means, that iconv isn't always link all it's
converters and tables (if no dynamical loading enabled) but instead, it
gives the possibility to enable only those encodings which are planned
to be used (see section about configure script options).

@*
Moreover, the Newlib iconv library configure options distinguish between
coding directions. This means, that not only supported encodings are
selectable, but the coding direction too. For example, if user wants
configuration which allows conversions from UTF-8 to UTF-16 and he
doesn't plan to use UTF-16 to UTF-8 conversions, he can enable exactly
that conversion direction (i.e., no UTF-16 -> UTF-8 -related code will
be included) thus saving some memory (note, that such technique allows to
exclude one half of CCS table from linking which may be big enough).

@*
One more design decision is speed- and size- optimized tables. Used can
select between them using s configure script option. Speed CCS tables
are the same as Size ones in case of 8 bit CCS (e.g.m KOI8-R), but for 16
bit CCS Size-optimized table may be in 1.5-2 time less then
Speed-optimized ones. From the other hand, the conversion with speed
tables is in several times faster.

@*
Its worth to stress, that new encodings support can't be
dynamically added into already compiled Newlib library. Even if this
needs only additional CCS table and iconv is configured to use external
files with CCS tables (this isn't a fundamental restriction and the
possibility to add new Table-based encodings support dynamically, by
copying new .cct file, may be easily added).

@*
Theoretically, the compiled-in CCS tables may be more appropriate foe
embedded solutions since they are read-only and are placed to ROM,
whereas the dynamic loading needs more RAM. Moreover, in current
implementation, distinct copy of CCS file is loaded for each fore each
opened iconv descriptor even in case of the same encoding.
This means, for example, that if two iconv descriptors for
KOI8-R -> UCS-4BE and KOI8-R -> UTF-16BE are opened, two copies of
koi8-r .cct file will be loaded (actually, iconv loads only needed part
of these files).


@page
@node iconv configuration
@section iconv configuration
@findex iconv configuration
@*
To enable encoding support --enable-newlib-iconv-encodings configure
script option should be used. This option accepts a comma-separated list
of encodings that should be enabled. Option enables each encoding in both
("to" and "from") directions.

@*
--enable-newlib-iconv-from-encodings configure script option enables
"from" support for each encoding that was passed to it.

@*
--enable-newlib-iconv-to-encodings configure script option enables
"to" support for each encoding that was passed to it.

@*
Example: if user plans only KOI8-R -> UTF-8, UTF-8 -> ISO-8859-5 and
KOI8-R -> UCS-2 conversions, the most optimal way (minimal iconv's
code and data will be linked) is to configure Newlib with 
--enable-newlib-iconv-encodings=UTF-8
--enable-newlib-iconv-from-encodings=KOI8-R
--enable-newlib-iconv-to-encodings=KOI8-R,ISO-8859-5

@*
--enable-newlib-iconv-external-ccs option enables iconv's
capabilities to work with external CCS files.

@*
Note: CCS files are searched by iconv_open in $NLSPATH/iconv_data/ directory.