1 files changed, 857 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man3pm/Encode::Supported.3pm b/upstream/mageia-cauldron/man3pm/Encode::Supported.3pm
new file mode 100644
index 00000000..06011c19
--- /dev/null
+++ b/upstream/mageia-cauldron/man3pm/Encode::Supported.3pm
@@ -0,0 +1,857 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+.    ds C` ""
+.    ds C' ""
+'br\}
+.el\{\
+.    ds C`
+.    ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el       .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD.  Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+.    if \nF \{\
+.        de IX
+.        tm Index:\\$1\t\\n%\t"\\$2"
+..
+.        if !\nF==2 \{\
+.            nr % 0
+.            nr F 2
+.        \}
+.    \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "Encode::Supported 3pm"
+.TH Encode::Supported 3pm 2023-11-28 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+Encode::Supported \-\- Encodings supported by Encode
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+.SS "Encoding Names"
+.IX Subsection "Encoding Names"
+Encoding names are case insensitive. White space in names
+is ignored.  In addition, an encoding may have aliases.
+Each encoding has one "canonical" name.  The "canonical"
+name is chosen from the names of the encoding by picking
+the first in the following sequence (with a few exceptions).
+.IP \(bu 2
+The name used by the Perl community.  That includes 'utf8' and 'ascii'.
+Unlike aliases, canonical names directly reach the method so such
+frequently used words like 'utf8' don't need to do alias lookups.
+.IP \(bu 2
+The MIME name as defined in IETF RFCs.  This includes all "iso\-"s.
+.IP \(bu 2
+The name in the IANA registry.
+.IP \(bu 2
+The name used by the organization that defined it.
+.PP
+In case \fIde jure\fR canonical names differ from that of the Encode
+module, they are always aliased if it ever be implemented.  So you can
+safely tell if a given encoding is implemented or not just by passing 
+the canonical name.
+.PP
+Because of all the alias issues, and because in the general case 
+encodings have state, "Encode" uses an encoding object internally 
+once an operation is in progress.
+.SH "Supported Encodings"
+.IX Header "Supported Encodings"
+As of Perl 5.8.0, at least the following encodings are recognized.
+Note that unless otherwise specified, they are all case insensitive
+(via alias) and all occurrence of spaces are replaced with '\-'.
+In other words, "ISO 8859 1" and "iso\-8859\-1" are identical.
+.PP
+Encodings are categorized and implemented in several different modules
+but you don't have to \f(CW\*(C`use Encode::XX\*(C'\fR to make them available for
+most cases.  Encode.pm will automatically load those modules on demand.
+.SS "Built-in Encodings"
+.IX Subsection "Built-in Encodings"
+The following encodings are always available.
+.PP
+.Vb 8
+\&  Canonical     Aliases                      Comments & References
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  ascii         US\-ascii ISO\-646\-US                         [ECMA]
+\&  ascii\-ctrl                                      Special Encoding
+\&  iso\-8859\-1    latin1                                       [ISO]
+\&  null                                            Special Encoding
+\&  utf8          UTF\-8                                    [RFC2279]
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+.Ve
+.PP
+\&\fInull\fR and \fIascii-ctrl\fR are special.  "null" fails for all character
+so when you set fallback mode to PERLQQ, HTMLCREF or XMLCREF, ALL
+CHARACTERS will fall back to character references.  Ditto for
+"ascii-ctrl" except for control characters.  For fallback modes, see
+Encode.
+.SS "Encode::Unicode \-\- other Unicode encodings"
+.IX Subsection "Encode::Unicode -- other Unicode encodings"
+Unicode coding schemes other than native utf8 are supported by
+Encode::Unicode, which will be autoloaded on demand.
+.PP
+.Vb 11
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  UCS\-2BE       UCS\-2, iso\-10646\-1                      [IANA, UC]
+\&  UCS\-2LE                                                     [UC]
+\&  UTF\-16                                                      [UC]
+\&  UTF\-16BE                                                    [UC]
+\&  UTF\-16LE                                                    [UC]
+\&  UTF\-32                                                      [UC]
+\&  UTF\-32BE      UCS\-4                                         [UC]
+\&  UTF\-32LE                                                    [UC]
+\&  UTF\-7                                                  [RFC2152]
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+.Ve
+.PP
+To find how (UCS\-2|UTF\-(16|32))(LE|BE)? differ from one another,
+see Encode::Unicode.
+.PP
+UTF\-7 is a special encoding which "re-encodes" UTF\-16BE into a 7\-bit
+encoding.  It is implemented separately by Encode::Unicode::UTF7.
+.SS "Encode::Byte \-\- Extended ASCII"
+.IX Subsection "Encode::Byte -- Extended ASCII"
+Encode::Byte implements most single-byte encodings except for
+Symbols and EBCDIC. The following encodings are based on single-byte
+encodings implemented as extended ASCII.  Most of them map
+\&\ex80\-\exff (upper half) to non-ASCII characters.
+.IP "ISO\-8859 and corresponding vendor mappings" 2
+.IX Item "ISO-8859 and corresponding vendor mappings"
+Since there are so many, they are presented in table format with
+languages and corresponding encoding names by vendors.  Note that
+the table is sorted in order of ISO\-8859 and the corresponding vendor
+mappings are slightly different from that of ISO.  See
+<http://czyborra.com/charsets/iso8859.html> for details.
+.Sp
+.Vb 10
+\&  Lang/Regions  ISO/Other Std.  DOS     Windows Macintosh  Others
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  N. America    (ASCII)         cp437        AdobeStandardEncoding
+\&                                cp863 (DOSCanadaF)
+\&  W. Europe     iso\-8859\-1      cp850   cp1252  MacRoman  nextstep
+\&                                                         hp\-roman8
+\&                                cp860 (DOSPortuguese)
+\&  Cntrl. Europe iso\-8859\-2      cp852   cp1250  MacCentralEurRoman
+\&                                                MacCroatian
+\&                                                MacRomanian
+\&                                                MacRumanian
+\&  Latin3[1]     iso\-8859\-3      
+\&  Latin4[2]     iso\-8859\-4              
+\&  Cyrillics     iso\-8859\-5      cp855   cp1251  MacCyrillic
+\&    (See also next section)     cp866           MacUkrainian
+\&  Arabic        iso\-8859\-6      cp864   cp1256  MacArabic
+\&                                cp1006          MacFarsi
+\&  Greek         iso\-8859\-7      cp737   cp1253  MacGreek
+\&                                cp869 (DOSGreek2)
+\&  Hebrew        iso\-8859\-8      cp862   cp1255  MacHebrew
+\&  Turkish       iso\-8859\-9      cp857   cp1254  MacTurkish
+\&  Nordics       iso\-8859\-10     cp865
+\&                                cp861           MacIcelandic
+\&                                                MacSami
+\&  Thai          iso\-8859\-11[3]  cp874           MacThai
+\&  (iso\-8859\-12 is nonexistent. Reserved for Indics?)
+\&  Baltics       iso\-8859\-13     cp775           cp1257
+\&  Celtics       iso\-8859\-14
+\&  Latin9 [4]    iso\-8859\-15
+\&  Latin10       iso\-8859\-16
+\&  Vietnamese    viscii                  cp1258  MacVietnamese
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&
+\&  [1] Esperanto, Maltese, and Turkish. Turkish is now on 8859\-9.
+\&  [2] Baltics.  Now on 8859\-10, except for Latvian.
+\&  [3] TIS 620 +  Non\-Breaking Space (0xA0 / U+00A0)
+\&  [4] Nicknamed Latin0; the Euro sign as well as French and Finnish
+\&      letters that are missing from 8859\-1 were added.
+.Ve
+.Sp
+All cp* are also available as ibm\-*, ms\-*, and windows\-* .  See also
+<http://czyborra.com/charsets/codepages.html>.
+.Sp
+Macintosh encodings don't seem to be registered in such entities as
+IANA.  "Canonical" names in Encode are based upon Apple's Tech Note
+1150.  See <http://developer.apple.com/technotes/tn/tn1150.html> 
+for details.
+.IP "KOI8 \- De Facto Standard for the Cyrillic world" 2
+.IX Item "KOI8 - De Facto Standard for the Cyrillic world"
+Though ISO\-8859 does have ISO\-8859\-5, the KOI8 series is far more
+popular in the Net.   Encode comes with the following KOI charsets.
+For gory details, see <http://czyborra.com/charsets/cyrillic.html>
+.Sp
+.Vb 5
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  koi8\-f                                        
+\&  koi8\-r cp878                                           [RFC1489]
+\&  koi8\-u                                                 [RFC2319]
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+.Ve
+.SS "gsm0338 \- Hentai Latin 1"
+.IX Subsection "gsm0338 - Hentai Latin 1"
+GSM0338 is for GSM handsets. Though it shares alphanumerals with
+ASCII, control character ranges and other parts are mapped very
+differently, mainly to store Greek characters.  There are also escape
+sequences (starting with 0x1B) to cover e.g. the Euro sign.
+.PP
+This was once handled by Encode::Bytes but because of all those
+unusual specifications, Encode 2.20 has relocated the support to
+Encode::GSM0338. See Encode::GSM0338 for details.
+.IP "gsm0338 support before 2.19" 2
+.IX Item "gsm0338 support before 2.19"
+Some special cases like a trailing 0x00 byte or a lone 0x1B byte are not
+well-defined and \fBdecode()\fR will return an empty string for them.
+One possible workaround is
+.Sp
+.Vb 3
+\&   $gsm =~ s/\ex00\ez/\ex00\ex00/;
+\&   $uni = decode("gsm0338", $gsm);
+\&   $uni .= "\exA0" if $gsm =~ /\ex1B\ez/;
+.Ve
+.Sp
+Note that the Encode implementation of GSM0338 does not implement the
+reuse of Latin capital letters as Greek capital letters (for example,
+the 0x5A is U+005A (LATIN CAPITAL LETTER Z), not U+0396 (GREEK CAPITAL
+LETTER ZETA).
+.Sp
+The GSM0338 is also covered in Encode::Byte even though it is not
+an "extended ASCII" encoding.
+.SS "CJK: Chinese, Japanese, Korean (Multibyte)"
+.IX Subsection "CJK: Chinese, Japanese, Korean (Multibyte)"
+Note that Vietnamese is listed above.  Also read "Encoding vs Charset"
+below.  Also note that these are implemented in distinct modules by
+countries, due to the size concerns (simplified Chinese is mapped
+to 'CN', continental China, while traditional Chinese is mapped to
+\&'TW', Taiwan).  Please refer to their respective documentation pages.
+.IP "Encode::CN \-\- Continental China" 2
+.IX Item "Encode::CN -- Continental China"
+.Vb 9
+\&  Standard      DOS/Win Macintosh                Comment/Reference
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  euc\-cn [1]            MacChineseSimp
+\&  (gbk)         cp936 [2]
+\&  gb12345\-raw                      { GB12345 without CES }
+\&  gb2312\-raw                       { GB2312  without CES }
+\&  hz
+\&  iso\-ir\-165
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&
+\&  [1] GB2312 is aliased to this.  See L<Microsoft\-related naming mess>
+\&  [2] gbk is aliased to this.  See L<Microsoft\-related naming mess>
+.Ve
+.IP "Encode::JP \-\- Japan" 2
+.IX Item "Encode::JP -- Japan"
+.Vb 11
+\&  Standard      DOS/Win Macintosh                Comment/Reference
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  euc\-jp
+\&  shiftjis      cp932   macJapanese
+\&  7bit\-jis
+\&  iso\-2022\-jp                                            [RFC1468]
+\&  iso\-2022\-jp\-1                                          [RFC2237]
+\&  jis0201\-raw  { JIS X 0201 (roman + halfwidth kana) without CES }
+\&  jis0208\-raw  { JIS X 0208 (Kanji + fullwidth kana) without CES }
+\&  jis0212\-raw  { JIS X 0212 (Extended Kanji)         without CES }
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+.Ve
+.IP "Encode::KR \-\- Korea" 2
+.IX Item "Encode::KR -- Korea"
+.Vb 8
+\&  Standard      DOS/Win Macintosh                Comment/Reference
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  euc\-kr                MacKorean                        [RFC1557]
+\&                cp949 [1]                    
+\&  iso\-2022\-kr                                            [RFC1557]
+\&  johab                                  [KS X 1001:1998, Annex 3]
+\&  ksc5601\-raw                              { KSC5601 without CES }
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&
+\&  [1] ks_c_5601\-1987, (x\-)?windows\-949, and uhc are aliased to this.
+\&  See below.
+.Ve
+.IP "Encode::TW \-\- Taiwan" 2
+.IX Item "Encode::TW -- Taiwan"
+.Vb 5
+\&  Standard      DOS/Win Macintosh                Comment/Reference
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  big5\-eten     cp950   MacChineseTrad {big5 aliased to big5\-eten}
+\&  big5\-hkscs                              
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+.Ve
+.IP "Encode::HanExtra \-\- More Chinese via CPAN" 2
+.IX Item "Encode::HanExtra -- More Chinese via CPAN"
+Due to the size concerns, additional Chinese encodings below are
+distributed separately on CPAN, under the name Encode::HanExtra.
+.Sp
+.Vb 8
+\&  Standard      DOS/Win Macintosh                Comment/Reference
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  big5ext                                   CMEX\*(Aqs Big5e Extension
+\&  big5plus                                  CMEX\*(Aqs Big5+ Extension
+\&  cccii         Chinese Character Code for Information Interchange
+\&  euc\-tw                             EUC (Extended Unix Character)
+\&  gb18030                          GBK with Traditional Characters
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+.Ve
+.IP "Encode::JIS2K \-\- JIS X 0213 encodings via CPAN" 2
+.IX Item "Encode::JIS2K -- JIS X 0213 encodings via CPAN"
+Due to size concerns, additional Japanese encodings below are
+distributed separately on CPAN, under the name Encode::JIS2K.
+.Sp
+.Vb 8
+\&  Standard      DOS/Win Macintosh                Comment/Reference
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  euc\-jisx0213
+\&  shiftjisx0123
+\&  iso\-2022\-jp\-3
+\&  jis0213\-1\-raw
+\&  jis0213\-2\-raw
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+.Ve
+.SS "Miscellaneous encodings"
+.IX Subsection "Miscellaneous encodings"
+.IP Encode::EBCDIC 2
+.IX Item "Encode::EBCDIC"
+See perlebcdic for details.
+.Sp
+.Vb 8
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  cp37
+\&  cp500  
+\&  cp875  
+\&  cp1026  
+\&  cp1047  
+\&  posix\-bc
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+.Ve
+.IP Encode::Symbols 2
+.IX Item "Encode::Symbols"
+For symbols  and dingbats.
+.Sp
+.Vb 7
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  symbol
+\&  dingbats
+\&  MacDingbats
+\&  AdobeZdingbat
+\&  AdobeSymbol
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+.Ve
+.IP Encode::MIME::Header 2
+.IX Item "Encode::MIME::Header"
+Strictly speaking, MIME header encoding documented in RFC 2047 is more
+of encapsulation than encoding.  However, their support in modern
+world is imperative so they are supported.
+.Sp
+.Vb 5
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  MIME\-Header                                            [RFC2047]
+\&  MIME\-B                                                 [RFC2047]
+\&  MIME\-Q                                                 [RFC2047]
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+.Ve
+.IP Encode::Guess 2
+.IX Item "Encode::Guess"
+This one is not a name of encoding but a utility that lets you pick up
+the most appropriate encoding for a data out of given \fIsuspects\fR.  See
+Encode::Guess for details.
+.SH "Unsupported encodings"
+.IX Header "Unsupported encodings"
+The following encodings are not supported as yet; some because they
+are rarely used, some because of technical difficulties.  They may
+be supported by external modules via CPAN in the future, however.
+.IP "ISO\-2022\-JP\-2 [RFC1554]" 2
+.IX Item "ISO-2022-JP-2 [RFC1554]"
+Not very popular yet.  Needs Unicode Database or equivalent to
+implement \fBencode()\fR (because it includes JIS X 0208/0212, KSC5601, and
+GB2312 simultaneously, whose code points in Unicode overlap.  So you
+need to lookup the database to determine to what character set a given
+Unicode character should belong).
+.IP "ISO\-2022\-CN [RFC1922]" 2
+.IX Item "ISO-2022-CN [RFC1922]"
+Not very popular.  Needs CNS 11643\-1 and \-2 which are not available in
+this module.  CNS 11643 is supported (via euc-tw) in Encode::HanExtra.
+Audrey Tang may add support for this encoding in her module in future.
+.IP "Various HP-UX encodings" 2
+.IX Item "Various HP-UX encodings"
+The following are unsupported due to the lack of mapping data.
+.Sp
+.Vb 2
+\&  \*(Aq8\*(Aq  \- arabic8, greek8, hebrew8, kana8, thai8, and turkish8
+\&  \*(Aq15\*(Aq \- japanese15, korean15, and roi15
+.Ve
+.IP "Cyrillic encoding ISO\-IR\-111" 2
+.IX Item "Cyrillic encoding ISO-IR-111"
+Anton Tagunov doubts its usefulness.
+.IP "ISO\-8859\-8\-1 [Hebrew]" 2
+.IX Item "ISO-8859-8-1 [Hebrew]"
+None of the Encode team knows Hebrew enough (ISO\-8859\-8, cp1255 and
+MacHebrew are supported because and just because there were mappings
+available at <http://www.unicode.org/>).  Contributions welcome.
+.IP "ISIRI 3342, Iran System, ISIRI 2900 [Farsi]" 2
+.IX Item "ISIRI 3342, Iran System, ISIRI 2900 [Farsi]"
+Ditto.
+.IP "Thai encoding TCVN" 2
+.IX Item "Thai encoding TCVN"
+Ditto.
+.IP "Vietnamese encodings VPS" 2
+.IX Item "Vietnamese encodings VPS"
+Though Jungshik Shin has reported that Mozilla supports this encoding,
+it was too late before 5.8.0 for us to add it.  In the future, it
+may be available via a separate module.  See
+<http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvlatin/vps.uf>
+and
+<http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvlatin/vps.ut>
+if you are interested in helping us.
+.IP "Various Mac encodings" 2
+.IX Item "Various Mac encodings"
+The following are unsupported due to the lack of mapping data.
+.Sp
+.Vb 5
+\&  MacArmenian,  MacBengali,   MacBurmese,   MacEthiopic
+\&  MacExtArabic, MacGeorgian,  MacKannada,   MacKhmer
+\&  MacLaotian,   MacMalayalam, MacMongolian, MacOriya
+\&  MacSinhalese, MacTamil,     MacTelugu,    MacTibetan
+\&  MacVietnamese
+.Ve
+.Sp
+The rest which are already available are based upon the vendor mappings
+at <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/> .
+.IP "(Mac) Indic encodings" 2
+.IX Item "(Mac) Indic encodings"
+The maps for the following are available at <http://www.unicode.org/>
+but remain unsupported because those encodings need an algorithmical
+approach, currently unsupported by \fIenc2xs\fR:
+.Sp
+.Vb 3
+\&  MacDevanagari
+\&  MacGurmukhi
+\&  MacGujarati
+.Ve
+.Sp
+For details, please see \f(CW\*(C`Unicode mapping issues and notes:\*(C'\fR at
+<http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/DEVANAGA.TXT> .
+.Sp
+I believe this issue is prevalent not only for Mac Indics but also in
+other Indic encodings, but the above were the only Indic encodings
+maps that I could find at <http://www.unicode.org/> .
+.SH "Encoding vs. Charset \-\- terminology"
+.IX Header "Encoding vs. Charset -- terminology"
+We are used to using the term (character) \fIencoding\fR and \fIcharacter
+set\fR interchangeably.  But just as confusing the terms byte and
+character is dangerous and the terms should be differentiated when
+needed, we need to differentiate \fIencoding\fR and \fIcharacter set\fR.
+.PP
+To understand that, here is a description of how we make computers
+grok our characters.
+.IP \(bu 2
+First we start with which characters to include.  We call this
+collection of characters \fIcharacter repertoire\fR.
+.IP \(bu 2
+Then we have to give each character a unique ID so your computer can
+tell the difference between 'a' and 'A'.  This itemized character
+repertoire is now a \fIcharacter set\fR.
+.IP \(bu 2
+If your computer can grow the character set without further
+processing, you can go ahead and use it.  This is called a \fIcoded
+character set\fR (CCS) or \fIraw character encoding\fR.  ASCII is used this
+way for most cases.
+.IP \(bu 2
+But in many cases, especially multi-byte CJK encodings, you have to
+tweak a little more.  Your network connection may not accept any data
+with the Most Significant Bit set, and your computer may not be able to
+tell if a given byte is a whole character or just half of it.  So you
+have to \fIencode\fR the character set to use it.
+.Sp
+A \fIcharacter encoding scheme\fR (CES) determines how to encode a given
+character set, or a set of multiple character sets.  7bit ISO\-2022 is
+an example of a CES.  You switch between character sets via \fIescape
+sequences\fR.
+.PP
+Technically, or mathematically, speaking, a character set encoded in
+such a CES that maps character by character may form a CCS.  EUC is such
+an example.  The CES of EUC is as follows:
+.IP \(bu 2
+Map ASCII unchanged.
+.IP \(bu 2
+Map such a character set that consists of 94 or 96 powered by N
+members by adding 0x80 to each byte.
+.IP \(bu 2
+You can also use 0x8e and 0x8f to indicate that the following sequence of
+characters belongs to yet another character set.  To each following byte
+is added the value 0x80.
+.PP
+By carefully looking at the encoded byte sequence, you can find that the
+byte sequence conforms a unique number.  In that sense, EUC is a CCS
+generated by a CES above from up to four CCS (complicated?).  UTF\-8
+falls into this category.  See "UTF\-8" in perlUnicode to find out how
+UTF\-8 maps Unicode to a byte sequence.
+.PP
+You may also have found out by now why 7bit ISO\-2022 cannot comprise
+a CCS.  If you look at a byte sequence \ex21\ex21, you can't tell if
+it is two !'s or IDEOGRAPHIC SPACE.  EUC maps the latter to \exA1\exA1
+so you have no trouble differentiating between "!!". and "\ \ ".
+.SH "Encoding Classification (by Anton Tagunov and Dan Kogai)"
+.IX Header "Encoding Classification (by Anton Tagunov and Dan Kogai)"
+This section tries to classify the supported encodings by their 
+applicability for information exchange over the Internet and to 
+choose the most suitable aliases to name them in the context of 
+such communication.
+.IP \(bu 2
+To (en|de)code encodings marked by \f(CW\*(C`(**)\*(C'\fR, you need 
+\&\f(CW\*(C`Encode::HanExtra\*(C'\fR, available from CPAN.
+.PP
+Encoding names
+.PP
+.Vb 3
+\&  US\-ASCII    UTF\-8    ISO\-8859\-*  KOI8\-R
+\&  Shift_JIS   EUC\-JP   ISO\-2022\-JP ISO\-2022\-JP\-1
+\&  EUC\-KR      Big5     GB2312
+.Ve
+.PP
+are registered with IANA as preferred MIME names and may
+be used over the Internet.
+.PP
+\&\f(CW\*(C`Shift_JIS\*(C'\fR has been officialized by JIS X 0208:1997.
+"Microsoft-related naming mess" gives details.
+.PP
+\&\f(CW\*(C`GB2312\*(C'\fR is the IANA name for \f(CW\*(C`EUC\-CN\*(C'\fR.
+See "Microsoft-related naming mess" for details.
+.PP
+\&\f(CW\*(C`GB_2312\-80\*(C'\fR \fIraw\fR encoding is available as \f(CW\*(C`gb2312\-raw\*(C'\fR
+with Encode. See Encode::CN for details.
+.PP
+.Vb 2
+\&  EUC\-CN
+\&  KOI8\-U        [RFC2319]
+.Ve
+.PP
+have not been registered with IANA (as of March 2002) but
+seem to be supported by major web browsers. 
+The IANA name for \f(CW\*(C`EUC\-CN\*(C'\fR is \f(CW\*(C`GB2312\*(C'\fR.
+.PP
+.Vb 1
+\&  KS_C_5601\-1987
+.Ve
+.PP
+is heavily misused.
+See "Microsoft-related naming mess" for details.
+.PP
+\&\f(CW\*(C`KS_C_5601\-1987\*(C'\fR \fIraw\fR encoding is available as \f(CW\*(C`kcs5601\-raw\*(C'\fR
+with Encode. See Encode::KR for details.
+.PP
+.Vb 1
+\&  UTF\-16 UTF\-16BE UTF\-16LE
+.Ve
+.PP
+are IANA-registered \f(CW\*(C`charset\*(C'\fRs. See [RFC 2781] for details.
+Jungshik Shin reports that UTF\-16 with a BOM is well accepted
+by MS IE 5/6 and NS 4/6. Beware however that
+.IP \(bu 2
+\&\f(CW\*(C`UTF\-16\*(C'\fR support in any software you're going to be
+using/interoperating with has probably been less tested
+then \f(CW\*(C`UTF\-8\*(C'\fR support
+.IP \(bu 2
+\&\f(CW\*(C`UTF\-8\*(C'\fR coded data seamlessly passes traditional
+command piping (\f(CW\*(C`cat\*(C'\fR, \f(CW\*(C`more\*(C'\fR, etc.) while \f(CW\*(C`UTF\-16\*(C'\fR coded
+data is likely to cause confusion (with its zero bytes,
+for example)
+.IP \(bu 2
+it is beyond the power of words to describe the way HTML browsers
+encode non\-\f(CW\*(C`ASCII\*(C'\fR form data. To get a general impression, visit
+<http://www.alanflavell.org.uk/charset/form\-i18n.html>.
+While encoding of form data has stabilized for \f(CW\*(C`UTF\-8\*(C'\fR encoded pages
+(at least IE 5/6, NS 6, and Opera 6 behave consistently), be sure to
+expect fun (and cross-browser discrepancies) with \f(CW\*(C`UTF\-16\*(C'\fR encoded
+pages!
+.PP
+The rule of thumb is to use \f(CW\*(C`UTF\-8\*(C'\fR unless you know what
+you're doing and unless you really benefit from using \f(CW\*(C`UTF\-16\*(C'\fR.
+.PP
+.Vb 5
+\&  ISO\-IR\-165    [RFC1345]
+\&  VISCII
+\&  GB 12345
+\&  GB 18030 (**)  (see links below)
+\&  EUC\-TW   (**)
+.Ve
+.PP
+are totally valid encodings but not registered at IANA.
+The names under which they are listed here are probably the
+most widely-known names for these encodings and are recommended
+names.
+.PP
+.Vb 1
+\&  BIG5PLUS (**)
+.Ve
+.PP
+is a proprietary name.
+.SS "Microsoft-related naming mess"
+.IX Subsection "Microsoft-related naming mess"
+Microsoft products misuse the following names:
+.IP KS_C_5601\-1987 2
+.IX Item "KS_C_5601-1987"
+Microsoft extension to \f(CW\*(C`EUC\-KR\*(C'\fR.
+.Sp
+Proper names: \f(CW\*(C`CP949\*(C'\fR, \f(CW\*(C`UHC\*(C'\fR, \f(CW\*(C`x\-windows\-949\*(C'\fR (as used by Mozilla).
+.Sp
+See <http://lists.w3.org/Archives/Public/ietf\-charsets/2001AprJun/0033.html>
+for details.
+.Sp
+Encode aliases \f(CW\*(C`KS_C_5601\-1987\*(C'\fR to \f(CW\*(C`cp949\*(C'\fR to reflect this common
+misusage. \fIRaw\fR \f(CW\*(C`KS_C_5601\-1987\*(C'\fR encoding is available as
+\&\f(CW\*(C`kcs5601\-raw\*(C'\fR.
+.Sp
+See Encode::KR for details.
+.IP GB2312 2
+.IX Item "GB2312"
+Microsoft extension to \f(CW\*(C`EUC\-CN\*(C'\fR.
+.Sp
+Proper names: \f(CW\*(C`CP936\*(C'\fR, \f(CW\*(C`GBK\*(C'\fR.
+.Sp
+\&\f(CW\*(C`GB2312\*(C'\fR has been registered in the \f(CW\*(C`EUC\-CN\*(C'\fR meaning at
+IANA. This has partially repaired the situation: Microsoft's 
+\&\f(CW\*(C`GB2312\*(C'\fR has become a superset of the official \f(CW\*(C`GB2312\*(C'\fR.
+.Sp
+Encode aliases \f(CW\*(C`GB2312\*(C'\fR to \f(CW\*(C`euc\-cn\*(C'\fR in full agreement with
+IANA registration. \f(CW\*(C`cp936\*(C'\fR is supported separately.
+\&\fIRaw\fR \f(CW\*(C`GB_2312\-80\*(C'\fR encoding is available as \f(CW\*(C`gb2312\-raw\*(C'\fR.
+.Sp
+See Encode::CN for details.
+.IP Big5 2
+.IX Item "Big5"
+Microsoft extension to \f(CW\*(C`Big5\*(C'\fR.
+.Sp
+Proper name: \f(CW\*(C`CP950\*(C'\fR.
+.Sp
+Encode separately supports \f(CW\*(C`Big5\*(C'\fR and \f(CW\*(C`cp950\*(C'\fR.
+.IP Shift_JIS 2
+.IX Item "Shift_JIS"
+Microsoft's understanding of \f(CW\*(C`Shift_JIS\*(C'\fR.
+.Sp
+JIS has not endorsed the full Microsoft standard however.
+The official \f(CW\*(C`Shift_JIS\*(C'\fR includes only JIS X 0201 and JIS X 0208
+character sets, while Microsoft has always used \f(CW\*(C`Shift_JIS\*(C'\fR
+to encode a wider character repertoire. See \f(CW\*(C`IANA\*(C'\fR registration for
+\&\f(CW\*(C`Windows\-31J\*(C'\fR.
+.Sp
+As a historical predecessor, Microsoft's variant
+probably has more rights for the name, though it may be objected
+that Microsoft shouldn't have used JIS as part of the name
+in the first place.
+.Sp
+Unambiguous name: \f(CW\*(C`CP932\*(C'\fR. \f(CW\*(C`IANA\*(C'\fR name (also used by Mozilla, and
+provided as an alias by Encode): \f(CW\*(C`Windows\-31J\*(C'\fR.
+.Sp
+Encode separately supports \f(CW\*(C`Shift_JIS\*(C'\fR and \f(CW\*(C`cp932\*(C'\fR.
+.SH Glossary
+.IX Header "Glossary"
+.IP "character repertoire" 2
+.IX Item "character repertoire"
+A collection of unique characters.  A \fIcharacter\fR set in the strictest
+sense. At this stage, characters are not numbered.
+.IP "coded character set (CCS)" 2
+.IX Item "coded character set (CCS)"
+A character set that is mapped in a way computers can use directly.
+Many character encodings, including EUC, fall in this category.
+.IP "character encoding scheme (CES)" 2
+.IX Item "character encoding scheme (CES)"
+An algorithm to map a character set to a byte sequence.  You don't
+have to be able to tell which character set a given byte sequence
+belongs.  7\-bit ISO\-2022 is a CES but it cannot be a CCS.  EUC is an
+example of being both a CCS and CES.
+.IP "charset (in MIME context)" 2
+.IX Item "charset (in MIME context)"
+has long been used in the meaning of \f(CW\*(C`encoding\*(C'\fR, CES.
+.Sp
+While the word combination \f(CW\*(C`character set\*(C'\fR has lost this meaning
+in MIME context since [RFC 2130], the \f(CW\*(C`charset\*(C'\fR abbreviation has
+retained it. This is how [RFC 2277] and [RFC 2278] bless \f(CW\*(C`charset\*(C'\fR:
+.Sp
+.Vb 7
+\& This document uses the term "charset" to mean a set of rules for
+\& mapping from a sequence of octets to a sequence of characters, such
+\& as the combination of a coded character set and a character encoding
+\& scheme; this is also what is used as an identifier in MIME "charset="
+\& parameters, and registered in the IANA charset registry ...  (Note
+\& that this is NOT a term used by other standards bodies, such as ISO).
+\& [RFC 2277]
+.Ve
+.IP EUC 2
+.IX Item "EUC"
+Extended Unix Character.  See ISO\-2022.
+.IP ISO\-2022 2
+.IX Item "ISO-2022"
+A CES that was carefully designed to coexist with ASCII.  There are a 7
+bit version and an 8 bit version.
+.Sp
+The 7 bit version switches character set via escape sequence so it
+cannot form a CCS.  Since this is more difficult to handle in programs
+than the 8 bit version, the 7 bit version is not very popular except for
+iso\-2022\-jp, the \fIde facto\fR standard CES for e\-mails.
+.Sp
+The 8 bit version can form a CCS.  EUC and ISO\-8859 are two examples
+thereof.  Pre\-5.6 perl could use them as string literals.
+.IP UCS 2
+.IX Item "UCS"
+Short for \fIUniversal Character Set\fR.  When you say just UCS, it means
+\&\fIUnicode\fR.
+.IP UCS\-2 2
+.IX Item "UCS-2"
+ISO/IEC 10646 encoding form: Universal Character Set coded in two
+octets.
+.IP Unicode 2
+.IX Item "Unicode"
+A character set that aims to include all character repertoires of the
+world.  Many character sets in various national as well as industrial
+standards have become, in a way, just subsets of Unicode.
+.IP UTF 2
+.IX Item "UTF"
+Short for \fIUnicode Transformation Format\fR.  Determines how to map a
+Unicode character into a byte sequence.
+.IP UTF\-16 2
+.IX Item "UTF-16"
+A UTF in 16\-bit encoding.  Can either be in big endian or little
+endian.  The big endian version is called UTF\-16BE (equal to UCS\-2 + 
+surrogate support) and the little endian version is called UTF\-16LE.
+.SH "See Also"
+.IX Header "See Also"
+Encode, 
+Encode::Byte, 
+Encode::CN, Encode::JP, Encode::KR, Encode::TW,
+Encode::EBCDIC, Encode::Symbol
+Encode::MIME::Header, Encode::Guess
+.SH References
+.IX Header "References"
+.IP ECMA 2
+.IX Item "ECMA"
+European Computer Manufacturers Association
+<http://www.ecma.ch>
+.RS 2
+.ie n .IP "ECMA\-035 (eq ""ISO\-2022"")" 2
+.el .IP "ECMA\-035 (eq \f(CWISO\-2022\fR)" 2
+.IX Item "ECMA-035 (eq ISO-2022)"
+<http://www.ecma.ch/ecma1/STAND/ECMA\-035.HTM>
+.Sp
+The specification of ISO\-2022 is available from the link above.
+.RE
+.RS 2
+.RE
+.IP IANA 2
+.IX Item "IANA"
+Internet Assigned Numbers Authority
+<http://www.iana.org/>
+.RS 2
+.IP "Assigned Charset Names by IANA" 2
+.IX Item "Assigned Charset Names by IANA"
+<http://www.iana.org/assignments/character\-sets>
+.Sp
+Most of the \f(CW\*(C`canonical names\*(C'\fR in Encode derive from this list
+so you can directly apply the string you have extracted from MIME
+header of mails and web pages.
+.RE
+.RS 2
+.RE
+.IP ISO 2
+.IX Item "ISO"
+International Organization for Standardization
+<http://www.iso.ch/>
+.IP RFC 2
+.IX Item "RFC"
+Request For Comments \-\- need I say more?
+<http://www.rfc\-editor.org/>, <http://www.ietf.org/rfc.html>,
+<http://www.faqs.org/rfcs/>
+.IP UC 2
+.IX Item "UC"
+Unicode Consortium
+<http://www.unicode.org/>
+.RS 2
+.IP "Unicode Glossary" 2
+.IX Item "Unicode Glossary"
+<http://www.unicode.org/glossary/>
+.Sp
+The glossary of this document is based upon this site.
+.RE
+.RS 2
+.RE
+.SS "Other Notable Sites"
+.IX Subsection "Other Notable Sites"
+.IP czyborra.com 2
+.IX Item "czyborra.com"
+<http://czyborra.com/>
+.Sp
+Contains a lot of useful information, especially gory details of ISO
+vs. vendor mappings.
+.IP CJK.inf 2
+.IX Item "CJK.inf"
+<http://examples.oreilly.com/cjkvinfo/doc/cjk.inf>
+.Sp
+Somewhat obsolete (last update in 1996), but still useful.  Also try
+.Sp
+<ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf>
+.Sp
+You will find brief info on \f(CW\*(C`EUC\-CN\*(C'\fR, \f(CW\*(C`GBK\*(C'\fR and mostly on \f(CW\*(C`GB 18030\*(C'\fR.
+.IP "Jungshik Shin's Hangul FAQ" 2
+.IX Item "Jungshik Shin's Hangul FAQ"
+<http://jshin.net/faq>
+.Sp
+And especially its subject 8.
+.Sp
+<http://jshin.net/faq/qa8.html>
+.Sp
+A comprehensive overview of the Korean (\f(CW\*(C`KS *\*(C'\fR) standards.
+.IP "debian.org: ""Introduction to i18n""" 2
+.IX Item "debian.org: ""Introduction to i18n"""
+A brief description for most of the mentioned CJK encodings is
+contained in
+<http://www.debian.org/doc/manuals/intro\-i18n/ch\-codes.en.html>
+.SS "Offline sources"
+.IX Subsection "Offline sources"
+.ie n .IP """CJKV Information Processing"" by Ken Lunde" 2
+.el .IP "\f(CWCJKV Information Processing\fR by Ken Lunde" 2
+.IX Item "CJKV Information Processing by Ken Lunde"
+CJKV Information Processing
+1999 O'Reilly & Associates, ISBN : 1\-56592\-224\-7
+.Sp
+The modern successor of \f(CW\*(C`CJK.inf\*(C'\fR.
+.Sp
+Features a comprehensive coverage of CJKV character sets and
+encodings along with many other issues faced by anyone trying
+to better support CJKV languages/scripts in all the areas of
+information processing.
+.Sp
+To purchase this book, visit
+<http://oreilly.com/catalog/9780596514471/>
+or your favourite bookstore.