summaryrefslogtreecommitdiffstats
path: root/upstream/mageia-cauldron/man3pm/encoding.3pm
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 19:43:11 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 19:43:11 +0000
commitfc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
treece1e3bce06471410239a6f41282e328770aa404a /upstream/mageia-cauldron/man3pm/encoding.3pm
parentInitial commit. (diff)
downloadmanpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz
manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip
Adding upstream version 4.22.0.upstream/4.22.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'upstream/mageia-cauldron/man3pm/encoding.3pm')
-rw-r--r--upstream/mageia-cauldron/man3pm/encoding.3pm544
1 files changed, 544 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man3pm/encoding.3pm b/upstream/mageia-cauldron/man3pm/encoding.3pm
new file mode 100644
index 00000000..3f7216a9
--- /dev/null
+++ b/upstream/mageia-cauldron/man3pm/encoding.3pm
@@ -0,0 +1,544 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+. ds C` ""
+. ds C' ""
+'br\}
+.el\{\
+. ds C`
+. ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD. Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+. if \nF \{\
+. de IX
+. tm Index:\\$1\t\\n%\t"\\$2"
+..
+. if !\nF==2 \{\
+. nr % 0
+. nr F 2
+. \}
+. \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "encoding 3pm"
+.TH encoding 3pm 2023-11-28 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification. Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+encoding \- allows you to write your script in non\-ASCII and non\-UTF\-8
+.SH WARNING
+.IX Header "WARNING"
+This module has been deprecated since perl v5.18. See "DESCRIPTION" and
+"BUGS".
+.SH SYNOPSIS
+.IX Header "SYNOPSIS"
+.Vb 2
+\& use encoding "greek"; # Perl like Greek to you?
+\& use encoding "euc\-jp"; # Jperl!
+\&
+\& # or you can even do this if your shell supports your native encoding
+\&
+\& perl \-Mencoding=latin2 \-e\*(Aq...\*(Aq # Feeling centrally European?
+\& perl \-Mencoding=euc\-kr \-e\*(Aq...\*(Aq # Or Korean?
+\&
+\& # more control
+\&
+\& # A simple euc\-cn => utf\-8 converter
+\& use encoding "euc\-cn", STDOUT => "utf8"; while(<>){print};
+\&
+\& # "no encoding;" supported
+\& no encoding;
+\&
+\& # an alternate way, Filter
+\& use encoding "euc\-jp", Filter=>1;
+\& # now you can use kanji identifiers \-\- in euc\-jp!
+\&
+\& # encode based on the current locale \- specialized purposes only;
+\& # fraught with danger!!
+\& use encoding \*(Aq:locale\*(Aq;
+.Ve
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+This pragma is used to enable a Perl script to be written in encodings that
+aren't strictly ASCII nor UTF\-8. It translates all or portions of the Perl
+program script from a given encoding into UTF\-8, and changes the PerlIO layers
+of \f(CW\*(C`STDIN\*(C'\fR and \f(CW\*(C`STDOUT\*(C'\fR to the encoding specified.
+.PP
+This pragma dates from the days when UTF\-8\-enabled editors were uncommon. But
+that was long ago, and the need for it is greatly diminished. That, coupled
+with the fact that it doesn't work with threads, along with other problems,
+(see "BUGS") have led to its being deprecated. It is planned to remove this
+pragma in a future Perl version. New code should be written in UTF\-8, and the
+\&\f(CW\*(C`use utf8\*(C'\fR pragma used instead (see perluniintro and utf8 for details).
+Old code should be converted to UTF\-8, via something like the recipe in the
+"SYNOPSIS" (though this simple approach may require manual adjustments
+afterwards).
+.PP
+If UTF\-8 is not an option, it is recommended that one use a simple source
+filter, such as that provided by Filter::Encoding on CPAN or this
+pragma's own \f(CW\*(C`Filter\*(C'\fR option (see below).
+.PP
+The only legitimate use of this pragma is almost certainly just one per file,
+near the top, with file scope, as the file is likely going to only be written
+in one encoding. Further restrictions apply in Perls before v5.22 (see
+"Prior to Perl v5.22").
+.PP
+There are two basic modes of operation (plus turning if off):
+.ie n .IP """use encoding [\*(Aq\fIENCNAME\fR\*(Aq] ;""" 4
+.el .IP "\f(CWuse encoding [\*(Aq\fR\f(CIENCNAME\fR\f(CW\*(Aq] ;\fR" 4
+.IX Item "use encoding [ENCNAME] ;"
+Please note: This mode of operation is no longer supported as of Perl
+v5.26.
+.Sp
+This is the normal operation. It translates various literals encountered in
+the Perl source file from the encoding \fIENCNAME\fR into UTF\-8, and similarly
+converts character code points. This is used when the script is a combination
+of ASCII (for the variable names and punctuation, \fIetc\fR), but the literal
+data is in the specified encoding.
+.Sp
+\&\fIENCNAME\fR is optional. If omitted, the encoding specified in the environment
+variable \f(CW\*(C`PERL_ENCODING\*(C'\fR is used. If this isn't
+set, or the resolved-to encoding is not known to \f(CW\*(C`Encode\*(C'\fR, the error
+\&\f(CW\*(C`Unknown encoding \*(Aq\fR\f(CIENCNAME\fR\f(CW\*(Aq\*(C'\fR will be thrown.
+.Sp
+Starting in Perl v5.8.6 (\f(CW\*(C`Encode\*(C'\fR version 2.0.1), \fIENCNAME\fR may be the
+name \f(CW\*(C`:locale\*(C'\fR. This is for very specialized applications, and is documented
+in "The \f(CW\*(C`:locale\*(C'\fR sub-pragma" below.
+.Sp
+The literals that are converted are \f(CW\*(C`q//, qq//, qr//, qw///, qx//\*(C'\fR, and
+starting in v5.8.1, \f(CW\*(C`tr///\*(C'\fR. Operations that do conversions include \f(CW\*(C`chr\*(C'\fR,
+\&\f(CW\*(C`ord\*(C'\fR, \f(CW\*(C`utf8::upgrade\*(C'\fR (but not \f(CW\*(C`utf8::downgrade\*(C'\fR), and \f(CW\*(C`chomp\*(C'\fR.
+.Sp
+Also starting in v5.8.1, the \f(CW\*(C`DATA\*(C'\fR pseudo-filehandle is translated from the
+encoding into UTF\-8.
+.Sp
+For example, you can write code in EUC-JP as follows:
+.Sp
+.Vb 3
+\& my $Rakuda = "\exF1\exD1\exF1\exCC"; # Camel in Kanji
+\& #<\-char\-><\-char\-> # 4 octets
+\& s/\ebCamel\eb/$Rakuda/;
+.Ve
+.Sp
+And with \f(CW\*(C`use encoding "euc\-jp"\*(C'\fR in effect, it is the same thing as
+that code in UTF\-8:
+.Sp
+.Vb 2
+\& my $Rakuda = "\ex{99F1}\ex{99DD}"; # two Unicode Characters
+\& s/\ebCamel\eb/$Rakuda/;
+.Ve
+.Sp
+See "EXAMPLE" below for a more complete example.
+.Sp
+Unless \f(CW\*(C`${^UNICODE}\*(C'\fR (available starting in v5.8.2) exists and is non-zero, the
+PerlIO layers of \f(CW\*(C`STDIN\*(C'\fR and \f(CW\*(C`STDOUT\*(C'\fR are set to "\f(CW:encoding(\fR\f(CIENCNAME\fR\f(CW)\fR".
+Therefore,
+.Sp
+.Vb 5
+\& use encoding "euc\-jp";
+\& my $message = "Camel is the symbol of perl.\en";
+\& my $Rakuda = "\exF1\exD1\exF1\exCC"; # Camel in Kanji
+\& $message =~ s/\ebCamel\eb/$Rakuda/;
+\& print $message;
+.Ve
+.Sp
+will print
+.Sp
+.Vb 1
+\& "\exF1\exD1\exF1\exCC is the symbol of perl.\en"
+.Ve
+.Sp
+not
+.Sp
+.Vb 1
+\& "\ex{99F1}\ex{99DD} is the symbol of perl.\en"
+.Ve
+.Sp
+You can override this by giving extra arguments; see below.
+.Sp
+Note that \f(CW\*(C`STDERR\*(C'\fR WILL NOT be changed, regardless.
+.Sp
+Also note that non-STD file handles remain unaffected. Use \f(CW\*(C`use
+open\*(C'\fR or \f(CW\*(C`binmode\*(C'\fR to change the layers of those.
+.ie n .IP """use encoding \fIENCNAME\fR, Filter=>1;""" 4
+.el .IP "\f(CWuse encoding \fR\f(CIENCNAME\fR\f(CW, Filter=>1;\fR" 4
+.IX Item "use encoding ENCNAME, Filter=>1;"
+This operates as above, but the \f(CW\*(C`Filter\*(C'\fR argument with a non-zero
+value causes the entire script, and not just literals, to be translated from
+the encoding into UTF\-8. This allows identifiers in the source to be in that
+encoding as well. (Problems may occur if the encoding is not a superset of
+ASCII; imagine all your semi-colons being translated into something
+different.) One can use this form to make
+.Sp
+.Vb 1
+\& ${"\ex{4eba}"}++
+.Ve
+.Sp
+work. (This is equivalent to \f(CW\*(C`$\fR\f(CIhuman\fR\f(CW++\*(C'\fR, where \fIhuman\fR is a single Han
+ideograph).
+.Sp
+This effectively means that your source code behaves as if it were written in
+UTF\-8 with \f(CW\*(C`\*(Aquse utf8\*(C'\fR' in effect. So even if your editor only supports
+Shift_JIS, for example, you can still try examples in Chapter 15 of
+\&\f(CW\*(C`Programming Perl, 3rd Ed.\*(C'\fR.
+.Sp
+This option is significantly slower than the other one.
+.ie n .IP """no encoding;""" 4
+.el .IP "\f(CWno encoding;\fR" 4
+.IX Item "no encoding;"
+Unsets the script encoding. The layers of \f(CW\*(C`STDIN\*(C'\fR, \f(CW\*(C`STDOUT\*(C'\fR are
+reset to "\f(CW\*(C`:raw\*(C'\fR" (the default unprocessed raw stream of bytes).
+.SH OPTIONS
+.IX Header "OPTIONS"
+.ie n .SS "Setting ""STDIN"" and/or ""STDOUT"" individually"
+.el .SS "Setting \f(CWSTDIN\fP and/or \f(CWSTDOUT\fP individually"
+.IX Subsection "Setting STDIN and/or STDOUT individually"
+The encodings of \f(CW\*(C`STDIN\*(C'\fR and \f(CW\*(C`STDOUT\*(C'\fR are individually settable by parameters to
+the pragma:
+.PP
+.Vb 1
+\& use encoding \*(Aqeuc\-tw\*(Aq, STDIN => \*(Aqgreek\*(Aq ...;
+.Ve
+.PP
+In this case, you cannot omit the first \fIENCNAME\fR. \f(CW\*(C`STDIN => undef\*(C'\fR
+turns the I/O transcoding completely off for that filehandle.
+.PP
+When \f(CW\*(C`${^UNICODE}\*(C'\fR (available starting in v5.8.2) exists and is non-zero,
+these options will be completely ignored. See "\f(CW\*(C`${^UNICODE}\*(C'\fR" in perlvar and
+"\f(CW\*(C`\-C\*(C'\fR" in perlrun for details.
+.ie n .SS "The "":locale"" sub-pragma"
+.el .SS "The \f(CW:locale\fP sub-pragma"
+.IX Subsection "The :locale sub-pragma"
+Starting in v5.8.6, the encoding name may be \f(CW\*(C`:locale\*(C'\fR. This means that the
+encoding is taken from the current locale, and not hard-coded by the pragma.
+Since a script really can only be encoded in exactly one encoding, this option
+is dangerous. It makes sense only if the script itself is written in ASCII,
+and all the possible locales that will be in use when the script is executed
+are supersets of ASCII. That means that the script itself doesn't get
+changed, but the I/O handles have the specified encoding added, and the
+operations like \f(CW\*(C`chr\*(C'\fR and \f(CW\*(C`ord\*(C'\fR use that encoding.
+.PP
+The logic of finding which locale \f(CW\*(C`:locale\*(C'\fR uses is as follows:
+.IP 1. 4
+If the platform supports the \f(CWlanginfo(CODESET)\fR interface, the codeset
+returned is used as the default encoding for the open pragma.
+.IP 2. 4
+If 1. didn't work but we are under the locale pragma, the environment
+variables \f(CW\*(C`LC_ALL\*(C'\fR and \f(CW\*(C`LANG\*(C'\fR (in that order) are matched for encodings
+(the part after "\f(CW\*(C`.\*(C'\fR", if any), and if any found, that is used
+as the default encoding for the open pragma.
+.IP 3. 4
+If 1. and 2. didn't work, the environment variables \f(CW\*(C`LC_ALL\*(C'\fR and \f(CW\*(C`LANG\*(C'\fR
+(in that order) are matched for anything looking like UTF\-8, and if
+any found, \f(CW\*(C`:utf8\*(C'\fR is used as the default encoding for the open
+pragma.
+.PP
+If your locale environment variables (\f(CW\*(C`LC_ALL\*(C'\fR, \f(CW\*(C`LC_CTYPE\*(C'\fR, \f(CW\*(C`LANG\*(C'\fR)
+contain the strings 'UTF\-8' or 'UTF8' (case-insensitive matching),
+the default encoding of your \f(CW\*(C`STDIN\*(C'\fR, \f(CW\*(C`STDOUT\*(C'\fR, and \f(CW\*(C`STDERR\*(C'\fR, and of
+\&\fBany subsequent file open\fR, is UTF\-8.
+.SH CAVEATS
+.IX Header "CAVEATS"
+.SS "SIDE EFFECTS"
+.IX Subsection "SIDE EFFECTS"
+.IP \(bu 4
+If the \f(CW\*(C`encoding\*(C'\fR pragma is in scope then the lengths returned are
+calculated from the length of \f(CW$/\fR in Unicode characters, which is not
+always the same as the length of \f(CW$/\fR in the native encoding.
+.IP \(bu 4
+Without this pragma, if strings operating under byte semantics and strings
+with Unicode character data are concatenated, the new string will
+be created by decoding the byte strings as \fIISO 8859\-1 (Latin\-1)\fR.
+.Sp
+The \fBencoding\fR pragma changes this to use the specified encoding
+instead. For example:
+.Sp
+.Vb 5
+\& use encoding \*(Aqutf8\*(Aq;
+\& my $string = chr(20000); # a Unicode string
+\& utf8::encode($string); # now it\*(Aqs a UTF\-8 encoded byte string
+\& # concatenate with another Unicode string
+\& print length($string . chr(20000));
+.Ve
+.Sp
+Will print \f(CW2\fR, because \f(CW$string\fR is upgraded as UTF\-8. Without
+\&\f(CW\*(C`use encoding \*(Aqutf8\*(Aq;\*(C'\fR, it will print \f(CW4\fR instead, since \f(CW$string\fR
+is three octets when interpreted as Latin\-1.
+.SS "DO NOT MIX MULTIPLE ENCODINGS"
+.IX Subsection "DO NOT MIX MULTIPLE ENCODINGS"
+Notice that only literals (string or regular expression) having only
+legacy code points are affected: if you mix data like this
+.PP
+.Vb 2
+\& \ex{100}\exDF
+\& \exDF\ex{100}
+.Ve
+.PP
+the data is assumed to be in (Latin 1 and) Unicode, not in your native
+encoding. In other words, this will match in "greek":
+.PP
+.Vb 1
+\& "\exDF" =~ /\ex{3af}/
+.Ve
+.PP
+but this will not
+.PP
+.Vb 1
+\& "\exDF\ex{100}" =~ /\ex{3af}\ex{100}/
+.Ve
+.PP
+since the \f(CW\*(C`\exDF\*(C'\fR (ISO 8859\-7 GREEK SMALL LETTER IOTA WITH TONOS) on
+the left will \fBnot\fR be upgraded to \f(CW\*(C`\ex{3af}\*(C'\fR (Unicode GREEK SMALL
+LETTER IOTA WITH TONOS) because of the \f(CW\*(C`\ex{100}\*(C'\fR on the left. You
+should not be mixing your legacy data and Unicode in the same string.
+.PP
+This pragma also affects encoding of the 0x80..0xFF code point range:
+normally characters in that range are left as eight-bit bytes (unless
+they are combined with characters with code points 0x100 or larger,
+in which case all characters need to become UTF\-8 encoded), but if
+the \f(CW\*(C`encoding\*(C'\fR pragma is present, even the 0x80..0xFF range always
+gets UTF\-8 encoded.
+.PP
+After all, the best thing about this pragma is that you don't have to
+resort to \ex{....} just to spell your name in a native encoding.
+So feel free to put your strings in your encoding in quotes and
+regexes.
+.SS "Prior to Perl v5.22"
+.IX Subsection "Prior to Perl v5.22"
+The pragma was a per script, not a per block lexical. Only the last
+\&\f(CW\*(C`use encoding\*(C'\fR or \f(CW\*(C`no encoding\*(C'\fR mattered, and it affected
+\&\fBthe whole script\fR. However, the \f(CW\*(C`no encoding\*(C'\fR pragma was supported and
+\&\f(CW\*(C`use encoding\*(C'\fR could appear as many times as you want in a given script
+(though only the last was effective).
+.PP
+Since the scope wasn't lexical, other modules' use of \f(CW\*(C`chr\*(C'\fR, \f(CW\*(C`ord\*(C'\fR, \fIetc.\fR
+were affected. This leads to spooky, incorrect action at a distance that is
+hard to debug.
+.PP
+This means you would have to be very careful of the load order:
+.PP
+.Vb 5
+\& # called module
+\& package Module_IN_BAR;
+\& use encoding "bar";
+\& # stuff in "bar" encoding here
+\& 1;
+\&
+\& # caller script
+\& use encoding "foo"
+\& use Module_IN_BAR;
+\& # surprise! use encoding "bar" is in effect.
+.Ve
+.PP
+The best way to avoid this oddity is to use this pragma RIGHT AFTER
+other modules are loaded. i.e.
+.PP
+.Vb 2
+\& use Module_IN_BAR;
+\& use encoding "foo";
+.Ve
+.SS "Prior to Encode version 1.87"
+.IX Subsection "Prior to Encode version 1.87"
+.IP \(bu 4
+\&\f(CW\*(C`STDIN\*(C'\fR and \f(CW\*(C`STDOUT\*(C'\fR were not set under the filter option.
+And \f(CW\*(C`STDIN=>\fR\f(CIENCODING\fR\f(CW\*(C'\fR and \f(CW\*(C`STDOUT=>\fR\f(CIENCODING\fR\f(CW\*(C'\fR didn't work like
+non-filter version.
+.IP \(bu 4
+\&\f(CW\*(C`use utf8\*(C'\fR wasn't implicitly declared so you have to \f(CW\*(C`use utf8\*(C'\fR to do
+.Sp
+.Vb 1
+\& ${"\ex{4eba}"}++
+.Ve
+.SS "Prior to Perl v5.8.1"
+.IX Subsection "Prior to Perl v5.8.1"
+.IP """NON-EUC"" doublebyte encodings" 4
+.IX Item """NON-EUC"" doublebyte encodings"
+Because perl needs to parse the script before applying this pragma, such
+encodings as Shift_JIS and Big\-5 that may contain \f(CW\*(Aq\e\*(Aq\fR (BACKSLASH;
+\&\f(CW\*(C`\ex5c\*(C'\fR) in the second byte fail because the second byte may
+accidentally escape the quoting character that follows.
+.ie n .IP """tr///""" 4
+.el .IP \f(CWtr///\fR 4
+.IX Item "tr///"
+The \fBencoding\fR pragma works by decoding string literals in
+\&\f(CW\*(C`q//,qq//,qr//,qw///, qx//\*(C'\fR and so forth. In perl v5.8.0, this
+does not apply to \f(CW\*(C`tr///\*(C'\fR. Therefore,
+.Sp
+.Vb 4
+\& use encoding \*(Aqeuc\-jp\*(Aq;
+\& #....
+\& $kana =~ tr/\exA4\exA1\-\exA4\exF3/\exA5\exA1\-\exA5\exF3/;
+\& # \-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-
+.Ve
+.Sp
+Does not work as
+.Sp
+.Vb 1
+\& $kana =~ tr/\ex{3041}\-\ex{3093}/\ex{30a1}\-\ex{30f3}/;
+.Ve
+.RS 4
+.IP "Legend of characters above" 4
+.IX Item "Legend of characters above"
+.Vb 6
+\& utf8 euc\-jp charnames::viacode()
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& \ex{3041} \exA4\exA1 HIRAGANA LETTER SMALL A
+\& \ex{3093} \exA4\exF3 HIRAGANA LETTER N
+\& \ex{30a1} \exA5\exA1 KATAKANA LETTER SMALL A
+\& \ex{30f3} \exA5\exF3 KATAKANA LETTER N
+.Ve
+.RE
+.RS 4
+.Sp
+This counterintuitive behavior has been fixed in perl v5.8.1.
+.Sp
+In perl v5.8.0, you can work around this as follows;
+.Sp
+.Vb 3
+\& use encoding \*(Aqeuc\-jp\*(Aq;
+\& # ....
+\& eval qq{ \e$kana =~ tr/\exA4\exA1\-\exA4\exF3/\exA5\exA1\-\exA5\exF3/ };
+.Ve
+.Sp
+Note the \f(CW\*(C`tr//\*(C'\fR expression is surrounded by \f(CW\*(C`qq{}\*(C'\fR. The idea behind
+this is the same as the classic idiom that makes \f(CW\*(C`tr///\*(C'\fR 'interpolate':
+.Sp
+.Vb 2
+\& tr/$from/$to/; # wrong!
+\& eval qq{ tr/$from/$to/ }; # workaround.
+.Ve
+.RE
+.SH "EXAMPLE \- Greekperl"
+.IX Header "EXAMPLE - Greekperl"
+.Vb 1
+\& use encoding "iso 8859\-7";
+\&
+\& # \exDF in ISO 8859\-7 (Greek) is \ex{3af} in Unicode.
+\&
+\& $a = "\exDF";
+\& $b = "\ex{100}";
+\&
+\& printf "%#x\en", ord($a); # will print 0x3af, not 0xdf
+\&
+\& $c = $a . $b;
+\&
+\& # $c will be "\ex{3af}\ex{100}", not "\ex{df}\ex{100}".
+\&
+\& # chr() is affected, and ...
+\&
+\& print "mega\en" if ord(chr(0xdf)) == 0x3af;
+\&
+\& # ... ord() is affected by the encoding pragma ...
+\&
+\& print "tera\en" if ord(pack("C", 0xdf)) == 0x3af;
+\&
+\& # ... as are eq and cmp ...
+\&
+\& print "peta\en" if "\ex{3af}" eq pack("C", 0xdf);
+\& print "exa\en" if "\ex{3af}" cmp pack("C", 0xdf) == 0;
+\&
+\& # ... but pack/unpack C are not affected, in case you still
+\& # want to go back to your native encoding
+\&
+\& print "zetta\en" if unpack("C", (pack("C", 0xdf))) == 0xdf;
+.Ve
+.SH BUGS
+.IX Header "BUGS"
+.IP "Thread safety" 4
+.IX Item "Thread safety"
+\&\f(CW\*(C`use encoding ...\*(C'\fR is not thread-safe (i.e., do not use in threaded
+applications).
+.IP "Can't be used by more than one module in a single program." 4
+.IX Item "Can't be used by more than one module in a single program."
+Only one encoding is allowed. If you combine modules in a program that have
+different encodings, only one will be actually used.
+.ie n .IP "Other modules using ""STDIN"" and ""STDOUT"" get the encoded stream" 4
+.el .IP "Other modules using \f(CWSTDIN\fR and \f(CWSTDOUT\fR get the encoded stream" 4
+.IX Item "Other modules using STDIN and STDOUT get the encoded stream"
+They may be expecting something completely different.
+.IP "literals in regex that are longer than 127 bytes" 4
+.IX Item "literals in regex that are longer than 127 bytes"
+For native multibyte encodings (either fixed or variable length),
+the current implementation of the regular expressions may introduce
+recoding errors for regular expression literals longer than 127 bytes.
+.IP EBCDIC 4
+.IX Item "EBCDIC"
+The encoding pragma is not supported on EBCDIC platforms.
+.ie n .IP """format""" 4
+.el .IP \f(CWformat\fR 4
+.IX Item "format"
+This pragma doesn't work well with \f(CW\*(C`format\*(C'\fR because PerlIO does not
+get along very well with it. When \f(CW\*(C`format\*(C'\fR contains non-ASCII
+characters it prints funny or gets "wide character warnings".
+To understand it, try the code below.
+.Sp
+.Vb 11
+\& # Save this one in utf8
+\& # replace *non\-ascii* with a non\-ascii string
+\& my $camel;
+\& format STDOUT =
+\& *non\-ascii*@>>>>>>>
+\& $camel
+\& .
+\& $camel = "*non\-ascii*";
+\& binmode(STDOUT=>\*(Aq:encoding(utf8)\*(Aq); # bang!
+\& write; # funny
+\& print $camel, "\en"; # fine
+.Ve
+.Sp
+Without binmode this happens to work but without binmode, \fBprint()\fR
+fails instead of \fBwrite()\fR.
+.Sp
+At any rate, the very use of \f(CW\*(C`format\*(C'\fR is questionable when it comes to
+unicode characters since you have to consider such things as character
+width (i.e. double-width for ideographs) and directions (i.e. BIDI for
+Arabic and Hebrew).
+.IP "See also ""CAVEATS""" 4
+.IX Item "See also ""CAVEATS"""
+.SH HISTORY
+.IX Header "HISTORY"
+This pragma first appeared in Perl v5.8.0. It has been enhanced in later
+releases as specified above.
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+perlunicode, Encode, open, Filter::Util::Call,
+.PP
+Ch. 15 of \f(CW\*(C`Programming Perl (3rd Edition)\*(C'\fR
+by Larry Wall, Tom Christiansen, Jon Orwant;
+O'Reilly & Associates; ISBN 0\-596\-00027\-8