Adding upstream version 4.22.0.upstream/4.22.0

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-15 19:43:11 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-15 19:43:11 +0000
commit: fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
tree: ce1e3bce06471410239a6f41282e328770aa404a /upstream/mageia-cauldron/man3pm/encoding.3pm
parent: Initial commit. (diff)
download: manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz
manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip
1 files changed, 544 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man3pm/encoding.3pm b/upstream/mageia-cauldron/man3pm/encoding.3pm
new file mode 100644
index 00000000..3f7216a9
--- /dev/null
+++ b/upstream/mageia-cauldron/man3pm/encoding.3pm
@@ -0,0 +1,544 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+.    ds C` ""
+.    ds C' ""
+'br\}
+.el\{\
+.    ds C`
+.    ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el       .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD.  Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+.    if \nF \{\
+.        de IX
+.        tm Index:\\$1\t\\n%\t"\\$2"
+..
+.        if !\nF==2 \{\
+.            nr % 0
+.            nr F 2
+.        \}
+.    \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "encoding 3pm"
+.TH encoding 3pm 2023-11-28 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+encoding \- allows you to write your script in non\-ASCII and non\-UTF\-8
+.SH WARNING
+.IX Header "WARNING"
+This module has been deprecated since perl v5.18.  See "DESCRIPTION" and
+"BUGS".
+.SH SYNOPSIS
+.IX Header "SYNOPSIS"
+.Vb 2
+\&  use encoding "greek";  # Perl like Greek to you?
+\&  use encoding "euc\-jp"; # Jperl!
+\&
+\&  # or you can even do this if your shell supports your native encoding
+\&
+\&  perl \-Mencoding=latin2 \-e\*(Aq...\*(Aq # Feeling centrally European?
+\&  perl \-Mencoding=euc\-kr \-e\*(Aq...\*(Aq # Or Korean?
+\&
+\&  # more control
+\&
+\&  # A simple euc\-cn => utf\-8 converter
+\&  use encoding "euc\-cn", STDOUT => "utf8";  while(<>){print};
+\&
+\&  # "no encoding;" supported
+\&  no encoding;
+\&
+\&  # an alternate way, Filter
+\&  use encoding "euc\-jp", Filter=>1;
+\&  # now you can use kanji identifiers \-\- in euc\-jp!
+\&
+\&  # encode based on the current locale \- specialized purposes only;
+\&  # fraught with danger!!
+\&  use encoding \*(Aq:locale\*(Aq;
+.Ve
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+This pragma is used to enable a Perl script to be written in encodings that
+aren't strictly ASCII nor UTF\-8.  It translates all or portions of the Perl
+program script from a given encoding into UTF\-8, and changes the PerlIO layers
+of \f(CW\*(C`STDIN\*(C'\fR and \f(CW\*(C`STDOUT\*(C'\fR to the encoding specified.
+.PP
+This pragma dates from the days when UTF\-8\-enabled editors were uncommon.  But
+that was long ago, and the need for it is greatly diminished.  That, coupled
+with the fact that it doesn't work with threads, along with other problems,
+(see "BUGS") have led to its being deprecated.  It is planned to remove this
+pragma in a future Perl version.  New code should be written in UTF\-8, and the
+\&\f(CW\*(C`use utf8\*(C'\fR pragma used instead (see perluniintro and utf8 for details).
+Old code should be converted to UTF\-8, via something like the recipe in the
+"SYNOPSIS" (though this simple approach may require manual adjustments
+afterwards).
+.PP
+If UTF\-8 is not an option, it is recommended that one use a simple source
+filter, such as that provided by Filter::Encoding on CPAN or this
+pragma's own \f(CW\*(C`Filter\*(C'\fR option (see below).
+.PP
+The only legitimate use of this pragma is almost certainly just one per file,
+near the top, with file scope, as the file is likely going to only be written
+in one encoding.  Further restrictions apply in Perls before v5.22 (see
+"Prior to Perl v5.22").
+.PP
+There are two basic modes of operation (plus turning if off):
+.ie n .IP """use encoding [\*(Aq\fIENCNAME\fR\*(Aq] ;""" 4
+.el .IP "\f(CWuse encoding [\*(Aq\fR\f(CIENCNAME\fR\f(CW\*(Aq] ;\fR" 4
+.IX Item "use encoding [ENCNAME] ;"
+Please note: This mode of operation is no longer supported as of Perl
+v5.26.
+.Sp
+This is the normal operation.  It translates various literals encountered in
+the Perl source file from the encoding \fIENCNAME\fR into UTF\-8, and similarly
+converts character code points.  This is used when the script is a combination
+of ASCII (for the variable names and punctuation, \fIetc\fR), but the literal
+data is in the specified encoding.
+.Sp
+\&\fIENCNAME\fR is optional.  If omitted, the encoding specified in the environment
+variable \f(CW\*(C`PERL_ENCODING\*(C'\fR is used.  If this isn't
+set, or the resolved-to encoding is not known to \f(CW\*(C`Encode\*(C'\fR, the error
+\&\f(CW\*(C`Unknown encoding \*(Aq\fR\f(CIENCNAME\fR\f(CW\*(Aq\*(C'\fR will be thrown.
+.Sp
+Starting in Perl v5.8.6 (\f(CW\*(C`Encode\*(C'\fR version 2.0.1), \fIENCNAME\fR may be the
+name \f(CW\*(C`:locale\*(C'\fR.  This is for very specialized applications, and is documented
+in "The \f(CW\*(C`:locale\*(C'\fR sub-pragma" below.
+.Sp
+The literals that are converted are \f(CW\*(C`q//, qq//, qr//, qw///, qx//\*(C'\fR, and
+starting in v5.8.1, \f(CW\*(C`tr///\*(C'\fR.  Operations that do conversions include \f(CW\*(C`chr\*(C'\fR,
+\&\f(CW\*(C`ord\*(C'\fR, \f(CW\*(C`utf8::upgrade\*(C'\fR (but not \f(CW\*(C`utf8::downgrade\*(C'\fR), and \f(CW\*(C`chomp\*(C'\fR.
+.Sp
+Also starting in v5.8.1, the \f(CW\*(C`DATA\*(C'\fR pseudo-filehandle is translated from the
+encoding into UTF\-8.
+.Sp
+For example, you can write code in EUC-JP as follows:
+.Sp
+.Vb 3
+\&  my $Rakuda = "\exF1\exD1\exF1\exCC"; # Camel in Kanji
+\&               #<\-char\-><\-char\->   # 4 octets
+\&  s/\ebCamel\eb/$Rakuda/;
+.Ve
+.Sp
+And with \f(CW\*(C`use encoding "euc\-jp"\*(C'\fR in effect, it is the same thing as
+that code in UTF\-8:
+.Sp
+.Vb 2
+\&  my $Rakuda = "\ex{99F1}\ex{99DD}"; # two Unicode Characters
+\&  s/\ebCamel\eb/$Rakuda/;
+.Ve
+.Sp
+See "EXAMPLE" below for a more complete example.
+.Sp
+Unless \f(CW\*(C`${^UNICODE}\*(C'\fR (available starting in v5.8.2) exists and is non-zero, the
+PerlIO layers of \f(CW\*(C`STDIN\*(C'\fR and \f(CW\*(C`STDOUT\*(C'\fR are set to "\f(CW:encoding(\fR\f(CIENCNAME\fR\f(CW)\fR".
+Therefore,
+.Sp
+.Vb 5
+\&  use encoding "euc\-jp";
+\&  my $message = "Camel is the symbol of perl.\en";
+\&  my $Rakuda = "\exF1\exD1\exF1\exCC"; # Camel in Kanji
+\&  $message =~ s/\ebCamel\eb/$Rakuda/;
+\&  print $message;
+.Ve
+.Sp
+will print
+.Sp
+.Vb 1
+\& "\exF1\exD1\exF1\exCC is the symbol of perl.\en"
+.Ve
+.Sp
+not
+.Sp
+.Vb 1
+\& "\ex{99F1}\ex{99DD} is the symbol of perl.\en"
+.Ve
+.Sp
+You can override this by giving extra arguments; see below.
+.Sp
+Note that \f(CW\*(C`STDERR\*(C'\fR WILL NOT be changed, regardless.
+.Sp
+Also note that non-STD file handles remain unaffected.  Use \f(CW\*(C`use
+open\*(C'\fR or \f(CW\*(C`binmode\*(C'\fR to change the layers of those.
+.ie n .IP """use encoding \fIENCNAME\fR, Filter=>1;""" 4
+.el .IP "\f(CWuse encoding \fR\f(CIENCNAME\fR\f(CW, Filter=>1;\fR" 4
+.IX Item "use encoding ENCNAME, Filter=>1;"
+This operates as above, but the \f(CW\*(C`Filter\*(C'\fR argument with a non-zero
+value causes the entire script, and not just literals, to be translated from
+the encoding into UTF\-8.  This allows identifiers in the source to be in that
+encoding as well.  (Problems may occur if the encoding is not a superset of
+ASCII; imagine all your semi-colons being translated into something
+different.)  One can use this form to make
+.Sp
+.Vb 1
+\& ${"\ex{4eba}"}++
+.Ve
+.Sp
+work.  (This is equivalent to \f(CW\*(C`$\fR\f(CIhuman\fR\f(CW++\*(C'\fR, where \fIhuman\fR is a single Han
+ideograph).
+.Sp
+This effectively means that your source code behaves as if it were written in
+UTF\-8 with \f(CW\*(C`\*(Aquse utf8\*(C'\fR' in effect.  So even if your editor only supports
+Shift_JIS, for example, you can still try examples in Chapter 15 of
+\&\f(CW\*(C`Programming Perl, 3rd Ed.\*(C'\fR.
+.Sp
+This option is significantly slower than the other one.
+.ie n .IP """no encoding;""" 4
+.el .IP "\f(CWno encoding;\fR" 4
+.IX Item "no encoding;"
+Unsets the script encoding. The layers of \f(CW\*(C`STDIN\*(C'\fR, \f(CW\*(C`STDOUT\*(C'\fR are
+reset to "\f(CW\*(C`:raw\*(C'\fR" (the default unprocessed raw stream of bytes).
+.SH OPTIONS
+.IX Header "OPTIONS"
+.ie n .SS "Setting ""STDIN"" and/or ""STDOUT"" individually"
+.el .SS "Setting \f(CWSTDIN\fP and/or \f(CWSTDOUT\fP individually"
+.IX Subsection "Setting STDIN and/or STDOUT individually"
+The encodings of \f(CW\*(C`STDIN\*(C'\fR and \f(CW\*(C`STDOUT\*(C'\fR are individually settable by parameters to
+the pragma:
+.PP
+.Vb 1
+\& use encoding \*(Aqeuc\-tw\*(Aq, STDIN => \*(Aqgreek\*(Aq  ...;
+.Ve
+.PP
+In this case, you cannot omit the first \fIENCNAME\fR.  \f(CW\*(C`STDIN => undef\*(C'\fR
+turns the I/O transcoding completely off for that filehandle.
+.PP
+When \f(CW\*(C`${^UNICODE}\*(C'\fR (available starting in v5.8.2) exists and is non-zero,
+these options will be completely ignored.  See "\f(CW\*(C`${^UNICODE}\*(C'\fR" in perlvar and
+"\f(CW\*(C`\-C\*(C'\fR" in perlrun for details.
+.ie n .SS "The "":locale"" sub-pragma"
+.el .SS "The \f(CW:locale\fP sub-pragma"
+.IX Subsection "The :locale sub-pragma"
+Starting in v5.8.6, the encoding name may be \f(CW\*(C`:locale\*(C'\fR.  This means that the
+encoding is taken from the current locale, and not hard-coded by the pragma.
+Since a script really can only be encoded in exactly one encoding, this option
+is dangerous.  It makes sense only if the script itself is written in ASCII,
+and all the possible locales that will be in use when the script is executed
+are supersets of ASCII.  That means that the script itself doesn't get
+changed, but the I/O handles have the specified encoding added, and the
+operations like \f(CW\*(C`chr\*(C'\fR and \f(CW\*(C`ord\*(C'\fR use that encoding.
+.PP
+The logic of finding which locale \f(CW\*(C`:locale\*(C'\fR uses is as follows:
+.IP 1. 4
+If the platform supports the \f(CWlanginfo(CODESET)\fR interface, the codeset
+returned is used as the default encoding for the open pragma.
+.IP 2. 4
+If 1. didn't work but we are under the locale pragma, the environment
+variables \f(CW\*(C`LC_ALL\*(C'\fR and \f(CW\*(C`LANG\*(C'\fR (in that order) are matched for encodings
+(the part after "\f(CW\*(C`.\*(C'\fR", if any), and if any found, that is used
+as the default encoding for the open pragma.
+.IP 3. 4
+If 1. and 2. didn't work, the environment variables \f(CW\*(C`LC_ALL\*(C'\fR and \f(CW\*(C`LANG\*(C'\fR
+(in that order) are matched for anything looking like UTF\-8, and if
+any found, \f(CW\*(C`:utf8\*(C'\fR is used as the default encoding for the open
+pragma.
+.PP
+If your locale environment variables (\f(CW\*(C`LC_ALL\*(C'\fR, \f(CW\*(C`LC_CTYPE\*(C'\fR, \f(CW\*(C`LANG\*(C'\fR)
+contain the strings 'UTF\-8' or 'UTF8' (case-insensitive matching),
+the default encoding of your \f(CW\*(C`STDIN\*(C'\fR, \f(CW\*(C`STDOUT\*(C'\fR, and \f(CW\*(C`STDERR\*(C'\fR, and of
+\&\fBany subsequent file open\fR, is UTF\-8.
+.SH CAVEATS
+.IX Header "CAVEATS"
+.SS "SIDE EFFECTS"
+.IX Subsection "SIDE EFFECTS"
+.IP \(bu 4
+If the \f(CW\*(C`encoding\*(C'\fR pragma is in scope then the lengths returned are
+calculated from the length of \f(CW$/\fR in Unicode characters, which is not
+always the same as the length of \f(CW$/\fR in the native encoding.
+.IP \(bu 4
+Without this pragma, if strings operating under byte semantics and strings
+with Unicode character data are concatenated, the new string will
+be created by decoding the byte strings as \fIISO 8859\-1 (Latin\-1)\fR.
+.Sp
+The \fBencoding\fR pragma changes this to use the specified encoding
+instead.  For example:
+.Sp
+.Vb 5
+\&    use encoding \*(Aqutf8\*(Aq;
+\&    my $string = chr(20000); # a Unicode string
+\&    utf8::encode($string);   # now it\*(Aqs a UTF\-8 encoded byte string
+\&    # concatenate with another Unicode string
+\&    print length($string . chr(20000));
+.Ve
+.Sp
+Will print \f(CW2\fR, because \f(CW$string\fR is upgraded as UTF\-8.  Without
+\&\f(CW\*(C`use encoding \*(Aqutf8\*(Aq;\*(C'\fR, it will print \f(CW4\fR instead, since \f(CW$string\fR
+is three octets when interpreted as Latin\-1.
+.SS "DO NOT MIX MULTIPLE ENCODINGS"
+.IX Subsection "DO NOT MIX MULTIPLE ENCODINGS"
+Notice that only literals (string or regular expression) having only
+legacy code points are affected: if you mix data like this
+.PP
+.Vb 2
+\&    \ex{100}\exDF
+\&    \exDF\ex{100}
+.Ve
+.PP
+the data is assumed to be in (Latin 1 and) Unicode, not in your native
+encoding.  In other words, this will match in "greek":
+.PP
+.Vb 1
+\&    "\exDF" =~ /\ex{3af}/
+.Ve
+.PP
+but this will not
+.PP
+.Vb 1
+\&    "\exDF\ex{100}" =~ /\ex{3af}\ex{100}/
+.Ve
+.PP
+since the \f(CW\*(C`\exDF\*(C'\fR (ISO 8859\-7 GREEK SMALL LETTER IOTA WITH TONOS) on
+the left will \fBnot\fR be upgraded to \f(CW\*(C`\ex{3af}\*(C'\fR (Unicode GREEK SMALL
+LETTER IOTA WITH TONOS) because of the \f(CW\*(C`\ex{100}\*(C'\fR on the left.  You
+should not be mixing your legacy data and Unicode in the same string.
+.PP
+This pragma also affects encoding of the 0x80..0xFF code point range:
+normally characters in that range are left as eight-bit bytes (unless
+they are combined with characters with code points 0x100 or larger,
+in which case all characters need to become UTF\-8 encoded), but if
+the \f(CW\*(C`encoding\*(C'\fR pragma is present, even the 0x80..0xFF range always
+gets UTF\-8 encoded.
+.PP
+After all, the best thing about this pragma is that you don't have to
+resort to \ex{....} just to spell your name in a native encoding.
+So feel free to put your strings in your encoding in quotes and
+regexes.
+.SS "Prior to Perl v5.22"
+.IX Subsection "Prior to Perl v5.22"
+The pragma was a per script, not a per block lexical.  Only the last
+\&\f(CW\*(C`use encoding\*(C'\fR or \f(CW\*(C`no encoding\*(C'\fR mattered, and it affected
+\&\fBthe whole script\fR.  However, the \f(CW\*(C`no encoding\*(C'\fR pragma was supported and
+\&\f(CW\*(C`use encoding\*(C'\fR could appear as many times as you want in a given script
+(though only the last was effective).
+.PP
+Since the scope wasn't lexical, other modules' use of \f(CW\*(C`chr\*(C'\fR, \f(CW\*(C`ord\*(C'\fR, \fIetc.\fR
+were affected.  This leads to spooky, incorrect action at a distance that is
+hard to debug.
+.PP
+This means you would have to be very careful of the load order:
+.PP
+.Vb 5
+\&  # called module
+\&  package Module_IN_BAR;
+\&  use encoding "bar";
+\&  # stuff in "bar" encoding here
+\&  1;
+\&
+\&  # caller script
+\&  use encoding "foo"
+\&  use Module_IN_BAR;
+\&  # surprise! use encoding "bar" is in effect.
+.Ve
+.PP
+The best way to avoid this oddity is to use this pragma RIGHT AFTER
+other modules are loaded.  i.e.
+.PP
+.Vb 2
+\&  use Module_IN_BAR;
+\&  use encoding "foo";
+.Ve
+.SS "Prior to Encode version 1.87"
+.IX Subsection "Prior to Encode version 1.87"
+.IP \(bu 4
+\&\f(CW\*(C`STDIN\*(C'\fR and \f(CW\*(C`STDOUT\*(C'\fR were not set under the filter option.
+And \f(CW\*(C`STDIN=>\fR\f(CIENCODING\fR\f(CW\*(C'\fR and \f(CW\*(C`STDOUT=>\fR\f(CIENCODING\fR\f(CW\*(C'\fR didn't work like
+non-filter version.
+.IP \(bu 4
+\&\f(CW\*(C`use utf8\*(C'\fR wasn't implicitly declared so you have to \f(CW\*(C`use utf8\*(C'\fR to do
+.Sp
+.Vb 1
+\& ${"\ex{4eba}"}++
+.Ve
+.SS "Prior to Perl v5.8.1"
+.IX Subsection "Prior to Perl v5.8.1"
+.IP """NON-EUC"" doublebyte encodings" 4
+.IX Item """NON-EUC"" doublebyte encodings"
+Because perl needs to parse the script before applying this pragma, such
+encodings as Shift_JIS and Big\-5 that may contain \f(CW\*(Aq\e\*(Aq\fR (BACKSLASH;
+\&\f(CW\*(C`\ex5c\*(C'\fR) in the second byte fail because the second byte may
+accidentally escape the quoting character that follows.
+.ie n .IP """tr///""" 4
+.el .IP \f(CWtr///\fR 4
+.IX Item "tr///"
+The \fBencoding\fR pragma works by decoding string literals in
+\&\f(CW\*(C`q//,qq//,qr//,qw///, qx//\*(C'\fR and so forth.  In perl v5.8.0, this
+does not apply to \f(CW\*(C`tr///\*(C'\fR.  Therefore,
+.Sp
+.Vb 4
+\&  use encoding \*(Aqeuc\-jp\*(Aq;
+\&  #....
+\&  $kana =~ tr/\exA4\exA1\-\exA4\exF3/\exA5\exA1\-\exA5\exF3/;
+\&  #           \-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-
+.Ve
+.Sp
+Does not work as
+.Sp
+.Vb 1
+\&  $kana =~ tr/\ex{3041}\-\ex{3093}/\ex{30a1}\-\ex{30f3}/;
+.Ve
+.RS 4
+.IP "Legend of characters above" 4
+.IX Item "Legend of characters above"
+.Vb 6
+\&  utf8     euc\-jp   charnames::viacode()
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  \ex{3041} \exA4\exA1 HIRAGANA LETTER SMALL A
+\&  \ex{3093} \exA4\exF3 HIRAGANA LETTER N
+\&  \ex{30a1} \exA5\exA1 KATAKANA LETTER SMALL A
+\&  \ex{30f3} \exA5\exF3 KATAKANA LETTER N
+.Ve
+.RE
+.RS 4
+.Sp
+This counterintuitive behavior has been fixed in perl v5.8.1.
+.Sp
+In perl v5.8.0, you can work around this as follows;
+.Sp
+.Vb 3
+\&  use encoding \*(Aqeuc\-jp\*(Aq;
+\&  #  ....
+\&  eval qq{ \e$kana =~ tr/\exA4\exA1\-\exA4\exF3/\exA5\exA1\-\exA5\exF3/ };
+.Ve
+.Sp
+Note the \f(CW\*(C`tr//\*(C'\fR expression is surrounded by \f(CW\*(C`qq{}\*(C'\fR.  The idea behind
+this is the same as the classic idiom that makes \f(CW\*(C`tr///\*(C'\fR 'interpolate':
+.Sp
+.Vb 2
+\&   tr/$from/$to/;            # wrong!
+\&   eval qq{ tr/$from/$to/ }; # workaround.
+.Ve
+.RE
+.SH "EXAMPLE \- Greekperl"
+.IX Header "EXAMPLE - Greekperl"
+.Vb 1
+\&    use encoding "iso 8859\-7";
+\&
+\&    # \exDF in ISO 8859\-7 (Greek) is \ex{3af} in Unicode.
+\&
+\&    $a = "\exDF";
+\&    $b = "\ex{100}";
+\&
+\&    printf "%#x\en", ord($a); # will print 0x3af, not 0xdf
+\&
+\&    $c = $a . $b;
+\&
+\&    # $c will be "\ex{3af}\ex{100}", not "\ex{df}\ex{100}".
+\&
+\&    # chr() is affected, and ...
+\&
+\&    print "mega\en"  if ord(chr(0xdf)) == 0x3af;
+\&
+\&    # ... ord() is affected by the encoding pragma ...
+\&
+\&    print "tera\en" if ord(pack("C", 0xdf)) == 0x3af;
+\&
+\&    # ... as are eq and cmp ...
+\&
+\&    print "peta\en" if "\ex{3af}" eq  pack("C", 0xdf);
+\&    print "exa\en"  if "\ex{3af}" cmp pack("C", 0xdf) == 0;
+\&
+\&    # ... but pack/unpack C are not affected, in case you still
+\&    # want to go back to your native encoding
+\&
+\&    print "zetta\en" if unpack("C", (pack("C", 0xdf))) == 0xdf;
+.Ve
+.SH BUGS
+.IX Header "BUGS"
+.IP "Thread safety" 4
+.IX Item "Thread safety"
+\&\f(CW\*(C`use encoding ...\*(C'\fR is not thread-safe (i.e., do not use in threaded
+applications).
+.IP "Can't be used by more than one module in a single program." 4
+.IX Item "Can't be used by more than one module in a single program."
+Only one encoding is allowed.  If you combine modules in a program that have
+different encodings, only one will be actually used.
+.ie n .IP "Other modules using ""STDIN"" and ""STDOUT"" get the encoded stream" 4
+.el .IP "Other modules using \f(CWSTDIN\fR and \f(CWSTDOUT\fR get the encoded stream" 4
+.IX Item "Other modules using STDIN and STDOUT get the encoded stream"
+They may be expecting something completely different.
+.IP "literals in regex that are longer than 127 bytes" 4
+.IX Item "literals in regex that are longer than 127 bytes"
+For native multibyte encodings (either fixed or variable length),
+the current implementation of the regular expressions may introduce
+recoding errors for regular expression literals longer than 127 bytes.
+.IP EBCDIC 4
+.IX Item "EBCDIC"
+The encoding pragma is not supported on EBCDIC platforms.
+.ie n .IP """format""" 4
+.el .IP \f(CWformat\fR 4
+.IX Item "format"
+This pragma doesn't work well with \f(CW\*(C`format\*(C'\fR because PerlIO does not
+get along very well with it.  When \f(CW\*(C`format\*(C'\fR contains non-ASCII
+characters it prints funny or gets "wide character warnings".
+To understand it, try the code below.
+.Sp
+.Vb 11
+\&  # Save this one in utf8
+\&  # replace *non\-ascii* with a non\-ascii string
+\&  my $camel;
+\&  format STDOUT =
+\&  *non\-ascii*@>>>>>>>
+\&  $camel
+\&  .
+\&  $camel = "*non\-ascii*";
+\&  binmode(STDOUT=>\*(Aq:encoding(utf8)\*(Aq); # bang!
+\&  write;              # funny
+\&  print $camel, "\en"; # fine
+.Ve
+.Sp
+Without binmode this happens to work but without binmode, \fBprint()\fR
+fails instead of \fBwrite()\fR.
+.Sp
+At any rate, the very use of \f(CW\*(C`format\*(C'\fR is questionable when it comes to
+unicode characters since you have to consider such things as character
+width (i.e. double-width for ideographs) and directions (i.e. BIDI for
+Arabic and Hebrew).
+.IP "See also ""CAVEATS""" 4
+.IX Item "See also ""CAVEATS"""
+.SH HISTORY
+.IX Header "HISTORY"
+This pragma first appeared in Perl v5.8.0.  It has been enhanced in later
+releases as specified above.
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+perlunicode, Encode, open, Filter::Util::Call,
+.PP
+Ch. 15 of \f(CW\*(C`Programming Perl (3rd Edition)\*(C'\fR
+by Larry Wall, Tom Christiansen, Jon Orwant;
+O'Reilly & Associates; ISBN 0\-596\-00027\-8
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-15 19:43:11 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-15 19:43:11 +0000
commit	fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
tree	ce1e3bce06471410239a6f41282e328770aa404a /upstream/mageia-cauldron/man3pm/encoding.3pm
parent	Initial commit. (diff)
download	manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip