Adding upstream version 4.22.0.upstream/4.22.0

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-15 19:43:11 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-15 19:43:11 +0000
commit: fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
tree: ce1e3bce06471410239a6f41282e328770aa404a /upstream/mageia-cauldron/man3pm/Encode.3pm
parent: Initial commit. (diff)
download: manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz
manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip
1 files changed, 881 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man3pm/Encode.3pm b/upstream/mageia-cauldron/man3pm/Encode.3pm
new file mode 100644
index 00000000..e5ec2aec
--- /dev/null
+++ b/upstream/mageia-cauldron/man3pm/Encode.3pm
@@ -0,0 +1,881 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+.    ds C` ""
+.    ds C' ""
+'br\}
+.el\{\
+.    ds C`
+.    ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el       .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD.  Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+.    if \nF \{\
+.        de IX
+.        tm Index:\\$1\t\\n%\t"\\$2"
+..
+.        if !\nF==2 \{\
+.            nr % 0
+.            nr F 2
+.        \}
+.    \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "Encode 3pm"
+.TH Encode 3pm 2023-11-28 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+Encode \- character encodings in Perl
+.SH SYNOPSIS
+.IX Header "SYNOPSIS"
+.Vb 3
+\&    use Encode qw(decode encode);
+\&    $characters = decode(\*(AqUTF\-8\*(Aq, $octets,     Encode::FB_CROAK);
+\&    $octets     = encode(\*(AqUTF\-8\*(Aq, $characters, Encode::FB_CROAK);
+.Ve
+.SS "Table of Contents"
+.IX Subsection "Table of Contents"
+Encode consists of a collection of modules whose details are too extensive
+to fit in one document.  This one itself explains the top-level APIs
+and general topics at a glance.  For other topics and more details,
+see the documentation for these modules:
+.IP "Encode::Alias \- Alias definitions to encodings" 2
+.IX Item "Encode::Alias - Alias definitions to encodings"
+.PD 0
+.IP "Encode::Encoding \- Encode Implementation Base Class" 2
+.IX Item "Encode::Encoding - Encode Implementation Base Class"
+.IP "Encode::Supported \- List of Supported Encodings" 2
+.IX Item "Encode::Supported - List of Supported Encodings"
+.IP "Encode::CN \- Simplified Chinese Encodings" 2
+.IX Item "Encode::CN - Simplified Chinese Encodings"
+.IP "Encode::JP \- Japanese Encodings" 2
+.IX Item "Encode::JP - Japanese Encodings"
+.IP "Encode::KR \- Korean Encodings" 2
+.IX Item "Encode::KR - Korean Encodings"
+.IP "Encode::TW \- Traditional Chinese Encodings" 2
+.IX Item "Encode::TW - Traditional Chinese Encodings"
+.PD
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+The \f(CW\*(C`Encode\*(C'\fR module provides the interface between Perl strings
+and the rest of the system.  Perl strings are sequences of
+\&\fIcharacters\fR.
+.PP
+The repertoire of characters that Perl can represent is a superset of those
+defined by the Unicode Consortium. On most platforms the ordinal
+values of a character as returned by \f(CWord(\fR\f(CIS\fR\f(CW)\fR is the \fIUnicode
+codepoint\fR for that character. The exceptions are platforms where
+the legacy encoding is some variant of EBCDIC rather than a superset
+of ASCII; see perlebcdic.
+.PP
+During recent history, data is moved around a computer in 8\-bit chunks,
+often called "bytes" but also known as "octets" in standards documents.
+Perl is widely used to manipulate data of many types: not only strings of
+characters representing human or computer languages, but also "binary"
+data, being the machine's representation of numbers, pixels in an image, or
+just about anything.
+.PP
+When Perl is processing "binary data", the programmer wants Perl to
+process "sequences of bytes". This is not a problem for Perl: because a
+byte has 256 possible values, it easily fits in Perl's much larger
+"logical character".
+.PP
+This document mostly explains the \fIhow\fR. perlunitut and perlunifaq
+explain the \fIwhy\fR.
+.SS TERMINOLOGY
+.IX Subsection "TERMINOLOGY"
+\fIcharacter\fR
+.IX Subsection "character"
+.PP
+A character in the range 0 .. 2**32\-1 (or more);
+what Perl's strings are made of.
+.PP
+\fIbyte\fR
+.IX Subsection "byte"
+.PP
+A character in the range 0..255;
+a special case of a Perl character.
+.PP
+\fIoctet\fR
+.IX Subsection "octet"
+.PP
+8 bits of data, with ordinal values 0..255;
+term for bytes passed to or from a non-Perl context, such as a disk file,
+standard I/O stream, database, command-line argument, environment variable,
+socket etc.
+.SH "THE PERL ENCODING API"
+.IX Header "THE PERL ENCODING API"
+.SS "Basic methods"
+.IX Subsection "Basic methods"
+\fIencode\fR
+.IX Subsection "encode"
+.PP
+.Vb 1
+\&  $octets  = encode(ENCODING, STRING[, CHECK])
+.Ve
+.PP
+Encodes the scalar value \fISTRING\fR from Perl's internal form into
+\&\fIENCODING\fR and returns a sequence of octets.  \fIENCODING\fR can be either a
+canonical name or an alias.  For encoding names and aliases, see
+"Defining Aliases".  For CHECK, see "Handling Malformed Data".
+.PP
+\&\fBCAVEAT\fR: the input scalar \fISTRING\fR might be modified in-place depending
+on what is set in CHECK. See "LEAVE_SRC" if you want your inputs to be
+left unchanged.
+.PP
+For example, to convert a string from Perl's internal format into
+ISO\-8859\-1, also known as Latin1:
+.PP
+.Vb 1
+\&  $octets = encode("iso\-8859\-1", $string);
+.Ve
+.PP
+\&\fBCAVEAT\fR: When you run \f(CW\*(C`$octets = encode("UTF\-8", $string)\*(C'\fR, then
+\&\f(CW$octets\fR \fImight not be equal to\fR \f(CW$string\fR.  Though both contain the
+same data, the UTF8 flag for \f(CW$octets\fR is \fIalways\fR off.  When you
+encode anything, the UTF8 flag on the result is always off, even when it
+contains a completely valid UTF\-8 string. See "The UTF8 flag" below.
+.PP
+If the \f(CW$string\fR is \f(CW\*(C`undef\*(C'\fR, then \f(CW\*(C`undef\*(C'\fR is returned.
+.PP
+\&\f(CW\*(C`str2bytes\*(C'\fR may be used as an alias for \f(CW\*(C`encode\*(C'\fR.
+.PP
+\fIdecode\fR
+.IX Subsection "decode"
+.PP
+.Vb 1
+\&  $string = decode(ENCODING, OCTETS[, CHECK])
+.Ve
+.PP
+This function returns the string that results from decoding the scalar
+value \fIOCTETS\fR, assumed to be a sequence of octets in \fIENCODING\fR, into
+Perl's internal form.  As with \fBencode()\fR,
+\&\fIENCODING\fR can be either a canonical name or an alias. For encoding names
+and aliases, see "Defining Aliases"; for \fICHECK\fR, see "Handling
+Malformed Data".
+.PP
+\&\fBCAVEAT\fR: the input scalar \fIOCTETS\fR might be modified in-place depending
+on what is set in CHECK. See "LEAVE_SRC" if you want your inputs to be
+left unchanged.
+.PP
+For example, to convert ISO\-8859\-1 data into a string in Perl's
+internal format:
+.PP
+.Vb 1
+\&  $string = decode("iso\-8859\-1", $octets);
+.Ve
+.PP
+\&\fBCAVEAT\fR: When you run \f(CW\*(C`$string = decode("UTF\-8", $octets)\*(C'\fR, then \f(CW$string\fR
+\&\fImight not be equal to\fR \f(CW$octets\fR.  Though both contain the same data, the
+UTF8 flag for \f(CW$string\fR is on.  See "The UTF8 flag"
+below.
+.PP
+If the \f(CW$string\fR is \f(CW\*(C`undef\*(C'\fR, then \f(CW\*(C`undef\*(C'\fR is returned.
+.PP
+\&\f(CW\*(C`bytes2str\*(C'\fR may be used as an alias for \f(CW\*(C`decode\*(C'\fR.
+.PP
+\fIfind_encoding\fR
+.IX Subsection "find_encoding"
+.PP
+.Vb 1
+\&  [$obj =] find_encoding(ENCODING)
+.Ve
+.PP
+Returns the \fIencoding object\fR corresponding to \fIENCODING\fR.  Returns
+\&\f(CW\*(C`undef\*(C'\fR if no matching \fIENCODING\fR is find.  The returned object is
+what does the actual encoding or decoding.
+.PP
+.Vb 1
+\&  $string = decode($name, $bytes);
+.Ve
+.PP
+is in fact
+.PP
+.Vb 5
+\&    $string = do {
+\&        $obj = find_encoding($name);
+\&        croak qq(encoding "$name" not found) unless ref $obj;
+\&        $obj\->decode($bytes);
+\&    };
+.Ve
+.PP
+with more error checking.
+.PP
+You can therefore save time by reusing this object as follows;
+.PP
+.Vb 5
+\&    my $enc = find_encoding("iso\-8859\-1");
+\&    while(<>) {
+\&        my $string = $enc\->decode($_);
+\&        ... # now do something with $string;
+\&    }
+.Ve
+.PP
+Besides "decode" and "encode", other methods are
+available as well.  For instance, \f(CWname()\fR returns the canonical
+name of the encoding object.
+.PP
+.Vb 1
+\&  find_encoding("latin1")\->name; # iso\-8859\-1
+.Ve
+.PP
+See Encode::Encoding for details.
+.PP
+\fIfind_mime_encoding\fR
+.IX Subsection "find_mime_encoding"
+.PP
+.Vb 1
+\&  [$obj =] find_mime_encoding(MIME_ENCODING)
+.Ve
+.PP
+Returns the \fIencoding object\fR corresponding to \fIMIME_ENCODING\fR.  Acts
+same as \f(CWfind_encoding()\fR but \f(CWmime_name()\fR of returned object must
+match to \fIMIME_ENCODING\fR.  So as opposite of \f(CWfind_encoding()\fR
+canonical names and aliases are not used when searching for object.
+.PP
+.Vb 4
+\&    find_mime_encoding("utf8"); # returns undef because "utf8" is not valid I<MIME_ENCODING>
+\&    find_mime_encoding("utf\-8"); # returns encode object "utf\-8\-strict"
+\&    find_mime_encoding("UTF\-8"); # same as "utf\-8" because I<MIME_ENCODING> is case insensitive
+\&    find_mime_encoding("utf\-8\-strict"); returns undef because "utf\-8\-strict" is not valid I<MIME_ENCODING>
+.Ve
+.PP
+\fIfrom_to\fR
+.IX Subsection "from_to"
+.PP
+.Vb 1
+\&  [$length =] from_to($octets, FROM_ENC, TO_ENC [, CHECK])
+.Ve
+.PP
+Converts \fIin-place\fR data between two encodings. The data in \f(CW$octets\fR
+must be encoded as octets and \fInot\fR as characters in Perl's internal
+format. For example, to convert ISO\-8859\-1 data into Microsoft's CP1250
+encoding:
+.PP
+.Vb 1
+\&  from_to($octets, "iso\-8859\-1", "cp1250");
+.Ve
+.PP
+and to convert it back:
+.PP
+.Vb 1
+\&  from_to($octets, "cp1250", "iso\-8859\-1");
+.Ve
+.PP
+Because the conversion happens in place, the data to be
+converted cannot be a string constant: it must be a scalar variable.
+.PP
+\&\f(CWfrom_to()\fR returns the length of the converted string in octets on success,
+and \f(CW\*(C`undef\*(C'\fR on error.
+.PP
+\&\fBCAVEAT\fR: The following operations may look the same, but are not:
+.PP
+.Vb 2
+\&  from_to($data, "iso\-8859\-1", "UTF\-8"); #1
+\&  $data = decode("iso\-8859\-1", $data);  #2
+.Ve
+.PP
+Both #1 and #2 make \f(CW$data\fR consist of a completely valid UTF\-8 string,
+but only #2 turns the UTF8 flag on.  #1 is equivalent to:
+.PP
+.Vb 1
+\&  $data = encode("UTF\-8", decode("iso\-8859\-1", $data));
+.Ve
+.PP
+See "The UTF8 flag" below.
+.PP
+Also note that:
+.PP
+.Vb 1
+\&  from_to($octets, $from, $to, $check);
+.Ve
+.PP
+is equivalent to:
+.PP
+.Vb 1
+\&  $octets = encode($to, decode($from, $octets), $check);
+.Ve
+.PP
+Yes, it does \fInot\fR respect the \f(CW$check\fR during decoding.  It is
+deliberately done that way.  If you need minute control, use \f(CW\*(C`decode\*(C'\fR
+followed by \f(CW\*(C`encode\*(C'\fR as follows:
+.PP
+.Vb 1
+\&  $octets = encode($to, decode($from, $octets, $check_from), $check_to);
+.Ve
+.PP
+\fIencode_utf8\fR
+.IX Subsection "encode_utf8"
+.PP
+.Vb 1
+\&  $octets = encode_utf8($string);
+.Ve
+.PP
+\&\fBWARNING\fR: This function can produce invalid UTF\-8!
+Do not use it for data exchange.
+Unless you want Perl's older "lax" mode, prefer
+\&\f(CW\*(C`$octets = encode("UTF\-8", $string)\*(C'\fR.
+.PP
+Equivalent to \f(CW\*(C`$octets = encode("utf8", $string)\*(C'\fR.  The characters in
+\&\f(CW$string\fR are encoded in Perl's internal format, and the result is returned
+as a sequence of octets.  Because all possible characters in Perl have a
+(loose, not strict) utf8 representation, this function cannot fail.
+.PP
+\fIdecode_utf8\fR
+.IX Subsection "decode_utf8"
+.PP
+.Vb 1
+\&  $string = decode_utf8($octets [, CHECK]);
+.Ve
+.PP
+\&\fBWARNING\fR: This function accepts invalid UTF\-8!
+Do not use it for data exchange.
+Unless you want Perl's older "lax" mode, prefer
+\&\f(CW\*(C`$string = decode("UTF\-8", $octets [, CHECK])\*(C'\fR.
+.PP
+Equivalent to \f(CW\*(C`$string = decode("utf8", $octets [, CHECK])\*(C'\fR.
+The sequence of octets represented by \f(CW$octets\fR is decoded
+from (loose, not strict) utf8 into a sequence of logical characters.
+Because not all sequences of octets are valid not strict utf8,
+it is quite possible for this function to fail.
+For CHECK, see "Handling Malformed Data".
+.PP
+\&\fBCAVEAT\fR: the input \fR\f(CI$octets\fR\fI\fR might be modified in-place depending on
+what is set in CHECK. See "LEAVE_SRC" if you want your inputs to be
+left unchanged.
+.SS "Listing available encodings"
+.IX Subsection "Listing available encodings"
+.Vb 2
+\&  use Encode;
+\&  @list = Encode\->encodings();
+.Ve
+.PP
+Returns a list of canonical names of available encodings that have already
+been loaded.  To get a list of all available encodings including those that
+have not yet been loaded, say:
+.PP
+.Vb 1
+\&  @all_encodings = Encode\->encodings(":all");
+.Ve
+.PP
+Or you can give the name of a specific module:
+.PP
+.Vb 1
+\&  @with_jp = Encode\->encodings("Encode::JP");
+.Ve
+.PP
+When "\f(CW\*(C`::\*(C'\fR" is not in the name, "\f(CW\*(C`Encode::\*(C'\fR" is assumed.
+.PP
+.Vb 1
+\&  @ebcdic = Encode\->encodings("EBCDIC");
+.Ve
+.PP
+To find out in detail which encodings are supported by this package,
+see Encode::Supported.
+.SS "Defining Aliases"
+.IX Subsection "Defining Aliases"
+To add a new alias to a given encoding, use:
+.PP
+.Vb 3
+\&  use Encode;
+\&  use Encode::Alias;
+\&  define_alias(NEWNAME => ENCODING);
+.Ve
+.PP
+After that, \fINEWNAME\fR can be used as an alias for \fIENCODING\fR.
+\&\fIENCODING\fR may be either the name of an encoding or an
+\&\fIencoding object\fR.
+.PP
+Before you do that, first make sure the alias is nonexistent using
+\&\f(CWresolve_alias()\fR, which returns the canonical name thereof.
+For example:
+.PP
+.Vb 3
+\&  Encode::resolve_alias("latin1") eq "iso\-8859\-1" # true
+\&  Encode::resolve_alias("iso\-8859\-12")   # false; nonexistent
+\&  Encode::resolve_alias($name) eq $name  # true if $name is canonical
+.Ve
+.PP
+\&\f(CWresolve_alias()\fR does not need \f(CW\*(C`use Encode::Alias\*(C'\fR; it can be
+imported via \f(CW\*(C`use Encode qw(resolve_alias)\*(C'\fR.
+.PP
+See Encode::Alias for details.
+.SS "Finding IANA Character Set Registry names"
+.IX Subsection "Finding IANA Character Set Registry names"
+The canonical name of a given encoding does not necessarily agree with
+IANA Character Set Registry, commonly seen as \f(CW\*(C`Content\-Type:
+text/plain; charset=\fR\f(CIWHATEVER\fR\f(CW\*(C'\fR.  For most cases, the canonical name
+works, but sometimes it does not, most notably with "utf\-8\-strict".
+.PP
+As of \f(CW\*(C`Encode\*(C'\fR version 2.21, a new method \f(CWmime_name()\fR is therefore added.
+.PP
+.Vb 4
+\&  use Encode;
+\&  my $enc = find_encoding("UTF\-8");
+\&  warn $enc\->name;      # utf\-8\-strict
+\&  warn $enc\->mime_name; # UTF\-8
+.Ve
+.PP
+See also:  Encode::Encoding
+.SH "Encoding via PerlIO"
+.IX Header "Encoding via PerlIO"
+If your perl supports \f(CW\*(C`PerlIO\*(C'\fR (which is the default), you can use a
+\&\f(CW\*(C`PerlIO\*(C'\fR layer to decode and encode directly via a filehandle.  The
+following two examples are fully identical in functionality:
+.PP
+.Vb 10
+\&  ### Version 1 via PerlIO
+\&    open(INPUT,  "< :encoding(shiftjis)", $infile)
+\&        || die "Can\*(Aqt open < $infile for reading: $!";
+\&    open(OUTPUT, "> :encoding(euc\-jp)",  $outfile)
+\&        || die "Can\*(Aqt open > $output for writing: $!";
+\&    while (<INPUT>) {   # auto decodes $_
+\&        print OUTPUT;   # auto encodes $_
+\&    }
+\&    close(INPUT)   || die "can\*(Aqt close $infile: $!";
+\&    close(OUTPUT)  || die "can\*(Aqt close $outfile: $!";
+\&
+\&  ### Version 2 via from_to()
+\&    open(INPUT,  "< :raw", $infile)
+\&        || die "Can\*(Aqt open < $infile for reading: $!";
+\&    open(OUTPUT, "> :raw",  $outfile)
+\&        || die "Can\*(Aqt open > $output for writing: $!";
+\&
+\&    while (<INPUT>) {
+\&        from_to($_, "shiftjis", "euc\-jp", 1);  # switch encoding
+\&        print OUTPUT;   # emit raw (but properly encoded) data
+\&    }
+\&    close(INPUT)   || die "can\*(Aqt close $infile: $!";
+\&    close(OUTPUT)  || die "can\*(Aqt close $outfile: $!";
+.Ve
+.PP
+In the first version above, you let the appropriate encoding layer
+handle the conversion.  In the second, you explicitly translate
+from one encoding to the other.
+.PP
+Unfortunately, it may be that encodings are not \f(CW\*(C`PerlIO\*(C'\fR\-savvy.  You can check
+to see whether your encoding is supported by \f(CW\*(C`PerlIO\*(C'\fR by invoking the
+\&\f(CW\*(C`perlio_ok\*(C'\fR method on it:
+.PP
+.Vb 2
+\&  Encode::perlio_ok("hz");             # false
+\&  find_encoding("euc\-cn")\->perlio_ok;  # true wherever PerlIO is available
+\&
+\&  use Encode qw(perlio_ok);            # imported upon request
+\&  perlio_ok("euc\-jp")
+.Ve
+.PP
+Fortunately, all encodings that come with \f(CW\*(C`Encode\*(C'\fR core are \f(CW\*(C`PerlIO\*(C'\fR\-savvy
+except for \f(CW\*(C`hz\*(C'\fR and \f(CW\*(C`ISO\-2022\-kr\*(C'\fR.  For the gory details, see
+Encode::Encoding and Encode::PerlIO.
+.SH "Handling Malformed Data"
+.IX Header "Handling Malformed Data"
+The optional \fICHECK\fR argument tells \f(CW\*(C`Encode\*(C'\fR what to do when
+encountering malformed data.  Without \fICHECK\fR, \f(CW\*(C`Encode::FB_DEFAULT\*(C'\fR
+(== 0) is assumed.
+.PP
+As of version 2.12, \f(CW\*(C`Encode\*(C'\fR supports coderef values for \f(CW\*(C`CHECK\*(C'\fR;
+see below.
+.PP
+\&\fBNOTE:\fR Not all encodings support this feature.
+Some encodings ignore the \fICHECK\fR argument.  For example,
+Encode::Unicode ignores \fICHECK\fR and it always croaks on error.
+.SS "List of \fICHECK\fP values"
+.IX Subsection "List of CHECK values"
+\fIFB_DEFAULT\fR
+.IX Subsection "FB_DEFAULT"
+.PP
+.Vb 1
+\&  I<CHECK> = Encode::FB_DEFAULT ( == 0)
+.Ve
+.PP
+If \fICHECK\fR is 0, encoding and decoding replace any malformed character
+with a \fIsubstitution character\fR.  When you encode, \fISUBCHAR\fR is used.
+When you decode, the Unicode REPLACEMENT CHARACTER, code point U+FFFD, is
+used.  If the data is supposed to be UTF\-8, an optional lexical warning of
+warning category \f(CW"utf8"\fR is given.
+.PP
+\fIFB_CROAK\fR
+.IX Subsection "FB_CROAK"
+.PP
+.Vb 1
+\&  I<CHECK> = Encode::FB_CROAK ( == 1)
+.Ve
+.PP
+If \fICHECK\fR is 1, methods immediately die with an error
+message.  Therefore, when \fICHECK\fR is 1, you should trap
+exceptions with \f(CW\*(C`eval{}\*(C'\fR, unless you really want to let it \f(CW\*(C`die\*(C'\fR.
+.PP
+\fIFB_QUIET\fR
+.IX Subsection "FB_QUIET"
+.PP
+.Vb 1
+\&  I<CHECK> = Encode::FB_QUIET
+.Ve
+.PP
+If \fICHECK\fR is set to \f(CW\*(C`Encode::FB_QUIET\*(C'\fR, encoding and decoding immediately
+return the portion of the data that has been processed so far when an
+error occurs. The data argument is overwritten with everything
+after that point; that is, the unprocessed portion of the data.  This is
+handy when you have to call \f(CW\*(C`decode\*(C'\fR repeatedly in the case where your
+source data may contain partial multi-byte character sequences,
+(that is, you are reading with a fixed-width buffer). Here's some sample
+code to do exactly that:
+.PP
+.Vb 5
+\&    my($buffer, $string) = ("", "");
+\&    while (read($fh, $buffer, 256, length($buffer))) {
+\&        $string .= decode($encoding, $buffer, Encode::FB_QUIET);
+\&        # $buffer now contains the unprocessed partial character
+\&    }
+.Ve
+.PP
+\fIFB_WARN\fR
+.IX Subsection "FB_WARN"
+.PP
+.Vb 1
+\&  I<CHECK> = Encode::FB_WARN
+.Ve
+.PP
+This is the same as \f(CW\*(C`FB_QUIET\*(C'\fR above, except that instead of being silent
+on errors, it issues a warning.  This is handy for when you are debugging.
+.PP
+\&\fBCAVEAT\fR: All warnings from Encode module are reported, independently of
+pragma warnings settings. If you want to follow settings of
+lexical warnings configured by pragma warnings then append
+also check value \f(CW\*(C`ENCODE::ONLY_PRAGMA_WARNINGS\*(C'\fR. This value is available
+since Encode version 2.99.
+.PP
+\fIFB_PERLQQ FB_HTMLCREF FB_XMLCREF\fR
+.IX Subsection "FB_PERLQQ FB_HTMLCREF FB_XMLCREF"
+.IP "perlqq mode (\fICHECK\fR = Encode::FB_PERLQQ)" 2
+.IX Item "perlqq mode (CHECK = Encode::FB_PERLQQ)"
+.PD 0
+.IP "HTML charref mode (\fICHECK\fR = Encode::FB_HTMLCREF)" 2
+.IX Item "HTML charref mode (CHECK = Encode::FB_HTMLCREF)"
+.IP "XML charref mode (\fICHECK\fR = Encode::FB_XMLCREF)" 2
+.IX Item "XML charref mode (CHECK = Encode::FB_XMLCREF)"
+.PD
+.PP
+For encodings that are implemented by the \f(CW\*(C`Encode::XS\*(C'\fR module, \f(CW\*(C`CHECK\*(C'\fR \f(CW\*(C`==\*(C'\fR
+\&\f(CW\*(C`Encode::FB_PERLQQ\*(C'\fR puts \f(CW\*(C`encode\*(C'\fR and \f(CW\*(C`decode\*(C'\fR into \f(CW\*(C`perlqq\*(C'\fR fallback mode.
+.PP
+When you decode, \f(CW\*(C`\ex\fR\f(CIHH\fR\f(CW\*(C'\fR is inserted for a malformed character, where
+\&\fIHH\fR is the hex representation of the octet that could not be decoded to
+utf8.  When you encode, \f(CW\*(C`\ex{\fR\f(CIHHHH\fR\f(CW}\*(C'\fR will be inserted, where \fIHHHH\fR is
+the Unicode code point (in any number of hex digits) of the character that
+cannot be found in the character repertoire of the encoding.
+.PP
+The HTML/XML character reference modes are about the same. In place of
+\&\f(CW\*(C`\ex{\fR\f(CIHHHH\fR\f(CW}\*(C'\fR, HTML uses \f(CW\*(C`&#\fR\f(CINNN\fR\f(CW;\*(C'\fR where \fINNN\fR is a decimal number, and
+XML uses \f(CW\*(C`&#x\fR\f(CIHHHH\fR\f(CW;\*(C'\fR where \fIHHHH\fR is the hexadecimal number.
+.PP
+In \f(CW\*(C`Encode\*(C'\fR 2.10 or later, \f(CW\*(C`LEAVE_SRC\*(C'\fR is also implied.
+.PP
+\fIThe bitmask\fR
+.IX Subsection "The bitmask"
+.PP
+These modes are all actually set via a bitmask.  Here is how the \f(CW\*(C`FB_\fR\f(CIXXX\fR\f(CW\*(C'\fR
+constants are laid out.  You can import the \f(CW\*(C`FB_\fR\f(CIXXX\fR\f(CW\*(C'\fR constants via
+\&\f(CW\*(C`use Encode qw(:fallbacks)\*(C'\fR, and you can import the generic bitmask
+constants via \f(CW\*(C`use Encode qw(:fallback_all)\*(C'\fR.
+.PP
+.Vb 8
+\&                     FB_DEFAULT FB_CROAK FB_QUIET FB_WARN  FB_PERLQQ
+\& DIE_ON_ERR    0x0001             X
+\& WARN_ON_ERR   0x0002                               X
+\& RETURN_ON_ERR 0x0004                      X        X
+\& LEAVE_SRC     0x0008                                        X
+\& PERLQQ        0x0100                                        X
+\& HTMLCREF      0x0200
+\& XMLCREF       0x0400
+.Ve
+.PP
+\fILEAVE_SRC\fR
+.IX Subsection "LEAVE_SRC"
+.PP
+.Vb 1
+\&  Encode::LEAVE_SRC
+.Ve
+.PP
+If the \f(CW\*(C`Encode::LEAVE_SRC\*(C'\fR bit is \fInot\fR set but \fICHECK\fR is set, then the
+source string to \fBencode()\fR or \fBdecode()\fR will be overwritten in place.
+If you're not interested in this, then bitwise-OR it with the bitmask.
+.SS "coderef for CHECK"
+.IX Subsection "coderef for CHECK"
+As of \f(CW\*(C`Encode\*(C'\fR 2.12, \f(CW\*(C`CHECK\*(C'\fR can also be a code reference which takes the
+ordinal value of the unmapped character as an argument and returns
+octets that represent the fallback character.  For instance:
+.PP
+.Vb 1
+\&  $ascii = encode("ascii", $utf8, sub{ sprintf "<U+%04X>", shift });
+.Ve
+.PP
+Acts like \f(CW\*(C`FB_PERLQQ\*(C'\fR but U+\fIXXXX\fR is used instead of \f(CW\*(C`\ex{\fR\f(CIXXXX\fR\f(CW}\*(C'\fR.
+.PP
+Fallback for \f(CW\*(C`decode\*(C'\fR must return decoded string (sequence of characters)
+and takes a list of ordinal values as its arguments. So for
+example if you wish to decode octets as UTF\-8, and use ISO\-8859\-15 as
+a fallback for bytes that are not valid UTF\-8, you could write
+.PP
+.Vb 4
+\&    $str = decode \*(AqUTF\-8\*(Aq, $octets, sub {
+\&        my $tmp = join \*(Aq\*(Aq, map chr, @_;
+\&        return decode \*(AqISO\-8859\-15\*(Aq, $tmp;
+\&    };
+.Ve
+.SH "Defining Encodings"
+.IX Header "Defining Encodings"
+To define a new encoding, use:
+.PP
+.Vb 2
+\&    use Encode qw(define_encoding);
+\&    define_encoding($object, CANONICAL_NAME [, alias...]);
+.Ve
+.PP
+\&\fICANONICAL_NAME\fR will be associated with \fR\f(CI$object\fR\fI\fR.  The object
+should provide the interface described in Encode::Encoding.
+If more than two arguments are provided, additional
+arguments are considered aliases for \fI\fR\f(CI$object\fR\fI\fR.
+.PP
+See Encode::Encoding for details.
+.SH "The UTF8 flag"
+.IX Header "The UTF8 flag"
+Before the introduction of Unicode support in Perl, The \f(CW\*(C`eq\*(C'\fR operator
+just compared the strings represented by two scalars. Beginning with
+Perl 5.8, \f(CW\*(C`eq\*(C'\fR compares two strings with simultaneous consideration of
+\&\fIthe UTF8 flag\fR. To explain why we made it so, I quote from page 402 of
+\&\fIProgramming Perl, 3rd ed.\fR
+.IP "Goal #1:" 2
+.IX Item "Goal #1:"
+Old byte-oriented programs should not spontaneously break on the old
+byte-oriented data they used to work on.
+.IP "Goal #2:" 2
+.IX Item "Goal #2:"
+Old byte-oriented programs should magically start working on the new
+character-oriented data when appropriate.
+.IP "Goal #3:" 2
+.IX Item "Goal #3:"
+Programs should run just as fast in the new character-oriented mode
+as in the old byte-oriented mode.
+.IP "Goal #4:" 2
+.IX Item "Goal #4:"
+Perl should remain one language, rather than forking into a
+byte-oriented Perl and a character-oriented Perl.
+.PP
+When \fIProgramming Perl, 3rd ed.\fR was written, not even Perl 5.6.0 had been
+born yet, many features documented in the book remained unimplemented for a
+long time.  Perl 5.8 corrected much of this, and the introduction of the
+UTF8 flag is one of them.  You can think of there being two fundamentally
+different kinds of strings and string-operations in Perl: one a
+byte-oriented mode  for when the internal UTF8 flag is off, and the other a
+character-oriented mode for when the internal UTF8 flag is on.
+.PP
+This UTF8 flag is not visible in Perl scripts, exactly for the same reason
+you cannot (or rather, you \fIdon't have to\fR) see whether a scalar contains
+a string, an integer, or a floating-point number.   But you can still peek
+and poke these if you will.  See the next section.
+.SS "Messing with Perl's Internals"
+.IX Subsection "Messing with Perl's Internals"
+The following API uses parts of Perl's internals in the current
+implementation.  As such, they are efficient but may change in a future
+release.
+.PP
+\fIis_utf8\fR
+.IX Subsection "is_utf8"
+.PP
+.Vb 1
+\&  is_utf8(STRING [, CHECK])
+.Ve
+.PP
+[INTERNAL] Tests whether the UTF8 flag is turned on in the \fISTRING\fR.
+If \fICHECK\fR is true, also checks whether \fISTRING\fR contains well-formed
+UTF\-8.  Returns true if successful, false otherwise.
+.PP
+Typically only necessary for debugging and testing.  Don't use this flag as
+a marker to distinguish character and binary data, that should be decided
+for each variable when you write your code.
+.PP
+\&\fBCAVEAT\fR: If \fISTRING\fR has UTF8 flag set, it does \fBNOT\fR mean that
+\&\fISTRING\fR is UTF\-8 encoded and vice-versa.
+.PP
+As of Perl 5.8.1, utf8 also has the \f(CW\*(C`utf8::is_utf8\*(C'\fR function.
+.PP
+\fI_utf8_on\fR
+.IX Subsection "_utf8_on"
+.PP
+.Vb 1
+\&  _utf8_on(STRING)
+.Ve
+.PP
+[INTERNAL] Turns the \fISTRING\fR's internal UTF8 flag \fBon\fR.  The \fISTRING\fR
+is \fInot\fR checked for containing only well-formed UTF\-8.  Do not use this
+unless you \fIknow with absolute certainty\fR that the STRING holds only
+well-formed UTF\-8.  Returns the previous state of the UTF8 flag (so please
+don't treat the return value as indicating success or failure), or \f(CW\*(C`undef\*(C'\fR
+if \fISTRING\fR is not a string.
+.PP
+\&\fBNOTE\fR: For security reasons, this function does not work on tainted values.
+.PP
+\fI_utf8_off\fR
+.IX Subsection "_utf8_off"
+.PP
+.Vb 1
+\&  _utf8_off(STRING)
+.Ve
+.PP
+[INTERNAL] Turns the \fISTRING\fR's internal UTF8 flag \fBoff\fR.  Do not use
+frivolously.  Returns the previous state of the UTF8 flag, or \f(CW\*(C`undef\*(C'\fR if
+\&\fISTRING\fR is not a string.  Do not treat the return value as indicative of
+success or failure, because that isn't what it means: it is only the
+previous setting.
+.PP
+\&\fBNOTE\fR: For security reasons, this function does not work on tainted values.
+.SH "UTF\-8 vs. utf8 vs. UTF8"
+.IX Header "UTF-8 vs. utf8 vs. UTF8"
+.Vb 3
+\&  ....We now view strings not as sequences of bytes, but as sequences
+\&  of numbers in the range 0 .. 2**32\-1 (or in the case of 64\-bit
+\&  computers, 0 .. 2**64\-1) \-\- Programming Perl, 3rd ed.
+.Ve
+.PP
+That has historically been Perl's notion of UTF\-8, as that is how UTF\-8 was
+first conceived by Ken Thompson when he invented it. However, thanks to
+later revisions to the applicable standards, official UTF\-8 is now rather
+stricter than that. For example, its range is much narrower (0 .. 0x10_FFFF
+to cover only 21 bits instead of 32 or 64 bits) and some sequences
+are not allowed, like those used in surrogate pairs, the 31 non-character
+code points 0xFDD0 .. 0xFDEF, the last two code points in \fIany\fR plane
+(0x\fIXX\fR_FFFE and 0x\fIXX\fR_FFFF), all non-shortest encodings, etc.
+.PP
+The former default in which Perl would always use a loose interpretation of
+UTF\-8 has now been overruled:
+.PP
+.Vb 5
+\&  From: Larry Wall <larry@wall.org>
+\&  Date: December 04, 2004 11:51:58 JST
+\&  To: perl\-unicode@perl.org
+\&  Subject: Re: Make Encode.pm support the real UTF\-8
+\&  Message\-Id: <20041204025158.GA28754@wall.org>
+\&
+\&  On Fri, Dec 03, 2004 at 10:12:12PM +0000, Tim Bunce wrote:
+\&  : I\*(Aqve no problem with \*(Aqutf8\*(Aq being perl\*(Aqs unrestricted uft8 encoding,
+\&  : but "UTF\-8" is the name of the standard and should give the
+\&  : corresponding behaviour.
+\&
+\&  For what it\*(Aqs worth, that\*(Aqs how I\*(Aqve always kept them straight in my
+\&  head.
+\&
+\&  Also for what it\*(Aqs worth, Perl 6 will mostly default to strict but
+\&  make it easy to switch back to lax.
+\&
+\&  Larry
+.Ve
+.PP
+Got that?  As of Perl 5.8.7, \fB"UTF\-8"\fR means UTF\-8 in its current
+sense, which is conservative and strict and security-conscious, whereas
+\&\fB"utf8"\fR means UTF\-8 in its former sense, which was liberal and loose and
+lax.  \f(CW\*(C`Encode\*(C'\fR version 2.10 or later thus groks this subtle but critically
+important distinction between \f(CW"UTF\-8"\fR and \f(CW"utf8"\fR.
+.PP
+.Vb 2
+\&  encode("utf8",  "\ex{FFFF_FFFF}", 1); # okay
+\&  encode("UTF\-8", "\ex{FFFF_FFFF}", 1); # croaks
+.Ve
+.PP
+This distinction is also important for decoding. In the following,
+\&\f(CW$s\fR stores character U+200000, which exceeds UTF\-8's allowed range.
+\&\f(CW$s\fR thus stores an invalid Unicode code point:
+.PP
+.Vb 1
+\&  $s = decode("utf8", "\exf8\ex88\ex80\ex80\ex80");
+.Ve
+.PP
+\&\f(CW"UTF\-8"\fR, by contrast, will either coerce the input to something valid:
+.PP
+.Vb 1
+\&    $s = decode("UTF\-8", "\exf8\ex88\ex80\ex80\ex80"); # U+FFFD
+.Ve
+.PP
+\&.. or croak:
+.PP
+.Vb 1
+\&    decode("UTF\-8", "\exf8\ex88\ex80\ex80\ex80", FB_CROAK|LEAVE_SRC);
+.Ve
+.PP
+In the \f(CW\*(C`Encode\*(C'\fR module, \f(CW"UTF\-8"\fR is actually a canonical name for
+\&\f(CW"utf\-8\-strict"\fR.  That hyphen between the \f(CW"UTF"\fR and the \f(CW"8"\fR is
+critical; without it, \f(CW\*(C`Encode\*(C'\fR goes "liberal" and (perhaps overly\-)permissive:
+.PP
+.Vb 4
+\&  find_encoding("UTF\-8")\->name # is \*(Aqutf\-8\-strict\*(Aq
+\&  find_encoding("utf\-8")\->name # ditto. names are case insensitive
+\&  find_encoding("utf_8")\->name # ditto. "_" are treated as "\-"
+\&  find_encoding("UTF8")\->name  # is \*(Aqutf8\*(Aq.
+.Ve
+.PP
+Perl's internal UTF8 flag is called "UTF8", without a hyphen. It indicates
+whether a string is internally encoded as "utf8", also without a hyphen.
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+Encode::Encoding,
+Encode::Supported,
+Encode::PerlIO,
+encoding,
+perlebcdic,
+"open" in perlfunc,
+perlunicode, perluniintro, perlunifaq, perlunitut
+utf8,
+the Perl Unicode Mailing List <http://lists.perl.org/list/perl\-unicode.html>
+.SH MAINTAINER
+.IX Header "MAINTAINER"
+This project was originated by the late Nick Ing-Simmons and later
+maintained by Dan Kogai \fI<dankogai@cpan.org>\fR.  See AUTHORS
+for a full list of people involved.  For any questions, send mail to
+\&\fI<perl\-unicode@perl.org>\fR so that we can all share.
+.PP
+While Dan Kogai retains the copyright as a maintainer, credit
+should go to all those involved.  See AUTHORS for a list of those
+who submitted code to the project.
+.SH COPYRIGHT
+.IX Header "COPYRIGHT"
+Copyright 2002\-2014 Dan Kogai \fI<dankogai@cpan.org>\fR.
+.PP
+This library is free software; you can redistribute it and/or modify
+it under the same terms as Perl itself.
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-15 19:43:11 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-15 19:43:11 +0000
commit	fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
tree	ce1e3bce06471410239a6f41282e328770aa404a /upstream/mageia-cauldron/man3pm/Encode.3pm
parent	Initial commit. (diff)
download	manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip