summaryrefslogtreecommitdiffstats
path: root/upstream/debian-bookworm/man3/Encode::PerlIO.3perl
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 19:43:11 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 19:43:11 +0000
commitfc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
treece1e3bce06471410239a6f41282e328770aa404a /upstream/debian-bookworm/man3/Encode::PerlIO.3perl
parentInitial commit. (diff)
downloadmanpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz
manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip
Adding upstream version 4.22.0.upstream/4.22.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'upstream/debian-bookworm/man3/Encode::PerlIO.3perl')
-rw-r--r--upstream/debian-bookworm/man3/Encode::PerlIO.3perl254
1 files changed, 254 insertions, 0 deletions
diff --git a/upstream/debian-bookworm/man3/Encode::PerlIO.3perl b/upstream/debian-bookworm/man3/Encode::PerlIO.3perl
new file mode 100644
index 00000000..30448e4b
--- /dev/null
+++ b/upstream/debian-bookworm/man3/Encode::PerlIO.3perl
@@ -0,0 +1,254 @@
+.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" Set up some character translations and predefined strings. \*(-- will
+.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
+.\" double quote, and \*(R" will give a right double quote. \*(C+ will
+.\" give a nicer C++. Capital omega is used to do unbreakable dashes and
+.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,
+.\" nothing in troff, for use with C<>.
+.tr \(*W-
+.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
+.ie n \{\
+. ds -- \(*W-
+. ds PI pi
+. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
+. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
+. ds L" ""
+. ds R" ""
+. ds C` ""
+. ds C' ""
+'br\}
+.el\{\
+. ds -- \|\(em\|
+. ds PI \(*p
+. ds L" ``
+. ds R" ''
+. ds C`
+. ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD. Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+. if \nF \{\
+. de IX
+. tm Index:\\$1\t\\n%\t"\\$2"
+..
+. if !\nF==2 \{\
+. nr % 0
+. nr F 2
+. \}
+. \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "Encode::PerlIO 3perl"
+.TH Encode::PerlIO 3perl "2023-11-25" "perl v5.36.0" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification. Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH "NAME"
+Encode::PerlIO \-\- a detailed document on Encode and PerlIO
+.SH "Overview"
+.IX Header "Overview"
+It is very common to want to do encoding transformations when
+reading or writing files, network connections, pipes etc.
+If Perl is configured to use the new 'perlio' \s-1IO\s0 system then
+\&\f(CW\*(C`Encode\*(C'\fR provides a \*(L"layer\*(R" (see PerlIO) which can transform
+data as it is read or written.
+.PP
+Here is how the blind poet would modernise the encoding:
+.PP
+.Vb 7
+\& use Encode;
+\& open(my $iliad,\*(Aq<:encoding(iso\-8859\-7)\*(Aq,\*(Aqiliad.greek\*(Aq);
+\& open(my $utf8,\*(Aq>:utf8\*(Aq,\*(Aqiliad.utf8\*(Aq);
+\& my @epic = <$iliad>;
+\& print $utf8 @epic;
+\& close($utf8);
+\& close($illiad);
+.Ve
+.PP
+In addition, the new \s-1IO\s0 system can also be configured to read/write
+\&\s-1UTF\-8\s0 encoded characters (as noted above, this is efficient):
+.PP
+.Vb 2
+\& open(my $fh,\*(Aq>:utf8\*(Aq,\*(Aqanything\*(Aq);
+\& print $fh "Any \ex{0021} string \eN{SMILEY FACE}\en";
+.Ve
+.PP
+Either of the above forms of \*(L"layer\*(R" specifications can be made the default
+for a lexical scope with the \f(CW\*(C`use open ...\*(C'\fR pragma. See open.
+.PP
+Once a handle is open, its layers can be altered using \f(CW\*(C`binmode\*(C'\fR.
+.PP
+Without any such configuration, or if Perl itself is built using the
+system's own \s-1IO,\s0 then write operations assume that the file handle
+accepts only \fIbytes\fR and will \f(CW\*(C`die\*(C'\fR if a character larger than 255 is
+written to the handle. When reading, each octet from the handle becomes
+a byte-in-a-character. Note that this default is the same behaviour
+as bytes-only languages (including Perl before v5.6) would have,
+and is sufficient to handle native 8\-bit encodings e.g. iso\-8859\-1,
+\&\s-1EBCDIC\s0 etc. and any legacy mechanisms for handling other encodings
+and binary data.
+.PP
+In other cases, it is the program's responsibility to transform
+characters into bytes using the \s-1API\s0 above before doing writes, and to
+transform the bytes read from a handle into characters before doing
+\&\*(L"character operations\*(R" (e.g. \f(CW\*(C`lc\*(C'\fR, \f(CW\*(C`/\eW+/\*(C'\fR, ...).
+.PP
+You can also use PerlIO to convert larger amounts of data you don't
+want to bring into memory. For example, to convert between \s-1ISO\-8859\-1\s0
+(Latin 1) and \s-1UTF\-8\s0 (or UTF-EBCDIC in \s-1EBCDIC\s0 machines):
+.PP
+.Vb 3
+\& open(F, "<:encoding(iso\-8859\-1)", "data.txt") or die $!;
+\& open(G, ">:utf8", "data.utf") or die $!;
+\& while (<F>) { print G }
+\&
+\& # Could also do "print G <F>" but that would pull
+\& # the whole file into memory just to write it out again.
+.Ve
+.PP
+More examples:
+.PP
+.Vb 3
+\& open(my $f, "<:encoding(cp1252)")
+\& open(my $g, ">:encoding(iso\-8859\-2)")
+\& open(my $h, ">:encoding(latin9)") # iso\-8859\-15
+.Ve
+.PP
+See also encoding for how to change the default encoding of the
+data in your script.
+.SH "How does it work?"
+.IX Header "How does it work?"
+Here is a crude diagram of how filehandle, PerlIO, and Encode
+interact.
+.PP
+.Vb 3
+\& filehandle <\-> PerlIO PerlIO <\-> scalar (read/printed)
+\& \e /
+\& Encode
+.Ve
+.PP
+When PerlIO receives data from either direction, it fills a buffer
+(currently with 1024 bytes) and passes the buffer to Encode.
+Encode tries to convert the valid part and passes it back to PerlIO,
+leaving invalid parts (usually a partial character) in the buffer.
+PerlIO then appends more data to the buffer, calls Encode again,
+and so on until the data stream ends.
+.PP
+To do so, PerlIO always calls (de|en)code methods with \s-1CHECK\s0 set to 1.
+This ensures that the method stops at the right place when it
+encounters partial character. The following is what happens when
+PerlIO and Encode tries to encode (from utf8) more than 1024 bytes
+and the buffer boundary happens to be in the middle of a character.
+.PP
+.Vb 5
+\& A B C .... ~ \ex{3000} ....
+\& 41 42 43 .... 7E e3 80 80 ....
+\& <\- buffer \-\-\-\-\-\-\-\-\-\-\-\-\-\-\->
+\& << encoded >>>>>>>>>>
+\& <\- next buffer \-\-\-\-\-\-
+.Ve
+.PP
+Encode converts from the beginning to \ex7E, leaving \exe3 in the buffer
+because it is invalid (partial character).
+.PP
+Unfortunately, this scheme does not work well with escape-based
+encodings such as \s-1ISO\-2022\-JP.\s0
+.SH "Line Buffering"
+.IX Header "Line Buffering"
+Now let's see what happens when you try to decode from \s-1ISO\-2022\-JP\s0 and
+the buffer ends in the middle of a character.
+.PP
+.Vb 5
+\& JIS208\-ESC \ex{5f3e}
+\& A B C .... ~ \ee $ B |DAN | ....
+\& 41 42 43 .... 7E 1b 24 41 43 46 ....
+\& <\- buffer \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\->
+\& << encoded >>>>>>>>>>>>>>>>>>>>>>>
+.Ve
+.PP
+As you see, the next buffer begins with \ex43. But \ex43 is 'C' in
+\&\s-1ASCII,\s0 which is wrong in this case because we are now in \s-1JISX 0208\s0
+area so it has to convert \ex43\ex46, not \ex43. Unlike utf8 and \s-1EUC,\s0
+in escape-based encodings you can't tell if a given octet is a whole
+character or just part of it.
+.PP
+Fortunately PerlIO also supports line buffer if you tell PerlIO to use
+one instead of fixed buffer. Since \s-1ISO\-2022\-JP\s0 is guaranteed to revert to \s-1ASCII\s0 at the end of the line, partial
+character will never happen when line buffer is used.
+.PP
+To tell PerlIO to use line buffer, implement \->needs_lines method
+for your encoding object. See Encode::Encoding for details.
+.PP
+Thanks to these efforts most encodings that come with Encode support
+PerlIO but that still leaves following encodings.
+.PP
+.Vb 4
+\& iso\-2022\-kr
+\& MIME\-B
+\& MIME\-Header
+\& MIME\-Q
+.Ve
+.PP
+Fortunately iso\-2022\-kr is hardly used (according to Jungshik) and
+MIME\-* are very unlikely to be fed to PerlIO because they are for mail
+headers. See Encode::MIME::Header for details.
+.SS "How can I tell whether my encoding fully supports PerlIO ?"
+.IX Subsection "How can I tell whether my encoding fully supports PerlIO ?"
+As of this writing, any encoding whose class belongs to Encode::XS and
+Encode::Unicode works. The Encode module has a \f(CW\*(C`perlio_ok\*(C'\fR method
+which you can use before applying PerlIO encoding to the filehandle.
+Here is an example:
+.PP
+.Vb 7
+\& my $use_perlio = perlio_ok($enc);
+\& my $layer = $use_perlio ? "<:raw" : "<:encoding($enc)";
+\& open my $fh, $layer, $file or die "$file : $!";
+\& while(<$fh>){
+\& $_ = decode($enc, $_) unless $use_perlio;
+\& # ....
+\& }
+.Ve
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+Encode::Encoding,
+Encode::Supported,
+Encode::PerlIO,
+encoding,
+perlebcdic,
+\&\*(L"open\*(R" in perlfunc,
+perlunicode,
+utf8,
+the Perl Unicode Mailing List <perl\-unicode@perl.org>