diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
commit | fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch) | |
tree | ce1e3bce06471410239a6f41282e328770aa404a /upstream/debian-bookworm/man3/Encode::PerlIO.3perl | |
parent | Initial commit. (diff) | |
download | manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip |
Adding upstream version 4.22.0.upstream/4.22.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'upstream/debian-bookworm/man3/Encode::PerlIO.3perl')
-rw-r--r-- | upstream/debian-bookworm/man3/Encode::PerlIO.3perl | 254 |
1 files changed, 254 insertions, 0 deletions
diff --git a/upstream/debian-bookworm/man3/Encode::PerlIO.3perl b/upstream/debian-bookworm/man3/Encode::PerlIO.3perl new file mode 100644 index 00000000..30448e4b --- /dev/null +++ b/upstream/debian-bookworm/man3/Encode::PerlIO.3perl @@ -0,0 +1,254 @@ +.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" Set up some character translations and predefined strings. \*(-- will +.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left +.\" double quote, and \*(R" will give a right double quote. \*(C+ will +.\" give a nicer C++. Capital omega is used to do unbreakable dashes and +.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, +.\" nothing in troff, for use with C<>. +.tr \(*W- +.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' +.ie n \{\ +. ds -- \(*W- +. ds PI pi +. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch +. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch +. ds L" "" +. ds R" "" +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds -- \|\(em\| +. ds PI \(*p +. ds L" `` +. ds R" '' +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "Encode::PerlIO 3perl" +.TH Encode::PerlIO 3perl "2023-11-25" "perl v5.36.0" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH "NAME" +Encode::PerlIO \-\- a detailed document on Encode and PerlIO +.SH "Overview" +.IX Header "Overview" +It is very common to want to do encoding transformations when +reading or writing files, network connections, pipes etc. +If Perl is configured to use the new 'perlio' \s-1IO\s0 system then +\&\f(CW\*(C`Encode\*(C'\fR provides a \*(L"layer\*(R" (see PerlIO) which can transform +data as it is read or written. +.PP +Here is how the blind poet would modernise the encoding: +.PP +.Vb 7 +\& use Encode; +\& open(my $iliad,\*(Aq<:encoding(iso\-8859\-7)\*(Aq,\*(Aqiliad.greek\*(Aq); +\& open(my $utf8,\*(Aq>:utf8\*(Aq,\*(Aqiliad.utf8\*(Aq); +\& my @epic = <$iliad>; +\& print $utf8 @epic; +\& close($utf8); +\& close($illiad); +.Ve +.PP +In addition, the new \s-1IO\s0 system can also be configured to read/write +\&\s-1UTF\-8\s0 encoded characters (as noted above, this is efficient): +.PP +.Vb 2 +\& open(my $fh,\*(Aq>:utf8\*(Aq,\*(Aqanything\*(Aq); +\& print $fh "Any \ex{0021} string \eN{SMILEY FACE}\en"; +.Ve +.PP +Either of the above forms of \*(L"layer\*(R" specifications can be made the default +for a lexical scope with the \f(CW\*(C`use open ...\*(C'\fR pragma. See open. +.PP +Once a handle is open, its layers can be altered using \f(CW\*(C`binmode\*(C'\fR. +.PP +Without any such configuration, or if Perl itself is built using the +system's own \s-1IO,\s0 then write operations assume that the file handle +accepts only \fIbytes\fR and will \f(CW\*(C`die\*(C'\fR if a character larger than 255 is +written to the handle. When reading, each octet from the handle becomes +a byte-in-a-character. Note that this default is the same behaviour +as bytes-only languages (including Perl before v5.6) would have, +and is sufficient to handle native 8\-bit encodings e.g. iso\-8859\-1, +\&\s-1EBCDIC\s0 etc. and any legacy mechanisms for handling other encodings +and binary data. +.PP +In other cases, it is the program's responsibility to transform +characters into bytes using the \s-1API\s0 above before doing writes, and to +transform the bytes read from a handle into characters before doing +\&\*(L"character operations\*(R" (e.g. \f(CW\*(C`lc\*(C'\fR, \f(CW\*(C`/\eW+/\*(C'\fR, ...). +.PP +You can also use PerlIO to convert larger amounts of data you don't +want to bring into memory. For example, to convert between \s-1ISO\-8859\-1\s0 +(Latin 1) and \s-1UTF\-8\s0 (or UTF-EBCDIC in \s-1EBCDIC\s0 machines): +.PP +.Vb 3 +\& open(F, "<:encoding(iso\-8859\-1)", "data.txt") or die $!; +\& open(G, ">:utf8", "data.utf") or die $!; +\& while (<F>) { print G } +\& +\& # Could also do "print G <F>" but that would pull +\& # the whole file into memory just to write it out again. +.Ve +.PP +More examples: +.PP +.Vb 3 +\& open(my $f, "<:encoding(cp1252)") +\& open(my $g, ">:encoding(iso\-8859\-2)") +\& open(my $h, ">:encoding(latin9)") # iso\-8859\-15 +.Ve +.PP +See also encoding for how to change the default encoding of the +data in your script. +.SH "How does it work?" +.IX Header "How does it work?" +Here is a crude diagram of how filehandle, PerlIO, and Encode +interact. +.PP +.Vb 3 +\& filehandle <\-> PerlIO PerlIO <\-> scalar (read/printed) +\& \e / +\& Encode +.Ve +.PP +When PerlIO receives data from either direction, it fills a buffer +(currently with 1024 bytes) and passes the buffer to Encode. +Encode tries to convert the valid part and passes it back to PerlIO, +leaving invalid parts (usually a partial character) in the buffer. +PerlIO then appends more data to the buffer, calls Encode again, +and so on until the data stream ends. +.PP +To do so, PerlIO always calls (de|en)code methods with \s-1CHECK\s0 set to 1. +This ensures that the method stops at the right place when it +encounters partial character. The following is what happens when +PerlIO and Encode tries to encode (from utf8) more than 1024 bytes +and the buffer boundary happens to be in the middle of a character. +.PP +.Vb 5 +\& A B C .... ~ \ex{3000} .... +\& 41 42 43 .... 7E e3 80 80 .... +\& <\- buffer \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-> +\& << encoded >>>>>>>>>> +\& <\- next buffer \-\-\-\-\-\- +.Ve +.PP +Encode converts from the beginning to \ex7E, leaving \exe3 in the buffer +because it is invalid (partial character). +.PP +Unfortunately, this scheme does not work well with escape-based +encodings such as \s-1ISO\-2022\-JP.\s0 +.SH "Line Buffering" +.IX Header "Line Buffering" +Now let's see what happens when you try to decode from \s-1ISO\-2022\-JP\s0 and +the buffer ends in the middle of a character. +.PP +.Vb 5 +\& JIS208\-ESC \ex{5f3e} +\& A B C .... ~ \ee $ B |DAN | .... +\& 41 42 43 .... 7E 1b 24 41 43 46 .... +\& <\- buffer \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-> +\& << encoded >>>>>>>>>>>>>>>>>>>>>>> +.Ve +.PP +As you see, the next buffer begins with \ex43. But \ex43 is 'C' in +\&\s-1ASCII,\s0 which is wrong in this case because we are now in \s-1JISX 0208\s0 +area so it has to convert \ex43\ex46, not \ex43. Unlike utf8 and \s-1EUC,\s0 +in escape-based encodings you can't tell if a given octet is a whole +character or just part of it. +.PP +Fortunately PerlIO also supports line buffer if you tell PerlIO to use +one instead of fixed buffer. Since \s-1ISO\-2022\-JP\s0 is guaranteed to revert to \s-1ASCII\s0 at the end of the line, partial +character will never happen when line buffer is used. +.PP +To tell PerlIO to use line buffer, implement \->needs_lines method +for your encoding object. See Encode::Encoding for details. +.PP +Thanks to these efforts most encodings that come with Encode support +PerlIO but that still leaves following encodings. +.PP +.Vb 4 +\& iso\-2022\-kr +\& MIME\-B +\& MIME\-Header +\& MIME\-Q +.Ve +.PP +Fortunately iso\-2022\-kr is hardly used (according to Jungshik) and +MIME\-* are very unlikely to be fed to PerlIO because they are for mail +headers. See Encode::MIME::Header for details. +.SS "How can I tell whether my encoding fully supports PerlIO ?" +.IX Subsection "How can I tell whether my encoding fully supports PerlIO ?" +As of this writing, any encoding whose class belongs to Encode::XS and +Encode::Unicode works. The Encode module has a \f(CW\*(C`perlio_ok\*(C'\fR method +which you can use before applying PerlIO encoding to the filehandle. +Here is an example: +.PP +.Vb 7 +\& my $use_perlio = perlio_ok($enc); +\& my $layer = $use_perlio ? "<:raw" : "<:encoding($enc)"; +\& open my $fh, $layer, $file or die "$file : $!"; +\& while(<$fh>){ +\& $_ = decode($enc, $_) unless $use_perlio; +\& # .... +\& } +.Ve +.SH "SEE ALSO" +.IX Header "SEE ALSO" +Encode::Encoding, +Encode::Supported, +Encode::PerlIO, +encoding, +perlebcdic, +\&\*(L"open\*(R" in perlfunc, +perlunicode, +utf8, +the Perl Unicode Mailing List <perl\-unicode@perl.org> |