Adding upstream version 4.22.0.upstream/4.22.0

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-15 19:43:11 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-15 19:43:11 +0000
commit: fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
tree: ce1e3bce06471410239a6f41282e328770aa404a /upstream/mageia-cauldron/man3pm/Encode::Guess.3pm
parent: Initial commit. (diff)
download: manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz
manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip
1 files changed, 245 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man3pm/Encode::Guess.3pm b/upstream/mageia-cauldron/man3pm/Encode::Guess.3pm
new file mode 100644
index 00000000..2fcfac1d
--- /dev/null
+++ b/upstream/mageia-cauldron/man3pm/Encode::Guess.3pm
@@ -0,0 +1,245 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+.    ds C` ""
+.    ds C' ""
+'br\}
+.el\{\
+.    ds C`
+.    ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el       .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD.  Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+.    if \nF \{\
+.        de IX
+.        tm Index:\\$1\t\\n%\t"\\$2"
+..
+.        if !\nF==2 \{\
+.            nr % 0
+.            nr F 2
+.        \}
+.    \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "Encode::Guess 3pm"
+.TH Encode::Guess 3pm 2023-11-28 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+Encode::Guess \-\- Guesses encoding from data
+.SH SYNOPSIS
+.IX Header "SYNOPSIS"
+.Vb 1
+\&  # if you are sure $data won\*(Aqt contain anything bogus
+\&
+\&  use Encode;
+\&  use Encode::Guess qw/euc\-jp shiftjis 7bit\-jis/;
+\&  my $utf8 = decode("Guess", $data);
+\&  my $data = encode("Guess", $utf8);   # this doesn\*(Aqt work!
+\&
+\&  # more elaborate way
+\&  use Encode::Guess;
+\&  my $enc = guess_encoding($data, qw/euc\-jp shiftjis 7bit\-jis/);
+\&  ref($enc) or die "Can\*(Aqt guess: $enc"; # trap error this way
+\&  $utf8 = $enc\->decode($data);
+\&  # or
+\&  $utf8 = decode($enc\->name, $data)
+.Ve
+.SH ABSTRACT
+.IX Header "ABSTRACT"
+Encode::Guess enables you to guess in what encoding a given data is
+encoded, or at least tries to.
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+By default, it checks only ascii, utf8 and UTF\-16/32 with BOM.
+.PP
+.Vb 1
+\&  use Encode::Guess; # ascii/utf8/BOMed UTF
+.Ve
+.PP
+To use it more practically, you have to give the names of encodings to
+check (\fIsuspects\fR as follows).  The name of suspects can either be
+canonical names or aliases.
+.PP
+CAVEAT: Unlike UTF\-(16|32), BOM in utf8 is NOT AUTOMATICALLY STRIPPED.
+.PP
+.Vb 2
+\& # tries all major Japanese Encodings as well
+\&  use Encode::Guess qw/euc\-jp shiftjis 7bit\-jis/;
+.Ve
+.PP
+If the \f(CW$Encode::Guess::NoUTFAutoGuess\fR variable is set to a true
+value, no heuristics will be applied to UTF8/16/32, and the result
+will be limited to the suspects and \f(CW\*(C`ascii\*(C'\fR.
+.IP Encode::Guess\->set_suspects 4
+.IX Item "Encode::Guess->set_suspects"
+You can also change the internal suspects list via \f(CW\*(C`set_suspects\*(C'\fR
+method.
+.Sp
+.Vb 2
+\&  use Encode::Guess;
+\&  Encode::Guess\->set_suspects(qw/euc\-jp shiftjis 7bit\-jis/);
+.Ve
+.IP Encode::Guess\->add_suspects 4
+.IX Item "Encode::Guess->add_suspects"
+Or you can use \f(CW\*(C`add_suspects\*(C'\fR method.  The difference is that
+\&\f(CW\*(C`set_suspects\*(C'\fR flushes the current suspects list while
+\&\f(CW\*(C`add_suspects\*(C'\fR adds.
+.Sp
+.Vb 5
+\&  use Encode::Guess;
+\&  Encode::Guess\->add_suspects(qw/euc\-jp shiftjis 7bit\-jis/);
+\&  # now the suspects are euc\-jp,shiftjis,7bit\-jis, AND
+\&  # euc\-kr,euc\-cn, and big5\-eten
+\&  Encode::Guess\->add_suspects(qw/euc\-kr euc\-cn big5\-eten/);
+.Ve
+.IP "Encode::decode(""Guess"" ...)" 4
+.IX Item "Encode::decode(""Guess"" ...)"
+When you are content with suspects list, you can now
+.Sp
+.Vb 1
+\&  my $utf8 = Encode::decode("Guess", $data);
+.Ve
+.IP Encode::Guess\->guess($data) 4
+.IX Item "Encode::Guess->guess($data)"
+But it will croak if:
+.RS 4
+.IP \(bu 4
+Two or more suspects remain
+.IP \(bu 4
+No suspects left
+.RE
+.RS 4
+.Sp
+So you should instead try this;
+.Sp
+.Vb 1
+\&  my $decoder = Encode::Guess\->guess($data);
+.Ve
+.Sp
+On success, \f(CW$decoder\fR is an object that is documented in
+Encode::Encoding.  So you can now do this;
+.Sp
+.Vb 1
+\&  my $utf8 = $decoder\->decode($data);
+.Ve
+.Sp
+On failure, \f(CW$decoder\fR now contains an error message so the whole thing
+would be as follows;
+.Sp
+.Vb 3
+\&  my $decoder = Encode::Guess\->guess($data);
+\&  die $decoder unless ref($decoder);
+\&  my $utf8 = $decoder\->decode($data);
+.Ve
+.RE
+.IP "guess_encoding($data, [, \fIlist of suspects\fR])" 4
+.IX Item "guess_encoding($data, [, list of suspects])"
+You can also try \f(CW\*(C`guess_encoding\*(C'\fR function which is exported by
+default.  It takes \f(CW$data\fR to check and it also takes the list of
+suspects by option.  The optional suspect list is \fInot reflected\fR to
+the internal suspects list.
+.Sp
+.Vb 5
+\&  my $decoder = guess_encoding($data, qw/euc\-jp euc\-kr euc\-cn/);
+\&  die $decoder unless ref($decoder);
+\&  my $utf8 = $decoder\->decode($data);
+\&  # check only ascii, utf8 and UTF\-(16|32) with BOM
+\&  my $decoder = guess_encoding($data);
+.Ve
+.SH CAVEATS
+.IX Header "CAVEATS"
+.IP \(bu 4
+Because of the algorithm used, ISO\-8859 series and other single-byte
+encodings do not work well unless either one of ISO\-8859 is the only
+one suspect (besides ascii and utf8).
+.Sp
+.Vb 5
+\&  use Encode::Guess;
+\&  # perhaps ok
+\&  my $decoder = guess_encoding($data, \*(Aqlatin1\*(Aq);
+\&  # definitely NOT ok
+\&  my $decoder = guess_encoding($data, qw/latin1 greek/);
+.Ve
+.Sp
+The reason is that Encode::Guess guesses encoding by trial and error.
+It first splits \f(CW$data\fR into lines and tries to decode the line for each
+suspect.  It keeps it going until all but one encoding is eliminated
+out of suspects list.  ISO\-8859 series is just too successful for most
+cases (because it fills almost all code points in \ex00\-\exff).
+.IP \(bu 4
+Do not mix national standard encodings and the corresponding vendor
+encodings.
+.Sp
+.Vb 3
+\&  # a very bad idea
+\&  my $decoder
+\&     = guess_encoding($data, qw/shiftjis MacJapanese cp932/);
+.Ve
+.Sp
+The reason is that vendor encoding is usually a superset of national
+standard so it becomes too ambiguous for most cases.
+.IP \(bu 4
+On the other hand, mixing various national standard encodings
+automagically works unless \f(CW$data\fR is too short to allow for guessing.
+.Sp
+.Vb 6
+\& # This is ok if $data is long enough
+\& my $decoder =  
+\&  guess_encoding($data, qw/euc\-cn
+\&                           euc\-jp shiftjis 7bit\-jis
+\&                           euc\-kr
+\&                           big5\-eten/);
+.Ve
+.IP \(bu 4
+DO NOT PUT TOO MANY SUSPECTS!  Don't you try something like this!
+.Sp
+.Vb 2
+\&  my $decoder = guess_encoding($data, 
+\&                               Encode\->encodings(":all"));
+.Ve
+.PP
+It is, after all, just a guess.  You should alway be explicit when it
+comes to encodings.  But there are some, especially Japanese,
+environment that guess-coding is a must.  Use this module with care.
+.SH "TO DO"
+.IX Header "TO DO"
+Encode::Guess does not work on EBCDIC platforms.
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+Encode, Encode::Encoding
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-15 19:43:11 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-15 19:43:11 +0000
commit	fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
tree	ce1e3bce06471410239a6f41282e328770aa404a /upstream/mageia-cauldron/man3pm/Encode::Guess.3pm
parent	Initial commit. (diff)
download	manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip