Adding upstream version 4.22.0.upstream/4.22.0

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-15 19:43:11 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-15 19:43:11 +0000
commit: fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
tree: ce1e3bce06471410239a6f41282e328770aa404a /upstream/mageia-cauldron/man1/enc2xs.1
parent: Initial commit. (diff)
download: manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz
manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip
1 files changed, 351 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man1/enc2xs.1 b/upstream/mageia-cauldron/man1/enc2xs.1
new file mode 100644
index 00000000..8bac403c
--- /dev/null
+++ b/upstream/mageia-cauldron/man1/enc2xs.1
@@ -0,0 +1,351 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+.    ds C` ""
+.    ds C' ""
+'br\}
+.el\{\
+.    ds C`
+.    ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el       .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD.  Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+.    if \nF \{\
+.        de IX
+.        tm Index:\\$1\t\\n%\t"\\$2"
+..
+.        if !\nF==2 \{\
+.            nr % 0
+.            nr F 2
+.        \}
+.    \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "ENC2XS 1"
+.TH ENC2XS 1 2023-12-15 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+enc2xs \-\- Perl Encode Module Generator
+.SH SYNOPSIS
+.IX Header "SYNOPSIS"
+.Vb 3
+\&  enc2xs \-[options]
+\&  enc2xs \-M ModName mapfiles...
+\&  enc2xs \-C
+.Ve
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+\&\fIenc2xs\fR builds a Perl extension for use by Encode from either
+Unicode Character Mapping files (.ucm) or Tcl Encoding Files (.enc).
+Besides being used internally during the build process of the Encode
+module, you can use \fIenc2xs\fR to add your own encoding to perl.
+No knowledge of XS is necessary.
+.SH "Quick Guide"
+.IX Header "Quick Guide"
+If you want to know as little about Perl as possible but need to
+add a new encoding, just read this chapter and forget the rest.
+.IP 0. 4
+.IX Item "0."
+Have a .ucm file ready.  You can get it from somewhere or you can write
+your own from scratch or you can grab one from the Encode distribution
+and customize it.  For the UCM format, see the next Chapter.  In the
+example below, I'll call my theoretical encoding myascii, defined
+in \fImy.ucm\fR.  \f(CW\*(C`$\*(C'\fR is a shell prompt.
+.Sp
+.Vb 2
+\&  $ ls \-F
+\&  my.ucm
+.Ve
+.IP 1. 4
+.IX Item "1."
+Issue a command as follows;
+.Sp
+.Vb 5
+\&  $ enc2xs \-M My my.ucm
+\&  generating Makefile.PL
+\&  generating My.pm
+\&  generating README
+\&  generating Changes
+.Ve
+.Sp
+Now take a look at your current directory.  It should look like this.
+.Sp
+.Vb 2
+\&  $ ls \-F
+\&  Makefile.PL   My.pm         my.ucm        t/
+.Ve
+.Sp
+The following files were created.
+.Sp
+.Vb 3
+\&  Makefile.PL \- MakeMaker script
+\&  My.pm       \- Encode submodule
+\&  t/My.t      \- test file
+.Ve
+.RS 4
+.IP 1.1. 4
+.IX Item "1.1."
+If you want *.ucm installed together with the modules, do as follows;
+.Sp
+.Vb 3
+\&  $ mkdir Encode
+\&  $ mv *.ucm Encode
+\&  $ enc2xs \-M My Encode/*ucm
+.Ve
+.RE
+.RS 4
+.RE
+.IP 2. 4
+.IX Item "2."
+Edit the files generated.  You don't have to if you have no time AND no
+intention to give it to someone else.  But it is a good idea to edit
+the pod and to add more tests.
+.IP 3. 4
+.IX Item "3."
+Now issue a command all Perl Mongers love:
+.Sp
+.Vb 2
+\&  $ perl Makefile.PL
+\&  Writing Makefile for Encode::My
+.Ve
+.IP 4. 4
+.IX Item "4."
+Now all you have to do is make.
+.Sp
+.Vb 12
+\&  $ make
+\&  cp My.pm blib/lib/Encode/My.pm
+\&  /usr/local/bin/perl /usr/local/bin/enc2xs \-Q \-O \e
+\&    \-o encode_t.c \-f encode_t.fnm
+\&  Reading myascii (myascii)
+\&  Writing compiled form
+\&  128 bytes in string tables
+\&  384 bytes (75%) saved spotting duplicates
+\&  1 bytes (0.775%) saved using substrings
+\&  ....
+\&  chmod 644 blib/arch/auto/Encode/My/My.bs
+\&  $
+.Ve
+.Sp
+The time it takes varies depending on how fast your machine is and
+how large your encoding is.  Unless you are working on something big
+like euc-tw, it won't take too long.
+.IP 5. 4
+.IX Item "5."
+You can "make install" already but you should test first.
+.Sp
+.Vb 8
+\&  $ make test
+\&  PERL_DL_NONLAZY=1 /usr/local/bin/perl \-Iblib/arch \-Iblib/lib \e
+\&    \-e \*(Aquse Test::Harness  qw(&runtests $verbose); \e
+\&    $verbose=0; runtests @ARGV;\*(Aq t/*.t
+\&  t/My....ok
+\&  All tests successful.
+\&  Files=1, Tests=2,  0 wallclock secs
+\&   ( 0.09 cusr + 0.01 csys = 0.09 CPU)
+.Ve
+.IP 6. 4
+.IX Item "6."
+If you are content with the test result, just "make install"
+.IP 7. 4
+.IX Item "7."
+If you want to add your encoding to Encode's demand-loading list
+(so you don't have to "use Encode::YourEncoding"), run
+.Sp
+.Vb 1
+\&  enc2xs \-C
+.Ve
+.Sp
+to update Encode::ConfigLocal, a module that controls local settings.
+After that, "use Encode;" is enough to load your encodings on demand.
+.SH "The Unicode Character Map"
+.IX Header "The Unicode Character Map"
+Encode uses the Unicode Character Map (UCM) format for source character
+mappings.  This format is used by IBM's ICU package and was adopted
+by Nick Ing-Simmons for use with the Encode module.  Since UCM is
+more flexible than Tcl's Encoding Map and far more user-friendly,
+this is the recommended format for Encode now.
+.PP
+A UCM file looks like this.
+.PP
+.Vb 10
+\&  #
+\&  # Comments
+\&  #
+\&  <code_set_name> "US\-ascii" # Required
+\&  <code_set_alias> "ascii"   # Optional
+\&  <mb_cur_min> 1             # Required; usually 1
+\&  <mb_cur_max> 1             # Max. # of bytes/char
+\&  <subchar> \ex3F             # Substitution char
+\&  #
+\&  CHARMAP
+\&  <U0000> \ex00 |0 # <control>
+\&  <U0001> \ex01 |0 # <control>
+\&  <U0002> \ex02 |0 # <control>
+\&  ....
+\&  <U007C> \ex7C |0 # VERTICAL LINE
+\&  <U007D> \ex7D |0 # RIGHT CURLY BRACKET
+\&  <U007E> \ex7E |0 # TILDE
+\&  <U007F> \ex7F |0 # <control>
+\&  END CHARMAP
+.Ve
+.IP \(bu 4
+Anything that follows \f(CW\*(C`#\*(C'\fR is treated as a comment.
+.IP \(bu 4
+The header section continues until a line containing the word
+CHARMAP. This section has a form of \fI<keyword> value\fR, one
+pair per line.  Strings used as values must be quoted. Barewords are
+treated as numbers.  \fI\exXX\fR represents a byte.
+.Sp
+Most of the keywords are self-explanatory. \fIsubchar\fR means
+substitution character, not subcharacter.  When you decode a Unicode
+sequence to this encoding but no matching character is found, the byte
+sequence defined here will be used.  For most cases, the value here is
+\&\ex3F; in ASCII, this is a question mark.
+.IP \(bu 4
+CHARMAP starts the character map section.  Each line has a form as
+follows:
+.Sp
+.Vb 5
+\&  <UXXXX> \exXX.. |0 # comment
+\&    ^     ^      ^
+\&    |     |      +\- Fallback flag
+\&    |     +\-\-\-\-\-\-\-\- Encoded byte sequence
+\&    +\-\-\-\-\-\-\-\-\-\-\-\-\-\- Unicode Character ID in hex
+.Ve
+.Sp
+The format is roughly the same as a header section except for the
+fallback flag: | followed by 0..3.   The meaning of the possible
+values is as follows:
+.RS 4
+.IP |0 4
+.IX Item "|0"
+Round trip safe.  A character decoded to Unicode encodes back to the
+same byte sequence.  Most characters have this flag.
+.IP |1 4
+.IX Item "|1"
+Fallback for unicode \-> encoding.  When seen, enc2xs adds this
+character for the encode map only.
+.IP |2 4
+.IX Item "|2"
+Skip sub-char mapping should there be no code point.
+.IP |3 4
+.IX Item "|3"
+Fallback for encoding \-> unicode.  When seen, enc2xs adds this
+character for the decode map only.
+.RE
+.RS 4
+.RE
+.IP \(bu 4
+And finally, END OF CHARMAP ends the section.
+.PP
+When you are manually creating a UCM file, you should copy ascii.ucm
+or an existing encoding which is close to yours, rather than write
+your own from scratch.
+.PP
+When you do so, make sure you leave at least \fBU0000\fR to \fBU0020\fR as
+is, unless your environment is EBCDIC.
+.PP
+\&\fBCAVEAT\fR: not all features in UCM are implemented.  For example,
+icu:state is not used.  Because of that, you need to write a perl
+module if you want to support algorithmical encodings, notably
+the ISO\-2022 series.  Such modules include Encode::JP::2022_JP,
+Encode::KR::2022_KR, and Encode::TW::HZ.
+.SS "Coping with duplicate mappings"
+.IX Subsection "Coping with duplicate mappings"
+When you create a map, you SHOULD make your mappings round-trip safe.
+That is, \f(CWencode(\*(Aqyour\-encoding\*(Aq, decode(\*(Aqyour\-encoding\*(Aq, $data)) eq
+$data\fR stands for all characters that are marked as \f(CW\*(C`|0\*(C'\fR.  Here is
+how to make sure:
+.IP \(bu 4
+Sort your map in Unicode order.
+.IP \(bu 4
+When you have a duplicate entry, mark either one with '|1' or '|3'.
+.IP \(bu 4
+And make sure the '|1' or '|3' entry FOLLOWS the '|0' entry.
+.PP
+Here is an example from big5\-eten.
+.PP
+.Vb 2
+\&  <U2550> \exF9\exF9 |0
+\&  <U2550> \exA2\exA4 |3
+.Ve
+.PP
+Internally Encoding \-> Unicode and Unicode \-> Encoding Map looks like
+this;
+.PP
+.Vb 4
+\&  E to U               U to E
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  \exF9\exF9 => U2550    U2550 => \exF9\exF9
+\&  \exA2\exA4 => U2550
+.Ve
+.PP
+So it is round-trip safe for \exF9\exF9.  But if the line above is upside
+down, here is what happens.
+.PP
+.Vb 4
+\&  E to U               U to E
+\&  \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\&  \exA2\exA4 => U2550    U2550 => \exF9\exF9
+\&  (\exF9\exF9 => U2550 is now overwritten!)
+.Ve
+.PP
+The Encode package comes with \fIucmlint\fR, a crude but sufficient
+utility to check the integrity of a UCM file.  Check under the
+Encode/bin directory for this.
+.PP
+When in doubt, you can use \fIucmsort\fR, yet another utility under
+Encode/bin directory.
+.SH Bookmarks
+.IX Header "Bookmarks"
+.IP \(bu 4
+ICU Home Page 
+<http://www.icu\-project.org/>
+.IP \(bu 4
+ICU Character Mapping Tables
+<http://site.icu\-project.org/charts/charset>
+.IP \(bu 4
+ICU:Conversion Data
+<http://www.icu\-project.org/userguide/conversion\-data.html>
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+Encode,
+perlmod,
+perlpod
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-15 19:43:11 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-15 19:43:11 +0000
commit	fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
tree	ce1e3bce06471410239a6f41282e328770aa404a /upstream/mageia-cauldron/man1/enc2xs.1
parent	Initial commit. (diff)
download	manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip