summaryrefslogtreecommitdiffstats
path: root/upstream/mageia-cauldron/man1/enc2xs.1
diff options
context:
space:
mode:
Diffstat (limited to 'upstream/mageia-cauldron/man1/enc2xs.1')
-rw-r--r--upstream/mageia-cauldron/man1/enc2xs.1351
1 files changed, 351 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man1/enc2xs.1 b/upstream/mageia-cauldron/man1/enc2xs.1
new file mode 100644
index 00000000..8bac403c
--- /dev/null
+++ b/upstream/mageia-cauldron/man1/enc2xs.1
@@ -0,0 +1,351 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+. ds C` ""
+. ds C' ""
+'br\}
+.el\{\
+. ds C`
+. ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD. Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+. if \nF \{\
+. de IX
+. tm Index:\\$1\t\\n%\t"\\$2"
+..
+. if !\nF==2 \{\
+. nr % 0
+. nr F 2
+. \}
+. \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "ENC2XS 1"
+.TH ENC2XS 1 2023-12-15 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification. Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+enc2xs \-\- Perl Encode Module Generator
+.SH SYNOPSIS
+.IX Header "SYNOPSIS"
+.Vb 3
+\& enc2xs \-[options]
+\& enc2xs \-M ModName mapfiles...
+\& enc2xs \-C
+.Ve
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+\&\fIenc2xs\fR builds a Perl extension for use by Encode from either
+Unicode Character Mapping files (.ucm) or Tcl Encoding Files (.enc).
+Besides being used internally during the build process of the Encode
+module, you can use \fIenc2xs\fR to add your own encoding to perl.
+No knowledge of XS is necessary.
+.SH "Quick Guide"
+.IX Header "Quick Guide"
+If you want to know as little about Perl as possible but need to
+add a new encoding, just read this chapter and forget the rest.
+.IP 0. 4
+.IX Item "0."
+Have a .ucm file ready. You can get it from somewhere or you can write
+your own from scratch or you can grab one from the Encode distribution
+and customize it. For the UCM format, see the next Chapter. In the
+example below, I'll call my theoretical encoding myascii, defined
+in \fImy.ucm\fR. \f(CW\*(C`$\*(C'\fR is a shell prompt.
+.Sp
+.Vb 2
+\& $ ls \-F
+\& my.ucm
+.Ve
+.IP 1. 4
+.IX Item "1."
+Issue a command as follows;
+.Sp
+.Vb 5
+\& $ enc2xs \-M My my.ucm
+\& generating Makefile.PL
+\& generating My.pm
+\& generating README
+\& generating Changes
+.Ve
+.Sp
+Now take a look at your current directory. It should look like this.
+.Sp
+.Vb 2
+\& $ ls \-F
+\& Makefile.PL My.pm my.ucm t/
+.Ve
+.Sp
+The following files were created.
+.Sp
+.Vb 3
+\& Makefile.PL \- MakeMaker script
+\& My.pm \- Encode submodule
+\& t/My.t \- test file
+.Ve
+.RS 4
+.IP 1.1. 4
+.IX Item "1.1."
+If you want *.ucm installed together with the modules, do as follows;
+.Sp
+.Vb 3
+\& $ mkdir Encode
+\& $ mv *.ucm Encode
+\& $ enc2xs \-M My Encode/*ucm
+.Ve
+.RE
+.RS 4
+.RE
+.IP 2. 4
+.IX Item "2."
+Edit the files generated. You don't have to if you have no time AND no
+intention to give it to someone else. But it is a good idea to edit
+the pod and to add more tests.
+.IP 3. 4
+.IX Item "3."
+Now issue a command all Perl Mongers love:
+.Sp
+.Vb 2
+\& $ perl Makefile.PL
+\& Writing Makefile for Encode::My
+.Ve
+.IP 4. 4
+.IX Item "4."
+Now all you have to do is make.
+.Sp
+.Vb 12
+\& $ make
+\& cp My.pm blib/lib/Encode/My.pm
+\& /usr/local/bin/perl /usr/local/bin/enc2xs \-Q \-O \e
+\& \-o encode_t.c \-f encode_t.fnm
+\& Reading myascii (myascii)
+\& Writing compiled form
+\& 128 bytes in string tables
+\& 384 bytes (75%) saved spotting duplicates
+\& 1 bytes (0.775%) saved using substrings
+\& ....
+\& chmod 644 blib/arch/auto/Encode/My/My.bs
+\& $
+.Ve
+.Sp
+The time it takes varies depending on how fast your machine is and
+how large your encoding is. Unless you are working on something big
+like euc-tw, it won't take too long.
+.IP 5. 4
+.IX Item "5."
+You can "make install" already but you should test first.
+.Sp
+.Vb 8
+\& $ make test
+\& PERL_DL_NONLAZY=1 /usr/local/bin/perl \-Iblib/arch \-Iblib/lib \e
+\& \-e \*(Aquse Test::Harness qw(&runtests $verbose); \e
+\& $verbose=0; runtests @ARGV;\*(Aq t/*.t
+\& t/My....ok
+\& All tests successful.
+\& Files=1, Tests=2, 0 wallclock secs
+\& ( 0.09 cusr + 0.01 csys = 0.09 CPU)
+.Ve
+.IP 6. 4
+.IX Item "6."
+If you are content with the test result, just "make install"
+.IP 7. 4
+.IX Item "7."
+If you want to add your encoding to Encode's demand-loading list
+(so you don't have to "use Encode::YourEncoding"), run
+.Sp
+.Vb 1
+\& enc2xs \-C
+.Ve
+.Sp
+to update Encode::ConfigLocal, a module that controls local settings.
+After that, "use Encode;" is enough to load your encodings on demand.
+.SH "The Unicode Character Map"
+.IX Header "The Unicode Character Map"
+Encode uses the Unicode Character Map (UCM) format for source character
+mappings. This format is used by IBM's ICU package and was adopted
+by Nick Ing-Simmons for use with the Encode module. Since UCM is
+more flexible than Tcl's Encoding Map and far more user-friendly,
+this is the recommended format for Encode now.
+.PP
+A UCM file looks like this.
+.PP
+.Vb 10
+\& #
+\& # Comments
+\& #
+\& <code_set_name> "US\-ascii" # Required
+\& <code_set_alias> "ascii" # Optional
+\& <mb_cur_min> 1 # Required; usually 1
+\& <mb_cur_max> 1 # Max. # of bytes/char
+\& <subchar> \ex3F # Substitution char
+\& #
+\& CHARMAP
+\& <U0000> \ex00 |0 # <control>
+\& <U0001> \ex01 |0 # <control>
+\& <U0002> \ex02 |0 # <control>
+\& ....
+\& <U007C> \ex7C |0 # VERTICAL LINE
+\& <U007D> \ex7D |0 # RIGHT CURLY BRACKET
+\& <U007E> \ex7E |0 # TILDE
+\& <U007F> \ex7F |0 # <control>
+\& END CHARMAP
+.Ve
+.IP \(bu 4
+Anything that follows \f(CW\*(C`#\*(C'\fR is treated as a comment.
+.IP \(bu 4
+The header section continues until a line containing the word
+CHARMAP. This section has a form of \fI<keyword> value\fR, one
+pair per line. Strings used as values must be quoted. Barewords are
+treated as numbers. \fI\exXX\fR represents a byte.
+.Sp
+Most of the keywords are self-explanatory. \fIsubchar\fR means
+substitution character, not subcharacter. When you decode a Unicode
+sequence to this encoding but no matching character is found, the byte
+sequence defined here will be used. For most cases, the value here is
+\&\ex3F; in ASCII, this is a question mark.
+.IP \(bu 4
+CHARMAP starts the character map section. Each line has a form as
+follows:
+.Sp
+.Vb 5
+\& <UXXXX> \exXX.. |0 # comment
+\& ^ ^ ^
+\& | | +\- Fallback flag
+\& | +\-\-\-\-\-\-\-\- Encoded byte sequence
+\& +\-\-\-\-\-\-\-\-\-\-\-\-\-\- Unicode Character ID in hex
+.Ve
+.Sp
+The format is roughly the same as a header section except for the
+fallback flag: | followed by 0..3. The meaning of the possible
+values is as follows:
+.RS 4
+.IP |0 4
+.IX Item "|0"
+Round trip safe. A character decoded to Unicode encodes back to the
+same byte sequence. Most characters have this flag.
+.IP |1 4
+.IX Item "|1"
+Fallback for unicode \-> encoding. When seen, enc2xs adds this
+character for the encode map only.
+.IP |2 4
+.IX Item "|2"
+Skip sub-char mapping should there be no code point.
+.IP |3 4
+.IX Item "|3"
+Fallback for encoding \-> unicode. When seen, enc2xs adds this
+character for the decode map only.
+.RE
+.RS 4
+.RE
+.IP \(bu 4
+And finally, END OF CHARMAP ends the section.
+.PP
+When you are manually creating a UCM file, you should copy ascii.ucm
+or an existing encoding which is close to yours, rather than write
+your own from scratch.
+.PP
+When you do so, make sure you leave at least \fBU0000\fR to \fBU0020\fR as
+is, unless your environment is EBCDIC.
+.PP
+\&\fBCAVEAT\fR: not all features in UCM are implemented. For example,
+icu:state is not used. Because of that, you need to write a perl
+module if you want to support algorithmical encodings, notably
+the ISO\-2022 series. Such modules include Encode::JP::2022_JP,
+Encode::KR::2022_KR, and Encode::TW::HZ.
+.SS "Coping with duplicate mappings"
+.IX Subsection "Coping with duplicate mappings"
+When you create a map, you SHOULD make your mappings round-trip safe.
+That is, \f(CWencode(\*(Aqyour\-encoding\*(Aq, decode(\*(Aqyour\-encoding\*(Aq, $data)) eq
+$data\fR stands for all characters that are marked as \f(CW\*(C`|0\*(C'\fR. Here is
+how to make sure:
+.IP \(bu 4
+Sort your map in Unicode order.
+.IP \(bu 4
+When you have a duplicate entry, mark either one with '|1' or '|3'.
+.IP \(bu 4
+And make sure the '|1' or '|3' entry FOLLOWS the '|0' entry.
+.PP
+Here is an example from big5\-eten.
+.PP
+.Vb 2
+\& <U2550> \exF9\exF9 |0
+\& <U2550> \exA2\exA4 |3
+.Ve
+.PP
+Internally Encoding \-> Unicode and Unicode \-> Encoding Map looks like
+this;
+.PP
+.Vb 4
+\& E to U U to E
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& \exF9\exF9 => U2550 U2550 => \exF9\exF9
+\& \exA2\exA4 => U2550
+.Ve
+.PP
+So it is round-trip safe for \exF9\exF9. But if the line above is upside
+down, here is what happens.
+.PP
+.Vb 4
+\& E to U U to E
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& \exA2\exA4 => U2550 U2550 => \exF9\exF9
+\& (\exF9\exF9 => U2550 is now overwritten!)
+.Ve
+.PP
+The Encode package comes with \fIucmlint\fR, a crude but sufficient
+utility to check the integrity of a UCM file. Check under the
+Encode/bin directory for this.
+.PP
+When in doubt, you can use \fIucmsort\fR, yet another utility under
+Encode/bin directory.
+.SH Bookmarks
+.IX Header "Bookmarks"
+.IP \(bu 4
+ICU Home Page
+<http://www.icu\-project.org/>
+.IP \(bu 4
+ICU Character Mapping Tables
+<http://site.icu\-project.org/charts/charset>
+.IP \(bu 4
+ICU:Conversion Data
+<http://www.icu\-project.org/userguide/conversion\-data.html>
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+Encode,
+perlmod,
+perlpod