Adding upstream version 4.22.0.upstream/4.22.0

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-15 19:43:11 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-15 19:43:11 +0000
commit: fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
tree: ce1e3bce06471410239a6f41282e328770aa404a /upstream/mageia-cauldron/man1/perlebcdic.1
parent: Initial commit. (diff)
download: manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz
manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip
1 files changed, 1998 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man1/perlebcdic.1 b/upstream/mageia-cauldron/man1/perlebcdic.1
new file mode 100644
index 00000000..b23c5fd5
--- /dev/null
+++ b/upstream/mageia-cauldron/man1/perlebcdic.1
@@ -0,0 +1,1998 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+.    ds C` ""
+.    ds C' ""
+'br\}
+.el\{\
+.    ds C`
+.    ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el       .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD.  Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+.    if \nF \{\
+.        de IX
+.        tm Index:\\$1\t\\n%\t"\\$2"
+..
+.        if !\nF==2 \{\
+.            nr % 0
+.            nr F 2
+.        \}
+.    \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "PERLEBCDIC 1"
+.TH PERLEBCDIC 1 2023-11-28 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+perlebcdic \- Considerations for running Perl on EBCDIC platforms
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+An exploration of some of the issues facing Perl programmers
+on EBCDIC based computers.
+.PP
+Portions of this document that are still incomplete are marked with XXX.
+.PP
+Early Perl versions worked on some EBCDIC machines, but the last known
+version that ran on EBCDIC was v5.8.7, until v5.22, when the Perl core
+again works on z/OS.  Theoretically, it could work on OS/400 or Siemens'
+BS2000  (or their successors), but this is untested.  In v5.22 and 5.24,
+not all
+the modules found on CPAN but shipped with core Perl work on z/OS.
+.PP
+If you want to use Perl on a non\-z/OS EBCDIC machine, please let us know
+at <https://github.com/Perl/perl5/issues>.
+.PP
+Writing Perl on an EBCDIC platform is really no different than writing
+on an "ASCII" one, but with different underlying numbers, as we'll see
+shortly.  You'll have to know something about those "ASCII" platforms
+because the documentation is biased and will frequently use example
+numbers that don't apply to EBCDIC.  There are also very few CPAN
+modules that are written for EBCDIC and which don't work on ASCII;
+instead the vast majority of CPAN modules are written for ASCII, and
+some may happen to work on EBCDIC, while a few have been designed to
+portably work on both.
+.PP
+If your code just uses the 52 letters A\-Z and a\-z, plus SPACE, the
+digits 0\-9, and the punctuation characters that Perl uses, plus a few
+controls that are denoted by escape sequences like \f(CW\*(C`\en\*(C'\fR and \f(CW\*(C`\et\*(C'\fR, then
+there's nothing special about using Perl, and your code may very well
+work on an ASCII machine without change.
+.PP
+But if you write code that uses \f(CW\*(C`\e005\*(C'\fR to mean a TAB or \f(CW\*(C`\exC1\*(C'\fR to mean
+an "A", or \f(CW\*(C`\exDF\*(C'\fR to mean a "ÿ" (small \f(CW"y"\fR with a diaeresis),
+then your code may well work on your EBCDIC platform, but not on an
+ASCII one.  That's fine to do if no one will ever want to run your code
+on an ASCII platform; but the bias in this document will be towards writing
+code portable between EBCDIC and ASCII systems.  Again, if every
+character you care about is easily enterable from your keyboard, you
+don't have to know anything about ASCII, but many keyboards don't easily
+allow you to directly enter, say, the character \f(CW\*(C`\exDF\*(C'\fR, so you have to
+specify it indirectly, such as by using the \f(CW"\exDF"\fR escape sequence.
+In those cases it's easiest to know something about the ASCII/Unicode
+character sets.  If you know that the small "ÿ" is \f(CW\*(C`U+00FF\*(C'\fR, then
+you can instead specify it as \f(CW"\eN{U+FF}"\fR, and have the computer
+automatically translate it to \f(CW\*(C`\exDF\*(C'\fR on your platform, and leave it as
+\&\f(CW\*(C`\exFF\*(C'\fR on ASCII ones.  Or you could specify it by name, \f(CW\*(C`\eN{LATIN
+SMALL LETTER Y WITH DIAERESIS\*(C'\fR and not have to know the  numbers.
+Either way works, but both require familiarity with Unicode.
+.SH "COMMON CHARACTER CODE SETS"
+.IX Header "COMMON CHARACTER CODE SETS"
+.SS ASCII
+.IX Subsection "ASCII"
+The American Standard Code for Information Interchange (ASCII or
+US-ASCII) is a set of
+integers running from 0 to 127 (decimal) that have standardized
+interpretations by the computers which use ASCII.  For example, 65 means
+the letter "A".
+The range 0..127 can be covered by setting various bits in a 7\-bit binary
+digit, hence the set is sometimes referred to as "7\-bit ASCII".
+ASCII was described by the American National Standards Institute
+document ANSI X3.4\-1986.  It was also described by ISO 646:1991
+(with localization for currency symbols).  The full ASCII set is
+given in the table below as the first 128 elements.
+Languages that
+can be written adequately with the characters in ASCII include
+English, Hawaiian, Indonesian, Swahili and some Native American
+languages.
+.PP
+Most non-EBCDIC character sets are supersets of ASCII.  That is the
+integers 0\-127 mean what ASCII says they mean.  But integers 128 and
+above are specific to the character set.
+.PP
+Many of these fit entirely into 8 bits, using ASCII as 0\-127, while
+specifying what 128\-255 mean, and not using anything above 255.
+Thus, these are single-byte (or octet if you prefer) character sets.
+One important one (since Unicode is a superset of it) is the ISO 8859\-1
+character set.
+.SS "ISO 8859"
+.IX Subsection "ISO 8859"
+The ISO 8859\-\fR\f(CB$n\fR\f(BI\fR\fI\fR are a collection of character code sets from the
+International Organization for Standardization (ISO), each of which adds
+characters to the ASCII set that are typically found in various
+languages, many of which are based on the Roman, or Latin, alphabet.
+Most are for European languages, but there are also ones for Arabic,
+Greek, Hebrew, and Thai.  There are good references on the web about
+all these.
+.SS "Latin 1 (ISO 8859\-1)"
+.IX Subsection "Latin 1 (ISO 8859-1)"
+A particular 8\-bit extension to ASCII that includes grave and acute
+accented Latin characters.  Languages that can employ ISO 8859\-1
+include all the languages covered by ASCII as well as Afrikaans,
+Albanian, Basque, Catalan, Danish, Faroese, Finnish, Norwegian,
+Portuguese, Spanish, and Swedish.  Dutch is covered albeit without
+the ij ligature.  French is covered too but without the oe ligature.
+German can use ISO 8859\-1 but must do so without German-style
+quotation marks.  This set is based on Western European extensions
+to ASCII and is commonly encountered in world wide web work.
+In IBM character code set identification terminology, ISO 8859\-1 is
+also known as CCSID 819 (or sometimes 0819 or even 00819).
+.SS EBCDIC
+.IX Subsection "EBCDIC"
+The Extended Binary Coded Decimal Interchange Code refers to a
+large collection of single\- and multi-byte coded character sets that are
+quite different from ASCII and ISO 8859\-1, and are all slightly
+different from each other; they typically run on host computers.  The
+EBCDIC encodings derive from 8\-bit byte extensions of Hollerith punched
+card encodings, which long predate ASCII.  The layout on the
+cards was such that high bits were set for the upper and lower case
+alphabetic
+characters \f(CW\*(C`[a\-z]\*(C'\fR and \f(CW\*(C`[A\-Z]\*(C'\fR, but there were gaps within each Latin
+alphabet range, visible in the table below.  These gaps can
+cause complications.
+.PP
+Some IBM EBCDIC character sets may be known by character code set
+identification numbers (CCSID numbers) or code page numbers.
+.PP
+Perl can be compiled on platforms that run any of three commonly used EBCDIC
+character sets, listed below.
+.PP
+\fIThe 13 variant characters\fR
+.IX Subsection "The 13 variant characters"
+.PP
+Among IBM EBCDIC character code sets there are 13 characters that
+are often mapped to different integer values.  Those characters
+are known as the 13 "variant" characters and are:
+.PP
+.Vb 1
+\&    \e [ ] { } ^ ~ ! # | $ @ \`
+.Ve
+.PP
+When Perl is compiled for a platform, it looks at all of these characters to
+guess which EBCDIC character set the platform uses, and adapts itself
+accordingly to that platform.  If the platform uses a character set that is not
+one of the three Perl knows about, Perl will either fail to compile, or
+mistakenly and silently choose one of the three.
+.PP
+The Line Feed (LF) character is actually a 14th variant character, and
+Perl checks for that as well.
+.PP
+\fIEBCDIC code sets recognized by Perl\fR
+.IX Subsection "EBCDIC code sets recognized by Perl"
+.IP \fB0037\fR 4
+.IX Item "0037"
+Character code set ID 0037 is a mapping of the ASCII plus Latin\-1
+characters (i.e. ISO 8859\-1) to an EBCDIC set.  0037 is used
+in North American English locales on the OS/400 operating system
+that runs on AS/400 computers.  CCSID 0037 differs from ISO 8859\-1
+in 236 places; in other words they agree on only 20 code point values.
+.IP \fB1047\fR 4
+.IX Item "1047"
+Character code set ID 1047 is also a mapping of the ASCII plus
+Latin\-1 characters (i.e. ISO 8859\-1) to an EBCDIC set.  1047 is
+used under Unix System Services for OS/390 or z/OS, and OpenEdition
+for VM/ESA.  CCSID 1047 differs from CCSID 0037 in eight places,
+and from ISO 8859\-1 in 236.
+.IP \fBPOSIX-BC\fR 4
+.IX Item "POSIX-BC"
+The EBCDIC code page in use on Siemens' BS2000 system is distinct from
+1047 and 0037.  It is identified below as the POSIX-BC set.
+Like 0037 and 1047, it is the same as ISO 8859\-1 in 20 code point
+values.
+.SS "Unicode code points versus EBCDIC code points"
+.IX Subsection "Unicode code points versus EBCDIC code points"
+In Unicode terminology a \fIcode point\fR is the number assigned to a
+character: for example, in EBCDIC the character "A" is usually assigned
+the number 193.  In Unicode, the character "A" is assigned the number 65.
+All the code points in ASCII and Latin\-1 (ISO 8859\-1) have the same
+meaning in Unicode.  All three of the recognized EBCDIC code sets have
+256 code points, and in each code set, all 256 code points are mapped to
+equivalent Latin1 code points.  Obviously, "A" will map to "A", "B" =>
+"B", "%" => "%", etc., for all printable characters in Latin1 and these
+code pages.
+.PP
+It also turns out that EBCDIC has nearly precise equivalents for the
+ASCII/Latin1 C0 controls and the DELETE control.  (The C0 controls are
+those whose ASCII code points are 0..0x1F; things like TAB, ACK, BEL,
+etc.)  A mapping is set up between these ASCII/EBCDIC controls.  There
+isn't such a precise mapping between the C1 controls on ASCII platforms
+and the remaining EBCDIC controls.  What has been done is to map these
+controls, mostly arbitrarily, to some otherwise unmatched character in
+the other character set.  Most of these are very very rarely used
+nowadays in EBCDIC anyway, and their names have been dropped, without
+much complaint.  For example the EO (Eight Ones) EBCDIC control
+(consisting of eight one bits = 0xFF) is mapped to the C1 APC control
+(0x9F), and you can't use the name "EO".
+.PP
+The EBCDIC controls provide three possible line terminator characters,
+CR (0x0D), LF (0x25), and NL (0x15).  On ASCII platforms, the symbols
+"NL" and "LF" refer to the same character, but in strict EBCDIC
+terminology they are different ones.  The EBCDIC NL is mapped to the C1
+control called "NEL" ("Next Line"; here's a case where the mapping makes
+quite a bit of sense, and hence isn't just arbitrary).  On some EBCDIC
+platforms, this NL or NEL is the typical line terminator.  This is true
+of z/OS and BS2000.  In these platforms, the C compilers will swap the
+LF and NEL code points, so that \f(CW"\en"\fR is 0x15, and refers to NL.  Perl
+does that too; you can see it in the code chart below.
+This makes things generally "just work" without you even having to be
+aware that there is a swap.
+.SS "Unicode and UTF"
+.IX Subsection "Unicode and UTF"
+UTF stands for "Unicode Transformation Format".
+UTF\-8 is an encoding of Unicode into a sequence of 8\-bit byte chunks, based on
+ASCII and Latin\-1.
+The length of a sequence required to represent a Unicode code point
+depends on the ordinal number of that code point,
+with larger numbers requiring more bytes.
+UTF-EBCDIC is like UTF\-8, but based on EBCDIC.
+They are enough alike that often, casual usage will conflate the two
+terms, and use "UTF\-8" to mean both the UTF\-8 found on ASCII platforms,
+and the UTF-EBCDIC found on EBCDIC ones.
+.PP
+You may see the term "invariant" character or code point.
+This simply means that the character has the same numeric
+value and representation when encoded in UTF\-8 (or UTF-EBCDIC) as when
+not.  (Note that this is a very different concept from "The 13 variant
+characters" mentioned above.  Careful prose will use the term "UTF\-8
+invariant" instead of just "invariant", but most often you'll see just
+"invariant".) For example, the ordinal value of "A" is 193 in most
+EBCDIC code pages, and also is 193 when encoded in UTF-EBCDIC.  All
+UTF\-8 (or UTF-EBCDIC) variant code points occupy at least two bytes when
+encoded in UTF\-8 (or UTF-EBCDIC); by definition, the UTF\-8 (or
+UTF-EBCDIC) invariant code points are exactly one byte whether encoded
+in UTF\-8 (or UTF-EBCDIC), or not.  (By now you see why people typically
+just say "UTF\-8" when they also mean "UTF-EBCDIC".  For the rest of this
+document, we'll mostly be casual about it too.)
+In ASCII UTF\-8, the code points corresponding to the lowest 128
+ordinal numbers (0 \- 127: the ASCII characters) are invariant.
+In UTF-EBCDIC, there are 160 invariant characters.
+(If you care, the EBCDIC invariants are those characters
+which have ASCII equivalents, plus those that correspond to
+the C1 controls (128 \- 159 on ASCII platforms).)
+.PP
+A string encoded in UTF-EBCDIC may be longer (very rarely shorter) than
+one encoded in UTF\-8.  Perl extends both UTF\-8 and UTF-EBCDIC so that
+they can encode code points above the Unicode maximum of U+10FFFF.  Both
+extensions are constructed to allow encoding of any code point that fits
+in a 64\-bit word.
+.PP
+UTF-EBCDIC is defined by
+Unicode Technical Report #16 <https://www.unicode.org/reports/tr16>
+(often referred to as just TR16).
+It is defined based on CCSID 1047, not allowing for the differences for
+other code pages.  This allows for easy interchange of text between
+computers running different code pages, but makes it unusable, without
+adaptation, for Perl on those other code pages.
+.PP
+The reason for this unusability is that a fundamental assumption of Perl
+is that the characters it cares about for parsing and lexical analysis
+are the same whether or not the text is in UTF\-8.  For example, Perl
+expects the character \f(CW"["\fR to have the same representation, no matter
+if the string containing it (or program text) is UTF\-8 encoded or not.
+To ensure this, Perl adapts UTF-EBCDIC to the particular code page so
+that all characters it expects to be UTF\-8 invariant are in fact UTF\-8
+invariant.  This means that text generated on a computer running one
+version of Perl's UTF-EBCDIC has to be translated to be intelligible to
+a computer running another.
+.PP
+TR16 implies a method to extend UTF-EBCDIC to encode points up through
+\&\f(CW\*(C`2\ **\ 31\ \-\ 1\*(C'\fR.  Perl uses this method for code points up through
+\&\f(CW\*(C`2\ **\ 30\ \-\ 1\*(C'\fR, but uses an incompatible method for larger ones, to
+enable it to handle much larger code points than otherwise.
+.SS "Using Encode"
+.IX Subsection "Using Encode"
+Starting from Perl 5.8 you can use the standard module Encode
+to translate from EBCDIC to Latin\-1 code points.
+Encode knows about more EBCDIC character sets than Perl can currently
+be compiled to run on.
+.PP
+.Vb 1
+\&   use Encode \*(Aqfrom_to\*(Aq;
+\&
+\&   my %ebcdic = ( 176 => \*(Aqcp37\*(Aq, 95 => \*(Aqcp1047\*(Aq, 106 => \*(Aqposix\-bc\*(Aq );
+\&
+\&   # $a is in EBCDIC code points
+\&   from_to($a, $ebcdic{ord \*(Aq^\*(Aq}, \*(Aqlatin1\*(Aq);
+\&   # $a is ISO 8859\-1 code points
+.Ve
+.PP
+and from Latin\-1 code points to EBCDIC code points
+.PP
+.Vb 1
+\&   use Encode \*(Aqfrom_to\*(Aq;
+\&
+\&   my %ebcdic = ( 176 => \*(Aqcp37\*(Aq, 95 => \*(Aqcp1047\*(Aq, 106 => \*(Aqposix\-bc\*(Aq );
+\&
+\&   # $a is ISO 8859\-1 code points
+\&   from_to($a, \*(Aqlatin1\*(Aq, $ebcdic{ord \*(Aq^\*(Aq});
+\&   # $a is in EBCDIC code points
+.Ve
+.PP
+For doing I/O it is suggested that you use the autotranslating features
+of PerlIO, see perluniintro.
+.PP
+Since version 5.8 Perl uses the PerlIO I/O library.  This enables
+you to use different encodings per IO channel.  For example you may use
+.PP
+.Vb 9
+\&    use Encode;
+\&    open($f, ">:encoding(ascii)", "test.ascii");
+\&    print $f "Hello World!\en";
+\&    open($f, ">:encoding(cp37)", "test.ebcdic");
+\&    print $f "Hello World!\en";
+\&    open($f, ">:encoding(latin1)", "test.latin1");
+\&    print $f "Hello World!\en";
+\&    open($f, ">:encoding(utf8)", "test.utf8");
+\&    print $f "Hello World!\en";
+.Ve
+.PP
+to get four files containing "Hello World!\en" in ASCII, CP 0037 EBCDIC,
+ISO 8859\-1 (Latin\-1) (in this example identical to ASCII since only ASCII
+characters were printed), and
+UTF-EBCDIC (in this example identical to normal EBCDIC since only characters
+that don't differ between EBCDIC and UTF-EBCDIC were printed).  See the
+documentation of Encode::PerlIO for details.
+.PP
+As the PerlIO layer uses raw IO (bytes) internally, all this totally
+ignores things like the type of your filesystem (ASCII or EBCDIC).
+.SH "SINGLE OCTET TABLES"
+.IX Header "SINGLE OCTET TABLES"
+The following tables list the ASCII and Latin 1 ordered sets including
+the subsets: C0 controls (0..31), ASCII graphics (32..7e), delete (7f),
+C1 controls (80..9f), and Latin\-1 (a.k.a. ISO 8859\-1) (a0..ff).  In the
+table names of the Latin 1
+extensions to ASCII have been labelled with character names roughly
+corresponding to \fIThe Unicode Standard, Version 6.1\fR albeit with
+substitutions such as \f(CW\*(C`s/LATIN//\*(C'\fR and \f(CW\*(C`s/VULGAR//\*(C'\fR in all cases;
+\&\f(CW\*(C`s/CAPITAL\ LETTER//\*(C'\fR in some cases; and
+\&\f(CW\*(C`s/SMALL\ LETTER\ ([A\-Z])/\el$1/\*(C'\fR in some other
+cases.  Controls are listed using their Unicode 6.2 abbreviations.
+The differences between the 0037 and 1047 sets are
+flagged with \f(CW\*(C`**\*(C'\fR.  The differences between the 1047 and POSIX-BC sets
+are flagged with \f(CW\*(C`##.\*(C'\fR  All \f(CWord()\fR numbers listed are decimal.  If you
+would rather see this table listing octal values, then run the table
+(that is, the pod source text of this document, since this recipe may not
+work with a pod2_other_format translation) through:
+.IP "recipe 0" 4
+.IX Item "recipe 0"
+.PP
+.Vb 3
+\&    perl \-ne \*(Aqif(/(.{29})(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)/)\*(Aq \e
+\&     \-e \*(Aq{printf("%s%\-5.03o%\-5.03o%\-5.03o%.03o\en",$1,$2,$3,$4,$5)}\*(Aq \e
+\&     perlebcdic.pod
+.Ve
+.PP
+If you want to retain the UTF-x code points then in script form you
+might want to write:
+.IP "recipe 1" 4
+.IX Item "recipe 1"
+.PP
+.Vb 10
+\& open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";
+\& while (<FH>) {
+\&     if (/(.{29})(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)\e.?(\ed*)
+\&                                                     \es+(\ed+)\e.?(\ed*)/x)
+\&     {
+\&         if ($7 ne \*(Aq\*(Aq && $9 ne \*(Aq\*(Aq) {
+\&             printf(
+\&                "%s%\-5.03o%\-5.03o%\-5.03o%\-5.03o%\-3o.%\-5o%\-3o.%.03o\en",
+\&                                            $1,$2,$3,$4,$5,$6,$7,$8,$9);
+\&         }
+\&         elsif ($7 ne \*(Aq\*(Aq) {
+\&             printf("%s%\-5.03o%\-5.03o%\-5.03o%\-5.03o%\-3o.%\-5o%.03o\en",
+\&                                           $1,$2,$3,$4,$5,$6,$7,$8);
+\&         }
+\&         else {
+\&             printf("%s%\-5.03o%\-5.03o%\-5.03o%\-5.03o%\-5.03o%.03o\en",
+\&                                                $1,$2,$3,$4,$5,$6,$8);
+\&         }
+\&     }
+\& }
+.Ve
+.PP
+If you would rather see this table listing hexadecimal values then
+run the table through:
+.IP "recipe 2" 4
+.IX Item "recipe 2"
+.PP
+.Vb 3
+\&    perl \-ne \*(Aqif(/(.{29})(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)/)\*(Aq \e
+\&     \-e \*(Aq{printf("%s%\-5.02X%\-5.02X%\-5.02X%.02X\en",$1,$2,$3,$4,$5)}\*(Aq \e
+\&     perlebcdic.pod
+.Ve
+.PP
+Or, in order to retain the UTF-x code points in hexadecimal:
+.IP "recipe 3" 4
+.IX Item "recipe 3"
+.PP
+.Vb 10
+\& open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";
+\& while (<FH>) {
+\&     if (/(.{29})(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)\e.?(\ed*)
+\&                                                     \es+(\ed+)\e.?(\ed*)/x)
+\&     {
+\&         if ($7 ne \*(Aq\*(Aq && $9 ne \*(Aq\*(Aq) {
+\&             printf(
+\&                "%s%\-5.02X%\-5.02X%\-5.02X%\-5.02X%\-2X.%\-6.02X%02X.%02X\en",
+\&                                           $1,$2,$3,$4,$5,$6,$7,$8,$9);
+\&         }
+\&         elsif ($7 ne \*(Aq\*(Aq) {
+\&             printf("%s%\-5.02X%\-5.02X%\-5.02X%\-5.02X%\-2X.%\-6.02X%02X\en",
+\&                                              $1,$2,$3,$4,$5,$6,$7,$8);
+\&         }
+\&         else {
+\&             printf("%s%\-5.02X%\-5.02X%\-5.02X%\-5.02X%\-5.02X%02X\en",
+\&                                                  $1,$2,$3,$4,$5,$6,$8);
+\&         }
+\&     }
+\& }
+\&
+\&
+\&                          ISO
+\&                         8859\-1             POS\-         CCSID
+\&                         CCSID  CCSID CCSID IX\-          1047
+\&  chr                     0819   0037 1047  BC  UTF\-8  UTF\-EBCDIC
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& <NUL>                       0    0    0    0    0        0
+\& <SOH>                       1    1    1    1    1        1
+\& <STX>                       2    2    2    2    2        2
+\& <ETX>                       3    3    3    3    3        3
+\& <EOT>                       4    55   55   55   4        55
+\& <ENQ>                       5    45   45   45   5        45
+\& <ACK>                       6    46   46   46   6        46
+\& <BEL>                       7    47   47   47   7        47
+\& <BS>                        8    22   22   22   8        22
+\& <HT>                        9    5    5    5    9        5
+\& <LF>                        10   37   21   21   10       21  **
+\& <VT>                        11   11   11   11   11       11
+\& <FF>                        12   12   12   12   12       12
+\& <CR>                        13   13   13   13   13       13
+\& <SO>                        14   14   14   14   14       14
+\& <SI>                        15   15   15   15   15       15
+\& <DLE>                       16   16   16   16   16       16
+\& <DC1>                       17   17   17   17   17       17
+\& <DC2>                       18   18   18   18   18       18
+\& <DC3>                       19   19   19   19   19       19
+\& <DC4>                       20   60   60   60   20       60
+\& <NAK>                       21   61   61   61   21       61
+\& <SYN>                       22   50   50   50   22       50
+\& <ETB>                       23   38   38   38   23       38
+\& <CAN>                       24   24   24   24   24       24
+\& <EOM>                       25   25   25   25   25       25
+\& <SUB>                       26   63   63   63   26       63
+\& <ESC>                       27   39   39   39   27       39
+\& <FS>                        28   28   28   28   28       28
+\& <GS>                        29   29   29   29   29       29
+\& <RS>                        30   30   30   30   30       30
+\& <US>                        31   31   31   31   31       31
+\& <SPACE>                     32   64   64   64   32       64
+\& !                           33   90   90   90   33       90
+\& "                           34   127  127  127  34       127
+\& #                           35   123  123  123  35       123
+\& $                           36   91   91   91   36       91
+\& %                           37   108  108  108  37       108
+\& &                           38   80   80   80   38       80
+\& \*(Aq                           39   125  125  125  39       125
+\& (                           40   77   77   77   40       77
+\& )                           41   93   93   93   41       93
+\& *                           42   92   92   92   42       92
+\& +                           43   78   78   78   43       78
+\& ,                           44   107  107  107  44       107
+\& \-                           45   96   96   96   45       96
+\& .                           46   75   75   75   46       75
+\& /                           47   97   97   97   47       97
+\& 0                           48   240  240  240  48       240
+\& 1                           49   241  241  241  49       241
+\& 2                           50   242  242  242  50       242
+\& 3                           51   243  243  243  51       243
+\& 4                           52   244  244  244  52       244
+\& 5                           53   245  245  245  53       245
+\& 6                           54   246  246  246  54       246
+\& 7                           55   247  247  247  55       247
+\& 8                           56   248  248  248  56       248
+\& 9                           57   249  249  249  57       249
+\& :                           58   122  122  122  58       122
+\& ;                           59   94   94   94   59       94
+\& <                           60   76   76   76   60       76
+\& =                           61   126  126  126  61       126
+\& >                           62   110  110  110  62       110
+\& ?                           63   111  111  111  63       111
+\& @                           64   124  124  124  64       124
+\& A                           65   193  193  193  65       193
+\& B                           66   194  194  194  66       194
+\& C                           67   195  195  195  67       195
+\& D                           68   196  196  196  68       196
+\& E                           69   197  197  197  69       197
+\& F                           70   198  198  198  70       198
+\& G                           71   199  199  199  71       199
+\& H                           72   200  200  200  72       200
+\& I                           73   201  201  201  73       201
+\& J                           74   209  209  209  74       209
+\& K                           75   210  210  210  75       210
+\& L                           76   211  211  211  76       211
+\& M                           77   212  212  212  77       212
+\& N                           78   213  213  213  78       213
+\& O                           79   214  214  214  79       214
+\& P                           80   215  215  215  80       215
+\& Q                           81   216  216  216  81       216
+\& R                           82   217  217  217  82       217
+\& S                           83   226  226  226  83       226
+\& T                           84   227  227  227  84       227
+\& U                           85   228  228  228  85       228
+\& V                           86   229  229  229  86       229
+\& W                           87   230  230  230  87       230
+\& X                           88   231  231  231  88       231
+\& Y                           89   232  232  232  89       232
+\& Z                           90   233  233  233  90       233
+\& [                           91   186  173  187  91       173  ** ##
+\& \e                           92   224  224  188  92       224  ##
+\& ]                           93   187  189  189  93       189  **
+\& ^                           94   176  95   106  94       95   ** ##
+\& _                           95   109  109  109  95       109
+\& \`                           96   121  121  74   96       121  ##
+\& a                           97   129  129  129  97       129
+\& b                           98   130  130  130  98       130
+\& c                           99   131  131  131  99       131
+\& d                           100  132  132  132  100      132
+\& e                           101  133  133  133  101      133
+\& f                           102  134  134  134  102      134
+\& g                           103  135  135  135  103      135
+\& h                           104  136  136  136  104      136
+\& i                           105  137  137  137  105      137
+\& j                           106  145  145  145  106      145
+\& k                           107  146  146  146  107      146
+\& l                           108  147  147  147  108      147
+\& m                           109  148  148  148  109      148
+\& n                           110  149  149  149  110      149
+\& o                           111  150  150  150  111      150
+\& p                           112  151  151  151  112      151
+\& q                           113  152  152  152  113      152
+\& r                           114  153  153  153  114      153
+\& s                           115  162  162  162  115      162
+\& t                           116  163  163  163  116      163
+\& u                           117  164  164  164  117      164
+\& v                           118  165  165  165  118      165
+\& w                           119  166  166  166  119      166
+\& x                           120  167  167  167  120      167
+\& y                           121  168  168  168  121      168
+\& z                           122  169  169  169  122      169
+\& {                           123  192  192  251  123      192  ##
+\& |                           124  79   79   79   124      79
+\& }                           125  208  208  253  125      208  ##
+\& ~                           126  161  161  255  126      161  ##
+\& <DEL>                       127  7    7    7    127      7
+\& <PAD>                       128  32   32   32   194.128  32
+\& <HOP>                       129  33   33   33   194.129  33
+\& <BPH>                       130  34   34   34   194.130  34
+\& <NBH>                       131  35   35   35   194.131  35
+\& <IND>                       132  36   36   36   194.132  36
+\& <NEL>                       133  21   37   37   194.133  37   **
+\& <SSA>                       134  6    6    6    194.134  6
+\& <ESA>                       135  23   23   23   194.135  23
+\& <HTS>                       136  40   40   40   194.136  40
+\& <HTJ>                       137  41   41   41   194.137  41
+\& <VTS>                       138  42   42   42   194.138  42
+\& <PLD>                       139  43   43   43   194.139  43
+\& <PLU>                       140  44   44   44   194.140  44
+\& <RI>                        141  9    9    9    194.141  9
+\& <SS2>                       142  10   10   10   194.142  10
+\& <SS3>                       143  27   27   27   194.143  27
+\& <DCS>                       144  48   48   48   194.144  48
+\& <PU1>                       145  49   49   49   194.145  49
+\& <PU2>                       146  26   26   26   194.146  26
+\& <STS>                       147  51   51   51   194.147  51
+\& <CCH>                       148  52   52   52   194.148  52
+\& <MW>                        149  53   53   53   194.149  53
+\& <SPA>                       150  54   54   54   194.150  54
+\& <EPA>                       151  8    8    8    194.151  8
+\& <SOS>                       152  56   56   56   194.152  56
+\& <SGC>                       153  57   57   57   194.153  57
+\& <SCI>                       154  58   58   58   194.154  58
+\& <CSI>                       155  59   59   59   194.155  59
+\& <ST>                        156  4    4    4    194.156  4
+\& <OSC>                       157  20   20   20   194.157  20
+\& <PM>                        158  62   62   62   194.158  62
+\& <APC>                       159  255  255  95   194.159  255      ##
+\& <NON\-BREAKING SPACE>        160  65   65   65   194.160  128.65
+\& <INVERTED "!" >             161  170  170  170  194.161  128.66
+\& <CENT SIGN>                 162  74   74   176  194.162  128.67   ##
+\& <POUND SIGN>                163  177  177  177  194.163  128.68
+\& <CURRENCY SIGN>             164  159  159  159  194.164  128.69
+\& <YEN SIGN>                  165  178  178  178  194.165  128.70
+\& <BROKEN BAR>                166  106  106  208  194.166  128.71   ##
+\& <SECTION SIGN>              167  181  181  181  194.167  128.72
+\& <DIAERESIS>                 168  189  187  121  194.168  128.73   ** ##
+\& <COPYRIGHT SIGN>            169  180  180  180  194.169  128.74
+\& <FEMININE ORDINAL>          170  154  154  154  194.170  128.81
+\& <LEFT POINTING GUILLEMET>   171  138  138  138  194.171  128.82
+\& <NOT SIGN>                  172  95   176  186  194.172  128.83   ** ##
+\& <SOFT HYPHEN>               173  202  202  202  194.173  128.84
+\& <REGISTERED TRADE MARK>     174  175  175  175  194.174  128.85
+\& <MACRON>                    175  188  188  161  194.175  128.86   ##
+\& <DEGREE SIGN>               176  144  144  144  194.176  128.87
+\& <PLUS\-OR\-MINUS SIGN>        177  143  143  143  194.177  128.88
+\& <SUPERSCRIPT TWO>           178  234  234  234  194.178  128.89
+\& <SUPERSCRIPT THREE>         179  250  250  250  194.179  128.98
+\& <ACUTE ACCENT>              180  190  190  190  194.180  128.99
+\& <MICRO SIGN>                181  160  160  160  194.181  128.100
+\& <PARAGRAPH SIGN>            182  182  182  182  194.182  128.101
+\& <MIDDLE DOT>                183  179  179  179  194.183  128.102
+\& <CEDILLA>                   184  157  157  157  194.184  128.103
+\& <SUPERSCRIPT ONE>           185  218  218  218  194.185  128.104
+\& <MASC. ORDINAL INDICATOR>   186  155  155  155  194.186  128.105
+\& <RIGHT POINTING GUILLEMET>  187  139  139  139  194.187  128.106
+\& <FRACTION ONE QUARTER>      188  183  183  183  194.188  128.112
+\& <FRACTION ONE HALF>         189  184  184  184  194.189  128.113
+\& <FRACTION THREE QUARTERS>   190  185  185  185  194.190  128.114
+\& <INVERTED QUESTION MARK>    191  171  171  171  194.191  128.115
+\& <A WITH GRAVE>              192  100  100  100  195.128  138.65
+\& <A WITH ACUTE>              193  101  101  101  195.129  138.66
+\& <A WITH CIRCUMFLEX>         194  98   98   98   195.130  138.67
+\& <A WITH TILDE>              195  102  102  102  195.131  138.68
+\& <A WITH DIAERESIS>          196  99   99   99   195.132  138.69
+\& <A WITH RING ABOVE>         197  103  103  103  195.133  138.70
+\& <CAPITAL LIGATURE AE>       198  158  158  158  195.134  138.71
+\& <C WITH CEDILLA>            199  104  104  104  195.135  138.72
+\& <E WITH GRAVE>              200  116  116  116  195.136  138.73
+\& <E WITH ACUTE>              201  113  113  113  195.137  138.74
+\& <E WITH CIRCUMFLEX>         202  114  114  114  195.138  138.81
+\& <E WITH DIAERESIS>          203  115  115  115  195.139  138.82
+\& <I WITH GRAVE>              204  120  120  120  195.140  138.83
+\& <I WITH ACUTE>              205  117  117  117  195.141  138.84
+\& <I WITH CIRCUMFLEX>         206  118  118  118  195.142  138.85
+\& <I WITH DIAERESIS>          207  119  119  119  195.143  138.86
+\& <CAPITAL LETTER ETH>        208  172  172  172  195.144  138.87
+\& <N WITH TILDE>              209  105  105  105  195.145  138.88
+\& <O WITH GRAVE>              210  237  237  237  195.146  138.89
+\& <O WITH ACUTE>              211  238  238  238  195.147  138.98
+\& <O WITH CIRCUMFLEX>         212  235  235  235  195.148  138.99
+\& <O WITH TILDE>              213  239  239  239  195.149  138.100
+\& <O WITH DIAERESIS>          214  236  236  236  195.150  138.101
+\& <MULTIPLICATION SIGN>       215  191  191  191  195.151  138.102
+\& <O WITH STROKE>             216  128  128  128  195.152  138.103
+\& <U WITH GRAVE>              217  253  253  224  195.153  138.104  ##
+\& <U WITH ACUTE>              218  254  254  254  195.154  138.105
+\& <U WITH CIRCUMFLEX>         219  251  251  221  195.155  138.106  ##
+\& <U WITH DIAERESIS>          220  252  252  252  195.156  138.112
+\& <Y WITH ACUTE>              221  173  186  173  195.157  138.113  ** ##
+\& <CAPITAL LETTER THORN>      222  174  174  174  195.158  138.114
+\& <SMALL LETTER SHARP S>      223  89   89   89   195.159  138.115
+\& <a WITH GRAVE>              224  68   68   68   195.160  139.65
+\& <a WITH ACUTE>              225  69   69   69   195.161  139.66
+\& <a WITH CIRCUMFLEX>         226  66   66   66   195.162  139.67
+\& <a WITH TILDE>              227  70   70   70   195.163  139.68
+\& <a WITH DIAERESIS>          228  67   67   67   195.164  139.69
+\& <a WITH RING ABOVE>         229  71   71   71   195.165  139.70
+\& <SMALL LIGATURE ae>         230  156  156  156  195.166  139.71
+\& <c WITH CEDILLA>            231  72   72   72   195.167  139.72
+\& <e WITH GRAVE>              232  84   84   84   195.168  139.73
+\& <e WITH ACUTE>              233  81   81   81   195.169  139.74
+\& <e WITH CIRCUMFLEX>         234  82   82   82   195.170  139.81
+\& <e WITH DIAERESIS>          235  83   83   83   195.171  139.82
+\& <i WITH GRAVE>              236  88   88   88   195.172  139.83
+\& <i WITH ACUTE>              237  85   85   85   195.173  139.84
+\& <i WITH CIRCUMFLEX>         238  86   86   86   195.174  139.85
+\& <i WITH DIAERESIS>          239  87   87   87   195.175  139.86
+\& <SMALL LETTER eth>          240  140  140  140  195.176  139.87
+\& <n WITH TILDE>              241  73   73   73   195.177  139.88
+\& <o WITH GRAVE>              242  205  205  205  195.178  139.89
+\& <o WITH ACUTE>              243  206  206  206  195.179  139.98
+\& <o WITH CIRCUMFLEX>         244  203  203  203  195.180  139.99
+\& <o WITH TILDE>              245  207  207  207  195.181  139.100
+\& <o WITH DIAERESIS>          246  204  204  204  195.182  139.101
+\& <DIVISION SIGN>             247  225  225  225  195.183  139.102
+\& <o WITH STROKE>             248  112  112  112  195.184  139.103
+\& <u WITH GRAVE>              249  221  221  192  195.185  139.104  ##
+\& <u WITH ACUTE>              250  222  222  222  195.186  139.105
+\& <u WITH CIRCUMFLEX>         251  219  219  219  195.187  139.106
+\& <u WITH DIAERESIS>          252  220  220  220  195.188  139.112
+\& <y WITH ACUTE>              253  141  141  141  195.189  139.113
+\& <SMALL LETTER thorn>        254  142  142  142  195.190  139.114
+\& <y WITH DIAERESIS>          255  223  223  223  195.191  139.115
+.Ve
+.PP
+If you would rather see the above table in CCSID 0037 order rather than
+ASCII + Latin\-1 order then run the table through:
+.IP "recipe 4" 4
+.IX Item "recipe 4"
+.PP
+.Vb 6
+\& perl \e
+\&    \-ne \*(Aqif(/.{29}\ed{1,3}\es{2,4}\ed{1,3}\es{2,4}\ed{1,3}\es{2,4}\ed{1,3}/)\*(Aq\e
+\&     \-e \*(Aq{push(@l,$_)}\*(Aq \e
+\&     \-e \*(AqEND{print map{$_\->[0]}\*(Aq \e
+\&     \-e \*(Aq          sort{$a\->[1] <=> $b\->[1]}\*(Aq \e
+\&     \-e \*(Aq          map{[$_,substr($_,34,3)]}@l;}\*(Aq perlebcdic.pod
+.Ve
+.PP
+If you would rather see it in CCSID 1047 order then change the number
+34 in the last line to 39, like this:
+.IP "recipe 5" 4
+.IX Item "recipe 5"
+.PP
+.Vb 6
+\& perl \e
+\&    \-ne \*(Aqif(/.{29}\ed{1,3}\es{2,4}\ed{1,3}\es{2,4}\ed{1,3}\es{2,4}\ed{1,3}/)\*(Aq\e
+\&    \-e \*(Aq{push(@l,$_)}\*(Aq \e
+\&    \-e \*(AqEND{print map{$_\->[0]}\*(Aq \e
+\&    \-e \*(Aq          sort{$a\->[1] <=> $b\->[1]}\*(Aq \e
+\&    \-e \*(Aq          map{[$_,substr($_,39,3)]}@l;}\*(Aq perlebcdic.pod
+.Ve
+.PP
+If you would rather see it in POSIX-BC order then change the number
+34 in the last line to 44, like this:
+.IP "recipe 6" 4
+.IX Item "recipe 6"
+.PP
+.Vb 6
+\& perl \e
+\&    \-ne \*(Aqif(/.{29}\ed{1,3}\es{2,4}\ed{1,3}\es{2,4}\ed{1,3}\es{2,4}\ed{1,3}/)\*(Aq\e
+\&     \-e \*(Aq{push(@l,$_)}\*(Aq \e
+\&     \-e \*(AqEND{print map{$_\->[0]}\*(Aq \e
+\&     \-e \*(Aq          sort{$a\->[1] <=> $b\->[1]}\*(Aq \e
+\&     \-e \*(Aq          map{[$_,substr($_,44,3)]}@l;}\*(Aq perlebcdic.pod
+.Ve
+.SS "Table in hex, sorted in 1047 order"
+.IX Subsection "Table in hex, sorted in 1047 order"
+Since this document was first written, the convention has become more
+and more to use hexadecimal notation for code points.  To do this with
+the recipes and to also sort is a multi-step process, so here, for
+convenience, is the table from above, re-sorted to be in Code Page 1047
+order, and using hex notation.
+.PP
+.Vb 10
+\&                          ISO
+\&                         8859\-1             POS\-         CCSID
+\&                         CCSID  CCSID CCSID IX\-          1047
+\&  chr                     0819   0037 1047  BC  UTF\-8  UTF\-EBCDIC
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& <NUL>                       00   00   00   00   00       00
+\& <SOH>                       01   01   01   01   01       01
+\& <STX>                       02   02   02   02   02       02
+\& <ETX>                       03   03   03   03   03       03
+\& <ST>                        9C   04   04   04   C2.9C    04
+\& <HT>                        09   05   05   05   09       05
+\& <SSA>                       86   06   06   06   C2.86    06
+\& <DEL>                       7F   07   07   07   7F       07
+\& <EPA>                       97   08   08   08   C2.97    08
+\& <RI>                        8D   09   09   09   C2.8D    09
+\& <SS2>                       8E   0A   0A   0A   C2.8E    0A
+\& <VT>                        0B   0B   0B   0B   0B       0B
+\& <FF>                        0C   0C   0C   0C   0C       0C
+\& <CR>                        0D   0D   0D   0D   0D       0D
+\& <SO>                        0E   0E   0E   0E   0E       0E
+\& <SI>                        0F   0F   0F   0F   0F       0F
+\& <DLE>                       10   10   10   10   10       10
+\& <DC1>                       11   11   11   11   11       11
+\& <DC2>                       12   12   12   12   12       12
+\& <DC3>                       13   13   13   13   13       13
+\& <OSC>                       9D   14   14   14   C2.9D    14
+\& <LF>                        0A   25   15   15   0A       15    **
+\& <BS>                        08   16   16   16   08       16
+\& <ESA>                       87   17   17   17   C2.87    17
+\& <CAN>                       18   18   18   18   18       18
+\& <EOM>                       19   19   19   19   19       19
+\& <PU2>                       92   1A   1A   1A   C2.92    1A
+\& <SS3>                       8F   1B   1B   1B   C2.8F    1B
+\& <FS>                        1C   1C   1C   1C   1C       1C
+\& <GS>                        1D   1D   1D   1D   1D       1D
+\& <RS>                        1E   1E   1E   1E   1E       1E
+\& <US>                        1F   1F   1F   1F   1F       1F
+\& <PAD>                       80   20   20   20   C2.80    20
+\& <HOP>                       81   21   21   21   C2.81    21
+\& <BPH>                       82   22   22   22   C2.82    22
+\& <NBH>                       83   23   23   23   C2.83    23
+\& <IND>                       84   24   24   24   C2.84    24
+\& <NEL>                       85   15   25   25   C2.85    25     **
+\& <ETB>                       17   26   26   26   17       26
+\& <ESC>                       1B   27   27   27   1B       27
+\& <HTS>                       88   28   28   28   C2.88    28
+\& <HTJ>                       89   29   29   29   C2.89    29
+\& <VTS>                       8A   2A   2A   2A   C2.8A    2A
+\& <PLD>                       8B   2B   2B   2B   C2.8B    2B
+\& <PLU>                       8C   2C   2C   2C   C2.8C    2C
+\& <ENQ>                       05   2D   2D   2D   05       2D
+\& <ACK>                       06   2E   2E   2E   06       2E
+\& <BEL>                       07   2F   2F   2F   07       2F
+\& <DCS>                       90   30   30   30   C2.90    30
+\& <PU1>                       91   31   31   31   C2.91    31
+\& <SYN>                       16   32   32   32   16       32
+\& <STS>                       93   33   33   33   C2.93    33
+\& <CCH>                       94   34   34   34   C2.94    34
+\& <MW>                        95   35   35   35   C2.95    35
+\& <SPA>                       96   36   36   36   C2.96    36
+\& <EOT>                       04   37   37   37   04       37
+\& <SOS>                       98   38   38   38   C2.98    38
+\& <SGC>                       99   39   39   39   C2.99    39
+\& <SCI>                       9A   3A   3A   3A   C2.9A    3A
+\& <CSI>                       9B   3B   3B   3B   C2.9B    3B
+\& <DC4>                       14   3C   3C   3C   14       3C
+\& <NAK>                       15   3D   3D   3D   15       3D
+\& <PM>                        9E   3E   3E   3E   C2.9E    3E
+\& <SUB>                       1A   3F   3F   3F   1A       3F
+\& <SPACE>                     20   40   40   40   20       40
+\& <NON\-BREAKING SPACE>        A0   41   41   41   C2.A0    80.41
+\& <a WITH CIRCUMFLEX>         E2   42   42   42   C3.A2    8B.43
+\& <a WITH DIAERESIS>          E4   43   43   43   C3.A4    8B.45
+\& <a WITH GRAVE>              E0   44   44   44   C3.A0    8B.41
+\& <a WITH ACUTE>              E1   45   45   45   C3.A1    8B.42
+\& <a WITH TILDE>              E3   46   46   46   C3.A3    8B.44
+\& <a WITH RING ABOVE>         E5   47   47   47   C3.A5    8B.46
+\& <c WITH CEDILLA>            E7   48   48   48   C3.A7    8B.48
+\& <n WITH TILDE>              F1   49   49   49   C3.B1    8B.58
+\& <CENT SIGN>                 A2   4A   4A   B0   C2.A2    80.43  ##
+\& .                           2E   4B   4B   4B   2E       4B
+\& <                           3C   4C   4C   4C   3C       4C
+\& (                           28   4D   4D   4D   28       4D
+\& +                           2B   4E   4E   4E   2B       4E
+\& |                           7C   4F   4F   4F   7C       4F
+\& &                           26   50   50   50   26       50
+\& <e WITH ACUTE>              E9   51   51   51   C3.A9    8B.4A
+\& <e WITH CIRCUMFLEX>         EA   52   52   52   C3.AA    8B.51
+\& <e WITH DIAERESIS>          EB   53   53   53   C3.AB    8B.52
+\& <e WITH GRAVE>              E8   54   54   54   C3.A8    8B.49
+\& <i WITH ACUTE>              ED   55   55   55   C3.AD    8B.54
+\& <i WITH CIRCUMFLEX>         EE   56   56   56   C3.AE    8B.55
+\& <i WITH DIAERESIS>          EF   57   57   57   C3.AF    8B.56
+\& <i WITH GRAVE>              EC   58   58   58   C3.AC    8B.53
+\& <SMALL LETTER SHARP S>      DF   59   59   59   C3.9F    8A.73
+\& !                           21   5A   5A   5A   21       5A
+\& $                           24   5B   5B   5B   24       5B
+\& *                           2A   5C   5C   5C   2A       5C
+\& )                           29   5D   5D   5D   29       5D
+\& ;                           3B   5E   5E   5E   3B       5E
+\& ^                           5E   B0   5F   6A   5E       5F     ** ##
+\& \-                           2D   60   60   60   2D       60
+\& /                           2F   61   61   61   2F       61
+\& <A WITH CIRCUMFLEX>         C2   62   62   62   C3.82    8A.43
+\& <A WITH DIAERESIS>          C4   63   63   63   C3.84    8A.45
+\& <A WITH GRAVE>              C0   64   64   64   C3.80    8A.41
+\& <A WITH ACUTE>              C1   65   65   65   C3.81    8A.42
+\& <A WITH TILDE>              C3   66   66   66   C3.83    8A.44
+\& <A WITH RING ABOVE>         C5   67   67   67   C3.85    8A.46
+\& <C WITH CEDILLA>            C7   68   68   68   C3.87    8A.48
+\& <N WITH TILDE>              D1   69   69   69   C3.91    8A.58
+\& <BROKEN BAR>                A6   6A   6A   D0   C2.A6    80.47  ##
+\& ,                           2C   6B   6B   6B   2C       6B
+\& %                           25   6C   6C   6C   25       6C
+\& _                           5F   6D   6D   6D   5F       6D
+\& >                           3E   6E   6E   6E   3E       6E
+\& ?                           3F   6F   6F   6F   3F       6F
+\& <o WITH STROKE>             F8   70   70   70   C3.B8    8B.67
+\& <E WITH ACUTE>              C9   71   71   71   C3.89    8A.4A
+\& <E WITH CIRCUMFLEX>         CA   72   72   72   C3.8A    8A.51
+\& <E WITH DIAERESIS>          CB   73   73   73   C3.8B    8A.52
+\& <E WITH GRAVE>              C8   74   74   74   C3.88    8A.49
+\& <I WITH ACUTE>              CD   75   75   75   C3.8D    8A.54
+\& <I WITH CIRCUMFLEX>         CE   76   76   76   C3.8E    8A.55
+\& <I WITH DIAERESIS>          CF   77   77   77   C3.8F    8A.56
+\& <I WITH GRAVE>              CC   78   78   78   C3.8C    8A.53
+\& \`                           60   79   79   4A   60       79     ##
+\& :                           3A   7A   7A   7A   3A       7A
+\& #                           23   7B   7B   7B   23       7B
+\& @                           40   7C   7C   7C   40       7C
+\& \*(Aq                           27   7D   7D   7D   27       7D
+\& =                           3D   7E   7E   7E   3D       7E
+\& "                           22   7F   7F   7F   22       7F
+\& <O WITH STROKE>             D8   80   80   80   C3.98    8A.67
+\& a                           61   81   81   81   61       81
+\& b                           62   82   82   82   62       82
+\& c                           63   83   83   83   63       83
+\& d                           64   84   84   84   64       84
+\& e                           65   85   85   85   65       85
+\& f                           66   86   86   86   66       86
+\& g                           67   87   87   87   67       87
+\& h                           68   88   88   88   68       88
+\& i                           69   89   89   89   69       89
+\& <LEFT POINTING GUILLEMET>   AB   8A   8A   8A   C2.AB    80.52
+\& <RIGHT POINTING GUILLEMET>  BB   8B   8B   8B   C2.BB    80.6A
+\& <SMALL LETTER eth>          F0   8C   8C   8C   C3.B0    8B.57
+\& <y WITH ACUTE>              FD   8D   8D   8D   C3.BD    8B.71
+\& <SMALL LETTER thorn>        FE   8E   8E   8E   C3.BE    8B.72
+\& <PLUS\-OR\-MINUS SIGN>        B1   8F   8F   8F   C2.B1    80.58
+\& <DEGREE SIGN>               B0   90   90   90   C2.B0    80.57
+\& j                           6A   91   91   91   6A       91
+\& k                           6B   92   92   92   6B       92
+\& l                           6C   93   93   93   6C       93
+\& m                           6D   94   94   94   6D       94
+\& n                           6E   95   95   95   6E       95
+\& o                           6F   96   96   96   6F       96
+\& p                           70   97   97   97   70       97
+\& q                           71   98   98   98   71       98
+\& r                           72   99   99   99   72       99
+\& <FEMININE ORDINAL>          AA   9A   9A   9A   C2.AA    80.51
+\& <MASC. ORDINAL INDICATOR>   BA   9B   9B   9B   C2.BA    80.69
+\& <SMALL LIGATURE ae>         E6   9C   9C   9C   C3.A6    8B.47
+\& <CEDILLA>                   B8   9D   9D   9D   C2.B8    80.67
+\& <CAPITAL LIGATURE AE>       C6   9E   9E   9E   C3.86    8A.47
+\& <CURRENCY SIGN>             A4   9F   9F   9F   C2.A4    80.45
+\& <MICRO SIGN>                B5   A0   A0   A0   C2.B5    80.64
+\& ~                           7E   A1   A1   FF   7E       A1     ##
+\& s                           73   A2   A2   A2   73       A2
+\& t                           74   A3   A3   A3   74       A3
+\& u                           75   A4   A4   A4   75       A4
+\& v                           76   A5   A5   A5   76       A5
+\& w                           77   A6   A6   A6   77       A6
+\& x                           78   A7   A7   A7   78       A7
+\& y                           79   A8   A8   A8   79       A8
+\& z                           7A   A9   A9   A9   7A       A9
+\& <INVERTED "!" >             A1   AA   AA   AA   C2.A1    80.42
+\& <INVERTED QUESTION MARK>    BF   AB   AB   AB   C2.BF    80.73
+\& <CAPITAL LETTER ETH>        D0   AC   AC   AC   C3.90    8A.57
+\& [                           5B   BA   AD   BB   5B       AD     ** ##
+\& <CAPITAL LETTER THORN>      DE   AE   AE   AE   C3.9E    8A.72
+\& <REGISTERED TRADE MARK>     AE   AF   AF   AF   C2.AE    80.55
+\& <NOT SIGN>                  AC   5F   B0   BA   C2.AC    80.53  ** ##
+\& <POUND SIGN>                A3   B1   B1   B1   C2.A3    80.44
+\& <YEN SIGN>                  A5   B2   B2   B2   C2.A5    80.46
+\& <MIDDLE DOT>                B7   B3   B3   B3   C2.B7    80.66
+\& <COPYRIGHT SIGN>            A9   B4   B4   B4   C2.A9    80.4A
+\& <SECTION SIGN>              A7   B5   B5   B5   C2.A7    80.48
+\& <PARAGRAPH SIGN>            B6   B6   B6   B6   C2.B6    80.65
+\& <FRACTION ONE QUARTER>      BC   B7   B7   B7   C2.BC    80.70
+\& <FRACTION ONE HALF>         BD   B8   B8   B8   C2.BD    80.71
+\& <FRACTION THREE QUARTERS>   BE   B9   B9   B9   C2.BE    80.72
+\& <Y WITH ACUTE>              DD   AD   BA   AD   C3.9D    8A.71  ** ##
+\& <DIAERESIS>                 A8   BD   BB   79   C2.A8    80.49  ** ##
+\& <MACRON>                    AF   BC   BC   A1   C2.AF    80.56  ##
+\& ]                           5D   BB   BD   BD   5D       BD     **
+\& <ACUTE ACCENT>              B4   BE   BE   BE   C2.B4    80.63
+\& <MULTIPLICATION SIGN>       D7   BF   BF   BF   C3.97    8A.66
+\& {                           7B   C0   C0   FB   7B       C0     ##
+\& A                           41   C1   C1   C1   41       C1
+\& B                           42   C2   C2   C2   42       C2
+\& C                           43   C3   C3   C3   43       C3
+\& D                           44   C4   C4   C4   44       C4
+\& E                           45   C5   C5   C5   45       C5
+\& F                           46   C6   C6   C6   46       C6
+\& G                           47   C7   C7   C7   47       C7
+\& H                           48   C8   C8   C8   48       C8
+\& I                           49   C9   C9   C9   49       C9
+\& <SOFT HYPHEN>               AD   CA   CA   CA   C2.AD    80.54
+\& <o WITH CIRCUMFLEX>         F4   CB   CB   CB   C3.B4    8B.63
+\& <o WITH DIAERESIS>          F6   CC   CC   CC   C3.B6    8B.65
+\& <o WITH GRAVE>              F2   CD   CD   CD   C3.B2    8B.59
+\& <o WITH ACUTE>              F3   CE   CE   CE   C3.B3    8B.62
+\& <o WITH TILDE>              F5   CF   CF   CF   C3.B5    8B.64
+\& }                           7D   D0   D0   FD   7D       D0     ##
+\& J                           4A   D1   D1   D1   4A       D1
+\& K                           4B   D2   D2   D2   4B       D2
+\& L                           4C   D3   D3   D3   4C       D3
+\& M                           4D   D4   D4   D4   4D       D4
+\& N                           4E   D5   D5   D5   4E       D5
+\& O                           4F   D6   D6   D6   4F       D6
+\& P                           50   D7   D7   D7   50       D7
+\& Q                           51   D8   D8   D8   51       D8
+\& R                           52   D9   D9   D9   52       D9
+\& <SUPERSCRIPT ONE>           B9   DA   DA   DA   C2.B9    80.68
+\& <u WITH CIRCUMFLEX>         FB   DB   DB   DB   C3.BB    8B.6A
+\& <u WITH DIAERESIS>          FC   DC   DC   DC   C3.BC    8B.70
+\& <u WITH GRAVE>              F9   DD   DD   C0   C3.B9    8B.68  ##
+\& <u WITH ACUTE>              FA   DE   DE   DE   C3.BA    8B.69
+\& <y WITH DIAERESIS>          FF   DF   DF   DF   C3.BF    8B.73
+\& \e                           5C   E0   E0   BC   5C       E0     ##
+\& <DIVISION SIGN>             F7   E1   E1   E1   C3.B7    8B.66
+\& S                           53   E2   E2   E2   53       E2
+\& T                           54   E3   E3   E3   54       E3
+\& U                           55   E4   E4   E4   55       E4
+\& V                           56   E5   E5   E5   56       E5
+\& W                           57   E6   E6   E6   57       E6
+\& X                           58   E7   E7   E7   58       E7
+\& Y                           59   E8   E8   E8   59       E8
+\& Z                           5A   E9   E9   E9   5A       E9
+\& <SUPERSCRIPT TWO>           B2   EA   EA   EA   C2.B2    80.59
+\& <O WITH CIRCUMFLEX>         D4   EB   EB   EB   C3.94    8A.63
+\& <O WITH DIAERESIS>          D6   EC   EC   EC   C3.96    8A.65
+\& <O WITH GRAVE>              D2   ED   ED   ED   C3.92    8A.59
+\& <O WITH ACUTE>              D3   EE   EE   EE   C3.93    8A.62
+\& <O WITH TILDE>              D5   EF   EF   EF   C3.95    8A.64
+\& 0                           30   F0   F0   F0   30       F0
+\& 1                           31   F1   F1   F1   31       F1
+\& 2                           32   F2   F2   F2   32       F2
+\& 3                           33   F3   F3   F3   33       F3
+\& 4                           34   F4   F4   F4   34       F4
+\& 5                           35   F5   F5   F5   35       F5
+\& 6                           36   F6   F6   F6   36       F6
+\& 7                           37   F7   F7   F7   37       F7
+\& 8                           38   F8   F8   F8   38       F8
+\& 9                           39   F9   F9   F9   39       F9
+\& <SUPERSCRIPT THREE>         B3   FA   FA   FA   C2.B3    80.62
+\& <U WITH CIRCUMFLEX>         DB   FB   FB   DD   C3.9B    8A.6A  ##
+\& <U WITH DIAERESIS>          DC   FC   FC   FC   C3.9C    8A.70
+\& <U WITH GRAVE>              D9   FD   FD   E0   C3.99    8A.68  ##
+\& <U WITH ACUTE>              DA   FE   FE   FE   C3.9A    8A.69
+\& <APC>                       9F   FF   FF   5F   C2.9F    FF     ##
+.Ve
+.SH "IDENTIFYING CHARACTER CODE SETS"
+.IX Header "IDENTIFYING CHARACTER CODE SETS"
+It is possible to determine which character set you are operating under.
+But first you need to be really really sure you need to do this.  Your
+code will be simpler and probably just as portable if you don't have
+to test the character set and do different things, depending.  There are
+actually only very few circumstances where it's not easy to write
+straight-line code portable to all character sets.  See
+"Unicode and EBCDIC" in perluniintro for how to portably specify
+characters.
+.PP
+But there are some cases where you may want to know which character set
+you are running under.  One possible example is doing
+sorting in inner loops where performance is critical.
+.PP
+To determine if you are running under ASCII or EBCDIC, you can use the
+return value of \f(CWord()\fR or \f(CWchr()\fR to test one or more character
+values.  For example:
+.PP
+.Vb 4
+\&    $is_ascii  = "A" eq chr(65);
+\&    $is_ebcdic = "A" eq chr(193);
+\&    $is_ascii  = ord("A") == 65;
+\&    $is_ebcdic = ord("A") == 193;
+.Ve
+.PP
+There's even less need to distinguish between EBCDIC code pages, but to
+do so try looking at one or more of the characters that differ between
+them.
+.PP
+.Vb 4
+\&    $is_ascii           = ord(\*(Aq[\*(Aq) == 91;
+\&    $is_ebcdic_37       = ord(\*(Aq[\*(Aq) == 186;
+\&    $is_ebcdic_1047     = ord(\*(Aq[\*(Aq) == 173;
+\&    $is_ebcdic_POSIX_BC = ord(\*(Aq[\*(Aq) == 187;
+.Ve
+.PP
+However, it would be unwise to write tests such as:
+.PP
+.Vb 2
+\&    $is_ascii = "\er" ne chr(13);  #  WRONG
+\&    $is_ascii = "\en" ne chr(10);  #  ILL ADVISED
+.Ve
+.PP
+Obviously the first of these will fail to distinguish most ASCII
+platforms from either a CCSID 0037, a 1047, or a POSIX-BC EBCDIC
+platform since \f(CW\*(C`"\er"\ eq\ chr(13)\*(C'\fR under all of those coded character
+sets.  But note too that because \f(CW"\en"\fR is \f(CWchr(13)\fR and \f(CW"\er"\fR is
+\&\f(CWchr(10)\fR on old Macintosh (which is an ASCII platform) the second
+\&\f(CW$is_ascii\fR test will lead to trouble there.
+.PP
+To determine whether or not perl was built under an EBCDIC
+code page you can use the Config module like so:
+.PP
+.Vb 2
+\&    use Config;
+\&    $is_ebcdic = $Config{\*(Aqebcdic\*(Aq} eq \*(Aqdefine\*(Aq;
+.Ve
+.SH CONVERSIONS
+.IX Header "CONVERSIONS"
+.ie n .SS "utf8::unicode_to_native() and utf8::native_to_unicode()"
+.el .SS "\f(CWutf8::unicode_to_native()\fP and \f(CWutf8::native_to_unicode()\fP"
+.IX Subsection "utf8::unicode_to_native() and utf8::native_to_unicode()"
+These functions take an input numeric code point in one encoding and
+return what its equivalent value is in the other.
+.PP
+See utf8.
+.SS tr///
+.IX Subsection "tr///"
+In order to convert a string of characters from one character set to
+another a simple list of numbers, such as in the right columns in the
+above table, along with Perl's \f(CW\*(C`tr///\*(C'\fR operator is all that is needed.
+The data in the table are in ASCII/Latin1 order, hence the EBCDIC columns
+provide easy-to-use ASCII/Latin1 to EBCDIC operations that are also easily
+reversed.
+.PP
+For example, to convert ASCII/Latin1 to code page 037 take the output of the
+second numbers column from the output of recipe 2 (modified to add
+\&\f(CW"\e"\fR characters), and use it in \f(CW\*(C`tr///\*(C'\fR like so:
+.PP
+.Vb 10
+\&    $cp_037 =
+\&    \*(Aq\ex00\ex01\ex02\ex03\ex37\ex2D\ex2E\ex2F\ex16\ex05\ex25\ex0B\ex0C\ex0D\ex0E\ex0F\*(Aq .
+\&    \*(Aq\ex10\ex11\ex12\ex13\ex3C\ex3D\ex32\ex26\ex18\ex19\ex3F\ex27\ex1C\ex1D\ex1E\ex1F\*(Aq .
+\&    \*(Aq\ex40\ex5A\ex7F\ex7B\ex5B\ex6C\ex50\ex7D\ex4D\ex5D\ex5C\ex4E\ex6B\ex60\ex4B\ex61\*(Aq .
+\&    \*(Aq\exF0\exF1\exF2\exF3\exF4\exF5\exF6\exF7\exF8\exF9\ex7A\ex5E\ex4C\ex7E\ex6E\ex6F\*(Aq .
+\&    \*(Aq\ex7C\exC1\exC2\exC3\exC4\exC5\exC6\exC7\exC8\exC9\exD1\exD2\exD3\exD4\exD5\exD6\*(Aq .
+\&    \*(Aq\exD7\exD8\exD9\exE2\exE3\exE4\exE5\exE6\exE7\exE8\exE9\exBA\exE0\exBB\exB0\ex6D\*(Aq .
+\&    \*(Aq\ex79\ex81\ex82\ex83\ex84\ex85\ex86\ex87\ex88\ex89\ex91\ex92\ex93\ex94\ex95\ex96\*(Aq .
+\&    \*(Aq\ex97\ex98\ex99\exA2\exA3\exA4\exA5\exA6\exA7\exA8\exA9\exC0\ex4F\exD0\exA1\ex07\*(Aq .
+\&    \*(Aq\ex20\ex21\ex22\ex23\ex24\ex15\ex06\ex17\ex28\ex29\ex2A\ex2B\ex2C\ex09\ex0A\ex1B\*(Aq .
+\&    \*(Aq\ex30\ex31\ex1A\ex33\ex34\ex35\ex36\ex08\ex38\ex39\ex3A\ex3B\ex04\ex14\ex3E\exFF\*(Aq .
+\&    \*(Aq\ex41\exAA\ex4A\exB1\ex9F\exB2\ex6A\exB5\exBD\exB4\ex9A\ex8A\ex5F\exCA\exAF\exBC\*(Aq .
+\&    \*(Aq\ex90\ex8F\exEA\exFA\exBE\exA0\exB6\exB3\ex9D\exDA\ex9B\ex8B\exB7\exB8\exB9\exAB\*(Aq .
+\&    \*(Aq\ex64\ex65\ex62\ex66\ex63\ex67\ex9E\ex68\ex74\ex71\ex72\ex73\ex78\ex75\ex76\ex77\*(Aq .
+\&    \*(Aq\exAC\ex69\exED\exEE\exEB\exEF\exEC\exBF\ex80\exFD\exFE\exFB\exFC\exAD\exAE\ex59\*(Aq .
+\&    \*(Aq\ex44\ex45\ex42\ex46\ex43\ex47\ex9C\ex48\ex54\ex51\ex52\ex53\ex58\ex55\ex56\ex57\*(Aq .
+\&    \*(Aq\ex8C\ex49\exCD\exCE\exCB\exCF\exCC\exE1\ex70\exDD\exDE\exDB\exDC\ex8D\ex8E\exDF\*(Aq;
+\&
+\&    my $ebcdic_string = $ascii_string;
+\&    eval \*(Aq$ebcdic_string =~ tr/\e000\-\e377/\*(Aq . $cp_037 . \*(Aq/\*(Aq;
+.Ve
+.PP
+To convert from EBCDIC 037 to ASCII just reverse the order of the tr///
+arguments like so:
+.PP
+.Vb 2
+\&    my $ascii_string = $ebcdic_string;
+\&    eval \*(Aq$ascii_string =~ tr/\*(Aq . $cp_037 . \*(Aq/\e000\-\e377/\*(Aq;
+.Ve
+.PP
+Similarly one could take the output of the third numbers column from recipe 2
+to obtain a \f(CW$cp_1047\fR table.  The fourth numbers column of the output from
+recipe 2 could provide a \f(CW$cp_posix_bc\fR table suitable for transcoding as
+well.
+.PP
+If you wanted to see the inverse tables, you would first have to sort on the
+desired numbers column as in recipes 4, 5 or 6, then take the output of the
+first numbers column.
+.SS iconv
+.IX Subsection "iconv"
+XPG operability often implies the presence of an \fIiconv\fR utility
+available from the shell or from the C library.  Consult your system's
+documentation for information on iconv.
+.PP
+On OS/390 or z/OS see the \fBiconv\fR\|(1) manpage.  One way to invoke the \f(CW\*(C`iconv\*(C'\fR
+shell utility from within perl would be to:
+.PP
+.Vb 2
+\&    # OS/390 or z/OS example
+\&    $ascii_data = \`echo \*(Aq$ebcdic_data\*(Aq| iconv \-f IBM\-1047 \-t ISO8859\-1\`
+.Ve
+.PP
+or the inverse map:
+.PP
+.Vb 2
+\&    # OS/390 or z/OS example
+\&    $ebcdic_data = \`echo \*(Aq$ascii_data\*(Aq| iconv \-f ISO8859\-1 \-t IBM\-1047\`
+.Ve
+.PP
+For other Perl-based conversion options see the \f(CW\*(C`Convert::*\*(C'\fR modules on CPAN.
+.SS "C RTL"
+.IX Subsection "C RTL"
+The OS/390 and z/OS C run-time libraries provide \f(CW_atoe()\fR and \f(CW_etoa()\fR functions.
+.SH "OPERATOR DIFFERENCES"
+.IX Header "OPERATOR DIFFERENCES"
+The \f(CW\*(C`..\*(C'\fR range operator treats certain character ranges with
+care on EBCDIC platforms.  For example the following array
+will have twenty six elements on either an EBCDIC platform
+or an ASCII platform:
+.PP
+.Vb 1
+\&    @alphabet = (\*(AqA\*(Aq..\*(AqZ\*(Aq);   #  $#alphabet == 25
+.Ve
+.PP
+The bitwise operators such as & ^ | may return different results
+when operating on string or character data in a Perl program running
+on an EBCDIC platform than when run on an ASCII platform.  Here is
+an example adapted from the one in perlop:
+.PP
+.Vb 5
+\&    # EBCDIC\-based examples
+\&    print "j p \en" ^ " a h";                      # prints "JAPH\en"
+\&    print "JA" | "  ph\en";                        # prints "japh\en"
+\&    print "JAPH\enJunk" & "\e277\e277\e277\e277\e277";  # prints "japh\en";
+\&    print \*(Aqp N$\*(Aq ^ " E<H\en";                      # prints "Perl\en";
+.Ve
+.PP
+An interesting property of the 32 C0 control characters
+in the ASCII table is that they can "literally" be constructed
+as control characters in Perl, e.g. \f(CW\*(C`(chr(0)\*(C'\fR eq \f(CW\*(C`\ec@\*(C'\fR)>
+\&\f(CW\*(C`(chr(1)\*(C'\fR eq \f(CW\*(C`\ecA\*(C'\fR)>, and so on.  Perl on EBCDIC platforms has been
+ported to take \f(CW\*(C`\ec@\*(C'\fR to \f(CWchr(0)\fR and \f(CW\*(C`\ecA\*(C'\fR to \f(CWchr(1)\fR, etc. as well, but the
+characters that result depend on which code page you are
+using.  The table below uses the standard acronyms for the controls.
+The POSIX-BC and 1047 sets are
+identical throughout this range and differ from the 0037 set at only
+one spot (21 decimal).  Note that the line terminator character
+may be generated by \f(CW\*(C`\ecJ\*(C'\fR on ASCII platforms but by \f(CW\*(C`\ecU\*(C'\fR on 1047 or POSIX-BC
+platforms and cannot be generated as a \f(CW"\ec.letter."\fR control character on
+0037 platforms.  Note also that \f(CW\*(C`\ec\e\*(C'\fR cannot be the final element in a string
+or regex, as it will absorb the terminator.   But \f(CW\*(C`\ec\e\fR\f(CIX\fR\f(CW\*(C'\fR is a \f(CW\*(C`FILE
+SEPARATOR\*(C'\fR concatenated with \fIX\fR for all \fIX\fR.
+The outlier \f(CW\*(C`\ec?\*(C'\fR on ASCII, which yields a non\-C0 control \f(CW\*(C`DEL\*(C'\fR,
+yields the outlier control \f(CW\*(C`APC\*(C'\fR on EBCDIC, the one that isn't in the
+block of contiguous controls.  Note that a subtlety of this is that
+\&\f(CW\*(C`\ec?\*(C'\fR on ASCII platforms is an ASCII character, while it isn't
+equivalent to any ASCII character in EBCDIC platforms.
+.PP
+.Vb 10
+\& chr   ord   8859\-1    0037    1047 && POSIX\-BC
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& \ec@     0   <NUL>     <NUL>        <NUL>
+\& \ecA     1   <SOH>     <SOH>        <SOH>
+\& \ecB     2   <STX>     <STX>        <STX>
+\& \ecC     3   <ETX>     <ETX>        <ETX>
+\& \ecD     4   <EOT>     <ST>         <ST>
+\& \ecE     5   <ENQ>     <HT>         <HT>
+\& \ecF     6   <ACK>     <SSA>        <SSA>
+\& \ecG     7   <BEL>     <DEL>        <DEL>
+\& \ecH     8   <BS>      <EPA>        <EPA>
+\& \ecI     9   <HT>      <RI>         <RI>
+\& \ecJ    10   <LF>      <SS2>        <SS2>
+\& \ecK    11   <VT>      <VT>         <VT>
+\& \ecL    12   <FF>      <FF>         <FF>
+\& \ecM    13   <CR>      <CR>         <CR>
+\& \ecN    14   <SO>      <SO>         <SO>
+\& \ecO    15   <SI>      <SI>         <SI>
+\& \ecP    16   <DLE>     <DLE>        <DLE>
+\& \ecQ    17   <DC1>     <DC1>        <DC1>
+\& \ecR    18   <DC2>     <DC2>        <DC2>
+\& \ecS    19   <DC3>     <DC3>        <DC3>
+\& \ecT    20   <DC4>     <OSC>        <OSC>
+\& \ecU    21   <NAK>     <NEL>        <LF>              **
+\& \ecV    22   <SYN>     <BS>         <BS>
+\& \ecW    23   <ETB>     <ESA>        <ESA>
+\& \ecX    24   <CAN>     <CAN>        <CAN>
+\& \ecY    25   <EOM>     <EOM>        <EOM>
+\& \ecZ    26   <SUB>     <PU2>        <PU2>
+\& \ec[    27   <ESC>     <SS3>        <SS3>
+\& \ec\eX   28   <FS>X     <FS>X        <FS>X
+\& \ec]    29   <GS>      <GS>         <GS>
+\& \ec^    30   <RS>      <RS>         <RS>
+\& \ec_    31   <US>      <US>         <US>
+\& \ec?    *    <DEL>     <APC>        <APC>
+.Ve
+.PP
+\&\f(CW\*(C`*\*(C'\fR Note: \f(CW\*(C`\ec?\*(C'\fR maps to ordinal 127 (\f(CW\*(C`DEL\*(C'\fR) on ASCII platforms, but
+since ordinal 127 is a not a control character on EBCDIC machines,
+\&\f(CW\*(C`\ec?\*(C'\fR instead maps on them to \f(CW\*(C`APC\*(C'\fR, which is 255 in 0037 and 1047,
+and 95 in POSIX-BC.
+.SH "FUNCTION DIFFERENCES"
+.IX Header "FUNCTION DIFFERENCES"
+.ie n .IP chr() 8
+.el .IP \f(CWchr()\fR 8
+.IX Item "chr()"
+\&\f(CWchr()\fR must be given an EBCDIC code number argument to yield a desired
+character return value on an EBCDIC platform.  For example:
+.Sp
+.Vb 1
+\&    $CAPITAL_LETTER_A = chr(193);
+.Ve
+.ie n .IP ord() 8
+.el .IP \f(CWord()\fR 8
+.IX Item "ord()"
+\&\f(CWord()\fR will return EBCDIC code number values on an EBCDIC platform.
+For example:
+.Sp
+.Vb 1
+\&    $the_number_193 = ord("A");
+.Ve
+.ie n .IP pack() 8
+.el .IP \f(CWpack()\fR 8
+.IX Item "pack()"
+The \f(CW"c"\fR and \f(CW"C"\fR templates for \f(CWpack()\fR are dependent upon character set
+encoding.  Examples of usage on EBCDIC include:
+.Sp
+.Vb 4
+\&    $foo = pack("CCCC",193,194,195,196);
+\&    # $foo eq "ABCD"
+\&    $foo = pack("C4",193,194,195,196);
+\&    # same thing
+\&
+\&    $foo = pack("ccxxcc",193,194,195,196);
+\&    # $foo eq "AB\e0\e0CD"
+.Ve
+.Sp
+The \f(CW"U"\fR template has been ported to mean "Unicode" on all platforms so
+that
+.Sp
+.Vb 1
+\&    pack("U", 65) eq \*(AqA\*(Aq
+.Ve
+.Sp
+is true on all platforms.  If you want native code points for the low
+256, use the \f(CW"W"\fR template.  This means that the equivalences
+.Sp
+.Vb 2
+\&    pack("W", ord($character)) eq $character
+\&    unpack("W", $character) == ord $character
+.Ve
+.Sp
+will hold.
+.ie n .IP print() 8
+.el .IP \f(CWprint()\fR 8
+.IX Item "print()"
+One must be careful with scalars and strings that are passed to
+print that contain ASCII encodings.  One common place
+for this to occur is in the output of the MIME type header for
+CGI script writing.  For example, many Perl programming guides
+recommend something similar to:
+.Sp
+.Vb 2
+\&    print "Content\-type:\ettext/html\e015\e012\e015\e012";
+\&    # this may be wrong on EBCDIC
+.Ve
+.Sp
+You can instead write
+.Sp
+.Vb 1
+\&    print "Content\-type:\ettext/html\er\en\er\en"; # OK for DGW et al
+.Ve
+.Sp
+and have it work portably.
+.Sp
+That is because the translation from EBCDIC to ASCII is done
+by the web server in this case.  Consult your web server's documentation for
+further details.
+.ie n .IP printf() 8
+.el .IP \f(CWprintf()\fR 8
+.IX Item "printf()"
+The formats that can convert characters to numbers and vice versa
+will be different from their ASCII counterparts when executed
+on an EBCDIC platform.  Examples include:
+.Sp
+.Vb 1
+\&    printf("%c%c%c",193,194,195);  # prints ABC
+.Ve
+.ie n .IP sort() 8
+.el .IP \f(CWsort()\fR 8
+.IX Item "sort()"
+EBCDIC sort results may differ from ASCII sort results especially for
+mixed case strings.  This is discussed in more detail below.
+.ie n .IP sprintf() 8
+.el .IP \f(CWsprintf()\fR 8
+.IX Item "sprintf()"
+See the discussion of \f(CW"printf()"\fR above.  An example of the use
+of sprintf would be:
+.Sp
+.Vb 1
+\&    $CAPITAL_LETTER_A = sprintf("%c",193);
+.Ve
+.ie n .IP unpack() 8
+.el .IP \f(CWunpack()\fR 8
+.IX Item "unpack()"
+See the discussion of \f(CW"pack()"\fR above.
+.PP
+Note that it is possible to write portable code for these by specifying
+things in Unicode numbers, and using a conversion function:
+.PP
+.Vb 3
+\&    printf("%c",utf8::unicode_to_native(65));  # prints A on all
+\&                                               # platforms
+\&    print utf8::native_to_unicode(ord("A"));   # Likewise, prints 65
+.Ve
+.PP
+See "Unicode and EBCDIC" in perluniintro and "CONVERSIONS"
+for other options.
+.SH "REGULAR EXPRESSION DIFFERENCES"
+.IX Header "REGULAR EXPRESSION DIFFERENCES"
+You can write your regular expressions just like someone on an ASCII
+platform would do.  But keep in mind that using octal or hex notation to
+specify a particular code point will give you the character that the
+EBCDIC code page natively maps to it.   (This is also true of all
+double-quoted strings.)  If you want to write portably, just use the
+\&\f(CW\*(C`\eN{U+...}\*(C'\fR notation everywhere where you would have used \f(CW\*(C`\ex{...}\*(C'\fR,
+and don't use octal notation at all.
+.PP
+Starting in Perl v5.22, this applies to ranges in bracketed character
+classes.  If you say, for example, \f(CW\*(C`qr/[\eN{U+20}\-\eN{U+7F}]/\*(C'\fR, it means
+the characters \f(CW\*(C`\eN{U+20}\*(C'\fR, \f(CW\*(C`\eN{U+21}\*(C'\fR, ..., \f(CW\*(C`\eN{U+7F}\*(C'\fR.  This range
+is all the printable characters that the ASCII character set contains.
+.PP
+Prior to v5.22, you couldn't specify any ranges portably, except
+(starting in Perl v5.5.3) all subsets of the \f(CW\*(C`[A\-Z]\*(C'\fR and \f(CW\*(C`[a\-z]\*(C'\fR
+ranges are specially coded to not pick up gap characters.  For example,
+characters such as "ô" (\f(CW\*(C`o WITH CIRCUMFLEX\*(C'\fR) that lie between
+"I" and "J" would not be matched by the regular expression range
+\&\f(CW\*(C`/[H\-K]/\*(C'\fR.  But if either of the range end points is explicitly numeric
+(and neither is specified by \f(CW\*(C`\eN{U+...}\*(C'\fR), the gap characters are
+matched:
+.PP
+.Vb 1
+\&    /[\ex89\-\ex91]/
+.Ve
+.PP
+will match \f(CW\*(C`\ex8e\*(C'\fR, even though \f(CW\*(C`\ex89\*(C'\fR is "i" and \f(CW\*(C`\ex91 \*(C'\fR is "j",
+and \f(CW\*(C`\ex8e\*(C'\fR is a gap character, from the alphabetic viewpoint.
+.PP
+Another construct to be wary of is the inappropriate use of hex (unless
+you use \f(CW\*(C`\eN{U+...}\*(C'\fR) or
+octal constants in regular expressions.  Consider the following
+set of subs:
+.PP
+.Vb 4
+\&    sub is_c0 {
+\&        my $char = substr(shift,0,1);
+\&        $char =~ /[\e000\-\e037]/;
+\&    }
+\&
+\&    sub is_print_ascii {
+\&        my $char = substr(shift,0,1);
+\&        $char =~ /[\e040\-\e176]/;
+\&    }
+\&
+\&    sub is_delete {
+\&        my $char = substr(shift,0,1);
+\&        $char eq "\e177";
+\&    }
+\&
+\&    sub is_c1 {
+\&        my $char = substr(shift,0,1);
+\&        $char =~ /[\e200\-\e237]/;
+\&    }
+\&
+\&    sub is_latin_1 {    # But not ASCII; not C1
+\&        my $char = substr(shift,0,1);
+\&        $char =~ /[\e240\-\e377]/;
+\&    }
+.Ve
+.PP
+These are valid only on ASCII platforms.  Starting in Perl v5.22, simply
+changing the octal constants to equivalent \f(CW\*(C`\eN{U+...}\*(C'\fR values makes
+them portable:
+.PP
+.Vb 4
+\&    sub is_c0 {
+\&        my $char = substr(shift,0,1);
+\&        $char =~ /[\eN{U+00}\-\eN{U+1F}]/;
+\&    }
+\&
+\&    sub is_print_ascii {
+\&        my $char = substr(shift,0,1);
+\&        $char =~ /[\eN{U+20}\-\eN{U+7E}]/;
+\&    }
+\&
+\&    sub is_delete {
+\&        my $char = substr(shift,0,1);
+\&        $char eq "\eN{U+7F}";
+\&    }
+\&
+\&    sub is_c1 {
+\&        my $char = substr(shift,0,1);
+\&        $char =~ /[\eN{U+80}\-\eN{U+9F}]/;
+\&    }
+\&
+\&    sub is_latin_1 {    # But not ASCII; not C1
+\&        my $char = substr(shift,0,1);
+\&        $char =~ /[\eN{U+A0}\-\eN{U+FF}]/;
+\&    }
+.Ve
+.PP
+And here are some alternative portable ways to write them:
+.PP
+.Vb 3
+\&    sub Is_c0 {
+\&        my $char = substr(shift,0,1);
+\&        return $char =~ /[[:cntrl:]]/a && ! Is_delete($char);
+\&
+\&        # Alternatively:
+\&        # return $char =~ /[[:cntrl:]]/
+\&        #        && $char =~ /[[:ascii:]]/
+\&        #        && ! Is_delete($char);
+\&    }
+\&
+\&    sub Is_print_ascii {
+\&        my $char = substr(shift,0,1);
+\&
+\&        return $char =~ /[[:print:]]/a;
+\&
+\&        # Alternatively:
+\&        # return $char =~ /[[:print:]]/ && $char =~ /[[:ascii:]]/;
+\&
+\&        # Or
+\&        # return $char
+\&        #      =~ /[ !"\e#\e$%&\*(Aq()*+,\e\-.\e/0\-9:;<=>?\e@A\-Z[\e\e\e]^_\`a\-z{|}~]/;
+\&    }
+\&
+\&    sub Is_delete {
+\&        my $char = substr(shift,0,1);
+\&        return utf8::native_to_unicode(ord $char) == 0x7F;
+\&    }
+\&
+\&    sub Is_c1 {
+\&        use feature \*(Aqunicode_strings\*(Aq;
+\&        my $char = substr(shift,0,1);
+\&        return $char =~ /[[:cntrl:]]/ && $char !~ /[[:ascii:]]/;
+\&    }
+\&
+\&    sub Is_latin_1 {    # But not ASCII; not C1
+\&        use feature \*(Aqunicode_strings\*(Aq;
+\&        my $char = substr(shift,0,1);
+\&        return ord($char) < 256
+\&               && $char !~ /[[:ascii:]]/
+\&               && $char !~ /[[:cntrl:]]/;
+\&    }
+.Ve
+.PP
+Another way to write \f(CWIs_latin_1()\fR would be
+to use the characters in the range explicitly:
+.PP
+.Vb 5
+\&    sub Is_latin_1 {
+\&        my $char = substr(shift,0,1);
+\&        $char =~ /[\ ¡¢£¤¥¦§¨©ª«¬\%®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ]
+\&                  [ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]/x;
+\&    }
+.Ve
+.PP
+Although that form may run into trouble in network transit (due to the
+presence of 8 bit characters) or on non ISO-Latin character sets.  But
+it does allow \f(CW\*(C`Is_c1\*(C'\fR to be rewritten so it works on Perls that don't
+have \f(CW\*(Aqunicode_strings\*(Aq\fR (earlier than v5.14):
+.PP
+.Vb 6
+\&    sub Is_latin_1 {    # But not ASCII; not C1
+\&        my $char = substr(shift,0,1);
+\&        return ord($char) < 256
+\&               && $char !~ /[[:ascii:]]/
+\&               && ! Is_latin1($char);
+\&    }
+.Ve
+.SH SOCKETS
+.IX Header "SOCKETS"
+Most socket programming assumes ASCII character encodings in network
+byte order.  Exceptions can include CGI script writing under a
+host web server where the server may take care of translation for you.
+Most host web servers convert EBCDIC data to ISO\-8859\-1 or Unicode on
+output.
+.SH SORTING
+.IX Header "SORTING"
+One big difference between ASCII-based character sets and EBCDIC ones
+are the relative positions of the characters when sorted in native
+order.  Of most concern are the upper\- and lowercase letters, the
+digits, and the underscore (\f(CW"_"\fR).  On ASCII platforms the native sort
+order has the digits come before the uppercase letters which come before
+the underscore which comes before the lowercase letters.  On EBCDIC, the
+underscore comes first, then the lowercase letters, then the uppercase
+ones, and the digits last.  If sorted on an ASCII-based platform, the
+two-letter abbreviation for a physician comes before the two letter
+abbreviation for drive; that is:
+.PP
+.Vb 2
+\& @sorted = sort(qw(Dr. dr.));  # @sorted holds (\*(AqDr.\*(Aq,\*(Aqdr.\*(Aq) on ASCII,
+\&                                  # but (\*(Aqdr.\*(Aq,\*(AqDr.\*(Aq) on EBCDIC
+.Ve
+.PP
+The property of lowercase before uppercase letters in EBCDIC is
+even carried to the Latin 1 EBCDIC pages such as 0037 and 1047.
+An example would be that "Ë" (\f(CW\*(C`E WITH DIAERESIS\*(C'\fR, 203) comes
+before "ë" (\f(CW\*(C`e WITH DIAERESIS\*(C'\fR, 235) on an ASCII platform, but
+the latter (83) comes before the former (115) on an EBCDIC platform.
+(Astute readers will note that the uppercase version of "ß"
+\&\f(CW\*(C`SMALL LETTER SHARP S\*(C'\fR is simply "SS" and that the upper case versions
+of "ÿ" (small \f(CW\*(C`y WITH DIAERESIS\*(C'\fR) and "µ" (\f(CW\*(C`MICRO SIGN\*(C'\fR)
+are not in the 0..255 range but are in Unicode, in a Unicode enabled
+Perl).
+.PP
+The sort order will cause differences between results obtained on
+ASCII platforms versus EBCDIC platforms.  What follows are some suggestions
+on how to deal with these differences.
+.SS "Ignore ASCII vs. EBCDIC sort differences."
+.IX Subsection "Ignore ASCII vs. EBCDIC sort differences."
+This is the least computationally expensive strategy.  It may require
+some user education.
+.SS "Use a sort helper function"
+.IX Subsection "Use a sort helper function"
+This is completely general, but the most computationally expensive
+strategy.  Choose one or the other character set and transform to that
+for every sort comparison.  Here's a complete example that transforms
+to ASCII sort order:
+.PP
+.Vb 2
+\& sub native_to_uni($) {
+\&    my $string = shift;
+\&
+\&    # Saves time on an ASCII platform
+\&    return $string if ord \*(AqA\*(Aq ==  65;
+\&
+\&    my $output = "";
+\&    for my $i (0 .. length($string) \- 1) {
+\&        $output
+\&           .= chr(utf8::native_to_unicode(ord(substr($string, $i, 1))));
+\&    }
+\&
+\&    # Preserve utf8ness of input onto the output, even if it didn\*(Aqt need
+\&    # to be utf8
+\&    utf8::upgrade($output) if utf8::is_utf8($string);
+\&
+\&    return $output;
+\& }
+\&
+\& sub ascii_order {   # Sort helper
+\&    return native_to_uni($a) cmp native_to_uni($b);
+\& }
+\&
+\& sort ascii_order @list;
+.Ve
+.SS "MONO CASE then sort data (for non-digits, non-underscore)"
+.IX Subsection "MONO CASE then sort data (for non-digits, non-underscore)"
+If you don't care about where digits and underscore sort to, you can do
+something like this
+.PP
+.Vb 3
+\& sub case_insensitive_order {   # Sort helper
+\&    return lc($a) cmp lc($b)
+\& }
+\&
+\& sort case_insensitive_order @list;
+.Ve
+.PP
+If performance is an issue, and you don't care if the output is in the
+same case as the input, Use \f(CW\*(C`tr///\*(C'\fR to transform to the case most
+employed within the data.  If the data are primarily UPPERCASE
+non\-Latin1, then apply \f(CW\*(C`tr/[a\-z]/[A\-Z]/\*(C'\fR, and then \f(CWsort()\fR.  If the
+data are primarily lowercase non Latin1 then apply \f(CW\*(C`tr/[A\-Z]/[a\-z]/\*(C'\fR
+before sorting.  If the data are primarily UPPERCASE and include Latin\-1
+characters then apply:
+.PP
+.Vb 3
+\&   tr/[a\-z]/[A\-Z]/;
+\&   tr/[àáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ]/[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ/;
+\&   s/ß/SS/g;
+.Ve
+.PP
+then \f(CWsort()\fR.  If you have a choice, it's better to lowercase things
+to avoid the problems of the two Latin\-1 characters whose uppercase is
+outside Latin\-1: "ÿ" (small \f(CW\*(C`y WITH DIAERESIS\*(C'\fR) and "µ"
+(\f(CW\*(C`MICRO SIGN\*(C'\fR).  If you do need to upppercase, you can; with a
+Unicode-enabled Perl, do:
+.PP
+.Vb 2
+\&    tr/ÿ/\ex{178}/;
+\&    tr/µ/\ex{39C}/;
+.Ve
+.SS "Perform sorting on one type of platform only."
+.IX Subsection "Perform sorting on one type of platform only."
+This strategy can employ a network connection.  As such
+it would be computationally expensive.
+.SH "TRANSFORMATION FORMATS"
+.IX Header "TRANSFORMATION FORMATS"
+There are a variety of ways of transforming data with an intra character set
+mapping that serve a variety of purposes.  Sorting was discussed in the
+previous section and a few of the other more popular mapping techniques are
+discussed next.
+.SS "URL decoding and encoding"
+.IX Subsection "URL decoding and encoding"
+Note that some URLs have hexadecimal ASCII code points in them in an
+attempt to overcome character or protocol limitation issues.  For example
+the tilde character is not on every keyboard hence a URL of the form:
+.PP
+.Vb 1
+\&    http://www.pvhp.com/~pvhp/
+.Ve
+.PP
+may also be expressed as either of:
+.PP
+.Vb 1
+\&    http://www.pvhp.com/%7Epvhp/
+\&
+\&    http://www.pvhp.com/%7epvhp/
+.Ve
+.PP
+where 7E is the hexadecimal ASCII code point for "~".  Here is an example
+of decoding such a URL in any EBCDIC code page:
+.PP
+.Vb 3
+\&    $url = \*(Aqhttp://www.pvhp.com/%7Epvhp/\*(Aq;
+\&    $url =~ s/%([0\-9a\-fA\-F]{2})/
+\&              pack("c",utf8::unicode_to_native(hex($1)))/xge;
+.Ve
+.PP
+Conversely, here is a partial solution for the task of encoding such
+a URL in any EBCDIC code page:
+.PP
+.Vb 5
+\&    $url = \*(Aqhttp://www.pvhp.com/~pvhp/\*(Aq;
+\&    # The following regular expression does not address the
+\&    # mappings for: (\*(Aq.\*(Aq => \*(Aq%2E\*(Aq, \*(Aq/\*(Aq => \*(Aq%2F\*(Aq, \*(Aq:\*(Aq => \*(Aq%3A\*(Aq)
+\&    $url =~ s/([\et "#%&\e(\e),;<=>\e?\e@\e[\e\e\e]^\`{|}~])/
+\&               sprintf("%%%02X",utf8::native_to_unicode(ord($1)))/xge;
+.Ve
+.PP
+where a more complete solution would split the URL into components
+and apply a full s/// substitution only to the appropriate parts.
+.SS "uu encoding and decoding"
+.IX Subsection "uu encoding and decoding"
+The \f(CW\*(C`u\*(C'\fR template to \f(CWpack()\fR or \f(CWunpack()\fR will render EBCDIC data in
+EBCDIC characters equivalent to their ASCII counterparts.  For example,
+the following will print "Yes indeed\en" on either an ASCII or EBCDIC
+computer:
+.PP
+.Vb 10
+\&    $all_byte_chrs = \*(Aq\*(Aq;
+\&    for (0..255) { $all_byte_chrs .= chr($_); }
+\&    $uuencode_byte_chrs = pack(\*(Aqu\*(Aq, $all_byte_chrs);
+\&    ($uu = <<\*(AqENDOFHEREDOC\*(Aq) =~ s/^\es*//gm;
+\&    M\`\`$"\`P0%!@<("0H+#\`T.#Q\`1$A,4%187&!D:&QP=\*(AqA\e@(2(C)"4F)R@I*BLL
+\&    M+2XO,#$R,S0U\-C<X.3H[/#T^/T!!0D\-$149\*(Aq2$E*2TQ\-3D]045)35%565UA9
+\&    M6EM<75Y?8&%B8V1E9F=H:6IK;&UN;W!Q<G\-T=79W>\*(AqEZ>WQ]?G^\`@8*#A(6&
+\&    MAXB)BHN,C8Z/D)&2DY25EI>8F9J;G)V>GZ"AHJ.DI::GJ*FJJZRMKJ^PL;*S
+\&    MM+6VM[BYNKN\eO;Z_P,\*(Aq"P\e3%QL?(R<K+S,W.S]#1TM/4U=;7V\-G:V]S=WM_@
+\&    ?X>+CY.7FY^CIZNOL[>[O\e/\*(AqR\e_3U]O?X^?K[_/W^_P\`\`
+\&    ENDOFHEREDOC
+\&    if ($uuencode_byte_chrs eq $uu) {
+\&        print "Yes ";
+\&    }
+\&    $uudecode_byte_chrs = unpack(\*(Aqu\*(Aq, $uuencode_byte_chrs);
+\&    if ($uudecode_byte_chrs eq $all_byte_chrs) {
+\&        print "indeed\en";
+\&    }
+.Ve
+.PP
+Here is a very spartan uudecoder that will work on EBCDIC:
+.PP
+.Vb 10
+\&    #!/usr/local/bin/perl
+\&    $_ = <> until ($mode,$file) = /^begin\es*(\ed*)\es*(\eS*)/;
+\&    open(OUT, "> $file") if $file ne "";
+\&    while(<>) {
+\&        last if /^end/;
+\&        next if /[a\-z]/;
+\&        next unless int((((utf8::native_to_unicode(ord()) \- 32 ) & 077)
+\&                                                               + 2) / 3)
+\&                    == int(length() / 4);
+\&        print OUT unpack("u", $_);
+\&    }
+\&    close(OUT);
+\&    chmod oct($mode), $file;
+.Ve
+.SS "Quoted-Printable encoding and decoding"
+.IX Subsection "Quoted-Printable encoding and decoding"
+On ASCII-encoded platforms it is possible to strip characters outside of
+the printable set using:
+.PP
+.Vb 3
+\&    # This QP encoder works on ASCII only
+\&    $qp_string =~ s/([=\ex00\-\ex1F\ex80\-\exFF])/
+\&                    sprintf("=%02X",ord($1))/xge;
+.Ve
+.PP
+Starting in Perl v5.22, this is trivially changeable to work portably on
+both ASCII and EBCDIC platforms.
+.PP
+.Vb 3
+\&    # This QP encoder works on both ASCII and EBCDIC
+\&    $qp_string =~ s/([=\eN{U+00}\-\eN{U+1F}\eN{U+80}\-\eN{U+FF}])/
+\&                    sprintf("=%02X",ord($1))/xge;
+.Ve
+.PP
+For earlier Perls, a QP encoder that works on both ASCII and EBCDIC
+platforms would look somewhat like the following:
+.PP
+.Vb 4
+\&    $delete = utf8::unicode_to_native(ord("\ex7F"));
+\&    $qp_string =~
+\&      s/([^[:print:]$delete])/
+\&         sprintf("=%02X",utf8::native_to_unicode(ord($1)))/xage;
+.Ve
+.PP
+(although in production code the substitutions might be done
+in the EBCDIC branch with the function call and separately in the
+ASCII branch without the expense of the identity map; in Perl v5.22, the
+identity map is optimized out so there is no expense, but the
+alternative above is simpler and is also available in v5.22).
+.PP
+Such QP strings can be decoded with:
+.PP
+.Vb 3
+\&    # This QP decoder is limited to ASCII only
+\&    $string =~ s/=([[:xdigit:][[:xdigit:])/chr hex $1/ge;
+\&    $string =~ s/=[\en\er]+$//;
+.Ve
+.PP
+Whereas a QP decoder that works on both ASCII and EBCDIC platforms
+would look somewhat like the following:
+.PP
+.Vb 3
+\&    $string =~ s/=([[:xdigit:][:xdigit:]])/
+\&                                chr utf8::native_to_unicode(hex $1)/xge;
+\&    $string =~ s/=[\en\er]+$//;
+.Ve
+.SS "Caesarean ciphers"
+.IX Subsection "Caesarean ciphers"
+The practice of shifting an alphabet one or more characters for encipherment
+dates back thousands of years and was explicitly detailed by Gaius Julius
+Caesar in his \fBGallic Wars\fR text.  A single alphabet shift is sometimes
+referred to as a rotation and the shift amount is given as a number \f(CW$n\fR after
+the string 'rot' or "rot$n".  Rot0 and rot26 would designate identity maps
+on the 26\-letter English version of the Latin alphabet.  Rot13 has the
+interesting property that alternate subsequent invocations are identity maps
+(thus rot13 is its own non-trivial inverse in the group of 26 alphabet
+rotations).  Hence the following is a rot13 encoder and decoder that will
+work on ASCII and EBCDIC platforms:
+.PP
+.Vb 1
+\&    #!/usr/local/bin/perl
+\&
+\&    while(<>){
+\&        tr/n\-za\-mN\-ZA\-M/a\-zA\-Z/;
+\&        print;
+\&    }
+.Ve
+.PP
+In one-liner form:
+.PP
+.Vb 1
+\&    perl \-ne \*(Aqtr/n\-za\-mN\-ZA\-M/a\-zA\-Z/;print\*(Aq
+.Ve
+.SH "Hashing order and checksums"
+.IX Header "Hashing order and checksums"
+Perl deliberately randomizes hash order for security purposes on both
+ASCII and EBCDIC platforms.
+.PP
+EBCDIC checksums will differ for the same file translated into ASCII
+and vice versa.
+.SH "I18N AND L10N"
+.IX Header "I18N AND L10N"
+Internationalization (I18N) and localization (L10N) are supported at least
+in principle even on EBCDIC platforms.  The details are system-dependent
+and discussed under the "OS ISSUES" section below.
+.SH "MULTI-OCTET CHARACTER SETS"
+.IX Header "MULTI-OCTET CHARACTER SETS"
+Perl works with UTF-EBCDIC, a multi-byte encoding.  In Perls earlier
+than v5.22, there may be various bugs in this regard.
+.PP
+Legacy multi byte EBCDIC code pages XXX.
+.SH "OS ISSUES"
+.IX Header "OS ISSUES"
+There may be a few system-dependent issues
+of concern to EBCDIC Perl programmers.
+.SS OS/400
+.IX Subsection "OS/400"
+.IP PASE 8
+.IX Item "PASE"
+The PASE environment is a runtime environment for OS/400 that can run
+executables built for PowerPC AIX in OS/400; see perlos400.  PASE
+is ASCII-based, not EBCDIC-based as the ILE.
+.IP "IFS access" 8
+.IX Item "IFS access"
+XXX.
+.SS "OS/390, z/OS"
+.IX Subsection "OS/390, z/OS"
+Perl runs under Unix Systems Services or USS.
+.ie n .IP """sigaction""" 8
+.el .IP \f(CWsigaction\fR 8
+.IX Item "sigaction"
+\&\f(CW\*(C`SA_SIGINFO\*(C'\fR can have segmentation faults.
+.ie n .IP """chcp""" 8
+.el .IP \f(CWchcp\fR 8
+.IX Item "chcp"
+\&\fBchcp\fR is supported as a shell utility for displaying and changing
+one's code page.  See also \fBchcp\fR\|(1).
+.IP "dataset access" 8
+.IX Item "dataset access"
+For sequential data set access try:
+.Sp
+.Vb 1
+\&    my @ds_records = \`cat //DSNAME\`;
+.Ve
+.Sp
+or:
+.Sp
+.Vb 1
+\&    my @ds_records = \`cat //\*(AqHLQ.DSNAME\*(Aq\`;
+.Ve
+.Sp
+See also the OS390::Stdio module on CPAN.
+.ie n .IP """iconv""" 8
+.el .IP \f(CWiconv\fR 8
+.IX Item "iconv"
+\&\fBiconv\fR is supported as both a shell utility and a C RTL routine.
+See also the \fBiconv\fR\|(1) and \fBiconv\fR\|(3) manual pages.
+.IP locales 8
+.IX Item "locales"
+Locales are supported.  There may be glitches when a locale is another
+EBCDIC code page which has some of the
+code-page variant characters in other
+positions.
+.Sp
+There aren't currently any real UTF\-8 locales, even though some locale
+names contain the string "UTF\-8".
+.Sp
+See perllocale for information on locales.  The L10N files
+are in \fI/usr/nls/locale\fR.  \f(CW$Config{d_setlocale}\fR is \f(CW\*(Aqdefine\*(Aq\fR on
+OS/390 or z/OS.
+.SS POSIX-BC?
+.IX Subsection "POSIX-BC?"
+XXX.
+.SH BUGS
+.IX Header "BUGS"
+.IP \(bu 4
+Not all shells will allow multiple \f(CW\*(C`\-e\*(C'\fR string arguments to perl to
+be concatenated together properly as recipes in this document
+0, 2, 4, 5, and 6 might
+seem to imply.
+.IP \(bu 4
+There are a significant number of test failures in the CPAN modules
+shipped with Perl v5.22 and 5.24.  These are only in modules not primarily
+maintained by Perl 5 porters.  Some of these are failures in the tests
+only: they don't realize that it is proper to get different results on
+EBCDIC platforms.  And some of the failures are real bugs.  If you
+compile and do a \f(CW\*(C`make test\*(C'\fR on Perl, all tests on the \f(CW\*(C`/cpan\*(C'\fR
+directory are skipped.
+.Sp
+Encode partially works.
+.IP \(bu 4
+In earlier Perl versions, when byte and character data were
+concatenated, the new string was sometimes created by
+decoding the byte strings as \fIISO 8859\-1 (Latin\-1)\fR, even if the
+old Unicode string used EBCDIC.
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+perllocale, perlfunc, perlunicode, utf8.
+.SH REFERENCES
+.IX Header "REFERENCES"
+<http://std.dkuug.dk/i18n/charmaps>
+.PP
+<https://www.unicode.org/>
+.PP
+<https://www.unicode.org/reports/tr16/>
+.PP
+<https://www.sr\-ix.com/Archive/CharCodeHist/index.html>
+\&\fBASCII: American Standard Code for Information Infiltration\fR Tom Jennings,
+September 1999.
+.PP
+\&\fBThe Unicode Standard, Version 3.0\fR The Unicode Consortium, Lisa Moore ed.,
+ISBN 0\-201\-61633\-5, Addison Wesley Developers Press, February 2000.
+.PP
+\&\fBCDRA: IBM \- Character Data Representation Architecture \-
+Reference and Registry\fR, IBM SC09\-2190\-00, December 1996.
+.PP
+"Demystifying Character Sets", Andrea Vine, Multilingual Computing
+& Technology, \fB#26 Vol. 10 Issue 4\fR, August/September 1999;
+ISSN 1523\-0309; Multilingual Computing Inc. Sandpoint ID, USA.
+.PP
+\&\fBCodes, Ciphers, and Other Cryptic and Clandestine Communication\fR
+Fred B. Wrixon, ISBN 1\-57912\-040\-7, Black Dog & Leventhal Publishers,
+1998.
+.PP
+<http://www.bobbemer.com/P\-BIT.HTM>
+\&\fBIBM \- EBCDIC and the P\-bit; The biggest Computer Goof Ever\fR Robert Bemer.
+.SH HISTORY
+.IX Header "HISTORY"
+15 April 2001: added UTF\-8 and UTF-EBCDIC to main table, pvhp.
+.SH AUTHOR
+.IX Header "AUTHOR"
+Peter Prymmer pvhp@best.com wrote this in 1999 and 2000
+with CCSID 0819 and 0037 help from Chris Leach and
+André Pirard A.Pirard@ulg.ac.be as well as POSIX-BC
+help from Thomas Dorner Thomas.Dorner@start.de.
+Thanks also to Vickie Cooper, Philip Newton, William Raffloer, and
+Joe Smith.  Trademarks, registered trademarks, service marks and
+registered service marks used in this document are the property of
+their respective owners.
+.PP
+Now maintained by Perl5 Porters.
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-15 19:43:11 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-15 19:43:11 +0000
commit	fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
tree	ce1e3bce06471410239a6f41282e328770aa404a /upstream/mageia-cauldron/man1/perlebcdic.1
parent	Initial commit. (diff)
download	manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip