summaryrefslogtreecommitdiffstats
path: root/upstream/mageia-cauldron/man1/perlebcdic.1
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 19:43:11 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 19:43:11 +0000
commitfc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
treece1e3bce06471410239a6f41282e328770aa404a /upstream/mageia-cauldron/man1/perlebcdic.1
parentInitial commit. (diff)
downloadmanpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz
manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip
Adding upstream version 4.22.0.upstream/4.22.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'upstream/mageia-cauldron/man1/perlebcdic.1')
-rw-r--r--upstream/mageia-cauldron/man1/perlebcdic.11998
1 files changed, 1998 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man1/perlebcdic.1 b/upstream/mageia-cauldron/man1/perlebcdic.1
new file mode 100644
index 00000000..b23c5fd5
--- /dev/null
+++ b/upstream/mageia-cauldron/man1/perlebcdic.1
@@ -0,0 +1,1998 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+. ds C` ""
+. ds C' ""
+'br\}
+.el\{\
+. ds C`
+. ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD. Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+. if \nF \{\
+. de IX
+. tm Index:\\$1\t\\n%\t"\\$2"
+..
+. if !\nF==2 \{\
+. nr % 0
+. nr F 2
+. \}
+. \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "PERLEBCDIC 1"
+.TH PERLEBCDIC 1 2023-11-28 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification. Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+perlebcdic \- Considerations for running Perl on EBCDIC platforms
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+An exploration of some of the issues facing Perl programmers
+on EBCDIC based computers.
+.PP
+Portions of this document that are still incomplete are marked with XXX.
+.PP
+Early Perl versions worked on some EBCDIC machines, but the last known
+version that ran on EBCDIC was v5.8.7, until v5.22, when the Perl core
+again works on z/OS. Theoretically, it could work on OS/400 or Siemens'
+BS2000 (or their successors), but this is untested. In v5.22 and 5.24,
+not all
+the modules found on CPAN but shipped with core Perl work on z/OS.
+.PP
+If you want to use Perl on a non\-z/OS EBCDIC machine, please let us know
+at <https://github.com/Perl/perl5/issues>.
+.PP
+Writing Perl on an EBCDIC platform is really no different than writing
+on an "ASCII" one, but with different underlying numbers, as we'll see
+shortly. You'll have to know something about those "ASCII" platforms
+because the documentation is biased and will frequently use example
+numbers that don't apply to EBCDIC. There are also very few CPAN
+modules that are written for EBCDIC and which don't work on ASCII;
+instead the vast majority of CPAN modules are written for ASCII, and
+some may happen to work on EBCDIC, while a few have been designed to
+portably work on both.
+.PP
+If your code just uses the 52 letters A\-Z and a\-z, plus SPACE, the
+digits 0\-9, and the punctuation characters that Perl uses, plus a few
+controls that are denoted by escape sequences like \f(CW\*(C`\en\*(C'\fR and \f(CW\*(C`\et\*(C'\fR, then
+there's nothing special about using Perl, and your code may very well
+work on an ASCII machine without change.
+.PP
+But if you write code that uses \f(CW\*(C`\e005\*(C'\fR to mean a TAB or \f(CW\*(C`\exC1\*(C'\fR to mean
+an "A", or \f(CW\*(C`\exDF\*(C'\fR to mean a "ÿ" (small \f(CW"y"\fR with a diaeresis),
+then your code may well work on your EBCDIC platform, but not on an
+ASCII one. That's fine to do if no one will ever want to run your code
+on an ASCII platform; but the bias in this document will be towards writing
+code portable between EBCDIC and ASCII systems. Again, if every
+character you care about is easily enterable from your keyboard, you
+don't have to know anything about ASCII, but many keyboards don't easily
+allow you to directly enter, say, the character \f(CW\*(C`\exDF\*(C'\fR, so you have to
+specify it indirectly, such as by using the \f(CW"\exDF"\fR escape sequence.
+In those cases it's easiest to know something about the ASCII/Unicode
+character sets. If you know that the small "ÿ" is \f(CW\*(C`U+00FF\*(C'\fR, then
+you can instead specify it as \f(CW"\eN{U+FF}"\fR, and have the computer
+automatically translate it to \f(CW\*(C`\exDF\*(C'\fR on your platform, and leave it as
+\&\f(CW\*(C`\exFF\*(C'\fR on ASCII ones. Or you could specify it by name, \f(CW\*(C`\eN{LATIN
+SMALL LETTER Y WITH DIAERESIS\*(C'\fR and not have to know the numbers.
+Either way works, but both require familiarity with Unicode.
+.SH "COMMON CHARACTER CODE SETS"
+.IX Header "COMMON CHARACTER CODE SETS"
+.SS ASCII
+.IX Subsection "ASCII"
+The American Standard Code for Information Interchange (ASCII or
+US-ASCII) is a set of
+integers running from 0 to 127 (decimal) that have standardized
+interpretations by the computers which use ASCII. For example, 65 means
+the letter "A".
+The range 0..127 can be covered by setting various bits in a 7\-bit binary
+digit, hence the set is sometimes referred to as "7\-bit ASCII".
+ASCII was described by the American National Standards Institute
+document ANSI X3.4\-1986. It was also described by ISO 646:1991
+(with localization for currency symbols). The full ASCII set is
+given in the table below as the first 128 elements.
+Languages that
+can be written adequately with the characters in ASCII include
+English, Hawaiian, Indonesian, Swahili and some Native American
+languages.
+.PP
+Most non-EBCDIC character sets are supersets of ASCII. That is the
+integers 0\-127 mean what ASCII says they mean. But integers 128 and
+above are specific to the character set.
+.PP
+Many of these fit entirely into 8 bits, using ASCII as 0\-127, while
+specifying what 128\-255 mean, and not using anything above 255.
+Thus, these are single-byte (or octet if you prefer) character sets.
+One important one (since Unicode is a superset of it) is the ISO 8859\-1
+character set.
+.SS "ISO 8859"
+.IX Subsection "ISO 8859"
+The ISO 8859\-\fR\f(CB$n\fR\f(BI\fR\fI\fR are a collection of character code sets from the
+International Organization for Standardization (ISO), each of which adds
+characters to the ASCII set that are typically found in various
+languages, many of which are based on the Roman, or Latin, alphabet.
+Most are for European languages, but there are also ones for Arabic,
+Greek, Hebrew, and Thai. There are good references on the web about
+all these.
+.SS "Latin 1 (ISO 8859\-1)"
+.IX Subsection "Latin 1 (ISO 8859-1)"
+A particular 8\-bit extension to ASCII that includes grave and acute
+accented Latin characters. Languages that can employ ISO 8859\-1
+include all the languages covered by ASCII as well as Afrikaans,
+Albanian, Basque, Catalan, Danish, Faroese, Finnish, Norwegian,
+Portuguese, Spanish, and Swedish. Dutch is covered albeit without
+the ij ligature. French is covered too but without the oe ligature.
+German can use ISO 8859\-1 but must do so without German-style
+quotation marks. This set is based on Western European extensions
+to ASCII and is commonly encountered in world wide web work.
+In IBM character code set identification terminology, ISO 8859\-1 is
+also known as CCSID 819 (or sometimes 0819 or even 00819).
+.SS EBCDIC
+.IX Subsection "EBCDIC"
+The Extended Binary Coded Decimal Interchange Code refers to a
+large collection of single\- and multi-byte coded character sets that are
+quite different from ASCII and ISO 8859\-1, and are all slightly
+different from each other; they typically run on host computers. The
+EBCDIC encodings derive from 8\-bit byte extensions of Hollerith punched
+card encodings, which long predate ASCII. The layout on the
+cards was such that high bits were set for the upper and lower case
+alphabetic
+characters \f(CW\*(C`[a\-z]\*(C'\fR and \f(CW\*(C`[A\-Z]\*(C'\fR, but there were gaps within each Latin
+alphabet range, visible in the table below. These gaps can
+cause complications.
+.PP
+Some IBM EBCDIC character sets may be known by character code set
+identification numbers (CCSID numbers) or code page numbers.
+.PP
+Perl can be compiled on platforms that run any of three commonly used EBCDIC
+character sets, listed below.
+.PP
+\fIThe 13 variant characters\fR
+.IX Subsection "The 13 variant characters"
+.PP
+Among IBM EBCDIC character code sets there are 13 characters that
+are often mapped to different integer values. Those characters
+are known as the 13 "variant" characters and are:
+.PP
+.Vb 1
+\& \e [ ] { } ^ ~ ! # | $ @ \`
+.Ve
+.PP
+When Perl is compiled for a platform, it looks at all of these characters to
+guess which EBCDIC character set the platform uses, and adapts itself
+accordingly to that platform. If the platform uses a character set that is not
+one of the three Perl knows about, Perl will either fail to compile, or
+mistakenly and silently choose one of the three.
+.PP
+The Line Feed (LF) character is actually a 14th variant character, and
+Perl checks for that as well.
+.PP
+\fIEBCDIC code sets recognized by Perl\fR
+.IX Subsection "EBCDIC code sets recognized by Perl"
+.IP \fB0037\fR 4
+.IX Item "0037"
+Character code set ID 0037 is a mapping of the ASCII plus Latin\-1
+characters (i.e. ISO 8859\-1) to an EBCDIC set. 0037 is used
+in North American English locales on the OS/400 operating system
+that runs on AS/400 computers. CCSID 0037 differs from ISO 8859\-1
+in 236 places; in other words they agree on only 20 code point values.
+.IP \fB1047\fR 4
+.IX Item "1047"
+Character code set ID 1047 is also a mapping of the ASCII plus
+Latin\-1 characters (i.e. ISO 8859\-1) to an EBCDIC set. 1047 is
+used under Unix System Services for OS/390 or z/OS, and OpenEdition
+for VM/ESA. CCSID 1047 differs from CCSID 0037 in eight places,
+and from ISO 8859\-1 in 236.
+.IP \fBPOSIX-BC\fR 4
+.IX Item "POSIX-BC"
+The EBCDIC code page in use on Siemens' BS2000 system is distinct from
+1047 and 0037. It is identified below as the POSIX-BC set.
+Like 0037 and 1047, it is the same as ISO 8859\-1 in 20 code point
+values.
+.SS "Unicode code points versus EBCDIC code points"
+.IX Subsection "Unicode code points versus EBCDIC code points"
+In Unicode terminology a \fIcode point\fR is the number assigned to a
+character: for example, in EBCDIC the character "A" is usually assigned
+the number 193. In Unicode, the character "A" is assigned the number 65.
+All the code points in ASCII and Latin\-1 (ISO 8859\-1) have the same
+meaning in Unicode. All three of the recognized EBCDIC code sets have
+256 code points, and in each code set, all 256 code points are mapped to
+equivalent Latin1 code points. Obviously, "A" will map to "A", "B" =>
+"B", "%" => "%", etc., for all printable characters in Latin1 and these
+code pages.
+.PP
+It also turns out that EBCDIC has nearly precise equivalents for the
+ASCII/Latin1 C0 controls and the DELETE control. (The C0 controls are
+those whose ASCII code points are 0..0x1F; things like TAB, ACK, BEL,
+etc.) A mapping is set up between these ASCII/EBCDIC controls. There
+isn't such a precise mapping between the C1 controls on ASCII platforms
+and the remaining EBCDIC controls. What has been done is to map these
+controls, mostly arbitrarily, to some otherwise unmatched character in
+the other character set. Most of these are very very rarely used
+nowadays in EBCDIC anyway, and their names have been dropped, without
+much complaint. For example the EO (Eight Ones) EBCDIC control
+(consisting of eight one bits = 0xFF) is mapped to the C1 APC control
+(0x9F), and you can't use the name "EO".
+.PP
+The EBCDIC controls provide three possible line terminator characters,
+CR (0x0D), LF (0x25), and NL (0x15). On ASCII platforms, the symbols
+"NL" and "LF" refer to the same character, but in strict EBCDIC
+terminology they are different ones. The EBCDIC NL is mapped to the C1
+control called "NEL" ("Next Line"; here's a case where the mapping makes
+quite a bit of sense, and hence isn't just arbitrary). On some EBCDIC
+platforms, this NL or NEL is the typical line terminator. This is true
+of z/OS and BS2000. In these platforms, the C compilers will swap the
+LF and NEL code points, so that \f(CW"\en"\fR is 0x15, and refers to NL. Perl
+does that too; you can see it in the code chart below.
+This makes things generally "just work" without you even having to be
+aware that there is a swap.
+.SS "Unicode and UTF"
+.IX Subsection "Unicode and UTF"
+UTF stands for "Unicode Transformation Format".
+UTF\-8 is an encoding of Unicode into a sequence of 8\-bit byte chunks, based on
+ASCII and Latin\-1.
+The length of a sequence required to represent a Unicode code point
+depends on the ordinal number of that code point,
+with larger numbers requiring more bytes.
+UTF-EBCDIC is like UTF\-8, but based on EBCDIC.
+They are enough alike that often, casual usage will conflate the two
+terms, and use "UTF\-8" to mean both the UTF\-8 found on ASCII platforms,
+and the UTF-EBCDIC found on EBCDIC ones.
+.PP
+You may see the term "invariant" character or code point.
+This simply means that the character has the same numeric
+value and representation when encoded in UTF\-8 (or UTF-EBCDIC) as when
+not. (Note that this is a very different concept from "The 13 variant
+characters" mentioned above. Careful prose will use the term "UTF\-8
+invariant" instead of just "invariant", but most often you'll see just
+"invariant".) For example, the ordinal value of "A" is 193 in most
+EBCDIC code pages, and also is 193 when encoded in UTF-EBCDIC. All
+UTF\-8 (or UTF-EBCDIC) variant code points occupy at least two bytes when
+encoded in UTF\-8 (or UTF-EBCDIC); by definition, the UTF\-8 (or
+UTF-EBCDIC) invariant code points are exactly one byte whether encoded
+in UTF\-8 (or UTF-EBCDIC), or not. (By now you see why people typically
+just say "UTF\-8" when they also mean "UTF-EBCDIC". For the rest of this
+document, we'll mostly be casual about it too.)
+In ASCII UTF\-8, the code points corresponding to the lowest 128
+ordinal numbers (0 \- 127: the ASCII characters) are invariant.
+In UTF-EBCDIC, there are 160 invariant characters.
+(If you care, the EBCDIC invariants are those characters
+which have ASCII equivalents, plus those that correspond to
+the C1 controls (128 \- 159 on ASCII platforms).)
+.PP
+A string encoded in UTF-EBCDIC may be longer (very rarely shorter) than
+one encoded in UTF\-8. Perl extends both UTF\-8 and UTF-EBCDIC so that
+they can encode code points above the Unicode maximum of U+10FFFF. Both
+extensions are constructed to allow encoding of any code point that fits
+in a 64\-bit word.
+.PP
+UTF-EBCDIC is defined by
+Unicode Technical Report #16 <https://www.unicode.org/reports/tr16>
+(often referred to as just TR16).
+It is defined based on CCSID 1047, not allowing for the differences for
+other code pages. This allows for easy interchange of text between
+computers running different code pages, but makes it unusable, without
+adaptation, for Perl on those other code pages.
+.PP
+The reason for this unusability is that a fundamental assumption of Perl
+is that the characters it cares about for parsing and lexical analysis
+are the same whether or not the text is in UTF\-8. For example, Perl
+expects the character \f(CW"["\fR to have the same representation, no matter
+if the string containing it (or program text) is UTF\-8 encoded or not.
+To ensure this, Perl adapts UTF-EBCDIC to the particular code page so
+that all characters it expects to be UTF\-8 invariant are in fact UTF\-8
+invariant. This means that text generated on a computer running one
+version of Perl's UTF-EBCDIC has to be translated to be intelligible to
+a computer running another.
+.PP
+TR16 implies a method to extend UTF-EBCDIC to encode points up through
+\&\f(CW\*(C`2\ **\ 31\ \-\ 1\*(C'\fR. Perl uses this method for code points up through
+\&\f(CW\*(C`2\ **\ 30\ \-\ 1\*(C'\fR, but uses an incompatible method for larger ones, to
+enable it to handle much larger code points than otherwise.
+.SS "Using Encode"
+.IX Subsection "Using Encode"
+Starting from Perl 5.8 you can use the standard module Encode
+to translate from EBCDIC to Latin\-1 code points.
+Encode knows about more EBCDIC character sets than Perl can currently
+be compiled to run on.
+.PP
+.Vb 1
+\& use Encode \*(Aqfrom_to\*(Aq;
+\&
+\& my %ebcdic = ( 176 => \*(Aqcp37\*(Aq, 95 => \*(Aqcp1047\*(Aq, 106 => \*(Aqposix\-bc\*(Aq );
+\&
+\& # $a is in EBCDIC code points
+\& from_to($a, $ebcdic{ord \*(Aq^\*(Aq}, \*(Aqlatin1\*(Aq);
+\& # $a is ISO 8859\-1 code points
+.Ve
+.PP
+and from Latin\-1 code points to EBCDIC code points
+.PP
+.Vb 1
+\& use Encode \*(Aqfrom_to\*(Aq;
+\&
+\& my %ebcdic = ( 176 => \*(Aqcp37\*(Aq, 95 => \*(Aqcp1047\*(Aq, 106 => \*(Aqposix\-bc\*(Aq );
+\&
+\& # $a is ISO 8859\-1 code points
+\& from_to($a, \*(Aqlatin1\*(Aq, $ebcdic{ord \*(Aq^\*(Aq});
+\& # $a is in EBCDIC code points
+.Ve
+.PP
+For doing I/O it is suggested that you use the autotranslating features
+of PerlIO, see perluniintro.
+.PP
+Since version 5.8 Perl uses the PerlIO I/O library. This enables
+you to use different encodings per IO channel. For example you may use
+.PP
+.Vb 9
+\& use Encode;
+\& open($f, ">:encoding(ascii)", "test.ascii");
+\& print $f "Hello World!\en";
+\& open($f, ">:encoding(cp37)", "test.ebcdic");
+\& print $f "Hello World!\en";
+\& open($f, ">:encoding(latin1)", "test.latin1");
+\& print $f "Hello World!\en";
+\& open($f, ">:encoding(utf8)", "test.utf8");
+\& print $f "Hello World!\en";
+.Ve
+.PP
+to get four files containing "Hello World!\en" in ASCII, CP 0037 EBCDIC,
+ISO 8859\-1 (Latin\-1) (in this example identical to ASCII since only ASCII
+characters were printed), and
+UTF-EBCDIC (in this example identical to normal EBCDIC since only characters
+that don't differ between EBCDIC and UTF-EBCDIC were printed). See the
+documentation of Encode::PerlIO for details.
+.PP
+As the PerlIO layer uses raw IO (bytes) internally, all this totally
+ignores things like the type of your filesystem (ASCII or EBCDIC).
+.SH "SINGLE OCTET TABLES"
+.IX Header "SINGLE OCTET TABLES"
+The following tables list the ASCII and Latin 1 ordered sets including
+the subsets: C0 controls (0..31), ASCII graphics (32..7e), delete (7f),
+C1 controls (80..9f), and Latin\-1 (a.k.a. ISO 8859\-1) (a0..ff). In the
+table names of the Latin 1
+extensions to ASCII have been labelled with character names roughly
+corresponding to \fIThe Unicode Standard, Version 6.1\fR albeit with
+substitutions such as \f(CW\*(C`s/LATIN//\*(C'\fR and \f(CW\*(C`s/VULGAR//\*(C'\fR in all cases;
+\&\f(CW\*(C`s/CAPITAL\ LETTER//\*(C'\fR in some cases; and
+\&\f(CW\*(C`s/SMALL\ LETTER\ ([A\-Z])/\el$1/\*(C'\fR in some other
+cases. Controls are listed using their Unicode 6.2 abbreviations.
+The differences between the 0037 and 1047 sets are
+flagged with \f(CW\*(C`**\*(C'\fR. The differences between the 1047 and POSIX-BC sets
+are flagged with \f(CW\*(C`##.\*(C'\fR All \f(CWord()\fR numbers listed are decimal. If you
+would rather see this table listing octal values, then run the table
+(that is, the pod source text of this document, since this recipe may not
+work with a pod2_other_format translation) through:
+.IP "recipe 0" 4
+.IX Item "recipe 0"
+.PP
+.Vb 3
+\& perl \-ne \*(Aqif(/(.{29})(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)/)\*(Aq \e
+\& \-e \*(Aq{printf("%s%\-5.03o%\-5.03o%\-5.03o%.03o\en",$1,$2,$3,$4,$5)}\*(Aq \e
+\& perlebcdic.pod
+.Ve
+.PP
+If you want to retain the UTF-x code points then in script form you
+might want to write:
+.IP "recipe 1" 4
+.IX Item "recipe 1"
+.PP
+.Vb 10
+\& open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";
+\& while (<FH>) {
+\& if (/(.{29})(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)\e.?(\ed*)
+\& \es+(\ed+)\e.?(\ed*)/x)
+\& {
+\& if ($7 ne \*(Aq\*(Aq && $9 ne \*(Aq\*(Aq) {
+\& printf(
+\& "%s%\-5.03o%\-5.03o%\-5.03o%\-5.03o%\-3o.%\-5o%\-3o.%.03o\en",
+\& $1,$2,$3,$4,$5,$6,$7,$8,$9);
+\& }
+\& elsif ($7 ne \*(Aq\*(Aq) {
+\& printf("%s%\-5.03o%\-5.03o%\-5.03o%\-5.03o%\-3o.%\-5o%.03o\en",
+\& $1,$2,$3,$4,$5,$6,$7,$8);
+\& }
+\& else {
+\& printf("%s%\-5.03o%\-5.03o%\-5.03o%\-5.03o%\-5.03o%.03o\en",
+\& $1,$2,$3,$4,$5,$6,$8);
+\& }
+\& }
+\& }
+.Ve
+.PP
+If you would rather see this table listing hexadecimal values then
+run the table through:
+.IP "recipe 2" 4
+.IX Item "recipe 2"
+.PP
+.Vb 3
+\& perl \-ne \*(Aqif(/(.{29})(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)/)\*(Aq \e
+\& \-e \*(Aq{printf("%s%\-5.02X%\-5.02X%\-5.02X%.02X\en",$1,$2,$3,$4,$5)}\*(Aq \e
+\& perlebcdic.pod
+.Ve
+.PP
+Or, in order to retain the UTF-x code points in hexadecimal:
+.IP "recipe 3" 4
+.IX Item "recipe 3"
+.PP
+.Vb 10
+\& open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";
+\& while (<FH>) {
+\& if (/(.{29})(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)\e.?(\ed*)
+\& \es+(\ed+)\e.?(\ed*)/x)
+\& {
+\& if ($7 ne \*(Aq\*(Aq && $9 ne \*(Aq\*(Aq) {
+\& printf(
+\& "%s%\-5.02X%\-5.02X%\-5.02X%\-5.02X%\-2X.%\-6.02X%02X.%02X\en",
+\& $1,$2,$3,$4,$5,$6,$7,$8,$9);
+\& }
+\& elsif ($7 ne \*(Aq\*(Aq) {
+\& printf("%s%\-5.02X%\-5.02X%\-5.02X%\-5.02X%\-2X.%\-6.02X%02X\en",
+\& $1,$2,$3,$4,$5,$6,$7,$8);
+\& }
+\& else {
+\& printf("%s%\-5.02X%\-5.02X%\-5.02X%\-5.02X%\-5.02X%02X\en",
+\& $1,$2,$3,$4,$5,$6,$8);
+\& }
+\& }
+\& }
+\&
+\&
+\& ISO
+\& 8859\-1 POS\- CCSID
+\& CCSID CCSID CCSID IX\- 1047
+\& chr 0819 0037 1047 BC UTF\-8 UTF\-EBCDIC
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& <NUL> 0 0 0 0 0 0
+\& <SOH> 1 1 1 1 1 1
+\& <STX> 2 2 2 2 2 2
+\& <ETX> 3 3 3 3 3 3
+\& <EOT> 4 55 55 55 4 55
+\& <ENQ> 5 45 45 45 5 45
+\& <ACK> 6 46 46 46 6 46
+\& <BEL> 7 47 47 47 7 47
+\& <BS> 8 22 22 22 8 22
+\& <HT> 9 5 5 5 9 5
+\& <LF> 10 37 21 21 10 21 **
+\& <VT> 11 11 11 11 11 11
+\& <FF> 12 12 12 12 12 12
+\& <CR> 13 13 13 13 13 13
+\& <SO> 14 14 14 14 14 14
+\& <SI> 15 15 15 15 15 15
+\& <DLE> 16 16 16 16 16 16
+\& <DC1> 17 17 17 17 17 17
+\& <DC2> 18 18 18 18 18 18
+\& <DC3> 19 19 19 19 19 19
+\& <DC4> 20 60 60 60 20 60
+\& <NAK> 21 61 61 61 21 61
+\& <SYN> 22 50 50 50 22 50
+\& <ETB> 23 38 38 38 23 38
+\& <CAN> 24 24 24 24 24 24
+\& <EOM> 25 25 25 25 25 25
+\& <SUB> 26 63 63 63 26 63
+\& <ESC> 27 39 39 39 27 39
+\& <FS> 28 28 28 28 28 28
+\& <GS> 29 29 29 29 29 29
+\& <RS> 30 30 30 30 30 30
+\& <US> 31 31 31 31 31 31
+\& <SPACE> 32 64 64 64 32 64
+\& ! 33 90 90 90 33 90
+\& " 34 127 127 127 34 127
+\& # 35 123 123 123 35 123
+\& $ 36 91 91 91 36 91
+\& % 37 108 108 108 37 108
+\& & 38 80 80 80 38 80
+\& \*(Aq 39 125 125 125 39 125
+\& ( 40 77 77 77 40 77
+\& ) 41 93 93 93 41 93
+\& * 42 92 92 92 42 92
+\& + 43 78 78 78 43 78
+\& , 44 107 107 107 44 107
+\& \- 45 96 96 96 45 96
+\& . 46 75 75 75 46 75
+\& / 47 97 97 97 47 97
+\& 0 48 240 240 240 48 240
+\& 1 49 241 241 241 49 241
+\& 2 50 242 242 242 50 242
+\& 3 51 243 243 243 51 243
+\& 4 52 244 244 244 52 244
+\& 5 53 245 245 245 53 245
+\& 6 54 246 246 246 54 246
+\& 7 55 247 247 247 55 247
+\& 8 56 248 248 248 56 248
+\& 9 57 249 249 249 57 249
+\& : 58 122 122 122 58 122
+\& ; 59 94 94 94 59 94
+\& < 60 76 76 76 60 76
+\& = 61 126 126 126 61 126
+\& > 62 110 110 110 62 110
+\& ? 63 111 111 111 63 111
+\& @ 64 124 124 124 64 124
+\& A 65 193 193 193 65 193
+\& B 66 194 194 194 66 194
+\& C 67 195 195 195 67 195
+\& D 68 196 196 196 68 196
+\& E 69 197 197 197 69 197
+\& F 70 198 198 198 70 198
+\& G 71 199 199 199 71 199
+\& H 72 200 200 200 72 200
+\& I 73 201 201 201 73 201
+\& J 74 209 209 209 74 209
+\& K 75 210 210 210 75 210
+\& L 76 211 211 211 76 211
+\& M 77 212 212 212 77 212
+\& N 78 213 213 213 78 213
+\& O 79 214 214 214 79 214
+\& P 80 215 215 215 80 215
+\& Q 81 216 216 216 81 216
+\& R 82 217 217 217 82 217
+\& S 83 226 226 226 83 226
+\& T 84 227 227 227 84 227
+\& U 85 228 228 228 85 228
+\& V 86 229 229 229 86 229
+\& W 87 230 230 230 87 230
+\& X 88 231 231 231 88 231
+\& Y 89 232 232 232 89 232
+\& Z 90 233 233 233 90 233
+\& [ 91 186 173 187 91 173 ** ##
+\& \e 92 224 224 188 92 224 ##
+\& ] 93 187 189 189 93 189 **
+\& ^ 94 176 95 106 94 95 ** ##
+\& _ 95 109 109 109 95 109
+\& \` 96 121 121 74 96 121 ##
+\& a 97 129 129 129 97 129
+\& b 98 130 130 130 98 130
+\& c 99 131 131 131 99 131
+\& d 100 132 132 132 100 132
+\& e 101 133 133 133 101 133
+\& f 102 134 134 134 102 134
+\& g 103 135 135 135 103 135
+\& h 104 136 136 136 104 136
+\& i 105 137 137 137 105 137
+\& j 106 145 145 145 106 145
+\& k 107 146 146 146 107 146
+\& l 108 147 147 147 108 147
+\& m 109 148 148 148 109 148
+\& n 110 149 149 149 110 149
+\& o 111 150 150 150 111 150
+\& p 112 151 151 151 112 151
+\& q 113 152 152 152 113 152
+\& r 114 153 153 153 114 153
+\& s 115 162 162 162 115 162
+\& t 116 163 163 163 116 163
+\& u 117 164 164 164 117 164
+\& v 118 165 165 165 118 165
+\& w 119 166 166 166 119 166
+\& x 120 167 167 167 120 167
+\& y 121 168 168 168 121 168
+\& z 122 169 169 169 122 169
+\& { 123 192 192 251 123 192 ##
+\& | 124 79 79 79 124 79
+\& } 125 208 208 253 125 208 ##
+\& ~ 126 161 161 255 126 161 ##
+\& <DEL> 127 7 7 7 127 7
+\& <PAD> 128 32 32 32 194.128 32
+\& <HOP> 129 33 33 33 194.129 33
+\& <BPH> 130 34 34 34 194.130 34
+\& <NBH> 131 35 35 35 194.131 35
+\& <IND> 132 36 36 36 194.132 36
+\& <NEL> 133 21 37 37 194.133 37 **
+\& <SSA> 134 6 6 6 194.134 6
+\& <ESA> 135 23 23 23 194.135 23
+\& <HTS> 136 40 40 40 194.136 40
+\& <HTJ> 137 41 41 41 194.137 41
+\& <VTS> 138 42 42 42 194.138 42
+\& <PLD> 139 43 43 43 194.139 43
+\& <PLU> 140 44 44 44 194.140 44
+\& <RI> 141 9 9 9 194.141 9
+\& <SS2> 142 10 10 10 194.142 10
+\& <SS3> 143 27 27 27 194.143 27
+\& <DCS> 144 48 48 48 194.144 48
+\& <PU1> 145 49 49 49 194.145 49
+\& <PU2> 146 26 26 26 194.146 26
+\& <STS> 147 51 51 51 194.147 51
+\& <CCH> 148 52 52 52 194.148 52
+\& <MW> 149 53 53 53 194.149 53
+\& <SPA> 150 54 54 54 194.150 54
+\& <EPA> 151 8 8 8 194.151 8
+\& <SOS> 152 56 56 56 194.152 56
+\& <SGC> 153 57 57 57 194.153 57
+\& <SCI> 154 58 58 58 194.154 58
+\& <CSI> 155 59 59 59 194.155 59
+\& <ST> 156 4 4 4 194.156 4
+\& <OSC> 157 20 20 20 194.157 20
+\& <PM> 158 62 62 62 194.158 62
+\& <APC> 159 255 255 95 194.159 255 ##
+\& <NON\-BREAKING SPACE> 160 65 65 65 194.160 128.65
+\& <INVERTED "!" > 161 170 170 170 194.161 128.66
+\& <CENT SIGN> 162 74 74 176 194.162 128.67 ##
+\& <POUND SIGN> 163 177 177 177 194.163 128.68
+\& <CURRENCY SIGN> 164 159 159 159 194.164 128.69
+\& <YEN SIGN> 165 178 178 178 194.165 128.70
+\& <BROKEN BAR> 166 106 106 208 194.166 128.71 ##
+\& <SECTION SIGN> 167 181 181 181 194.167 128.72
+\& <DIAERESIS> 168 189 187 121 194.168 128.73 ** ##
+\& <COPYRIGHT SIGN> 169 180 180 180 194.169 128.74
+\& <FEMININE ORDINAL> 170 154 154 154 194.170 128.81
+\& <LEFT POINTING GUILLEMET> 171 138 138 138 194.171 128.82
+\& <NOT SIGN> 172 95 176 186 194.172 128.83 ** ##
+\& <SOFT HYPHEN> 173 202 202 202 194.173 128.84
+\& <REGISTERED TRADE MARK> 174 175 175 175 194.174 128.85
+\& <MACRON> 175 188 188 161 194.175 128.86 ##
+\& <DEGREE SIGN> 176 144 144 144 194.176 128.87
+\& <PLUS\-OR\-MINUS SIGN> 177 143 143 143 194.177 128.88
+\& <SUPERSCRIPT TWO> 178 234 234 234 194.178 128.89
+\& <SUPERSCRIPT THREE> 179 250 250 250 194.179 128.98
+\& <ACUTE ACCENT> 180 190 190 190 194.180 128.99
+\& <MICRO SIGN> 181 160 160 160 194.181 128.100
+\& <PARAGRAPH SIGN> 182 182 182 182 194.182 128.101
+\& <MIDDLE DOT> 183 179 179 179 194.183 128.102
+\& <CEDILLA> 184 157 157 157 194.184 128.103
+\& <SUPERSCRIPT ONE> 185 218 218 218 194.185 128.104
+\& <MASC. ORDINAL INDICATOR> 186 155 155 155 194.186 128.105
+\& <RIGHT POINTING GUILLEMET> 187 139 139 139 194.187 128.106
+\& <FRACTION ONE QUARTER> 188 183 183 183 194.188 128.112
+\& <FRACTION ONE HALF> 189 184 184 184 194.189 128.113
+\& <FRACTION THREE QUARTERS> 190 185 185 185 194.190 128.114
+\& <INVERTED QUESTION MARK> 191 171 171 171 194.191 128.115
+\& <A WITH GRAVE> 192 100 100 100 195.128 138.65
+\& <A WITH ACUTE> 193 101 101 101 195.129 138.66
+\& <A WITH CIRCUMFLEX> 194 98 98 98 195.130 138.67
+\& <A WITH TILDE> 195 102 102 102 195.131 138.68
+\& <A WITH DIAERESIS> 196 99 99 99 195.132 138.69
+\& <A WITH RING ABOVE> 197 103 103 103 195.133 138.70
+\& <CAPITAL LIGATURE AE> 198 158 158 158 195.134 138.71
+\& <C WITH CEDILLA> 199 104 104 104 195.135 138.72
+\& <E WITH GRAVE> 200 116 116 116 195.136 138.73
+\& <E WITH ACUTE> 201 113 113 113 195.137 138.74
+\& <E WITH CIRCUMFLEX> 202 114 114 114 195.138 138.81
+\& <E WITH DIAERESIS> 203 115 115 115 195.139 138.82
+\& <I WITH GRAVE> 204 120 120 120 195.140 138.83
+\& <I WITH ACUTE> 205 117 117 117 195.141 138.84
+\& <I WITH CIRCUMFLEX> 206 118 118 118 195.142 138.85
+\& <I WITH DIAERESIS> 207 119 119 119 195.143 138.86
+\& <CAPITAL LETTER ETH> 208 172 172 172 195.144 138.87
+\& <N WITH TILDE> 209 105 105 105 195.145 138.88
+\& <O WITH GRAVE> 210 237 237 237 195.146 138.89
+\& <O WITH ACUTE> 211 238 238 238 195.147 138.98
+\& <O WITH CIRCUMFLEX> 212 235 235 235 195.148 138.99
+\& <O WITH TILDE> 213 239 239 239 195.149 138.100
+\& <O WITH DIAERESIS> 214 236 236 236 195.150 138.101
+\& <MULTIPLICATION SIGN> 215 191 191 191 195.151 138.102
+\& <O WITH STROKE> 216 128 128 128 195.152 138.103
+\& <U WITH GRAVE> 217 253 253 224 195.153 138.104 ##
+\& <U WITH ACUTE> 218 254 254 254 195.154 138.105
+\& <U WITH CIRCUMFLEX> 219 251 251 221 195.155 138.106 ##
+\& <U WITH DIAERESIS> 220 252 252 252 195.156 138.112
+\& <Y WITH ACUTE> 221 173 186 173 195.157 138.113 ** ##
+\& <CAPITAL LETTER THORN> 222 174 174 174 195.158 138.114
+\& <SMALL LETTER SHARP S> 223 89 89 89 195.159 138.115
+\& <a WITH GRAVE> 224 68 68 68 195.160 139.65
+\& <a WITH ACUTE> 225 69 69 69 195.161 139.66
+\& <a WITH CIRCUMFLEX> 226 66 66 66 195.162 139.67
+\& <a WITH TILDE> 227 70 70 70 195.163 139.68
+\& <a WITH DIAERESIS> 228 67 67 67 195.164 139.69
+\& <a WITH RING ABOVE> 229 71 71 71 195.165 139.70
+\& <SMALL LIGATURE ae> 230 156 156 156 195.166 139.71
+\& <c WITH CEDILLA> 231 72 72 72 195.167 139.72
+\& <e WITH GRAVE> 232 84 84 84 195.168 139.73
+\& <e WITH ACUTE> 233 81 81 81 195.169 139.74
+\& <e WITH CIRCUMFLEX> 234 82 82 82 195.170 139.81
+\& <e WITH DIAERESIS> 235 83 83 83 195.171 139.82
+\& <i WITH GRAVE> 236 88 88 88 195.172 139.83
+\& <i WITH ACUTE> 237 85 85 85 195.173 139.84
+\& <i WITH CIRCUMFLEX> 238 86 86 86 195.174 139.85
+\& <i WITH DIAERESIS> 239 87 87 87 195.175 139.86
+\& <SMALL LETTER eth> 240 140 140 140 195.176 139.87
+\& <n WITH TILDE> 241 73 73 73 195.177 139.88
+\& <o WITH GRAVE> 242 205 205 205 195.178 139.89
+\& <o WITH ACUTE> 243 206 206 206 195.179 139.98
+\& <o WITH CIRCUMFLEX> 244 203 203 203 195.180 139.99
+\& <o WITH TILDE> 245 207 207 207 195.181 139.100
+\& <o WITH DIAERESIS> 246 204 204 204 195.182 139.101
+\& <DIVISION SIGN> 247 225 225 225 195.183 139.102
+\& <o WITH STROKE> 248 112 112 112 195.184 139.103
+\& <u WITH GRAVE> 249 221 221 192 195.185 139.104 ##
+\& <u WITH ACUTE> 250 222 222 222 195.186 139.105
+\& <u WITH CIRCUMFLEX> 251 219 219 219 195.187 139.106
+\& <u WITH DIAERESIS> 252 220 220 220 195.188 139.112
+\& <y WITH ACUTE> 253 141 141 141 195.189 139.113
+\& <SMALL LETTER thorn> 254 142 142 142 195.190 139.114
+\& <y WITH DIAERESIS> 255 223 223 223 195.191 139.115
+.Ve
+.PP
+If you would rather see the above table in CCSID 0037 order rather than
+ASCII + Latin\-1 order then run the table through:
+.IP "recipe 4" 4
+.IX Item "recipe 4"
+.PP
+.Vb 6
+\& perl \e
+\& \-ne \*(Aqif(/.{29}\ed{1,3}\es{2,4}\ed{1,3}\es{2,4}\ed{1,3}\es{2,4}\ed{1,3}/)\*(Aq\e
+\& \-e \*(Aq{push(@l,$_)}\*(Aq \e
+\& \-e \*(AqEND{print map{$_\->[0]}\*(Aq \e
+\& \-e \*(Aq sort{$a\->[1] <=> $b\->[1]}\*(Aq \e
+\& \-e \*(Aq map{[$_,substr($_,34,3)]}@l;}\*(Aq perlebcdic.pod
+.Ve
+.PP
+If you would rather see it in CCSID 1047 order then change the number
+34 in the last line to 39, like this:
+.IP "recipe 5" 4
+.IX Item "recipe 5"
+.PP
+.Vb 6
+\& perl \e
+\& \-ne \*(Aqif(/.{29}\ed{1,3}\es{2,4}\ed{1,3}\es{2,4}\ed{1,3}\es{2,4}\ed{1,3}/)\*(Aq\e
+\& \-e \*(Aq{push(@l,$_)}\*(Aq \e
+\& \-e \*(AqEND{print map{$_\->[0]}\*(Aq \e
+\& \-e \*(Aq sort{$a\->[1] <=> $b\->[1]}\*(Aq \e
+\& \-e \*(Aq map{[$_,substr($_,39,3)]}@l;}\*(Aq perlebcdic.pod
+.Ve
+.PP
+If you would rather see it in POSIX-BC order then change the number
+34 in the last line to 44, like this:
+.IP "recipe 6" 4
+.IX Item "recipe 6"
+.PP
+.Vb 6
+\& perl \e
+\& \-ne \*(Aqif(/.{29}\ed{1,3}\es{2,4}\ed{1,3}\es{2,4}\ed{1,3}\es{2,4}\ed{1,3}/)\*(Aq\e
+\& \-e \*(Aq{push(@l,$_)}\*(Aq \e
+\& \-e \*(AqEND{print map{$_\->[0]}\*(Aq \e
+\& \-e \*(Aq sort{$a\->[1] <=> $b\->[1]}\*(Aq \e
+\& \-e \*(Aq map{[$_,substr($_,44,3)]}@l;}\*(Aq perlebcdic.pod
+.Ve
+.SS "Table in hex, sorted in 1047 order"
+.IX Subsection "Table in hex, sorted in 1047 order"
+Since this document was first written, the convention has become more
+and more to use hexadecimal notation for code points. To do this with
+the recipes and to also sort is a multi-step process, so here, for
+convenience, is the table from above, re-sorted to be in Code Page 1047
+order, and using hex notation.
+.PP
+.Vb 10
+\& ISO
+\& 8859\-1 POS\- CCSID
+\& CCSID CCSID CCSID IX\- 1047
+\& chr 0819 0037 1047 BC UTF\-8 UTF\-EBCDIC
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& <NUL> 00 00 00 00 00 00
+\& <SOH> 01 01 01 01 01 01
+\& <STX> 02 02 02 02 02 02
+\& <ETX> 03 03 03 03 03 03
+\& <ST> 9C 04 04 04 C2.9C 04
+\& <HT> 09 05 05 05 09 05
+\& <SSA> 86 06 06 06 C2.86 06
+\& <DEL> 7F 07 07 07 7F 07
+\& <EPA> 97 08 08 08 C2.97 08
+\& <RI> 8D 09 09 09 C2.8D 09
+\& <SS2> 8E 0A 0A 0A C2.8E 0A
+\& <VT> 0B 0B 0B 0B 0B 0B
+\& <FF> 0C 0C 0C 0C 0C 0C
+\& <CR> 0D 0D 0D 0D 0D 0D
+\& <SO> 0E 0E 0E 0E 0E 0E
+\& <SI> 0F 0F 0F 0F 0F 0F
+\& <DLE> 10 10 10 10 10 10
+\& <DC1> 11 11 11 11 11 11
+\& <DC2> 12 12 12 12 12 12
+\& <DC3> 13 13 13 13 13 13
+\& <OSC> 9D 14 14 14 C2.9D 14
+\& <LF> 0A 25 15 15 0A 15 **
+\& <BS> 08 16 16 16 08 16
+\& <ESA> 87 17 17 17 C2.87 17
+\& <CAN> 18 18 18 18 18 18
+\& <EOM> 19 19 19 19 19 19
+\& <PU2> 92 1A 1A 1A C2.92 1A
+\& <SS3> 8F 1B 1B 1B C2.8F 1B
+\& <FS> 1C 1C 1C 1C 1C 1C
+\& <GS> 1D 1D 1D 1D 1D 1D
+\& <RS> 1E 1E 1E 1E 1E 1E
+\& <US> 1F 1F 1F 1F 1F 1F
+\& <PAD> 80 20 20 20 C2.80 20
+\& <HOP> 81 21 21 21 C2.81 21
+\& <BPH> 82 22 22 22 C2.82 22
+\& <NBH> 83 23 23 23 C2.83 23
+\& <IND> 84 24 24 24 C2.84 24
+\& <NEL> 85 15 25 25 C2.85 25 **
+\& <ETB> 17 26 26 26 17 26
+\& <ESC> 1B 27 27 27 1B 27
+\& <HTS> 88 28 28 28 C2.88 28
+\& <HTJ> 89 29 29 29 C2.89 29
+\& <VTS> 8A 2A 2A 2A C2.8A 2A
+\& <PLD> 8B 2B 2B 2B C2.8B 2B
+\& <PLU> 8C 2C 2C 2C C2.8C 2C
+\& <ENQ> 05 2D 2D 2D 05 2D
+\& <ACK> 06 2E 2E 2E 06 2E
+\& <BEL> 07 2F 2F 2F 07 2F
+\& <DCS> 90 30 30 30 C2.90 30
+\& <PU1> 91 31 31 31 C2.91 31
+\& <SYN> 16 32 32 32 16 32
+\& <STS> 93 33 33 33 C2.93 33
+\& <CCH> 94 34 34 34 C2.94 34
+\& <MW> 95 35 35 35 C2.95 35
+\& <SPA> 96 36 36 36 C2.96 36
+\& <EOT> 04 37 37 37 04 37
+\& <SOS> 98 38 38 38 C2.98 38
+\& <SGC> 99 39 39 39 C2.99 39
+\& <SCI> 9A 3A 3A 3A C2.9A 3A
+\& <CSI> 9B 3B 3B 3B C2.9B 3B
+\& <DC4> 14 3C 3C 3C 14 3C
+\& <NAK> 15 3D 3D 3D 15 3D
+\& <PM> 9E 3E 3E 3E C2.9E 3E
+\& <SUB> 1A 3F 3F 3F 1A 3F
+\& <SPACE> 20 40 40 40 20 40
+\& <NON\-BREAKING SPACE> A0 41 41 41 C2.A0 80.41
+\& <a WITH CIRCUMFLEX> E2 42 42 42 C3.A2 8B.43
+\& <a WITH DIAERESIS> E4 43 43 43 C3.A4 8B.45
+\& <a WITH GRAVE> E0 44 44 44 C3.A0 8B.41
+\& <a WITH ACUTE> E1 45 45 45 C3.A1 8B.42
+\& <a WITH TILDE> E3 46 46 46 C3.A3 8B.44
+\& <a WITH RING ABOVE> E5 47 47 47 C3.A5 8B.46
+\& <c WITH CEDILLA> E7 48 48 48 C3.A7 8B.48
+\& <n WITH TILDE> F1 49 49 49 C3.B1 8B.58
+\& <CENT SIGN> A2 4A 4A B0 C2.A2 80.43 ##
+\& . 2E 4B 4B 4B 2E 4B
+\& < 3C 4C 4C 4C 3C 4C
+\& ( 28 4D 4D 4D 28 4D
+\& + 2B 4E 4E 4E 2B 4E
+\& | 7C 4F 4F 4F 7C 4F
+\& & 26 50 50 50 26 50
+\& <e WITH ACUTE> E9 51 51 51 C3.A9 8B.4A
+\& <e WITH CIRCUMFLEX> EA 52 52 52 C3.AA 8B.51
+\& <e WITH DIAERESIS> EB 53 53 53 C3.AB 8B.52
+\& <e WITH GRAVE> E8 54 54 54 C3.A8 8B.49
+\& <i WITH ACUTE> ED 55 55 55 C3.AD 8B.54
+\& <i WITH CIRCUMFLEX> EE 56 56 56 C3.AE 8B.55
+\& <i WITH DIAERESIS> EF 57 57 57 C3.AF 8B.56
+\& <i WITH GRAVE> EC 58 58 58 C3.AC 8B.53
+\& <SMALL LETTER SHARP S> DF 59 59 59 C3.9F 8A.73
+\& ! 21 5A 5A 5A 21 5A
+\& $ 24 5B 5B 5B 24 5B
+\& * 2A 5C 5C 5C 2A 5C
+\& ) 29 5D 5D 5D 29 5D
+\& ; 3B 5E 5E 5E 3B 5E
+\& ^ 5E B0 5F 6A 5E 5F ** ##
+\& \- 2D 60 60 60 2D 60
+\& / 2F 61 61 61 2F 61
+\& <A WITH CIRCUMFLEX> C2 62 62 62 C3.82 8A.43
+\& <A WITH DIAERESIS> C4 63 63 63 C3.84 8A.45
+\& <A WITH GRAVE> C0 64 64 64 C3.80 8A.41
+\& <A WITH ACUTE> C1 65 65 65 C3.81 8A.42
+\& <A WITH TILDE> C3 66 66 66 C3.83 8A.44
+\& <A WITH RING ABOVE> C5 67 67 67 C3.85 8A.46
+\& <C WITH CEDILLA> C7 68 68 68 C3.87 8A.48
+\& <N WITH TILDE> D1 69 69 69 C3.91 8A.58
+\& <BROKEN BAR> A6 6A 6A D0 C2.A6 80.47 ##
+\& , 2C 6B 6B 6B 2C 6B
+\& % 25 6C 6C 6C 25 6C
+\& _ 5F 6D 6D 6D 5F 6D
+\& > 3E 6E 6E 6E 3E 6E
+\& ? 3F 6F 6F 6F 3F 6F
+\& <o WITH STROKE> F8 70 70 70 C3.B8 8B.67
+\& <E WITH ACUTE> C9 71 71 71 C3.89 8A.4A
+\& <E WITH CIRCUMFLEX> CA 72 72 72 C3.8A 8A.51
+\& <E WITH DIAERESIS> CB 73 73 73 C3.8B 8A.52
+\& <E WITH GRAVE> C8 74 74 74 C3.88 8A.49
+\& <I WITH ACUTE> CD 75 75 75 C3.8D 8A.54
+\& <I WITH CIRCUMFLEX> CE 76 76 76 C3.8E 8A.55
+\& <I WITH DIAERESIS> CF 77 77 77 C3.8F 8A.56
+\& <I WITH GRAVE> CC 78 78 78 C3.8C 8A.53
+\& \` 60 79 79 4A 60 79 ##
+\& : 3A 7A 7A 7A 3A 7A
+\& # 23 7B 7B 7B 23 7B
+\& @ 40 7C 7C 7C 40 7C
+\& \*(Aq 27 7D 7D 7D 27 7D
+\& = 3D 7E 7E 7E 3D 7E
+\& " 22 7F 7F 7F 22 7F
+\& <O WITH STROKE> D8 80 80 80 C3.98 8A.67
+\& a 61 81 81 81 61 81
+\& b 62 82 82 82 62 82
+\& c 63 83 83 83 63 83
+\& d 64 84 84 84 64 84
+\& e 65 85 85 85 65 85
+\& f 66 86 86 86 66 86
+\& g 67 87 87 87 67 87
+\& h 68 88 88 88 68 88
+\& i 69 89 89 89 69 89
+\& <LEFT POINTING GUILLEMET> AB 8A 8A 8A C2.AB 80.52
+\& <RIGHT POINTING GUILLEMET> BB 8B 8B 8B C2.BB 80.6A
+\& <SMALL LETTER eth> F0 8C 8C 8C C3.B0 8B.57
+\& <y WITH ACUTE> FD 8D 8D 8D C3.BD 8B.71
+\& <SMALL LETTER thorn> FE 8E 8E 8E C3.BE 8B.72
+\& <PLUS\-OR\-MINUS SIGN> B1 8F 8F 8F C2.B1 80.58
+\& <DEGREE SIGN> B0 90 90 90 C2.B0 80.57
+\& j 6A 91 91 91 6A 91
+\& k 6B 92 92 92 6B 92
+\& l 6C 93 93 93 6C 93
+\& m 6D 94 94 94 6D 94
+\& n 6E 95 95 95 6E 95
+\& o 6F 96 96 96 6F 96
+\& p 70 97 97 97 70 97
+\& q 71 98 98 98 71 98
+\& r 72 99 99 99 72 99
+\& <FEMININE ORDINAL> AA 9A 9A 9A C2.AA 80.51
+\& <MASC. ORDINAL INDICATOR> BA 9B 9B 9B C2.BA 80.69
+\& <SMALL LIGATURE ae> E6 9C 9C 9C C3.A6 8B.47
+\& <CEDILLA> B8 9D 9D 9D C2.B8 80.67
+\& <CAPITAL LIGATURE AE> C6 9E 9E 9E C3.86 8A.47
+\& <CURRENCY SIGN> A4 9F 9F 9F C2.A4 80.45
+\& <MICRO SIGN> B5 A0 A0 A0 C2.B5 80.64
+\& ~ 7E A1 A1 FF 7E A1 ##
+\& s 73 A2 A2 A2 73 A2
+\& t 74 A3 A3 A3 74 A3
+\& u 75 A4 A4 A4 75 A4
+\& v 76 A5 A5 A5 76 A5
+\& w 77 A6 A6 A6 77 A6
+\& x 78 A7 A7 A7 78 A7
+\& y 79 A8 A8 A8 79 A8
+\& z 7A A9 A9 A9 7A A9
+\& <INVERTED "!" > A1 AA AA AA C2.A1 80.42
+\& <INVERTED QUESTION MARK> BF AB AB AB C2.BF 80.73
+\& <CAPITAL LETTER ETH> D0 AC AC AC C3.90 8A.57
+\& [ 5B BA AD BB 5B AD ** ##
+\& <CAPITAL LETTER THORN> DE AE AE AE C3.9E 8A.72
+\& <REGISTERED TRADE MARK> AE AF AF AF C2.AE 80.55
+\& <NOT SIGN> AC 5F B0 BA C2.AC 80.53 ** ##
+\& <POUND SIGN> A3 B1 B1 B1 C2.A3 80.44
+\& <YEN SIGN> A5 B2 B2 B2 C2.A5 80.46
+\& <MIDDLE DOT> B7 B3 B3 B3 C2.B7 80.66
+\& <COPYRIGHT SIGN> A9 B4 B4 B4 C2.A9 80.4A
+\& <SECTION SIGN> A7 B5 B5 B5 C2.A7 80.48
+\& <PARAGRAPH SIGN> B6 B6 B6 B6 C2.B6 80.65
+\& <FRACTION ONE QUARTER> BC B7 B7 B7 C2.BC 80.70
+\& <FRACTION ONE HALF> BD B8 B8 B8 C2.BD 80.71
+\& <FRACTION THREE QUARTERS> BE B9 B9 B9 C2.BE 80.72
+\& <Y WITH ACUTE> DD AD BA AD C3.9D 8A.71 ** ##
+\& <DIAERESIS> A8 BD BB 79 C2.A8 80.49 ** ##
+\& <MACRON> AF BC BC A1 C2.AF 80.56 ##
+\& ] 5D BB BD BD 5D BD **
+\& <ACUTE ACCENT> B4 BE BE BE C2.B4 80.63
+\& <MULTIPLICATION SIGN> D7 BF BF BF C3.97 8A.66
+\& { 7B C0 C0 FB 7B C0 ##
+\& A 41 C1 C1 C1 41 C1
+\& B 42 C2 C2 C2 42 C2
+\& C 43 C3 C3 C3 43 C3
+\& D 44 C4 C4 C4 44 C4
+\& E 45 C5 C5 C5 45 C5
+\& F 46 C6 C6 C6 46 C6
+\& G 47 C7 C7 C7 47 C7
+\& H 48 C8 C8 C8 48 C8
+\& I 49 C9 C9 C9 49 C9
+\& <SOFT HYPHEN> AD CA CA CA C2.AD 80.54
+\& <o WITH CIRCUMFLEX> F4 CB CB CB C3.B4 8B.63
+\& <o WITH DIAERESIS> F6 CC CC CC C3.B6 8B.65
+\& <o WITH GRAVE> F2 CD CD CD C3.B2 8B.59
+\& <o WITH ACUTE> F3 CE CE CE C3.B3 8B.62
+\& <o WITH TILDE> F5 CF CF CF C3.B5 8B.64
+\& } 7D D0 D0 FD 7D D0 ##
+\& J 4A D1 D1 D1 4A D1
+\& K 4B D2 D2 D2 4B D2
+\& L 4C D3 D3 D3 4C D3
+\& M 4D D4 D4 D4 4D D4
+\& N 4E D5 D5 D5 4E D5
+\& O 4F D6 D6 D6 4F D6
+\& P 50 D7 D7 D7 50 D7
+\& Q 51 D8 D8 D8 51 D8
+\& R 52 D9 D9 D9 52 D9
+\& <SUPERSCRIPT ONE> B9 DA DA DA C2.B9 80.68
+\& <u WITH CIRCUMFLEX> FB DB DB DB C3.BB 8B.6A
+\& <u WITH DIAERESIS> FC DC DC DC C3.BC 8B.70
+\& <u WITH GRAVE> F9 DD DD C0 C3.B9 8B.68 ##
+\& <u WITH ACUTE> FA DE DE DE C3.BA 8B.69
+\& <y WITH DIAERESIS> FF DF DF DF C3.BF 8B.73
+\& \e 5C E0 E0 BC 5C E0 ##
+\& <DIVISION SIGN> F7 E1 E1 E1 C3.B7 8B.66
+\& S 53 E2 E2 E2 53 E2
+\& T 54 E3 E3 E3 54 E3
+\& U 55 E4 E4 E4 55 E4
+\& V 56 E5 E5 E5 56 E5
+\& W 57 E6 E6 E6 57 E6
+\& X 58 E7 E7 E7 58 E7
+\& Y 59 E8 E8 E8 59 E8
+\& Z 5A E9 E9 E9 5A E9
+\& <SUPERSCRIPT TWO> B2 EA EA EA C2.B2 80.59
+\& <O WITH CIRCUMFLEX> D4 EB EB EB C3.94 8A.63
+\& <O WITH DIAERESIS> D6 EC EC EC C3.96 8A.65
+\& <O WITH GRAVE> D2 ED ED ED C3.92 8A.59
+\& <O WITH ACUTE> D3 EE EE EE C3.93 8A.62
+\& <O WITH TILDE> D5 EF EF EF C3.95 8A.64
+\& 0 30 F0 F0 F0 30 F0
+\& 1 31 F1 F1 F1 31 F1
+\& 2 32 F2 F2 F2 32 F2
+\& 3 33 F3 F3 F3 33 F3
+\& 4 34 F4 F4 F4 34 F4
+\& 5 35 F5 F5 F5 35 F5
+\& 6 36 F6 F6 F6 36 F6
+\& 7 37 F7 F7 F7 37 F7
+\& 8 38 F8 F8 F8 38 F8
+\& 9 39 F9 F9 F9 39 F9
+\& <SUPERSCRIPT THREE> B3 FA FA FA C2.B3 80.62
+\& <U WITH CIRCUMFLEX> DB FB FB DD C3.9B 8A.6A ##
+\& <U WITH DIAERESIS> DC FC FC FC C3.9C 8A.70
+\& <U WITH GRAVE> D9 FD FD E0 C3.99 8A.68 ##
+\& <U WITH ACUTE> DA FE FE FE C3.9A 8A.69
+\& <APC> 9F FF FF 5F C2.9F FF ##
+.Ve
+.SH "IDENTIFYING CHARACTER CODE SETS"
+.IX Header "IDENTIFYING CHARACTER CODE SETS"
+It is possible to determine which character set you are operating under.
+But first you need to be really really sure you need to do this. Your
+code will be simpler and probably just as portable if you don't have
+to test the character set and do different things, depending. There are
+actually only very few circumstances where it's not easy to write
+straight-line code portable to all character sets. See
+"Unicode and EBCDIC" in perluniintro for how to portably specify
+characters.
+.PP
+But there are some cases where you may want to know which character set
+you are running under. One possible example is doing
+sorting in inner loops where performance is critical.
+.PP
+To determine if you are running under ASCII or EBCDIC, you can use the
+return value of \f(CWord()\fR or \f(CWchr()\fR to test one or more character
+values. For example:
+.PP
+.Vb 4
+\& $is_ascii = "A" eq chr(65);
+\& $is_ebcdic = "A" eq chr(193);
+\& $is_ascii = ord("A") == 65;
+\& $is_ebcdic = ord("A") == 193;
+.Ve
+.PP
+There's even less need to distinguish between EBCDIC code pages, but to
+do so try looking at one or more of the characters that differ between
+them.
+.PP
+.Vb 4
+\& $is_ascii = ord(\*(Aq[\*(Aq) == 91;
+\& $is_ebcdic_37 = ord(\*(Aq[\*(Aq) == 186;
+\& $is_ebcdic_1047 = ord(\*(Aq[\*(Aq) == 173;
+\& $is_ebcdic_POSIX_BC = ord(\*(Aq[\*(Aq) == 187;
+.Ve
+.PP
+However, it would be unwise to write tests such as:
+.PP
+.Vb 2
+\& $is_ascii = "\er" ne chr(13); # WRONG
+\& $is_ascii = "\en" ne chr(10); # ILL ADVISED
+.Ve
+.PP
+Obviously the first of these will fail to distinguish most ASCII
+platforms from either a CCSID 0037, a 1047, or a POSIX-BC EBCDIC
+platform since \f(CW\*(C`"\er"\ eq\ chr(13)\*(C'\fR under all of those coded character
+sets. But note too that because \f(CW"\en"\fR is \f(CWchr(13)\fR and \f(CW"\er"\fR is
+\&\f(CWchr(10)\fR on old Macintosh (which is an ASCII platform) the second
+\&\f(CW$is_ascii\fR test will lead to trouble there.
+.PP
+To determine whether or not perl was built under an EBCDIC
+code page you can use the Config module like so:
+.PP
+.Vb 2
+\& use Config;
+\& $is_ebcdic = $Config{\*(Aqebcdic\*(Aq} eq \*(Aqdefine\*(Aq;
+.Ve
+.SH CONVERSIONS
+.IX Header "CONVERSIONS"
+.ie n .SS "utf8::unicode_to_native() and utf8::native_to_unicode()"
+.el .SS "\f(CWutf8::unicode_to_native()\fP and \f(CWutf8::native_to_unicode()\fP"
+.IX Subsection "utf8::unicode_to_native() and utf8::native_to_unicode()"
+These functions take an input numeric code point in one encoding and
+return what its equivalent value is in the other.
+.PP
+See utf8.
+.SS tr///
+.IX Subsection "tr///"
+In order to convert a string of characters from one character set to
+another a simple list of numbers, such as in the right columns in the
+above table, along with Perl's \f(CW\*(C`tr///\*(C'\fR operator is all that is needed.
+The data in the table are in ASCII/Latin1 order, hence the EBCDIC columns
+provide easy-to-use ASCII/Latin1 to EBCDIC operations that are also easily
+reversed.
+.PP
+For example, to convert ASCII/Latin1 to code page 037 take the output of the
+second numbers column from the output of recipe 2 (modified to add
+\&\f(CW"\e"\fR characters), and use it in \f(CW\*(C`tr///\*(C'\fR like so:
+.PP
+.Vb 10
+\& $cp_037 =
+\& \*(Aq\ex00\ex01\ex02\ex03\ex37\ex2D\ex2E\ex2F\ex16\ex05\ex25\ex0B\ex0C\ex0D\ex0E\ex0F\*(Aq .
+\& \*(Aq\ex10\ex11\ex12\ex13\ex3C\ex3D\ex32\ex26\ex18\ex19\ex3F\ex27\ex1C\ex1D\ex1E\ex1F\*(Aq .
+\& \*(Aq\ex40\ex5A\ex7F\ex7B\ex5B\ex6C\ex50\ex7D\ex4D\ex5D\ex5C\ex4E\ex6B\ex60\ex4B\ex61\*(Aq .
+\& \*(Aq\exF0\exF1\exF2\exF3\exF4\exF5\exF6\exF7\exF8\exF9\ex7A\ex5E\ex4C\ex7E\ex6E\ex6F\*(Aq .
+\& \*(Aq\ex7C\exC1\exC2\exC3\exC4\exC5\exC6\exC7\exC8\exC9\exD1\exD2\exD3\exD4\exD5\exD6\*(Aq .
+\& \*(Aq\exD7\exD8\exD9\exE2\exE3\exE4\exE5\exE6\exE7\exE8\exE9\exBA\exE0\exBB\exB0\ex6D\*(Aq .
+\& \*(Aq\ex79\ex81\ex82\ex83\ex84\ex85\ex86\ex87\ex88\ex89\ex91\ex92\ex93\ex94\ex95\ex96\*(Aq .
+\& \*(Aq\ex97\ex98\ex99\exA2\exA3\exA4\exA5\exA6\exA7\exA8\exA9\exC0\ex4F\exD0\exA1\ex07\*(Aq .
+\& \*(Aq\ex20\ex21\ex22\ex23\ex24\ex15\ex06\ex17\ex28\ex29\ex2A\ex2B\ex2C\ex09\ex0A\ex1B\*(Aq .
+\& \*(Aq\ex30\ex31\ex1A\ex33\ex34\ex35\ex36\ex08\ex38\ex39\ex3A\ex3B\ex04\ex14\ex3E\exFF\*(Aq .
+\& \*(Aq\ex41\exAA\ex4A\exB1\ex9F\exB2\ex6A\exB5\exBD\exB4\ex9A\ex8A\ex5F\exCA\exAF\exBC\*(Aq .
+\& \*(Aq\ex90\ex8F\exEA\exFA\exBE\exA0\exB6\exB3\ex9D\exDA\ex9B\ex8B\exB7\exB8\exB9\exAB\*(Aq .
+\& \*(Aq\ex64\ex65\ex62\ex66\ex63\ex67\ex9E\ex68\ex74\ex71\ex72\ex73\ex78\ex75\ex76\ex77\*(Aq .
+\& \*(Aq\exAC\ex69\exED\exEE\exEB\exEF\exEC\exBF\ex80\exFD\exFE\exFB\exFC\exAD\exAE\ex59\*(Aq .
+\& \*(Aq\ex44\ex45\ex42\ex46\ex43\ex47\ex9C\ex48\ex54\ex51\ex52\ex53\ex58\ex55\ex56\ex57\*(Aq .
+\& \*(Aq\ex8C\ex49\exCD\exCE\exCB\exCF\exCC\exE1\ex70\exDD\exDE\exDB\exDC\ex8D\ex8E\exDF\*(Aq;
+\&
+\& my $ebcdic_string = $ascii_string;
+\& eval \*(Aq$ebcdic_string =~ tr/\e000\-\e377/\*(Aq . $cp_037 . \*(Aq/\*(Aq;
+.Ve
+.PP
+To convert from EBCDIC 037 to ASCII just reverse the order of the tr///
+arguments like so:
+.PP
+.Vb 2
+\& my $ascii_string = $ebcdic_string;
+\& eval \*(Aq$ascii_string =~ tr/\*(Aq . $cp_037 . \*(Aq/\e000\-\e377/\*(Aq;
+.Ve
+.PP
+Similarly one could take the output of the third numbers column from recipe 2
+to obtain a \f(CW$cp_1047\fR table. The fourth numbers column of the output from
+recipe 2 could provide a \f(CW$cp_posix_bc\fR table suitable for transcoding as
+well.
+.PP
+If you wanted to see the inverse tables, you would first have to sort on the
+desired numbers column as in recipes 4, 5 or 6, then take the output of the
+first numbers column.
+.SS iconv
+.IX Subsection "iconv"
+XPG operability often implies the presence of an \fIiconv\fR utility
+available from the shell or from the C library. Consult your system's
+documentation for information on iconv.
+.PP
+On OS/390 or z/OS see the \fBiconv\fR\|(1) manpage. One way to invoke the \f(CW\*(C`iconv\*(C'\fR
+shell utility from within perl would be to:
+.PP
+.Vb 2
+\& # OS/390 or z/OS example
+\& $ascii_data = \`echo \*(Aq$ebcdic_data\*(Aq| iconv \-f IBM\-1047 \-t ISO8859\-1\`
+.Ve
+.PP
+or the inverse map:
+.PP
+.Vb 2
+\& # OS/390 or z/OS example
+\& $ebcdic_data = \`echo \*(Aq$ascii_data\*(Aq| iconv \-f ISO8859\-1 \-t IBM\-1047\`
+.Ve
+.PP
+For other Perl-based conversion options see the \f(CW\*(C`Convert::*\*(C'\fR modules on CPAN.
+.SS "C RTL"
+.IX Subsection "C RTL"
+The OS/390 and z/OS C run-time libraries provide \f(CW_atoe()\fR and \f(CW_etoa()\fR functions.
+.SH "OPERATOR DIFFERENCES"
+.IX Header "OPERATOR DIFFERENCES"
+The \f(CW\*(C`..\*(C'\fR range operator treats certain character ranges with
+care on EBCDIC platforms. For example the following array
+will have twenty six elements on either an EBCDIC platform
+or an ASCII platform:
+.PP
+.Vb 1
+\& @alphabet = (\*(AqA\*(Aq..\*(AqZ\*(Aq); # $#alphabet == 25
+.Ve
+.PP
+The bitwise operators such as & ^ | may return different results
+when operating on string or character data in a Perl program running
+on an EBCDIC platform than when run on an ASCII platform. Here is
+an example adapted from the one in perlop:
+.PP
+.Vb 5
+\& # EBCDIC\-based examples
+\& print "j p \en" ^ " a h"; # prints "JAPH\en"
+\& print "JA" | " ph\en"; # prints "japh\en"
+\& print "JAPH\enJunk" & "\e277\e277\e277\e277\e277"; # prints "japh\en";
+\& print \*(Aqp N$\*(Aq ^ " E<H\en"; # prints "Perl\en";
+.Ve
+.PP
+An interesting property of the 32 C0 control characters
+in the ASCII table is that they can "literally" be constructed
+as control characters in Perl, e.g. \f(CW\*(C`(chr(0)\*(C'\fR eq \f(CW\*(C`\ec@\*(C'\fR)>
+\&\f(CW\*(C`(chr(1)\*(C'\fR eq \f(CW\*(C`\ecA\*(C'\fR)>, and so on. Perl on EBCDIC platforms has been
+ported to take \f(CW\*(C`\ec@\*(C'\fR to \f(CWchr(0)\fR and \f(CW\*(C`\ecA\*(C'\fR to \f(CWchr(1)\fR, etc. as well, but the
+characters that result depend on which code page you are
+using. The table below uses the standard acronyms for the controls.
+The POSIX-BC and 1047 sets are
+identical throughout this range and differ from the 0037 set at only
+one spot (21 decimal). Note that the line terminator character
+may be generated by \f(CW\*(C`\ecJ\*(C'\fR on ASCII platforms but by \f(CW\*(C`\ecU\*(C'\fR on 1047 or POSIX-BC
+platforms and cannot be generated as a \f(CW"\ec.letter."\fR control character on
+0037 platforms. Note also that \f(CW\*(C`\ec\e\*(C'\fR cannot be the final element in a string
+or regex, as it will absorb the terminator. But \f(CW\*(C`\ec\e\fR\f(CIX\fR\f(CW\*(C'\fR is a \f(CW\*(C`FILE
+SEPARATOR\*(C'\fR concatenated with \fIX\fR for all \fIX\fR.
+The outlier \f(CW\*(C`\ec?\*(C'\fR on ASCII, which yields a non\-C0 control \f(CW\*(C`DEL\*(C'\fR,
+yields the outlier control \f(CW\*(C`APC\*(C'\fR on EBCDIC, the one that isn't in the
+block of contiguous controls. Note that a subtlety of this is that
+\&\f(CW\*(C`\ec?\*(C'\fR on ASCII platforms is an ASCII character, while it isn't
+equivalent to any ASCII character in EBCDIC platforms.
+.PP
+.Vb 10
+\& chr ord 8859\-1 0037 1047 && POSIX\-BC
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& \ec@ 0 <NUL> <NUL> <NUL>
+\& \ecA 1 <SOH> <SOH> <SOH>
+\& \ecB 2 <STX> <STX> <STX>
+\& \ecC 3 <ETX> <ETX> <ETX>
+\& \ecD 4 <EOT> <ST> <ST>
+\& \ecE 5 <ENQ> <HT> <HT>
+\& \ecF 6 <ACK> <SSA> <SSA>
+\& \ecG 7 <BEL> <DEL> <DEL>
+\& \ecH 8 <BS> <EPA> <EPA>
+\& \ecI 9 <HT> <RI> <RI>
+\& \ecJ 10 <LF> <SS2> <SS2>
+\& \ecK 11 <VT> <VT> <VT>
+\& \ecL 12 <FF> <FF> <FF>
+\& \ecM 13 <CR> <CR> <CR>
+\& \ecN 14 <SO> <SO> <SO>
+\& \ecO 15 <SI> <SI> <SI>
+\& \ecP 16 <DLE> <DLE> <DLE>
+\& \ecQ 17 <DC1> <DC1> <DC1>
+\& \ecR 18 <DC2> <DC2> <DC2>
+\& \ecS 19 <DC3> <DC3> <DC3>
+\& \ecT 20 <DC4> <OSC> <OSC>
+\& \ecU 21 <NAK> <NEL> <LF> **
+\& \ecV 22 <SYN> <BS> <BS>
+\& \ecW 23 <ETB> <ESA> <ESA>
+\& \ecX 24 <CAN> <CAN> <CAN>
+\& \ecY 25 <EOM> <EOM> <EOM>
+\& \ecZ 26 <SUB> <PU2> <PU2>
+\& \ec[ 27 <ESC> <SS3> <SS3>
+\& \ec\eX 28 <FS>X <FS>X <FS>X
+\& \ec] 29 <GS> <GS> <GS>
+\& \ec^ 30 <RS> <RS> <RS>
+\& \ec_ 31 <US> <US> <US>
+\& \ec? * <DEL> <APC> <APC>
+.Ve
+.PP
+\&\f(CW\*(C`*\*(C'\fR Note: \f(CW\*(C`\ec?\*(C'\fR maps to ordinal 127 (\f(CW\*(C`DEL\*(C'\fR) on ASCII platforms, but
+since ordinal 127 is a not a control character on EBCDIC machines,
+\&\f(CW\*(C`\ec?\*(C'\fR instead maps on them to \f(CW\*(C`APC\*(C'\fR, which is 255 in 0037 and 1047,
+and 95 in POSIX-BC.
+.SH "FUNCTION DIFFERENCES"
+.IX Header "FUNCTION DIFFERENCES"
+.ie n .IP chr() 8
+.el .IP \f(CWchr()\fR 8
+.IX Item "chr()"
+\&\f(CWchr()\fR must be given an EBCDIC code number argument to yield a desired
+character return value on an EBCDIC platform. For example:
+.Sp
+.Vb 1
+\& $CAPITAL_LETTER_A = chr(193);
+.Ve
+.ie n .IP ord() 8
+.el .IP \f(CWord()\fR 8
+.IX Item "ord()"
+\&\f(CWord()\fR will return EBCDIC code number values on an EBCDIC platform.
+For example:
+.Sp
+.Vb 1
+\& $the_number_193 = ord("A");
+.Ve
+.ie n .IP pack() 8
+.el .IP \f(CWpack()\fR 8
+.IX Item "pack()"
+The \f(CW"c"\fR and \f(CW"C"\fR templates for \f(CWpack()\fR are dependent upon character set
+encoding. Examples of usage on EBCDIC include:
+.Sp
+.Vb 4
+\& $foo = pack("CCCC",193,194,195,196);
+\& # $foo eq "ABCD"
+\& $foo = pack("C4",193,194,195,196);
+\& # same thing
+\&
+\& $foo = pack("ccxxcc",193,194,195,196);
+\& # $foo eq "AB\e0\e0CD"
+.Ve
+.Sp
+The \f(CW"U"\fR template has been ported to mean "Unicode" on all platforms so
+that
+.Sp
+.Vb 1
+\& pack("U", 65) eq \*(AqA\*(Aq
+.Ve
+.Sp
+is true on all platforms. If you want native code points for the low
+256, use the \f(CW"W"\fR template. This means that the equivalences
+.Sp
+.Vb 2
+\& pack("W", ord($character)) eq $character
+\& unpack("W", $character) == ord $character
+.Ve
+.Sp
+will hold.
+.ie n .IP print() 8
+.el .IP \f(CWprint()\fR 8
+.IX Item "print()"
+One must be careful with scalars and strings that are passed to
+print that contain ASCII encodings. One common place
+for this to occur is in the output of the MIME type header for
+CGI script writing. For example, many Perl programming guides
+recommend something similar to:
+.Sp
+.Vb 2
+\& print "Content\-type:\ettext/html\e015\e012\e015\e012";
+\& # this may be wrong on EBCDIC
+.Ve
+.Sp
+You can instead write
+.Sp
+.Vb 1
+\& print "Content\-type:\ettext/html\er\en\er\en"; # OK for DGW et al
+.Ve
+.Sp
+and have it work portably.
+.Sp
+That is because the translation from EBCDIC to ASCII is done
+by the web server in this case. Consult your web server's documentation for
+further details.
+.ie n .IP printf() 8
+.el .IP \f(CWprintf()\fR 8
+.IX Item "printf()"
+The formats that can convert characters to numbers and vice versa
+will be different from their ASCII counterparts when executed
+on an EBCDIC platform. Examples include:
+.Sp
+.Vb 1
+\& printf("%c%c%c",193,194,195); # prints ABC
+.Ve
+.ie n .IP sort() 8
+.el .IP \f(CWsort()\fR 8
+.IX Item "sort()"
+EBCDIC sort results may differ from ASCII sort results especially for
+mixed case strings. This is discussed in more detail below.
+.ie n .IP sprintf() 8
+.el .IP \f(CWsprintf()\fR 8
+.IX Item "sprintf()"
+See the discussion of \f(CW"printf()"\fR above. An example of the use
+of sprintf would be:
+.Sp
+.Vb 1
+\& $CAPITAL_LETTER_A = sprintf("%c",193);
+.Ve
+.ie n .IP unpack() 8
+.el .IP \f(CWunpack()\fR 8
+.IX Item "unpack()"
+See the discussion of \f(CW"pack()"\fR above.
+.PP
+Note that it is possible to write portable code for these by specifying
+things in Unicode numbers, and using a conversion function:
+.PP
+.Vb 3
+\& printf("%c",utf8::unicode_to_native(65)); # prints A on all
+\& # platforms
+\& print utf8::native_to_unicode(ord("A")); # Likewise, prints 65
+.Ve
+.PP
+See "Unicode and EBCDIC" in perluniintro and "CONVERSIONS"
+for other options.
+.SH "REGULAR EXPRESSION DIFFERENCES"
+.IX Header "REGULAR EXPRESSION DIFFERENCES"
+You can write your regular expressions just like someone on an ASCII
+platform would do. But keep in mind that using octal or hex notation to
+specify a particular code point will give you the character that the
+EBCDIC code page natively maps to it. (This is also true of all
+double-quoted strings.) If you want to write portably, just use the
+\&\f(CW\*(C`\eN{U+...}\*(C'\fR notation everywhere where you would have used \f(CW\*(C`\ex{...}\*(C'\fR,
+and don't use octal notation at all.
+.PP
+Starting in Perl v5.22, this applies to ranges in bracketed character
+classes. If you say, for example, \f(CW\*(C`qr/[\eN{U+20}\-\eN{U+7F}]/\*(C'\fR, it means
+the characters \f(CW\*(C`\eN{U+20}\*(C'\fR, \f(CW\*(C`\eN{U+21}\*(C'\fR, ..., \f(CW\*(C`\eN{U+7F}\*(C'\fR. This range
+is all the printable characters that the ASCII character set contains.
+.PP
+Prior to v5.22, you couldn't specify any ranges portably, except
+(starting in Perl v5.5.3) all subsets of the \f(CW\*(C`[A\-Z]\*(C'\fR and \f(CW\*(C`[a\-z]\*(C'\fR
+ranges are specially coded to not pick up gap characters. For example,
+characters such as "ô" (\f(CW\*(C`o WITH CIRCUMFLEX\*(C'\fR) that lie between
+"I" and "J" would not be matched by the regular expression range
+\&\f(CW\*(C`/[H\-K]/\*(C'\fR. But if either of the range end points is explicitly numeric
+(and neither is specified by \f(CW\*(C`\eN{U+...}\*(C'\fR), the gap characters are
+matched:
+.PP
+.Vb 1
+\& /[\ex89\-\ex91]/
+.Ve
+.PP
+will match \f(CW\*(C`\ex8e\*(C'\fR, even though \f(CW\*(C`\ex89\*(C'\fR is "i" and \f(CW\*(C`\ex91 \*(C'\fR is "j",
+and \f(CW\*(C`\ex8e\*(C'\fR is a gap character, from the alphabetic viewpoint.
+.PP
+Another construct to be wary of is the inappropriate use of hex (unless
+you use \f(CW\*(C`\eN{U+...}\*(C'\fR) or
+octal constants in regular expressions. Consider the following
+set of subs:
+.PP
+.Vb 4
+\& sub is_c0 {
+\& my $char = substr(shift,0,1);
+\& $char =~ /[\e000\-\e037]/;
+\& }
+\&
+\& sub is_print_ascii {
+\& my $char = substr(shift,0,1);
+\& $char =~ /[\e040\-\e176]/;
+\& }
+\&
+\& sub is_delete {
+\& my $char = substr(shift,0,1);
+\& $char eq "\e177";
+\& }
+\&
+\& sub is_c1 {
+\& my $char = substr(shift,0,1);
+\& $char =~ /[\e200\-\e237]/;
+\& }
+\&
+\& sub is_latin_1 { # But not ASCII; not C1
+\& my $char = substr(shift,0,1);
+\& $char =~ /[\e240\-\e377]/;
+\& }
+.Ve
+.PP
+These are valid only on ASCII platforms. Starting in Perl v5.22, simply
+changing the octal constants to equivalent \f(CW\*(C`\eN{U+...}\*(C'\fR values makes
+them portable:
+.PP
+.Vb 4
+\& sub is_c0 {
+\& my $char = substr(shift,0,1);
+\& $char =~ /[\eN{U+00}\-\eN{U+1F}]/;
+\& }
+\&
+\& sub is_print_ascii {
+\& my $char = substr(shift,0,1);
+\& $char =~ /[\eN{U+20}\-\eN{U+7E}]/;
+\& }
+\&
+\& sub is_delete {
+\& my $char = substr(shift,0,1);
+\& $char eq "\eN{U+7F}";
+\& }
+\&
+\& sub is_c1 {
+\& my $char = substr(shift,0,1);
+\& $char =~ /[\eN{U+80}\-\eN{U+9F}]/;
+\& }
+\&
+\& sub is_latin_1 { # But not ASCII; not C1
+\& my $char = substr(shift,0,1);
+\& $char =~ /[\eN{U+A0}\-\eN{U+FF}]/;
+\& }
+.Ve
+.PP
+And here are some alternative portable ways to write them:
+.PP
+.Vb 3
+\& sub Is_c0 {
+\& my $char = substr(shift,0,1);
+\& return $char =~ /[[:cntrl:]]/a && ! Is_delete($char);
+\&
+\& # Alternatively:
+\& # return $char =~ /[[:cntrl:]]/
+\& # && $char =~ /[[:ascii:]]/
+\& # && ! Is_delete($char);
+\& }
+\&
+\& sub Is_print_ascii {
+\& my $char = substr(shift,0,1);
+\&
+\& return $char =~ /[[:print:]]/a;
+\&
+\& # Alternatively:
+\& # return $char =~ /[[:print:]]/ && $char =~ /[[:ascii:]]/;
+\&
+\& # Or
+\& # return $char
+\& # =~ /[ !"\e#\e$%&\*(Aq()*+,\e\-.\e/0\-9:;<=>?\e@A\-Z[\e\e\e]^_\`a\-z{|}~]/;
+\& }
+\&
+\& sub Is_delete {
+\& my $char = substr(shift,0,1);
+\& return utf8::native_to_unicode(ord $char) == 0x7F;
+\& }
+\&
+\& sub Is_c1 {
+\& use feature \*(Aqunicode_strings\*(Aq;
+\& my $char = substr(shift,0,1);
+\& return $char =~ /[[:cntrl:]]/ && $char !~ /[[:ascii:]]/;
+\& }
+\&
+\& sub Is_latin_1 { # But not ASCII; not C1
+\& use feature \*(Aqunicode_strings\*(Aq;
+\& my $char = substr(shift,0,1);
+\& return ord($char) < 256
+\& && $char !~ /[[:ascii:]]/
+\& && $char !~ /[[:cntrl:]]/;
+\& }
+.Ve
+.PP
+Another way to write \f(CWIs_latin_1()\fR would be
+to use the characters in the range explicitly:
+.PP
+.Vb 5
+\& sub Is_latin_1 {
+\& my $char = substr(shift,0,1);
+\& $char =~ /[\ ¡¢£¤¥¦§¨©ª«¬\%®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ]
+\& [ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]/x;
+\& }
+.Ve
+.PP
+Although that form may run into trouble in network transit (due to the
+presence of 8 bit characters) or on non ISO-Latin character sets. But
+it does allow \f(CW\*(C`Is_c1\*(C'\fR to be rewritten so it works on Perls that don't
+have \f(CW\*(Aqunicode_strings\*(Aq\fR (earlier than v5.14):
+.PP
+.Vb 6
+\& sub Is_latin_1 { # But not ASCII; not C1
+\& my $char = substr(shift,0,1);
+\& return ord($char) < 256
+\& && $char !~ /[[:ascii:]]/
+\& && ! Is_latin1($char);
+\& }
+.Ve
+.SH SOCKETS
+.IX Header "SOCKETS"
+Most socket programming assumes ASCII character encodings in network
+byte order. Exceptions can include CGI script writing under a
+host web server where the server may take care of translation for you.
+Most host web servers convert EBCDIC data to ISO\-8859\-1 or Unicode on
+output.
+.SH SORTING
+.IX Header "SORTING"
+One big difference between ASCII-based character sets and EBCDIC ones
+are the relative positions of the characters when sorted in native
+order. Of most concern are the upper\- and lowercase letters, the
+digits, and the underscore (\f(CW"_"\fR). On ASCII platforms the native sort
+order has the digits come before the uppercase letters which come before
+the underscore which comes before the lowercase letters. On EBCDIC, the
+underscore comes first, then the lowercase letters, then the uppercase
+ones, and the digits last. If sorted on an ASCII-based platform, the
+two-letter abbreviation for a physician comes before the two letter
+abbreviation for drive; that is:
+.PP
+.Vb 2
+\& @sorted = sort(qw(Dr. dr.)); # @sorted holds (\*(AqDr.\*(Aq,\*(Aqdr.\*(Aq) on ASCII,
+\& # but (\*(Aqdr.\*(Aq,\*(AqDr.\*(Aq) on EBCDIC
+.Ve
+.PP
+The property of lowercase before uppercase letters in EBCDIC is
+even carried to the Latin 1 EBCDIC pages such as 0037 and 1047.
+An example would be that "Ë" (\f(CW\*(C`E WITH DIAERESIS\*(C'\fR, 203) comes
+before "ë" (\f(CW\*(C`e WITH DIAERESIS\*(C'\fR, 235) on an ASCII platform, but
+the latter (83) comes before the former (115) on an EBCDIC platform.
+(Astute readers will note that the uppercase version of "ß"
+\&\f(CW\*(C`SMALL LETTER SHARP S\*(C'\fR is simply "SS" and that the upper case versions
+of "ÿ" (small \f(CW\*(C`y WITH DIAERESIS\*(C'\fR) and "µ" (\f(CW\*(C`MICRO SIGN\*(C'\fR)
+are not in the 0..255 range but are in Unicode, in a Unicode enabled
+Perl).
+.PP
+The sort order will cause differences between results obtained on
+ASCII platforms versus EBCDIC platforms. What follows are some suggestions
+on how to deal with these differences.
+.SS "Ignore ASCII vs. EBCDIC sort differences."
+.IX Subsection "Ignore ASCII vs. EBCDIC sort differences."
+This is the least computationally expensive strategy. It may require
+some user education.
+.SS "Use a sort helper function"
+.IX Subsection "Use a sort helper function"
+This is completely general, but the most computationally expensive
+strategy. Choose one or the other character set and transform to that
+for every sort comparison. Here's a complete example that transforms
+to ASCII sort order:
+.PP
+.Vb 2
+\& sub native_to_uni($) {
+\& my $string = shift;
+\&
+\& # Saves time on an ASCII platform
+\& return $string if ord \*(AqA\*(Aq == 65;
+\&
+\& my $output = "";
+\& for my $i (0 .. length($string) \- 1) {
+\& $output
+\& .= chr(utf8::native_to_unicode(ord(substr($string, $i, 1))));
+\& }
+\&
+\& # Preserve utf8ness of input onto the output, even if it didn\*(Aqt need
+\& # to be utf8
+\& utf8::upgrade($output) if utf8::is_utf8($string);
+\&
+\& return $output;
+\& }
+\&
+\& sub ascii_order { # Sort helper
+\& return native_to_uni($a) cmp native_to_uni($b);
+\& }
+\&
+\& sort ascii_order @list;
+.Ve
+.SS "MONO CASE then sort data (for non-digits, non-underscore)"
+.IX Subsection "MONO CASE then sort data (for non-digits, non-underscore)"
+If you don't care about where digits and underscore sort to, you can do
+something like this
+.PP
+.Vb 3
+\& sub case_insensitive_order { # Sort helper
+\& return lc($a) cmp lc($b)
+\& }
+\&
+\& sort case_insensitive_order @list;
+.Ve
+.PP
+If performance is an issue, and you don't care if the output is in the
+same case as the input, Use \f(CW\*(C`tr///\*(C'\fR to transform to the case most
+employed within the data. If the data are primarily UPPERCASE
+non\-Latin1, then apply \f(CW\*(C`tr/[a\-z]/[A\-Z]/\*(C'\fR, and then \f(CWsort()\fR. If the
+data are primarily lowercase non Latin1 then apply \f(CW\*(C`tr/[A\-Z]/[a\-z]/\*(C'\fR
+before sorting. If the data are primarily UPPERCASE and include Latin\-1
+characters then apply:
+.PP
+.Vb 3
+\& tr/[a\-z]/[A\-Z]/;
+\& tr/[àáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ]/[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ/;
+\& s/ß/SS/g;
+.Ve
+.PP
+then \f(CWsort()\fR. If you have a choice, it's better to lowercase things
+to avoid the problems of the two Latin\-1 characters whose uppercase is
+outside Latin\-1: "ÿ" (small \f(CW\*(C`y WITH DIAERESIS\*(C'\fR) and "µ"
+(\f(CW\*(C`MICRO SIGN\*(C'\fR). If you do need to upppercase, you can; with a
+Unicode-enabled Perl, do:
+.PP
+.Vb 2
+\& tr/ÿ/\ex{178}/;
+\& tr/µ/\ex{39C}/;
+.Ve
+.SS "Perform sorting on one type of platform only."
+.IX Subsection "Perform sorting on one type of platform only."
+This strategy can employ a network connection. As such
+it would be computationally expensive.
+.SH "TRANSFORMATION FORMATS"
+.IX Header "TRANSFORMATION FORMATS"
+There are a variety of ways of transforming data with an intra character set
+mapping that serve a variety of purposes. Sorting was discussed in the
+previous section and a few of the other more popular mapping techniques are
+discussed next.
+.SS "URL decoding and encoding"
+.IX Subsection "URL decoding and encoding"
+Note that some URLs have hexadecimal ASCII code points in them in an
+attempt to overcome character or protocol limitation issues. For example
+the tilde character is not on every keyboard hence a URL of the form:
+.PP
+.Vb 1
+\& http://www.pvhp.com/~pvhp/
+.Ve
+.PP
+may also be expressed as either of:
+.PP
+.Vb 1
+\& http://www.pvhp.com/%7Epvhp/
+\&
+\& http://www.pvhp.com/%7epvhp/
+.Ve
+.PP
+where 7E is the hexadecimal ASCII code point for "~". Here is an example
+of decoding such a URL in any EBCDIC code page:
+.PP
+.Vb 3
+\& $url = \*(Aqhttp://www.pvhp.com/%7Epvhp/\*(Aq;
+\& $url =~ s/%([0\-9a\-fA\-F]{2})/
+\& pack("c",utf8::unicode_to_native(hex($1)))/xge;
+.Ve
+.PP
+Conversely, here is a partial solution for the task of encoding such
+a URL in any EBCDIC code page:
+.PP
+.Vb 5
+\& $url = \*(Aqhttp://www.pvhp.com/~pvhp/\*(Aq;
+\& # The following regular expression does not address the
+\& # mappings for: (\*(Aq.\*(Aq => \*(Aq%2E\*(Aq, \*(Aq/\*(Aq => \*(Aq%2F\*(Aq, \*(Aq:\*(Aq => \*(Aq%3A\*(Aq)
+\& $url =~ s/([\et "#%&\e(\e),;<=>\e?\e@\e[\e\e\e]^\`{|}~])/
+\& sprintf("%%%02X",utf8::native_to_unicode(ord($1)))/xge;
+.Ve
+.PP
+where a more complete solution would split the URL into components
+and apply a full s/// substitution only to the appropriate parts.
+.SS "uu encoding and decoding"
+.IX Subsection "uu encoding and decoding"
+The \f(CW\*(C`u\*(C'\fR template to \f(CWpack()\fR or \f(CWunpack()\fR will render EBCDIC data in
+EBCDIC characters equivalent to their ASCII counterparts. For example,
+the following will print "Yes indeed\en" on either an ASCII or EBCDIC
+computer:
+.PP
+.Vb 10
+\& $all_byte_chrs = \*(Aq\*(Aq;
+\& for (0..255) { $all_byte_chrs .= chr($_); }
+\& $uuencode_byte_chrs = pack(\*(Aqu\*(Aq, $all_byte_chrs);
+\& ($uu = <<\*(AqENDOFHEREDOC\*(Aq) =~ s/^\es*//gm;
+\& M\`\`$"\`P0%!@<("0H+#\`T.#Q\`1$A,4%187&!D:&QP=\*(AqA\e@(2(C)"4F)R@I*BLL
+\& M+2XO,#$R,S0U\-C<X.3H[/#T^/T!!0D\-$149\*(Aq2$E*2TQ\-3D]045)35%565UA9
+\& M6EM<75Y?8&%B8V1E9F=H:6IK;&UN;W!Q<G\-T=79W>\*(AqEZ>WQ]?G^\`@8*#A(6&
+\& MAXB)BHN,C8Z/D)&2DY25EI>8F9J;G)V>GZ"AHJ.DI::GJ*FJJZRMKJ^PL;*S
+\& MM+6VM[BYNKN\eO;Z_P,\*(Aq"P\e3%QL?(R<K+S,W.S]#1TM/4U=;7V\-G:V]S=WM_@
+\& ?X>+CY.7FY^CIZNOL[>[O\e/\*(AqR\e_3U]O?X^?K[_/W^_P\`\`
+\& ENDOFHEREDOC
+\& if ($uuencode_byte_chrs eq $uu) {
+\& print "Yes ";
+\& }
+\& $uudecode_byte_chrs = unpack(\*(Aqu\*(Aq, $uuencode_byte_chrs);
+\& if ($uudecode_byte_chrs eq $all_byte_chrs) {
+\& print "indeed\en";
+\& }
+.Ve
+.PP
+Here is a very spartan uudecoder that will work on EBCDIC:
+.PP
+.Vb 10
+\& #!/usr/local/bin/perl
+\& $_ = <> until ($mode,$file) = /^begin\es*(\ed*)\es*(\eS*)/;
+\& open(OUT, "> $file") if $file ne "";
+\& while(<>) {
+\& last if /^end/;
+\& next if /[a\-z]/;
+\& next unless int((((utf8::native_to_unicode(ord()) \- 32 ) & 077)
+\& + 2) / 3)
+\& == int(length() / 4);
+\& print OUT unpack("u", $_);
+\& }
+\& close(OUT);
+\& chmod oct($mode), $file;
+.Ve
+.SS "Quoted-Printable encoding and decoding"
+.IX Subsection "Quoted-Printable encoding and decoding"
+On ASCII-encoded platforms it is possible to strip characters outside of
+the printable set using:
+.PP
+.Vb 3
+\& # This QP encoder works on ASCII only
+\& $qp_string =~ s/([=\ex00\-\ex1F\ex80\-\exFF])/
+\& sprintf("=%02X",ord($1))/xge;
+.Ve
+.PP
+Starting in Perl v5.22, this is trivially changeable to work portably on
+both ASCII and EBCDIC platforms.
+.PP
+.Vb 3
+\& # This QP encoder works on both ASCII and EBCDIC
+\& $qp_string =~ s/([=\eN{U+00}\-\eN{U+1F}\eN{U+80}\-\eN{U+FF}])/
+\& sprintf("=%02X",ord($1))/xge;
+.Ve
+.PP
+For earlier Perls, a QP encoder that works on both ASCII and EBCDIC
+platforms would look somewhat like the following:
+.PP
+.Vb 4
+\& $delete = utf8::unicode_to_native(ord("\ex7F"));
+\& $qp_string =~
+\& s/([^[:print:]$delete])/
+\& sprintf("=%02X",utf8::native_to_unicode(ord($1)))/xage;
+.Ve
+.PP
+(although in production code the substitutions might be done
+in the EBCDIC branch with the function call and separately in the
+ASCII branch without the expense of the identity map; in Perl v5.22, the
+identity map is optimized out so there is no expense, but the
+alternative above is simpler and is also available in v5.22).
+.PP
+Such QP strings can be decoded with:
+.PP
+.Vb 3
+\& # This QP decoder is limited to ASCII only
+\& $string =~ s/=([[:xdigit:][[:xdigit:])/chr hex $1/ge;
+\& $string =~ s/=[\en\er]+$//;
+.Ve
+.PP
+Whereas a QP decoder that works on both ASCII and EBCDIC platforms
+would look somewhat like the following:
+.PP
+.Vb 3
+\& $string =~ s/=([[:xdigit:][:xdigit:]])/
+\& chr utf8::native_to_unicode(hex $1)/xge;
+\& $string =~ s/=[\en\er]+$//;
+.Ve
+.SS "Caesarean ciphers"
+.IX Subsection "Caesarean ciphers"
+The practice of shifting an alphabet one or more characters for encipherment
+dates back thousands of years and was explicitly detailed by Gaius Julius
+Caesar in his \fBGallic Wars\fR text. A single alphabet shift is sometimes
+referred to as a rotation and the shift amount is given as a number \f(CW$n\fR after
+the string 'rot' or "rot$n". Rot0 and rot26 would designate identity maps
+on the 26\-letter English version of the Latin alphabet. Rot13 has the
+interesting property that alternate subsequent invocations are identity maps
+(thus rot13 is its own non-trivial inverse in the group of 26 alphabet
+rotations). Hence the following is a rot13 encoder and decoder that will
+work on ASCII and EBCDIC platforms:
+.PP
+.Vb 1
+\& #!/usr/local/bin/perl
+\&
+\& while(<>){
+\& tr/n\-za\-mN\-ZA\-M/a\-zA\-Z/;
+\& print;
+\& }
+.Ve
+.PP
+In one-liner form:
+.PP
+.Vb 1
+\& perl \-ne \*(Aqtr/n\-za\-mN\-ZA\-M/a\-zA\-Z/;print\*(Aq
+.Ve
+.SH "Hashing order and checksums"
+.IX Header "Hashing order and checksums"
+Perl deliberately randomizes hash order for security purposes on both
+ASCII and EBCDIC platforms.
+.PP
+EBCDIC checksums will differ for the same file translated into ASCII
+and vice versa.
+.SH "I18N AND L10N"
+.IX Header "I18N AND L10N"
+Internationalization (I18N) and localization (L10N) are supported at least
+in principle even on EBCDIC platforms. The details are system-dependent
+and discussed under the "OS ISSUES" section below.
+.SH "MULTI-OCTET CHARACTER SETS"
+.IX Header "MULTI-OCTET CHARACTER SETS"
+Perl works with UTF-EBCDIC, a multi-byte encoding. In Perls earlier
+than v5.22, there may be various bugs in this regard.
+.PP
+Legacy multi byte EBCDIC code pages XXX.
+.SH "OS ISSUES"
+.IX Header "OS ISSUES"
+There may be a few system-dependent issues
+of concern to EBCDIC Perl programmers.
+.SS OS/400
+.IX Subsection "OS/400"
+.IP PASE 8
+.IX Item "PASE"
+The PASE environment is a runtime environment for OS/400 that can run
+executables built for PowerPC AIX in OS/400; see perlos400. PASE
+is ASCII-based, not EBCDIC-based as the ILE.
+.IP "IFS access" 8
+.IX Item "IFS access"
+XXX.
+.SS "OS/390, z/OS"
+.IX Subsection "OS/390, z/OS"
+Perl runs under Unix Systems Services or USS.
+.ie n .IP """sigaction""" 8
+.el .IP \f(CWsigaction\fR 8
+.IX Item "sigaction"
+\&\f(CW\*(C`SA_SIGINFO\*(C'\fR can have segmentation faults.
+.ie n .IP """chcp""" 8
+.el .IP \f(CWchcp\fR 8
+.IX Item "chcp"
+\&\fBchcp\fR is supported as a shell utility for displaying and changing
+one's code page. See also \fBchcp\fR\|(1).
+.IP "dataset access" 8
+.IX Item "dataset access"
+For sequential data set access try:
+.Sp
+.Vb 1
+\& my @ds_records = \`cat //DSNAME\`;
+.Ve
+.Sp
+or:
+.Sp
+.Vb 1
+\& my @ds_records = \`cat //\*(AqHLQ.DSNAME\*(Aq\`;
+.Ve
+.Sp
+See also the OS390::Stdio module on CPAN.
+.ie n .IP """iconv""" 8
+.el .IP \f(CWiconv\fR 8
+.IX Item "iconv"
+\&\fBiconv\fR is supported as both a shell utility and a C RTL routine.
+See also the \fBiconv\fR\|(1) and \fBiconv\fR\|(3) manual pages.
+.IP locales 8
+.IX Item "locales"
+Locales are supported. There may be glitches when a locale is another
+EBCDIC code page which has some of the
+code-page variant characters in other
+positions.
+.Sp
+There aren't currently any real UTF\-8 locales, even though some locale
+names contain the string "UTF\-8".
+.Sp
+See perllocale for information on locales. The L10N files
+are in \fI/usr/nls/locale\fR. \f(CW$Config{d_setlocale}\fR is \f(CW\*(Aqdefine\*(Aq\fR on
+OS/390 or z/OS.
+.SS POSIX-BC?
+.IX Subsection "POSIX-BC?"
+XXX.
+.SH BUGS
+.IX Header "BUGS"
+.IP \(bu 4
+Not all shells will allow multiple \f(CW\*(C`\-e\*(C'\fR string arguments to perl to
+be concatenated together properly as recipes in this document
+0, 2, 4, 5, and 6 might
+seem to imply.
+.IP \(bu 4
+There are a significant number of test failures in the CPAN modules
+shipped with Perl v5.22 and 5.24. These are only in modules not primarily
+maintained by Perl 5 porters. Some of these are failures in the tests
+only: they don't realize that it is proper to get different results on
+EBCDIC platforms. And some of the failures are real bugs. If you
+compile and do a \f(CW\*(C`make test\*(C'\fR on Perl, all tests on the \f(CW\*(C`/cpan\*(C'\fR
+directory are skipped.
+.Sp
+Encode partially works.
+.IP \(bu 4
+In earlier Perl versions, when byte and character data were
+concatenated, the new string was sometimes created by
+decoding the byte strings as \fIISO 8859\-1 (Latin\-1)\fR, even if the
+old Unicode string used EBCDIC.
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+perllocale, perlfunc, perlunicode, utf8.
+.SH REFERENCES
+.IX Header "REFERENCES"
+<http://std.dkuug.dk/i18n/charmaps>
+.PP
+<https://www.unicode.org/>
+.PP
+<https://www.unicode.org/reports/tr16/>
+.PP
+<https://www.sr\-ix.com/Archive/CharCodeHist/index.html>
+\&\fBASCII: American Standard Code for Information Infiltration\fR Tom Jennings,
+September 1999.
+.PP
+\&\fBThe Unicode Standard, Version 3.0\fR The Unicode Consortium, Lisa Moore ed.,
+ISBN 0\-201\-61633\-5, Addison Wesley Developers Press, February 2000.
+.PP
+\&\fBCDRA: IBM \- Character Data Representation Architecture \-
+Reference and Registry\fR, IBM SC09\-2190\-00, December 1996.
+.PP
+"Demystifying Character Sets", Andrea Vine, Multilingual Computing
+& Technology, \fB#26 Vol. 10 Issue 4\fR, August/September 1999;
+ISSN 1523\-0309; Multilingual Computing Inc. Sandpoint ID, USA.
+.PP
+\&\fBCodes, Ciphers, and Other Cryptic and Clandestine Communication\fR
+Fred B. Wrixon, ISBN 1\-57912\-040\-7, Black Dog & Leventhal Publishers,
+1998.
+.PP
+<http://www.bobbemer.com/P\-BIT.HTM>
+\&\fBIBM \- EBCDIC and the P\-bit; The biggest Computer Goof Ever\fR Robert Bemer.
+.SH HISTORY
+.IX Header "HISTORY"
+15 April 2001: added UTF\-8 and UTF-EBCDIC to main table, pvhp.
+.SH AUTHOR
+.IX Header "AUTHOR"
+Peter Prymmer pvhp@best.com wrote this in 1999 and 2000
+with CCSID 0819 and 0037 help from Chris Leach and
+André Pirard A.Pirard@ulg.ac.be as well as POSIX-BC
+help from Thomas Dorner Thomas.Dorner@start.de.
+Thanks also to Vickie Cooper, Philip Newton, William Raffloer, and
+Joe Smith. Trademarks, registered trademarks, service marks and
+registered service marks used in this document are the property of
+their respective owners.
+.PP
+Now maintained by Perl5 Porters.