summaryrefslogtreecommitdiffstats
path: root/upstream/debian-bookworm/man1/perlpodspec.1
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 19:43:11 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 19:43:11 +0000
commitfc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
treece1e3bce06471410239a6f41282e328770aa404a /upstream/debian-bookworm/man1/perlpodspec.1
parentInitial commit. (diff)
downloadmanpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz
manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip
Adding upstream version 4.22.0.upstream/4.22.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'upstream/debian-bookworm/man1/perlpodspec.1')
-rw-r--r--upstream/debian-bookworm/man1/perlpodspec.11912
1 files changed, 1912 insertions, 0 deletions
diff --git a/upstream/debian-bookworm/man1/perlpodspec.1 b/upstream/debian-bookworm/man1/perlpodspec.1
new file mode 100644
index 00000000..bdfd6780
--- /dev/null
+++ b/upstream/debian-bookworm/man1/perlpodspec.1
@@ -0,0 +1,1912 @@
+.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" Set up some character translations and predefined strings. \*(-- will
+.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
+.\" double quote, and \*(R" will give a right double quote. \*(C+ will
+.\" give a nicer C++. Capital omega is used to do unbreakable dashes and
+.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,
+.\" nothing in troff, for use with C<>.
+.tr \(*W-
+.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
+.ie n \{\
+. ds -- \(*W-
+. ds PI pi
+. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
+. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
+. ds L" ""
+. ds R" ""
+. ds C` ""
+. ds C' ""
+'br\}
+.el\{\
+. ds -- \|\(em\|
+. ds PI \(*p
+. ds L" ``
+. ds R" ''
+. ds C`
+. ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD. Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+. if \nF \{\
+. de IX
+. tm Index:\\$1\t\\n%\t"\\$2"
+..
+. if !\nF==2 \{\
+. nr % 0
+. nr F 2
+. \}
+. \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "PERLPODSPEC 1"
+.TH PERLPODSPEC 1 "2023-11-25" "perl v5.36.0" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification. Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH "NAME"
+perlpodspec \- Plain Old Documentation: format specification and notes
+.SH "DESCRIPTION"
+.IX Header "DESCRIPTION"
+This document is detailed notes on the Pod markup language. Most
+people will only have to read perlpod to know how to write
+in Pod, but this document may answer some incidental questions to do
+with parsing and rendering Pod.
+.PP
+In this document, \*(L"must\*(R" / \*(L"must not\*(R", \*(L"should\*(R" /
+\&\*(L"should not\*(R", and \*(L"may\*(R" have their conventional (cf. \s-1RFC 2119\s0)
+meanings: \*(L"X must do Y\*(R" means that if X doesn't do Y, it's against
+this specification, and should really be fixed. \*(L"X should do Y\*(R"
+means that it's recommended, but X may fail to do Y, if there's a
+good reason. \*(L"X may do Y\*(R" is merely a note that X can do Y at
+will (although it is up to the reader to detect any connotation of
+"and I think it would be \fInice\fR if X did Y\*(L" versus \*(R"it wouldn't
+really \fIbother\fR me if X did Y").
+.PP
+Notably, when I say \*(L"the parser should do Y\*(R", the
+parser may fail to do Y, if the calling application explicitly
+requests that the parser \fInot\fR do Y. I often phrase this as
+\&\*(L"the parser should, by default, do Y.\*(R" This doesn't \fIrequire\fR
+the parser to provide an option for turning off whatever
+feature Y is (like expanding tabs in verbatim paragraphs), although
+it implicates that such an option \fImay\fR be provided.
+.SH "Pod Definitions"
+.IX Header "Pod Definitions"
+Pod is embedded in files, typically Perl source files, although you
+can write a file that's nothing but Pod.
+.PP
+A \fBline\fR in a file consists of zero or more non-newline characters,
+terminated by either a newline or the end of the file.
+.PP
+A \fBnewline sequence\fR is usually a platform-dependent concept, but
+Pod parsers should understand it to mean any of \s-1CR\s0 (\s-1ASCII 13\s0), \s-1LF\s0
+(\s-1ASCII 10\s0), or a \s-1CRLF\s0 (\s-1ASCII 13\s0 followed immediately by \s-1ASCII 10\s0), in
+addition to any other system-specific meaning. The first \s-1CR/CRLF/LF\s0
+sequence in the file may be used as the basis for identifying the
+newline sequence for parsing the rest of the file.
+.PP
+A \fBblank line\fR is a line consisting entirely of zero or more spaces
+(\s-1ASCII 32\s0) or tabs (\s-1ASCII 9\s0), and terminated by a newline or end-of-file.
+A \fBnon-blank line\fR is a line containing one or more characters other
+than space or tab (and terminated by a newline or end-of-file).
+.PP
+(\fINote:\fR Many older Pod parsers did not accept a line consisting of
+spaces/tabs and then a newline as a blank line. The only lines they
+considered blank were lines consisting of \fIno characters at all\fR,
+terminated by a newline.)
+.PP
+\&\fBWhitespace\fR is used in this document as a blanket term for spaces,
+tabs, and newline sequences. (By itself, this term usually refers
+to literal whitespace. That is, sequences of whitespace characters
+in Pod source, as opposed to \*(L"E<32>\*(R", which is a formatting
+code that \fIdenotes\fR a whitespace character.)
+.PP
+A \fBPod parser\fR is a module meant for parsing Pod (regardless of
+whether this involves calling callbacks or building a parse tree or
+directly formatting it). A \fBPod formatter\fR (or \fBPod translator\fR)
+is a module or program that converts Pod to some other format (\s-1HTML,\s0
+plaintext, TeX, PostScript, \s-1RTF\s0). A \fBPod processor\fR might be a
+formatter or translator, or might be a program that does something
+else with the Pod (like counting words, scanning for index points,
+etc.).
+.PP
+Pod content is contained in \fBPod blocks\fR. A Pod block starts with a
+line that matches \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR, and continues up to the next line
+that matches \f(CW\*(C`m/\eA=cut/\*(C'\fR or up to the end of the file if there is
+no \f(CW\*(C`m/\eA=cut/\*(C'\fR line.
+.PP
+Note that a parser is not expected to distinguish between something that
+looks like pod, but is in a quoted string, such as a here document.
+.PP
+Within a Pod block, there are \fBPod paragraphs\fR. A Pod paragraph
+consists of non-blank lines of text, separated by one or more blank
+lines.
+.PP
+For purposes of Pod processing, there are four types of paragraphs in
+a Pod block:
+.IP "\(bu" 4
+A command paragraph (also called a \*(L"directive\*(R"). The first line of
+this paragraph must match \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR. Command paragraphs are
+typically one line, as in:
+.Sp
+.Vb 1
+\& =head1 NOTES
+\&
+\& =item *
+.Ve
+.Sp
+But they may span several (non-blank) lines:
+.Sp
+.Vb 3
+\& =for comment
+\& Hm, I wonder what it would look like if
+\& you tried to write a BNF for Pod from this.
+\&
+\& =head3 Dr. Strangelove, or: How I Learned to
+\& Stop Worrying and Love the Bomb
+.Ve
+.Sp
+\&\fISome\fR command paragraphs allow formatting codes in their content
+(i.e., after the part that matches \f(CW\*(C`m/\eA=[a\-zA\-Z]\eS*\es*/\*(C'\fR), as in:
+.Sp
+.Vb 1
+\& =head1 Did You Remember to C<use strict;>?
+.Ve
+.Sp
+In other words, the Pod processing handler for \*(L"head1\*(R" will apply the
+same processing to \*(L"Did You Remember to C<use strict;>?\*(R" that it
+would to an ordinary paragraph (i.e., formatting codes like
+\&\*(L"C<...>\*(R") are parsed and presumably formatted appropriately, and
+whitespace in the form of literal spaces and/or tabs is not
+significant.
+.IP "\(bu" 4
+A \fBverbatim paragraph\fR. The first line of this paragraph must be a
+literal space or tab, and this paragraph must not be inside a "=begin
+\&\fIidentifier\fR\*(L", ... \*(R"=end \fIidentifier\fR\*(L" sequence unless
+\&\*(R"\fIidentifier\fR\*(L" begins with a colon (\*(R":"). That is, if a paragraph
+starts with a literal space or tab, but \fIis\fR inside a
+"=begin \fIidentifier\fR\*(L", ... \*(R"=end \fIidentifier\fR\*(L" region, then it's
+a data paragraph, unless \*(R"\fIidentifier\fR" begins with a colon.
+.Sp
+Whitespace \fIis\fR significant in verbatim paragraphs (although, in
+processing, tabs are probably expanded).
+.IP "\(bu" 4
+An \fBordinary paragraph\fR. A paragraph is an ordinary paragraph
+if its first line matches neither \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR nor
+\&\f(CW\*(C`m/\eA[ \et]/\*(C'\fR, \fIand\fR if it's not inside a "=begin \fIidentifier\fR\*(L",
+\&... \*(R"=end \fIidentifier\fR\*(L" sequence unless \*(R"\fIidentifier\fR\*(L" begins with
+a colon (\*(R":").
+.IP "\(bu" 4
+A \fBdata paragraph\fR. This is a paragraph that \fIis\fR inside a "=begin
+\&\fIidentifier\fR\*(L" ... \*(R"=end \fIidentifier\fR\*(L" sequence where
+\&\*(R"\fIidentifier\fR" does \fInot\fR begin with a literal colon (\*(L":\*(R"). In
+some sense, a data paragraph is not part of Pod at all (i.e.,
+effectively it's \*(L"out-of-band\*(R"), since it's not subject to most kinds
+of Pod parsing; but it is specified here, since Pod
+parsers need to be able to call an event for it, or store it in some
+form in a parse tree, or at least just parse \fIaround\fR it.
+.PP
+For example: consider the following paragraphs:
+.PP
+.Vb 1
+\& # <\- that\*(Aqs the 0th column
+\&
+\& =head1 Foo
+\&
+\& Stuff
+\&
+\& $foo\->bar
+\&
+\& =cut
+.Ve
+.PP
+Here, \*(L"=head1 Foo\*(R" and \*(L"=cut\*(R" are command paragraphs because the first
+line of each matches \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR. "\fI[space][space]\fR\f(CW$foo\fR\->bar\*(L"
+is a verbatim paragraph, because its first line starts with a literal
+whitespace character (and there's no \*(R"=begin\*(L"...\*(R"=end" region around).
+.PP
+The "=begin \fIidentifier\fR\*(L" ... \*(R"=end \fIidentifier\fR" commands stop
+paragraphs that they surround from being parsed as ordinary or verbatim
+paragraphs, if \fIidentifier\fR doesn't begin with a colon. This
+is discussed in detail in the section
+\&\*(L"About Data Paragraphs and \*(R"=begin/=end\*(L" Regions\*(R".
+.SH "Pod Commands"
+.IX Header "Pod Commands"
+This section is intended to supplement and clarify the discussion in
+\&\*(L"Command Paragraph\*(R" in perlpod. These are the currently recognized
+Pod commands:
+.ie n .IP """=head1"", ""=head2"", ""=head3"", ""=head4"", ""=head5"", ""=head6""" 4
+.el .IP "``=head1'', ``=head2'', ``=head3'', ``=head4'', ``=head5'', ``=head6''" 4
+.IX Item "=head1, =head2, =head3, =head4, =head5, =head6"
+This command indicates that the text in the remainder of the paragraph
+is a heading. That text may contain formatting codes. Examples:
+.Sp
+.Vb 1
+\& =head1 Object Attributes
+\&
+\& =head3 What B<Not> to Do!
+.Ve
+.Sp
+Both \f(CW\*(C`=head5\*(C'\fR and \f(CW\*(C`=head6\*(C'\fR were added in 2020 and might not be
+supported on all Pod parsers. Pod::Simple 3.41 was released on October
+2020 and supports both of these providing support for all
+Pod::Simple\-based Pod parsers.
+.ie n .IP """=pod""" 4
+.el .IP "``=pod''" 4
+.IX Item "=pod"
+This command indicates that this paragraph begins a Pod block. (If we
+are already in the middle of a Pod block, this command has no effect at
+all.) If there is any text in this command paragraph after \*(L"=pod\*(R",
+it must be ignored. Examples:
+.Sp
+.Vb 1
+\& =pod
+\&
+\& This is a plain Pod paragraph.
+\&
+\& =pod This text is ignored.
+.Ve
+.ie n .IP """=cut""" 4
+.el .IP "``=cut''" 4
+.IX Item "=cut"
+This command indicates that this line is the end of this previously
+started Pod block. If there is any text after \*(L"=cut\*(R" on the line, it must be
+ignored. Examples:
+.Sp
+.Vb 1
+\& =cut
+\&
+\& =cut The documentation ends here.
+\&
+\& =cut
+\& # This is the first line of program text.
+\& sub foo { # This is the second.
+.Ve
+.Sp
+It is an error to try to \fIstart\fR a Pod block with a \*(L"=cut\*(R" command. In
+that case, the Pod processor must halt parsing of the input file, and
+must by default emit a warning.
+.ie n .IP """=over""" 4
+.el .IP "``=over''" 4
+.IX Item "=over"
+This command indicates that this is the start of a list/indent
+region. If there is any text following the \*(L"=over\*(R", it must consist
+of only a nonzero positive numeral. The semantics of this numeral is
+explained in the \*(L"About =over...=back Regions\*(R" section, further
+below. Formatting codes are not expanded. Examples:
+.Sp
+.Vb 1
+\& =over 3
+\&
+\& =over 3.5
+\&
+\& =over
+.Ve
+.ie n .IP """=item""" 4
+.el .IP "``=item''" 4
+.IX Item "=item"
+This command indicates that an item in a list begins here. Formatting
+codes are processed. The semantics of the (optional) text in the
+remainder of this paragraph are
+explained in the \*(L"About =over...=back Regions\*(R" section, further
+below. Examples:
+.Sp
+.Vb 1
+\& =item
+\&
+\& =item *
+\&
+\& =item *
+\&
+\& =item 14
+\&
+\& =item 3.
+\&
+\& =item C<< $thing\->stuff(I<dodad>) >>
+\&
+\& =item For transporting us beyond seas to be tried for pretended
+\& offenses
+\&
+\& =item He is at this time transporting large armies of foreign
+\& mercenaries to complete the works of death, desolation and
+\& tyranny, already begun with circumstances of cruelty and perfidy
+\& scarcely paralleled in the most barbarous ages, and totally
+\& unworthy the head of a civilized nation.
+.Ve
+.ie n .IP """=back""" 4
+.el .IP "``=back''" 4
+.IX Item "=back"
+This command indicates that this is the end of the region begun
+by the most recent \*(L"=over\*(R" command. It permits no text after the
+\&\*(L"=back\*(R" command.
+.ie n .IP """=begin formatname""" 4
+.el .IP "``=begin formatname''" 4
+.IX Item "=begin formatname"
+.PD 0
+.ie n .IP """=begin formatname parameter""" 4
+.el .IP "``=begin formatname parameter''" 4
+.IX Item "=begin formatname parameter"
+.PD
+This marks the following paragraphs (until the matching \*(L"=end
+formatname\*(R") as being for some special kind of processing. Unless
+\&\*(L"formatname\*(R" begins with a colon, the contained non-command
+paragraphs are data paragraphs. But if \*(L"formatname\*(R" \fIdoes\fR begin
+with a colon, then non-command paragraphs are ordinary paragraphs
+or data paragraphs. This is discussed in detail in the section
+\&\*(L"About Data Paragraphs and \*(R"=begin/=end\*(L" Regions\*(R".
+.Sp
+It is advised that formatnames match the regexp
+\&\f(CW\*(C`m/\eA:?[\-a\-zA\-Z0\-9_]+\ez/\*(C'\fR. Everything following whitespace after the
+formatname is a parameter that may be used by the formatter when dealing
+with this region. This parameter must not be repeated in the \*(L"=end\*(R"
+paragraph. Implementors should anticipate future expansion in the
+semantics and syntax of the first parameter to \*(L"=begin\*(R"/\*(L"=end\*(R"/\*(L"=for\*(R".
+.ie n .IP """=end formatname""" 4
+.el .IP "``=end formatname''" 4
+.IX Item "=end formatname"
+This marks the end of the region opened by the matching
+\&\*(L"=begin formatname\*(R" region. If \*(L"formatname\*(R" is not the formatname
+of the most recent open \*(L"=begin formatname\*(R" region, then this
+is an error, and must generate an error message. This
+is discussed in detail in the section
+\&\*(L"About Data Paragraphs and \*(R"=begin/=end\*(L" Regions\*(R".
+.ie n .IP """=for formatname text...""" 4
+.el .IP "``=for formatname text...''" 4
+.IX Item "=for formatname text..."
+This is synonymous with:
+.Sp
+.Vb 1
+\& =begin formatname
+\&
+\& text...
+\&
+\& =end formatname
+.Ve
+.Sp
+That is, it creates a region consisting of a single paragraph; that
+paragraph is to be treated as a normal paragraph if \*(L"formatname\*(R"
+begins with a \*(L":\*(R"; if \*(L"formatname\*(R" \fIdoesn't\fR begin with a colon,
+then \*(L"text...\*(R" will constitute a data paragraph. There is no way
+to use \*(L"=for formatname text...\*(R" to express \*(L"text...\*(R" as a verbatim
+paragraph.
+.ie n .IP """=encoding encodingname""" 4
+.el .IP "``=encoding encodingname''" 4
+.IX Item "=encoding encodingname"
+This command, which should occur early in the document (at least
+before any non-US-ASCII data!), declares that this document is
+encoded in the encoding \fIencodingname\fR, which must be
+an encoding name that Encode recognizes. (Encode's list
+of supported encodings, in Encode::Supported, is useful here.)
+If the Pod parser cannot decode the declared encoding, it
+should emit a warning and may abort parsing the document
+altogether.
+.Sp
+A document having more than one \*(L"=encoding\*(R" line should be
+considered an error. Pod processors may silently tolerate this if
+the not-first \*(L"=encoding\*(R" lines are just duplicates of the
+first one (e.g., if there's a \*(L"=encoding utf8\*(R" line, and later on
+another \*(L"=encoding utf8\*(R" line). But Pod processors should complain if
+there are contradictory \*(L"=encoding\*(R" lines in the same document
+(e.g., if there is a \*(L"=encoding utf8\*(R" early in the document and
+\&\*(L"=encoding big5\*(R" later). Pod processors that recognize BOMs
+may also complain if they see an \*(L"=encoding\*(R" line
+that contradicts the \s-1BOM\s0 (e.g., if a document with a \s-1UTF\-16LE
+BOM\s0 has an \*(L"=encoding shiftjis\*(R" line).
+.PP
+If a Pod processor sees any command other than the ones listed
+above (like \*(L"=head\*(R", or \*(L"=haed1\*(R", or \*(L"=stuff\*(R", or \*(L"=cuttlefish\*(R",
+or \*(L"=w123\*(R"), that processor must by default treat this as an
+error. It must not process the paragraph beginning with that
+command, must by default warn of this as an error, and may
+abort the parse. A Pod parser may allow a way for particular
+applications to add to the above list of known commands, and to
+stipulate, for each additional command, whether formatting
+codes should be processed.
+.PP
+Future versions of this specification may add additional
+commands.
+.SH "Pod Formatting Codes"
+.IX Header "Pod Formatting Codes"
+(Note that in previous drafts of this document and of perlpod,
+formatting codes were referred to as \*(L"interior sequences\*(R", and
+this term may still be found in the documentation for Pod parsers,
+and in error messages from Pod processors.)
+.PP
+There are two syntaxes for formatting codes:
+.IP "\(bu" 4
+A formatting code starts with a capital letter (just US-ASCII [A\-Z])
+followed by a \*(L"<\*(R", any number of characters, and ending with the first
+matching \*(L">\*(R". Examples:
+.Sp
+.Vb 1
+\& That\*(Aqs what I<you> think!
+\&
+\& What\*(Aqs C<CORE::dump()> for?
+\&
+\& X<C<chmod> and C<unlink()> Under Different Operating Systems>
+.Ve
+.IP "\(bu" 4
+A formatting code starts with a capital letter (just US-ASCII [A\-Z])
+followed by two or more \*(L"<\*(R"'s, one or more whitespace characters,
+any number of characters, one or more whitespace characters,
+and ending with the first matching sequence of two or more \*(L">\*(R"'s, where
+the number of \*(L">\*(R"'s equals the number of \*(L"<\*(R"'s in the opening of this
+formatting code. Examples:
+.Sp
+.Vb 1
+\& That\*(Aqs what I<< you >> think!
+\&
+\& C<<< open(X, ">>thing.dat") || die $! >>>
+\&
+\& B<< $foo\->bar(); >>
+.Ve
+.Sp
+With this syntax, the whitespace character(s) after the \*(L"C<<<\*(R"
+and before the \*(L">>>\*(R" (or whatever letter) are \fInot\fR renderable. They
+do not signify whitespace, are merely part of the formatting codes
+themselves. That is, these are all synonymous:
+.Sp
+.Vb 7
+\& C<thing>
+\& C<< thing >>
+\& C<< thing >>
+\& C<<< thing >>>
+\& C<<<<
+\& thing
+\& >>>>
+.Ve
+.Sp
+and so on.
+.Sp
+Finally, the multiple-angle-bracket form does \fInot\fR alter the interpretation
+of nested formatting codes, meaning that the following four example lines are
+identical in meaning:
+.Sp
+.Vb 1
+\& B<example: C<$a E<lt>=E<gt> $b>>
+\&
+\& B<example: C<< $a <=> $b >>>
+\&
+\& B<example: C<< $a E<lt>=E<gt> $b >>>
+\&
+\& B<<< example: C<< $a E<lt>=E<gt> $b >> >>>
+.Ve
+.PP
+In parsing Pod, a notably tricky part is the correct parsing of
+(potentially nested!) formatting codes. Implementors should
+consult the code in the \f(CW\*(C`parse_text\*(C'\fR routine in Pod::Parser as an
+example of a correct implementation.
+.ie n .IP """I<text>"" \*(-- italic text" 4
+.el .IP "\f(CWI<text>\fR \*(-- italic text" 4
+.IX Item "I<text> italic text"
+See the brief discussion in \*(L"Formatting Codes\*(R" in perlpod.
+.ie n .IP """B<text>"" \*(-- bold text" 4
+.el .IP "\f(CWB<text>\fR \*(-- bold text" 4
+.IX Item "B<text> bold text"
+See the brief discussion in \*(L"Formatting Codes\*(R" in perlpod.
+.ie n .IP """C<code>"" \*(-- code text" 4
+.el .IP "\f(CWC<code>\fR \*(-- code text" 4
+.IX Item "C<code> code text"
+See the brief discussion in \*(L"Formatting Codes\*(R" in perlpod.
+.ie n .IP """F<filename>"" \*(-- style for filenames" 4
+.el .IP "\f(CWF<filename>\fR \*(-- style for filenames" 4
+.IX Item "F<filename> style for filenames"
+See the brief discussion in \*(L"Formatting Codes\*(R" in perlpod.
+.ie n .IP """X<topic name>"" \*(-- an index entry" 4
+.el .IP "\f(CWX<topic name>\fR \*(-- an index entry" 4
+.IX Item "X<topic name> an index entry"
+See the brief discussion in \*(L"Formatting Codes\*(R" in perlpod.
+.Sp
+This code is unusual in that most formatters completely discard
+this code and its content. Other formatters will render it with
+invisible codes that can be used in building an index of
+the current document.
+.ie n .IP """Z<>"" \*(-- a null (zero-effect) formatting code" 4
+.el .IP "\f(CWZ<>\fR \*(-- a null (zero-effect) formatting code" 4
+.IX Item "Z<> a null (zero-effect) formatting code"
+Discussed briefly in \*(L"Formatting Codes\*(R" in perlpod.
+.Sp
+This code is unusual in that it should have no content. That is,
+a processor may complain if it sees \f(CW\*(C`Z<potatoes>\*(C'\fR. Whether
+or not it complains, the \fIpotatoes\fR text should ignored.
+.ie n .IP """L<name>"" \*(-- a hyperlink" 4
+.el .IP "\f(CWL<name>\fR \*(-- a hyperlink" 4
+.IX Item "L<name> a hyperlink"
+The complicated syntaxes of this code are discussed at length in
+\&\*(L"Formatting Codes\*(R" in perlpod, and implementation details are
+discussed below, in \*(L"About L<...> Codes\*(R". Parsing the
+contents of L<content> is tricky. Notably, the content has to be
+checked for whether it looks like a \s-1URL,\s0 or whether it has to be split
+on literal \*(L"|\*(R" and/or \*(L"/\*(R" (in the right order!), and so on,
+\&\fIbefore\fR E<...> codes are resolved.
+.ie n .IP """E<escape>"" \*(-- a character escape" 4
+.el .IP "\f(CWE<escape>\fR \*(-- a character escape" 4
+.IX Item "E<escape> a character escape"
+See \*(L"Formatting Codes\*(R" in perlpod, and several points in
+\&\*(L"Notes on Implementing Pod Processors\*(R".
+.ie n .IP """S<text>"" \*(-- text contains non-breaking spaces" 4
+.el .IP "\f(CWS<text>\fR \*(-- text contains non-breaking spaces" 4
+.IX Item "S<text> text contains non-breaking spaces"
+This formatting code is syntactically simple, but semantically
+complex. What it means is that each space in the printable
+content of this code signifies a non-breaking space.
+.Sp
+Consider:
+.Sp
+.Vb 1
+\& C<$x ? $y : $z>
+\&
+\& S<C<$x ? $y : $z>>
+.Ve
+.Sp
+Both signify the monospace (c[ode] style) text consisting of
+\&\*(L"$x\*(R", one space, \*(L"?\*(R", one space, \*(L":\*(R", one space, \*(L"$z\*(R". The
+difference is that in the latter, with the S code, those spaces
+are not \*(L"normal\*(R" spaces, but instead are non-breaking spaces.
+.PP
+If a Pod processor sees any formatting code other than the ones
+listed above (as in \*(L"N<...>\*(R", or \*(L"Q<...>\*(R", etc.), that
+processor must by default treat this as an error.
+A Pod parser may allow a way for particular
+applications to add to the above list of known formatting codes;
+a Pod parser might even allow a way to stipulate, for each additional
+command, whether it requires some form of special processing, as
+L<...> does.
+.PP
+Future versions of this specification may add additional
+formatting codes.
+.PP
+Historical note: A few older Pod processors would not see a \*(L">\*(R" as
+closing a \*(L"C<\*(R" code, if the \*(L">\*(R" was immediately preceded by
+a \*(L"\-\*(R". This was so that this:
+.PP
+.Vb 1
+\& C<$foo\->bar>
+.Ve
+.PP
+would parse as equivalent to this:
+.PP
+.Vb 1
+\& C<$foo\-E<gt>bar>
+.Ve
+.PP
+instead of as equivalent to a \*(L"C\*(R" formatting code containing
+only \*(L"$foo\-\*(R", and then a \*(L"bar>\*(R" outside the \*(L"C\*(R" formatting code. This
+problem has since been solved by the addition of syntaxes like this:
+.PP
+.Vb 1
+\& C<< $foo\->bar >>
+.Ve
+.PP
+Compliant parsers must not treat \*(L"\->\*(R" as special.
+.PP
+Formatting codes absolutely cannot span paragraphs. If a code is
+opened in one paragraph, and no closing code is found by the end of
+that paragraph, the Pod parser must close that formatting code,
+and should complain (as in \*(L"Unterminated I code in the paragraph
+starting at line 123: 'Time objects are not...'\*(R"). So these
+two paragraphs:
+.PP
+.Vb 1
+\& I<I told you not to do this!
+\&
+\& Don\*(Aqt make me say it again!>
+.Ve
+.PP
+\&...must \fInot\fR be parsed as two paragraphs in italics (with the I
+code starting in one paragraph and starting in another.) Instead,
+the first paragraph should generate a warning, but that aside, the
+above code must parse as if it were:
+.PP
+.Vb 1
+\& I<I told you not to do this!>
+\&
+\& Don\*(Aqt make me say it again!E<gt>
+.Ve
+.PP
+(In SGMLish jargon, all Pod commands are like block-level
+elements, whereas all Pod formatting codes are like inline-level
+elements.)
+.SH "Notes on Implementing Pod Processors"
+.IX Header "Notes on Implementing Pod Processors"
+The following is a long section of miscellaneous requirements
+and suggestions to do with Pod processing.
+.IP "\(bu" 4
+Pod formatters should tolerate lines in verbatim blocks that are of
+any length, even if that means having to break them (possibly several
+times, for very long lines) to avoid text running off the side of the
+page. Pod formatters may warn of such line-breaking. Such warnings
+are particularly appropriate for lines are over 100 characters long, which
+are usually not intentional.
+.IP "\(bu" 4
+Pod parsers must recognize \fIall\fR of the three well-known newline
+formats: \s-1CR, LF,\s0 and \s-1CRLF.\s0 See perlport.
+.IP "\(bu" 4
+Pod parsers should accept input lines that are of any length.
+.IP "\(bu" 4
+Since Perl recognizes a Unicode Byte Order Mark at the start of files
+as signaling that the file is Unicode encoded as in \s-1UTF\-16\s0 (whether
+big-endian or little-endian) or \s-1UTF\-8,\s0 Pod parsers should do the
+same. Otherwise, the character encoding should be understood as
+being \s-1UTF\-8\s0 if the first highbit byte sequence in the file seems
+valid as a \s-1UTF\-8\s0 sequence, or otherwise as \s-1CP\-1252\s0 (earlier versions of
+this specification used Latin\-1 instead of \s-1CP\-1252\s0).
+.Sp
+Future versions of this specification may specify
+how Pod can accept other encodings. Presumably treatment of other
+encodings in Pod parsing would be as in \s-1XML\s0 parsing: whatever the
+encoding declared by a particular Pod file, content is to be
+stored in memory as Unicode characters.
+.IP "\(bu" 4
+The well known Unicode Byte Order Marks are as follows: if the
+file begins with the two literal byte values 0xFE 0xFF, this is
+the \s-1BOM\s0 for big-endian \s-1UTF\-16.\s0 If the file begins with the two
+literal byte value 0xFF 0xFE, this is the \s-1BOM\s0 for little-endian
+\&\s-1UTF\-16.\s0 On an \s-1ASCII\s0 platform, if the file begins with the three literal
+byte values
+0xEF 0xBB 0xBF, this is the \s-1BOM\s0 for \s-1UTF\-8.
+A\s0 mechanism portable to \s-1EBCDIC\s0 platforms is to:
+.Sp
+.Vb 2
+\& my $utf8_bom = "\ex{FEFF}";
+\& utf8::encode($utf8_bom);
+.Ve
+.IP "\(bu" 4
+A naive, but often sufficient heuristic on \s-1ASCII\s0 platforms, for testing
+the first highbit
+byte-sequence in a BOM-less file (whether in code or in Pod!), to see
+whether that sequence is valid as \s-1UTF\-8\s0 (\s-1RFC 2279\s0) is to check whether
+that the first byte in the sequence is in the range 0xC2 \- 0xFD
+\&\fIand\fR whether the next byte is in the range
+0x80 \- 0xBF. If so, the parser may conclude that this file is in
+\&\s-1UTF\-8,\s0 and all highbit sequences in the file should be assumed to
+be \s-1UTF\-8.\s0 Otherwise the parser should treat the file as being
+in \s-1CP\-1252.\s0 (A better check, and which works on \s-1EBCDIC\s0 platforms as
+well, is to pass a copy of the sequence to
+\&\fButf8::decode()\fR which performs a full validity check on the
+sequence and returns \s-1TRUE\s0 if it is valid \s-1UTF\-8, FALSE\s0 otherwise. This
+function is always pre-loaded, is fast because it is written in C, and
+will only get called at most once, so you don't need to avoid it out of
+performance concerns.)
+In the unlikely circumstance that the first highbit
+sequence in a truly non\-UTF\-8 file happens to appear to be \s-1UTF\-8,\s0 one
+can cater to our heuristic (as well as any more intelligent heuristic)
+by prefacing that line with a comment line containing a highbit
+sequence that is clearly \fInot\fR valid as \s-1UTF\-8.\s0 A line consisting
+of simply \*(L"#\*(R", an e\-acute, and any non-highbit byte,
+is sufficient to establish this file's encoding.
+.IP "\(bu" 4
+Pod processors must treat a \*(L"=for [label] [content...]\*(R" paragraph as
+meaning the same thing as a \*(L"=begin [label]\*(R" paragraph, content, and
+an \*(L"=end [label]\*(R" paragraph. (The parser may conflate these two
+constructs, or may leave them distinct, in the expectation that the
+formatter will nevertheless treat them the same.)
+.IP "\(bu" 4
+When rendering Pod to a format that allows comments (i.e., to nearly
+any format other than plaintext), a Pod formatter must insert comment
+text identifying its name and version number, and the name and
+version numbers of any modules it might be using to process the Pod.
+Minimal examples:
+.Sp
+.Vb 1
+\& %% POD::Pod2PS v3.14159, using POD::Parser v1.92
+\&
+\& <!\-\- Pod::HTML v3.14159, using POD::Parser v1.92 \-\->
+\&
+\& {\edoccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08}
+\&
+\& .\e" Pod::Man version 3.14159, using POD::Parser version 1.92
+.Ve
+.Sp
+Formatters may also insert additional comments, including: the
+release date of the Pod formatter program, the contact address for
+the author(s) of the formatter, the current time, the name of input
+file, the formatting options in effect, version of Perl used, etc.
+.Sp
+Formatters may also choose to note errors/warnings as comments,
+besides or instead of emitting them otherwise (as in messages to
+\&\s-1STDERR,\s0 or \f(CW\*(C`die\*(C'\fRing).
+.IP "\(bu" 4
+Pod parsers \fImay\fR emit warnings or error messages (\*(L"Unknown E code
+E<zslig>!\*(R") to \s-1STDERR\s0 (whether through printing to \s-1STDERR,\s0 or
+\&\f(CW\*(C`warn\*(C'\fRing/\f(CW\*(C`carp\*(C'\fRing, or \f(CW\*(C`die\*(C'\fRing/\f(CW\*(C`croak\*(C'\fRing), but \fImust\fR allow
+suppressing all such \s-1STDERR\s0 output, and instead allow an option for
+reporting errors/warnings
+in some other way, whether by triggering a callback, or noting errors
+in some attribute of the document object, or some similarly unobtrusive
+mechanism \*(-- or even by appending a \*(L"Pod Errors\*(R" section to the end of
+the parsed form of the document.
+.IP "\(bu" 4
+In cases of exceptionally aberrant documents, Pod parsers may abort the
+parse. Even then, using \f(CW\*(C`die\*(C'\fRing/\f(CW\*(C`croak\*(C'\fRing is to be avoided; where
+possible, the parser library may simply close the input file
+and add text like \*(L"*** Formatting Aborted ***\*(R" to the end of the
+(partial) in-memory document.
+.IP "\(bu" 4
+In paragraphs where formatting codes (like E<...>, B<...>)
+are understood (i.e., \fInot\fR verbatim paragraphs, but \fIincluding\fR
+ordinary paragraphs, and command paragraphs that produce renderable
+text, like \*(L"=head1\*(R"), literal whitespace should generally be considered
+\&\*(L"insignificant\*(R", in that one literal space has the same meaning as any
+(nonzero) number of literal spaces, literal newlines, and literal tabs
+(as long as this produces no blank lines, since those would terminate
+the paragraph). Pod parsers should compact literal whitespace in each
+processed paragraph, but may provide an option for overriding this
+(since some processing tasks do not require it), or may follow
+additional special rules (for example, specially treating
+period-space-space or period-newline sequences).
+.IP "\(bu" 4
+Pod parsers should not, by default, try to coerce apostrophe (') and
+quote (\*(L") into smart quotes (little 9's, 66's, 99's, etc), nor try to
+turn backtick (`) into anything else but a single backtick character
+(distinct from an open quote character!), nor \*(R"\-\-" into anything but
+two minus signs. They \fImust never\fR do any of those things to text
+in C<...> formatting codes, and never \fIever\fR to text in verbatim
+paragraphs.
+.IP "\(bu" 4
+When rendering Pod to a format that has two kinds of hyphens (\-), one
+that's a non-breaking hyphen, and another that's a breakable hyphen
+(as in \*(L"object-oriented\*(R", which can be split across lines as
+\&\*(L"object\-\*(R", newline, \*(L"oriented\*(R"), formatters are encouraged to
+generally translate \*(L"\-\*(R" to non-breaking hyphen, but may apply
+heuristics to convert some of these to breaking hyphens.
+.IP "\(bu" 4
+Pod formatters should make reasonable efforts to keep words of Perl
+code from being broken across lines. For example, \*(L"Foo::Bar\*(R" in some
+formatting systems is seen as eligible for being broken across lines
+as \*(L"Foo::\*(R" newline \*(L"Bar\*(R" or even \*(L"Foo::\-\*(R" newline \*(L"Bar\*(R". This should
+be avoided where possible, either by disabling all line-breaking in
+mid-word, or by wrapping particular words with internal punctuation
+in \*(L"don't break this across lines\*(R" codes (which in some formats may
+not be a single code, but might be a matter of inserting non-breaking
+zero-width spaces between every pair of characters in a word.)
+.IP "\(bu" 4
+Pod parsers should, by default, expand tabs in verbatim paragraphs as
+they are processed, before passing them to the formatter or other
+processor. Parsers may also allow an option for overriding this.
+.IP "\(bu" 4
+Pod parsers should, by default, remove newlines from the end of
+ordinary and verbatim paragraphs before passing them to the
+formatter. For example, while the paragraph you're reading now
+could be considered, in Pod source, to end with (and contain)
+the newline(s) that end it, it should be processed as ending with
+(and containing) the period character that ends this sentence.
+.IP "\(bu" 4
+Pod parsers, when reporting errors, should make some effort to report
+an approximate line number (\*(L"Nested E<>'s in Paragraph #52, near
+line 633 of Thing/Foo.pm!\*(R"), instead of merely noting the paragraph
+number (\*(L"Nested E<>'s in Paragraph #52 of Thing/Foo.pm!\*(R"). Where
+this is problematic, the paragraph number should at least be
+accompanied by an excerpt from the paragraph (\*(L"Nested E<>'s in
+Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for
+the C<interest rate> attribute...'\*(R").
+.IP "\(bu" 4
+Pod parsers, when processing a series of verbatim paragraphs one
+after another, should consider them to be one large verbatim
+paragraph that happens to contain blank lines. I.e., these two
+lines, which have a blank line between them:
+.Sp
+.Vb 1
+\& use Foo;
+\&
+\& print Foo\->VERSION
+.Ve
+.Sp
+should be unified into one paragraph (\*(L"\etuse Foo;\en\en\etprint
+Foo\->\s-1VERSION\*(R"\s0) before being passed to the formatter or other
+processor. Parsers may also allow an option for overriding this.
+.Sp
+While this might be too cumbersome to implement in event-based Pod
+parsers, it is straightforward for parsers that return parse trees.
+.IP "\(bu" 4
+Pod formatters, where feasible, are advised to avoid splitting short
+verbatim paragraphs (under twelve lines, say) across pages.
+.IP "\(bu" 4
+Pod parsers must treat a line with only spaces and/or tabs on it as a
+\&\*(L"blank line\*(R" such as separates paragraphs. (Some older parsers
+recognized only two adjacent newlines as a \*(L"blank line\*(R" but would not
+recognize a newline, a space, and a newline, as a blank line. This
+is noncompliant behavior.)
+.IP "\(bu" 4
+Authors of Pod formatters/processors should make every effort to
+avoid writing their own Pod parser. There are already several in
+\&\s-1CPAN,\s0 with a wide range of interface styles \*(-- and one of them,
+Pod::Simple, comes with modern versions of Perl.
+.IP "\(bu" 4
+Characters in Pod documents may be conveyed either as literals, or by
+number in E<n> codes, or by an equivalent mnemonic, as in
+E<eacute> which is exactly equivalent to E<233>. The numbers
+are the Latin1/Unicode values, even on \s-1EBCDIC\s0 platforms.
+.Sp
+When referring to characters by using a E<n> numeric code, numbers
+in the range 32\-126 refer to those well known US-ASCII characters (also
+defined there by Unicode, with the same meaning), which all Pod
+formatters must render faithfully. Characters whose E<> numbers
+are in the ranges 0\-31 and 127\-159 should not be used (neither as
+literals,
+nor as E<number> codes), except for the literal byte-sequences for
+newline (\s-1ASCII 13, ASCII 13 10,\s0 or \s-1ASCII 10\s0), and tab (\s-1ASCII 9\s0).
+.Sp
+Numbers in the range 160\-255 refer to Latin\-1 characters (also
+defined there by Unicode, with the same meaning). Numbers above
+255 should be understood to refer to Unicode characters.
+.IP "\(bu" 4
+Be warned
+that some formatters cannot reliably render characters outside 32\-126;
+and many are able to handle 32\-126 and 160\-255, but nothing above
+255.
+.IP "\(bu" 4
+Besides the well-known \*(L"E<lt>\*(R" and \*(L"E<gt>\*(R" codes for
+less-than and greater-than, Pod parsers must understand \*(L"E<sol>\*(R"
+for \*(L"/\*(R" (solidus, slash), and \*(L"E<verbar>\*(R" for \*(L"|\*(R" (vertical bar,
+pipe). Pod parsers should also understand \*(L"E<lchevron>\*(R" and
+\&\*(L"E<rchevron>\*(R" as legacy codes for characters 171 and 187, i.e.,
+\&\*(L"left-pointing double angle quotation mark\*(R" = \*(L"left pointing
+guillemet\*(R" and \*(L"right-pointing double angle quotation mark\*(R" = \*(L"right
+pointing guillemet\*(R". (These look like little \*(L"<<\*(R" and \*(L">>\*(R", and they
+are now preferably expressed with the \s-1HTML/XHTML\s0 codes \*(L"E<laquo>\*(R"
+and \*(L"E<raquo>\*(R".)
+.IP "\(bu" 4
+Pod parsers should understand all \*(L"E<html>\*(R" codes as defined
+in the entity declarations in the most recent \s-1XHTML\s0 specification at
+\&\f(CW\*(C`www.W3.org\*(C'\fR. Pod parsers must understand at least the entities
+that define characters in the range 160\-255 (Latin\-1). Pod parsers,
+when faced with some unknown "E<\fIidentifier\fR>" code,
+shouldn't simply replace it with nullstring (by default, at least),
+but may pass it through as a string consisting of the literal characters
+E, less-than, \fIidentifier\fR, greater-than. Or Pod parsers may offer the
+alternative option of processing such unknown
+"E<\fIidentifier\fR>\*(L" codes by firing an event especially
+for such codes, or by adding a special node-type to the in-memory
+document tree. Such \*(R"E<\fIidentifier\fR>" may have special meaning
+to some processors, or some processors may choose to add them to
+a special error report.
+.IP "\(bu" 4
+Pod parsers must also support the \s-1XHTML\s0 codes \*(L"E<quot>\*(R" for
+character 34 (doublequote, \*(L"), \*(R"E<amp>\*(L" for character 38
+(ampersand, &), and \*(R"E<apos>" for character 39 (apostrophe, ').
+.IP "\(bu" 4
+Note that in all cases of \*(L"E<whatever>\*(R", \fIwhatever\fR (whether
+an htmlname, or a number in any base) must consist only of
+alphanumeric characters \*(-- that is, \fIwhatever\fR must match
+\&\f(CW\*(C`m/\eA\ew+\ez/\*(C'\fR. So \*(L"E< 0 1 2 3 >\*(R" is invalid, because
+it contains spaces, which aren't alphanumeric characters. This
+presumably does not \fIneed\fR special treatment by a Pod processor;
+\&\*(L" 0 1 2 3 \*(R" doesn't look like a number in any base, so it would
+presumably be looked up in the table of HTML-like names. Since
+there isn't (and cannot be) an HTML-like entity called \*(L" 0 1 2 3 \*(R",
+this will be treated as an error. However, Pod processors may
+treat \*(L"E< 0 1 2 3 >\*(R" or \*(L"E<e\-acute>\*(R" as \fIsyntactically\fR
+invalid, potentially earning a different error message than the
+error message (or warning, or event) generated by a merely unknown
+(but theoretically valid) htmlname, as in \*(L"E<qacute>\*(R"
+[sic]. However, Pod parsers are not required to make this
+distinction.
+.IP "\(bu" 4
+Note that E<number> \fImust not\fR be interpreted as simply
+"codepoint \fInumber\fR in the current/native character set\*(L". It always
+means only \*(R"the character represented by codepoint \fInumber\fR in
+Unicode." (This is identical to the semantics of &#\fInumber\fR; in \s-1XML.\s0)
+.Sp
+This will likely require many formatters to have tables mapping from
+treatable Unicode codepoints (such as the \*(L"\exE9\*(R" for the e\-acute
+character) to the escape sequences or codes necessary for conveying
+such sequences in the target output format. A converter to *roff
+would, for example know that \*(L"\exE9\*(R" (whether conveyed literally, or via
+a E<...> sequence) is to be conveyed as \*(L"e\e\e*'\*(R".
+Similarly, a program rendering Pod in a Mac \s-1OS\s0 application window, would
+presumably need to know that \*(L"\exE9\*(R" maps to codepoint 142 in MacRoman
+encoding that (at time of writing) is native for Mac \s-1OS.\s0 Such
+Unicode2whatever mappings are presumably already widely available for
+common output formats. (Such mappings may be incomplete! Implementers
+are not expected to bend over backwards in an attempt to render
+Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any
+of the other weird things that Unicode can encode.) And
+if a Pod document uses a character not found in such a mapping, the
+formatter should consider it an unrenderable character.
+.IP "\(bu" 4
+If, surprisingly, the implementor of a Pod formatter can't find a
+satisfactory pre-existing table mapping from Unicode characters to
+escapes in the target format (e.g., a decent table of Unicode
+characters to *roff escapes), it will be necessary to build such a
+table. If you are in this circumstance, you should begin with the
+characters in the range 0x00A0 \- 0x00FF, which is mostly the heavily
+used accented characters. Then proceed (as patience permits and
+fastidiousness compels) through the characters that the (X)HTML
+standards groups judged important enough to merit mnemonics
+for. These are declared in the (X)HTML specifications at the
+www.W3.org site. At time of writing (September 2001), the most recent
+entity declaration files are:
+.Sp
+.Vb 3
+\& http://www.w3.org/TR/xhtml1/DTD/xhtml\-lat1.ent
+\& http://www.w3.org/TR/xhtml1/DTD/xhtml\-special.ent
+\& http://www.w3.org/TR/xhtml1/DTD/xhtml\-symbol.ent
+.Ve
+.Sp
+Then you can progress through any remaining notable Unicode characters
+in the range 0x2000\-0x204D (consult the character tables at
+www.unicode.org), and whatever else strikes your fancy. For example,
+in \fIxhtml\-symbol.ent\fR, there is the entry:
+.Sp
+.Vb 1
+\& <!ENTITY infin "&#8734;"> <!\-\- infinity, U+221E ISOtech \-\->
+.Ve
+.Sp
+While the mapping \*(L"infin\*(R" to the character \*(L"\ex{221E}\*(R" will (hopefully)
+have been already handled by the Pod parser, the presence of the
+character in this file means that it's reasonably important enough to
+include in a formatter's table that maps from notable Unicode characters
+to the codes necessary for rendering them. So for a Unicode\-to\-*roff
+mapping, for example, this would merit the entry:
+.Sp
+.Vb 1
+\& "\ex{221E}" => \*(Aq\e(in\*(Aq,
+.Ve
+.Sp
+It is eagerly hoped that in the future, increasing numbers of formats
+(and formatters) will support Unicode characters directly (as (X)HTML
+does with \f(CW\*(C`&infin;\*(C'\fR, \f(CW\*(C`&#8734;\*(C'\fR, or \f(CW\*(C`&#x221E;\*(C'\fR), reducing the need
+for idiosyncratic mappings of Unicode\-to\-\fImy_escapes\fR.
+.IP "\(bu" 4
+It is up to individual Pod formatter to display good judgement when
+confronted with an unrenderable character (which is distinct from an
+unknown E<thing> sequence that the parser couldn't resolve to
+anything, renderable or not). It is good practice to map Latin letters
+with diacritics (like \*(L"E<eacute>\*(R"/\*(L"E<233>\*(R") to the corresponding
+unaccented US-ASCII letters (like a simple character 101, \*(L"e\*(R"), but
+clearly this is often not feasible, and an unrenderable character may
+be represented as \*(L"?\*(R", or the like. In attempting a sane fallback
+(as from E<233> to \*(L"e\*(R"), Pod formatters may use the
+\&\f(CW%Latin1Code_to_fallback\fR table in Pod::Escapes, or
+Text::Unidecode, if available.
+.Sp
+For example, this Pod text:
+.Sp
+.Vb 1
+\& magic is enabled if you set C<$Currency> to \*(AqE<euro>\*(Aq.
+.Ve
+.Sp
+may be rendered as:
+"magic is enabled if you set \f(CW$Currency\fR to '\fI?\fR'\*(L" or as
+\&\*(R"magic is enabled if you set \f(CW$Currency\fR to '\fB[euro]\fR'\*(L", or as
+\&\*(R"magic is enabled if you set \f(CW$Currency\fR to '[x20AC]', etc.
+.Sp
+A Pod formatter may also note, in a comment or warning, a list of what
+unrenderable characters were encountered.
+.IP "\(bu" 4
+E<...> may freely appear in any formatting code (other than
+in another E<...> or in an Z<>). That is, \*(L"X<The
+E<euro>1,000,000 Solution>\*(R" is valid, as is \*(L"L<The
+E<euro>1,000,000 Solution|Million::Euros>\*(R".
+.IP "\(bu" 4
+Some Pod formatters output to formats that implement non-breaking
+spaces as an individual character (which I'll call \*(L"\s-1NBSP\*(R"\s0), and
+others output to formats that implement non-breaking spaces just as
+spaces wrapped in a \*(L"don't break this across lines\*(R" code. Note that
+at the level of Pod, both sorts of codes can occur: Pod can contain a
+\&\s-1NBSP\s0 character (whether as a literal, or as a \*(L"E<160>\*(R" or
+\&\*(L"E<nbsp>\*(R" code); and Pod can contain \*(L"S<foo
+I<bar> baz>\*(R" codes, where \*(L"mere spaces\*(R" (character 32) in
+such codes are taken to represent non-breaking spaces. Pod
+parsers should consider supporting the optional parsing of \*(L"S<foo
+I<bar> baz>\*(R" as if it were
+"foo\fI\s-1NBSP\s0\fRI<bar>\fI\s-1NBSP\s0\fRbaz", and, going the other way, the
+optional parsing of groups of words joined by \s-1NBSP\s0's as if each group
+were in a S<...> code, so that formatters may use the
+representation that maps best to what the output format demands.
+.IP "\(bu" 4
+Some processors may find that the \f(CW\*(C`S<...>\*(C'\fR code is easiest to
+implement by replacing each space in the parse tree under the content
+of the S, with an \s-1NBSP.\s0 But note: the replacement should apply \fInot\fR to
+spaces in \fIall\fR text, but \fIonly\fR to spaces in \fIprintable\fR text. (This
+distinction may or may not be evident in the particular tree/event
+model implemented by the Pod parser.) For example, consider this
+unusual case:
+.Sp
+.Vb 1
+\& S<L</Autoloaded Functions>>
+.Ve
+.Sp
+This means that the space in the middle of the visible link text must
+not be broken across lines. In other words, it's the same as this:
+.Sp
+.Vb 1
+\& L<"AutoloadedE<160>Functions"/Autoloaded Functions>
+.Ve
+.Sp
+However, a misapplied space-to-NBSP replacement could (wrongly)
+produce something equivalent to this:
+.Sp
+.Vb 1
+\& L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions>
+.Ve
+.Sp
+\&...which is almost definitely not going to work as a hyperlink (assuming
+this formatter outputs a format supporting hypertext).
+.Sp
+Formatters may choose to just not support the S format code,
+especially in cases where the output format simply has no \s-1NBSP\s0
+character/code and no code for \*(L"don't break this stuff across lines\*(R".
+.IP "\(bu" 4
+Besides the \s-1NBSP\s0 character discussed above, implementors are reminded
+of the existence of the other \*(L"special\*(R" character in Latin\-1, the
+\&\*(L"soft hyphen\*(R" character, also known as \*(L"discretionary hyphen\*(R",
+i.e. \f(CW\*(C`E<173>\*(C'\fR = \f(CW\*(C`E<0xAD>\*(C'\fR =
+\&\f(CW\*(C`E<shy>\*(C'\fR). This character expresses an optional hyphenation
+point. That is, it normally renders as nothing, but may render as a
+\&\*(L"\-\*(R" if a formatter breaks the word at that point. Pod formatters
+should, as appropriate, do one of the following: 1) render this with
+a code with the same meaning (e.g., \*(L"\e\-\*(R" in \s-1RTF\s0), 2) pass it through
+in the expectation that the formatter understands this character as
+such, or 3) delete it.
+.Sp
+For example:
+.Sp
+.Vb 3
+\& sigE<shy>action
+\& manuE<shy>script
+\& JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi
+.Ve
+.Sp
+These signal to a formatter that if it is to hyphenate \*(L"sigaction\*(R"
+or \*(L"manuscript\*(R", then it should be done as
+"sig\-\fI[linebreak]\fRaction\*(L" or \*(R"manu\-\fI[linebreak]\fRscript"
+(and if it doesn't hyphenate it, then the \f(CW\*(C`E<shy>\*(C'\fR doesn't
+show up at all). And if it is
+to hyphenate \*(L"Jarkko\*(R" and/or \*(L"Hietaniemi\*(R", it can do
+so only at the points where there is a \f(CW\*(C`E<shy>\*(C'\fR code.
+.Sp
+In practice, it is anticipated that this character will not be used
+often, but formatters should either support it, or delete it.
+.IP "\(bu" 4
+If you think that you want to add a new command to Pod (like, say, a
+\&\*(L"=biblio\*(R" command), consider whether you could get the same
+effect with a for or begin/end sequence: \*(L"=for biblio ...\*(R" or \*(L"=begin
+biblio\*(R" ... \*(L"=end biblio\*(R". Pod processors that don't understand
+\&\*(L"=for biblio\*(R", etc, will simply ignore it, whereas they may complain
+loudly if they see \*(L"=biblio\*(R".
+.IP "\(bu" 4
+Throughout this document, \*(L"Pod\*(R" has been the preferred spelling for
+the name of the documentation format. One may also use \*(L"\s-1POD\*(R"\s0 or
+\&\*(L"pod\*(R". For the documentation that is (typically) in the Pod
+format, you may use \*(L"pod\*(R", or \*(L"Pod\*(R", or \*(L"\s-1POD\*(R".\s0 Understanding these
+distinctions is useful; but obsessing over how to spell them, usually
+is not.
+.SH "About L<...> Codes"
+.IX Header "About L<...> Codes"
+As you can tell from a glance at perlpod, the L<...>
+code is the most complex of the Pod formatting codes. The points below
+will hopefully clarify what it means and how processors should deal
+with it.
+.IP "\(bu" 4
+In parsing an L<...> code, Pod parsers must distinguish at least
+four attributes:
+.RS 4
+.IP "First:" 4
+.IX Item "First:"
+The link-text. If there is none, this must be \f(CW\*(C`undef\*(C'\fR. (E.g., in
+\&\*(L"L<Perl Functions|perlfunc>\*(R", the link-text is \*(L"Perl Functions\*(R".
+In \*(L"L<Time::HiRes>\*(R" and even \*(L"L<|Time::HiRes>\*(R", there is no
+link text. Note that link text may contain formatting.)
+.IP "Second:" 4
+.IX Item "Second:"
+The possibly inferred link-text; i.e., if there was no real link
+text, then this is the text that we'll infer in its place. (E.g., for
+\&\*(L"L<Getopt::Std>\*(R", the inferred link text is \*(L"Getopt::Std\*(R".)
+.IP "Third:" 4
+.IX Item "Third:"
+The name or \s-1URL,\s0 or \f(CW\*(C`undef\*(C'\fR if none. (E.g., in \*(L"L<Perl
+Functions|perlfunc>\*(R", the name (also sometimes called the page)
+is \*(L"perlfunc\*(R". In \*(L"L</CAVEATS>\*(R", the name is \f(CW\*(C`undef\*(C'\fR.)
+.IP "Fourth:" 4
+.IX Item "Fourth:"
+The section (\s-1AKA\s0 \*(L"item\*(R" in older perlpods), or \f(CW\*(C`undef\*(C'\fR if none. E.g.,
+in \*(L"L<Getopt::Std/DESCRIPTION>\*(R", \*(L"\s-1DESCRIPTION\*(R"\s0 is the section. (Note
+that this is not the same as a manpage section like the \*(L"5\*(R" in \*(L"man 5
+crontab\*(R". \*(L"Section Foo\*(R" in the Pod sense means the part of the text
+that's introduced by the heading or item whose text is \*(L"Foo\*(R".)
+.RE
+.RS 4
+.Sp
+Pod parsers may also note additional attributes including:
+.IP "Fifth:" 4
+.IX Item "Fifth:"
+A flag for whether item 3 (if present) is a \s-1URL\s0 (like
+\&\*(L"http://lists.perl.org\*(R" is), in which case there should be no section
+attribute; a Pod name (like \*(L"perldoc\*(R" and \*(L"Getopt::Std\*(R" are); or
+possibly a man page name (like \*(L"\fBcrontab\fR\|(5)\*(R" is).
+.IP "Sixth:" 4
+.IX Item "Sixth:"
+The raw original L<...> content, before text is split on
+\&\*(L"|\*(R", \*(L"/\*(R", etc, and before E<...> codes are expanded.
+.RE
+.RS 4
+.Sp
+(The above were numbered only for concise reference below. It is not
+a requirement that these be passed as an actual list or array.)
+.Sp
+For example:
+.Sp
+.Vb 7
+\& L<Foo::Bar>
+\& => undef, # link text
+\& "Foo::Bar", # possibly inferred link text
+\& "Foo::Bar", # name
+\& undef, # section
+\& \*(Aqpod\*(Aq, # what sort of link
+\& "Foo::Bar" # original content
+\&
+\& L<Perlport\*(Aqs section on NL\*(Aqs|perlport/Newlines>
+\& => "Perlport\*(Aqs section on NL\*(Aqs", # link text
+\& "Perlport\*(Aqs section on NL\*(Aqs", # possibly inferred link text
+\& "perlport", # name
+\& "Newlines", # section
+\& \*(Aqpod\*(Aq, # what sort of link
+\& "Perlport\*(Aqs section on NL\*(Aqs|perlport/Newlines"
+\& # original content
+\&
+\& L<perlport/Newlines>
+\& => undef, # link text
+\& \*(Aq"Newlines" in perlport\*(Aq, # possibly inferred link text
+\& "perlport", # name
+\& "Newlines", # section
+\& \*(Aqpod\*(Aq, # what sort of link
+\& "perlport/Newlines" # original content
+\&
+\& L<crontab(5)/"DESCRIPTION">
+\& => undef, # link text
+\& \*(Aq"DESCRIPTION" in crontab(5)\*(Aq, # possibly inferred link text
+\& "crontab(5)", # name
+\& "DESCRIPTION", # section
+\& \*(Aqman\*(Aq, # what sort of link
+\& \*(Aqcrontab(5)/"DESCRIPTION"\*(Aq # original content
+\&
+\& L</Object Attributes>
+\& => undef, # link text
+\& \*(Aq"Object Attributes"\*(Aq, # possibly inferred link text
+\& undef, # name
+\& "Object Attributes", # section
+\& \*(Aqpod\*(Aq, # what sort of link
+\& "/Object Attributes" # original content
+\&
+\& L<https://www.perl.org/>
+\& => undef, # link text
+\& "https://www.perl.org/", # possibly inferred link text
+\& "https://www.perl.org/", # name
+\& undef, # section
+\& \*(Aqurl\*(Aq, # what sort of link
+\& "https://www.perl.org/" # original content
+\&
+\& L<Perl.org|https://www.perl.org/>
+\& => "Perl.org", # link text
+\& "https://www.perl.org/", # possibly inferred link text
+\& "https://www.perl.org/", # name
+\& undef, # section
+\& \*(Aqurl\*(Aq, # what sort of link
+\& "Perl.org|https://www.perl.org/" # original content
+.Ve
+.Sp
+Note that you can distinguish URL-links from anything else by the
+fact that they match \f(CW\*(C`m/\eA\ew+:[^:\es]\eS*\ez/\*(C'\fR. So
+\&\f(CW\*(C`L<http://www.perl.com>\*(C'\fR is a \s-1URL,\s0 but
+\&\f(CW\*(C`L<HTTP::Response>\*(C'\fR isn't.
+.RE
+.IP "\(bu" 4
+In case of L<...> codes with no \*(L"text|\*(R" part in them,
+older formatters have exhibited great variation in actually displaying
+the link or cross reference. For example, L<\fBcrontab\fR\|(5)> would render
+as "the \f(CWcrontab(5)\fR manpage\*(L", or \*(R"in the \f(CWcrontab(5)\fR manpage\*(L"
+or just \*(R"\f(CWcrontab(5)\fR".
+.Sp
+Pod processors must now treat \*(L"text|\*(R"\-less links as follows:
+.Sp
+.Vb 3
+\& L<name> => L<name|name>
+\& L</section> => L<"section"|/section>
+\& L<name/section> => L<"section" in name|name/section>
+.Ve
+.IP "\(bu" 4
+Note that section names might contain markup. I.e., if a section
+starts with:
+.Sp
+.Vb 1
+\& =head2 About the C<\-M> Operator
+.Ve
+.Sp
+or with:
+.Sp
+.Vb 1
+\& =item About the C<\-M> Operator
+.Ve
+.Sp
+then a link to it would look like this:
+.Sp
+.Vb 1
+\& L<somedoc/About the C<\-M> Operator>
+.Ve
+.Sp
+Formatters may choose to ignore the markup for purposes of resolving
+the link and use only the renderable characters in the section name,
+as in:
+.Sp
+.Vb 2
+\& <h1><a name="About_the_\-M_Operator">About the <code>\-M</code>
+\& Operator</h1>
+\&
+\& ...
+\&
+\& <a href="somedoc#About_the_\-M_Operator">About the <code>\-M</code>
+\& Operator" in somedoc</a>
+.Ve
+.IP "\(bu" 4
+Previous versions of perlpod distinguished \f(CW\*(C`L<name/"section">\*(C'\fR
+links from \f(CW\*(C`L<name/item>\*(C'\fR links (and their targets). These
+have been merged syntactically and semantically in the current
+specification, and \fIsection\fR can refer either to a "=head\fIn\fR Heading
+Content\*(L" command or to a \*(R"=item Item Content" command. This
+specification does not specify what behavior should be in the case
+of a given document having several things all seeming to produce the
+same \fIsection\fR identifier (e.g., in \s-1HTML,\s0 several things all producing
+the same \fIanchorname\fR in <a name="\fIanchorname\fR">...</a>
+elements). Where Pod processors can control this behavior, they should
+use the first such anchor. That is, \f(CW\*(C`L<Foo/Bar>\*(C'\fR refers to the
+\&\fIfirst\fR \*(L"Bar\*(R" section in Foo.
+.Sp
+But for some processors/formats this cannot be easily controlled; as
+with the \s-1HTML\s0 example, the behavior of multiple ambiguous
+<a name="\fIanchorname\fR">...</a> is most easily just left up to
+browsers to decide.
+.IP "\(bu" 4
+In a \f(CW\*(C`L<text|...>\*(C'\fR code, text may contain formatting codes
+for formatting or for E<...> escapes, as in:
+.Sp
+.Vb 1
+\& L<B<ummE<234>stuff>|...>
+.Ve
+.Sp
+For \f(CW\*(C`L<...>\*(C'\fR codes without a \*(L"name|\*(R" part, only
+\&\f(CW\*(C`E<...>\*(C'\fR and \f(CW\*(C`Z<>\*(C'\fR codes may occur. That is,
+authors should not use "\f(CW\*(C`L<B<Foo::Bar>>\*(C'\fR".
+.Sp
+Note, however, that formatting codes and Z<>'s can occur in any
+and all parts of an L<...> (i.e., in \fIname\fR, \fIsection\fR, \fItext\fR,
+and \fIurl\fR).
+.Sp
+Authors must not nest L<...> codes. For example, \*(L"L<The
+L<Foo::Bar> man page>\*(R" should be treated as an error.
+.IP "\(bu" 4
+Note that Pod authors may use formatting codes inside the \*(L"text\*(R"
+part of \*(L"L<text|name>\*(R" (and so on for L<text|/\*(L"sec\*(R">).
+.Sp
+In other words, this is valid:
+.Sp
+.Vb 1
+\& Go read L<the docs on C<$.>|perlvar/"$.">
+.Ve
+.Sp
+Some output formats that do allow rendering \*(L"L<...>\*(R" codes as
+hypertext, might not allow the link-text to be formatted; in
+that case, formatters will have to just ignore that formatting.
+.IP "\(bu" 4
+At time of writing, \f(CW\*(C`L<name>\*(C'\fR values are of two types:
+either the name of a Pod page like \f(CW\*(C`L<Foo::Bar>\*(C'\fR (which
+might be a real Perl module or program in an \f(CW@INC\fR / \s-1PATH\s0
+directory, or a .pod file in those places); or the name of a Unix
+man page, like \f(CW\*(C`L<crontab(5)>\*(C'\fR. In theory, \f(CW\*(C`L<chmod>\*(C'\fR
+is ambiguous between a Pod page called \*(L"chmod\*(R", or the Unix man page
+\&\*(L"chmod\*(R" (in whatever man-section). However, the presence of a string
+in parens, as in \*(L"\fBcrontab\fR\|(5)\*(R", is sufficient to signal that what
+is being discussed is not a Pod page, and so is presumably a
+Unix man page. The distinction is of no importance to many
+Pod processors, but some processors that render to hypertext formats
+may need to distinguish them in order to know how to render a
+given \f(CW\*(C`L<foo>\*(C'\fR code.
+.IP "\(bu" 4
+Previous versions of perlpod allowed for a \f(CW\*(C`L<section>\*(C'\fR syntax (as in
+\&\f(CW\*(C`L<Object Attributes>\*(C'\fR), which was not easily distinguishable from
+\&\f(CW\*(C`L<name>\*(C'\fR syntax and for \f(CW\*(C`L<"section">\*(C'\fR which was only
+slightly less ambiguous. This syntax is no longer in the specification, and
+has been replaced by the \f(CW\*(C`L</section>\*(C'\fR syntax (where the slash was
+formerly optional). Pod parsers should tolerate the \f(CW\*(C`L<"section">\*(C'\fR
+syntax, for a while at least. The suggested heuristic for distinguishing
+\&\f(CW\*(C`L<section>\*(C'\fR from \f(CW\*(C`L<name>\*(C'\fR is that if it contains any
+whitespace, it's a \fIsection\fR. Pod processors should warn about this being
+deprecated syntax.
+.SH "About =over...=back Regions"
+.IX Header "About =over...=back Regions"
+\&\*(L"=over\*(R"...\*(L"=back\*(R" regions are used for various kinds of list-like
+structures. (I use the term \*(L"region\*(R" here simply as a collective
+term for everything from the \*(L"=over\*(R" to the matching \*(L"=back\*(R".)
+.IP "\(bu" 4
+The non-zero numeric \fIindentlevel\fR in "=over \fIindentlevel\fR\*(L" ...
+\&\*(R"=back\*(L" is used for giving the formatter a clue as to how many
+\&\*(R"spaces" (ems, or roughly equivalent units) it should tab over,
+although many formatters will have to convert this to an absolute
+measurement that may not exactly match with the size of spaces (or M's)
+in the document's base font. Other formatters may have to completely
+ignore the number. The lack of any explicit \fIindentlevel\fR parameter is
+equivalent to an \fIindentlevel\fR value of 4. Pod processors may
+complain if \fIindentlevel\fR is present but is not a positive number
+matching \f(CW\*(C`m/\eA(\ed*\e.)?\ed+\ez/\*(C'\fR.
+.IP "\(bu" 4
+Authors of Pod formatters are reminded that \*(L"=over\*(R" ... \*(L"=back\*(R" may
+map to several different constructs in your output format. For
+example, in converting Pod to (X)HTML, it can map to any of
+<ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or
+<blockquote>...</blockquote>. Similarly, \*(L"=item\*(R" can map to <li> or
+<dt>.
+.IP "\(bu" 4
+Each \*(L"=over\*(R" ... \*(L"=back\*(R" region should be one of the following:
+.RS 4
+.IP "\(bu" 4
+An \*(L"=over\*(R" ... \*(L"=back\*(R" region containing only \*(L"=item *\*(R" commands,
+each followed by some number of ordinary/verbatim paragraphs, other
+nested \*(L"=over\*(R" ... \*(L"=back\*(R" regions, \*(L"=for...\*(R" paragraphs, and
+\&\*(L"=begin\*(R"...\*(L"=end\*(R" regions.
+.Sp
+(Pod processors must tolerate a bare \*(L"=item\*(R" as if it were \*(L"=item
+*\*(R".) Whether \*(L"*\*(R" is rendered as a literal asterisk, an \*(L"o\*(R", or as
+some kind of real bullet character, is left up to the Pod formatter,
+and may depend on the level of nesting.
+.IP "\(bu" 4
+An \*(L"=over\*(R" ... \*(L"=back\*(R" region containing only
+\&\f(CW\*(C`m/\eA=item\es+\ed+\e.?\es*\ez/\*(C'\fR paragraphs, each one (or each group of them)
+followed by some number of ordinary/verbatim paragraphs, other nested
+\&\*(L"=over\*(R" ... \*(L"=back\*(R" regions, \*(L"=for...\*(R" paragraphs, and/or
+\&\*(L"=begin\*(R"...\*(L"=end\*(R" codes. Note that the numbers must start at 1
+in each section, and must proceed in order and without skipping
+numbers.
+.Sp
+(Pod processors must tolerate lines like \*(L"=item 1\*(R" as if they were
+\&\*(L"=item 1.\*(R", with the period.)
+.IP "\(bu" 4
+An \*(L"=over\*(R" ... \*(L"=back\*(R" region containing only \*(L"=item [text]\*(R"
+commands, each one (or each group of them) followed by some number of
+ordinary/verbatim paragraphs, other nested \*(L"=over\*(R" ... \*(L"=back\*(R"
+regions, or \*(L"=for...\*(R" paragraphs, and \*(L"=begin\*(R"...\*(L"=end\*(R" regions.
+.Sp
+The \*(L"=item [text]\*(R" paragraph should not match
+\&\f(CW\*(C`m/\eA=item\es+\ed+\e.?\es*\ez/\*(C'\fR or \f(CW\*(C`m/\eA=item\es+\e*\es*\ez/\*(C'\fR, nor should it
+match just \f(CW\*(C`m/\eA=item\es*\ez/\*(C'\fR.
+.IP "\(bu" 4
+An \*(L"=over\*(R" ... \*(L"=back\*(R" region containing no \*(L"=item\*(R" paragraphs at
+all, and containing only some number of
+ordinary/verbatim paragraphs, and possibly also some nested \*(L"=over\*(R"
+\&... \*(L"=back\*(R" regions, \*(L"=for...\*(R" paragraphs, and \*(L"=begin\*(R"...\*(L"=end\*(R"
+regions. Such an itemless \*(L"=over\*(R" ... \*(L"=back\*(R" region in Pod is
+equivalent in meaning to a \*(L"<blockquote>...</blockquote>\*(R" element in
+\&\s-1HTML.\s0
+.RE
+.RS 4
+.Sp
+Note that with all the above cases, you can determine which type of
+\&\*(L"=over\*(R" ... \*(L"=back\*(R" you have, by examining the first (non\-\*(L"=cut\*(R",
+non\-\*(L"=pod\*(R") Pod paragraph after the \*(L"=over\*(R" command.
+.RE
+.IP "\(bu" 4
+Pod formatters \fImust\fR tolerate arbitrarily large amounts of text
+in the "=item \fItext...\fR" paragraph. In practice, most such
+paragraphs are short, as in:
+.Sp
+.Vb 1
+\& =item For cutting off our trade with all parts of the world
+.Ve
+.Sp
+But they may be arbitrarily long:
+.Sp
+.Vb 2
+\& =item For transporting us beyond seas to be tried for pretended
+\& offenses
+\&
+\& =item He is at this time transporting large armies of foreign
+\& mercenaries to complete the works of death, desolation and
+\& tyranny, already begun with circumstances of cruelty and perfidy
+\& scarcely paralleled in the most barbarous ages, and totally
+\& unworthy the head of a civilized nation.
+.Ve
+.IP "\(bu" 4
+Pod processors should tolerate \*(L"=item *\*(R" / "=item \fInumber\fR" commands
+with no accompanying paragraph. The middle item is an example:
+.Sp
+.Vb 1
+\& =over
+\&
+\& =item 1
+\&
+\& Pick up dry cleaning.
+\&
+\& =item 2
+\&
+\& =item 3
+\&
+\& Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs.
+\&
+\& =back
+.Ve
+.IP "\(bu" 4
+No \*(L"=over\*(R" ... \*(L"=back\*(R" region can contain headings. Processors may
+treat such a heading as an error.
+.IP "\(bu" 4
+Note that an \*(L"=over\*(R" ... \*(L"=back\*(R" region should have some
+content. That is, authors should not have an empty region like this:
+.Sp
+.Vb 1
+\& =over
+\&
+\& =back
+.Ve
+.Sp
+Pod processors seeing such a contentless \*(L"=over\*(R" ... \*(L"=back\*(R" region,
+may ignore it, or may report it as an error.
+.IP "\(bu" 4
+Processors must tolerate an \*(L"=over\*(R" list that goes off the end of the
+document (i.e., which has no matching \*(L"=back\*(R"), but they may warn
+about such a list.
+.IP "\(bu" 4
+Authors of Pod formatters should note that this construct:
+.Sp
+.Vb 1
+\& =item Neque
+\&
+\& =item Porro
+\&
+\& =item Quisquam Est
+\&
+\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
+\& velit, sed quia non numquam eius modi tempora incidunt ut
+\& labore et dolore magnam aliquam quaerat voluptatem.
+\&
+\& =item Ut Enim
+.Ve
+.Sp
+is semantically ambiguous, in a way that makes formatting decisions
+a bit difficult. On the one hand, it could be mention of an item
+\&\*(L"Neque\*(R", mention of another item \*(L"Porro\*(R", and mention of another
+item \*(L"Quisquam Est\*(R", with just the last one requiring the explanatory
+paragraph \*(L"Qui dolorem ipsum quia dolor...\*(R"; and then an item
+\&\*(L"Ut Enim\*(R". In that case, you'd want to format it like so:
+.Sp
+.Vb 1
+\& Neque
+\&
+\& Porro
+\&
+\& Quisquam Est
+\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
+\& velit, sed quia non numquam eius modi tempora incidunt ut
+\& labore et dolore magnam aliquam quaerat voluptatem.
+\&
+\& Ut Enim
+.Ve
+.Sp
+But it could equally well be a discussion of three (related or equivalent)
+items, \*(L"Neque\*(R", \*(L"Porro\*(R", and \*(L"Quisquam Est\*(R", followed by a paragraph
+explaining them all, and then a new item \*(L"Ut Enim\*(R". In that case, you'd
+probably want to format it like so:
+.Sp
+.Vb 6
+\& Neque
+\& Porro
+\& Quisquam Est
+\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
+\& velit, sed quia non numquam eius modi tempora incidunt ut
+\& labore et dolore magnam aliquam quaerat voluptatem.
+\&
+\& Ut Enim
+.Ve
+.Sp
+But (for the foreseeable future), Pod does not provide any way for Pod
+authors to distinguish which grouping is meant by the above
+\&\*(L"=item\*(R"\-cluster structure. So formatters should format it like so:
+.Sp
+.Vb 1
+\& Neque
+\&
+\& Porro
+\&
+\& Quisquam Est
+\&
+\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
+\& velit, sed quia non numquam eius modi tempora incidunt ut
+\& labore et dolore magnam aliquam quaerat voluptatem.
+\&
+\& Ut Enim
+.Ve
+.Sp
+That is, there should be (at least roughly) equal spacing between
+items as between paragraphs (although that spacing may well be less
+than the full height of a line of text). This leaves it to the reader
+to use (con)textual cues to figure out whether the \*(L"Qui dolorem
+ipsum...\*(R" paragraph applies to the \*(L"Quisquam Est\*(R" item or to all three
+items \*(L"Neque\*(R", \*(L"Porro\*(R", and \*(L"Quisquam Est\*(R". While not an ideal
+situation, this is preferable to providing formatting cues that may
+be actually contrary to the author's intent.
+.ie n .SH "About Data Paragraphs and ""=begin/=end"" Regions"
+.el .SH "About Data Paragraphs and ``=begin/=end'' Regions"
+.IX Header "About Data Paragraphs and =begin/=end Regions"
+Data paragraphs are typically used for inlining non-Pod data that is
+to be used (typically passed through) when rendering the document to
+a specific format:
+.PP
+.Vb 1
+\& =begin rtf
+\&
+\& \epar{\epard\eqr\esa4500{\ei Printed\e~\echdate\e~\echtime}\epar}
+\&
+\& =end rtf
+.Ve
+.PP
+The exact same effect could, incidentally, be achieved with a single
+\&\*(L"=for\*(R" paragraph:
+.PP
+.Vb 1
+\& =for rtf \epar{\epard\eqr\esa4500{\ei Printed\e~\echdate\e~\echtime}\epar}
+.Ve
+.PP
+(Although that is not formally a data paragraph, it has the same
+meaning as one, and Pod parsers may parse it as one.)
+.PP
+Another example of a data paragraph:
+.PP
+.Vb 1
+\& =begin html
+\&
+\& I like <em>PIE</em>!
+\&
+\& <hr>Especially pecan pie!
+\&
+\& =end html
+.Ve
+.PP
+If these were ordinary paragraphs, the Pod parser would try to
+expand the \*(L"E</em>\*(R" (in the first paragraph) as a formatting
+code, just like \*(L"E<lt>\*(R" or \*(L"E<eacute>\*(R". But since this
+is in a "=begin \fIidentifier\fR\*(L"...\*(R"=end \fIidentifier\fR" region \fIand\fR
+the identifier \*(L"html\*(R" doesn't begin have a \*(L":\*(R" prefix, the contents
+of this region are stored as data paragraphs, instead of being
+processed as ordinary paragraphs (or if they began with a spaces
+and/or tabs, as verbatim paragraphs).
+.PP
+As a further example: At time of writing, no \*(L"biblio\*(R" identifier is
+supported, but suppose some processor were written to recognize it as
+a way of (say) denoting a bibliographic reference (necessarily
+containing formatting codes in ordinary paragraphs). The fact that
+\&\*(L"biblio\*(R" paragraphs were meant for ordinary processing would be
+indicated by prefacing each \*(L"biblio\*(R" identifier with a colon:
+.PP
+.Vb 1
+\& =begin :biblio
+\&
+\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
+\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ.
+\&
+\& =end :biblio
+.Ve
+.PP
+This would signal to the parser that paragraphs in this begin...end
+region are subject to normal handling as ordinary/verbatim paragraphs
+(while still tagged as meant only for processors that understand the
+\&\*(L"biblio\*(R" identifier). The same effect could be had with:
+.PP
+.Vb 3
+\& =for :biblio
+\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
+\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ.
+.Ve
+.PP
+The \*(L":\*(R" on these identifiers means simply \*(L"process this stuff
+normally, even though the result will be for some special target\*(R".
+I suggest that parser APIs report \*(L"biblio\*(R" as the target identifier,
+but also report that it had a \*(L":\*(R" prefix. (And similarly, with the
+above \*(L"html\*(R", report \*(L"html\*(R" as the target identifier, and note the
+\&\fIlack\fR of a \*(L":\*(R" prefix.)
+.PP
+Note that a "=begin \fIidentifier\fR\*(L"...\*(R"=end \fIidentifier\fR" region where
+\&\fIidentifier\fR begins with a colon, \fIcan\fR contain commands. For example:
+.PP
+.Vb 1
+\& =begin :biblio
+\&
+\& Wirth\*(Aqs classic is available in several editions, including:
+\&
+\& =for comment
+\& hm, check abebooks.com for how much used copies cost.
+\&
+\& =over
+\&
+\& =item
+\&
+\& Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
+\& Teubner, Stuttgart. [Yes, it\*(Aqs in German.]
+\&
+\& =item
+\&
+\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
+\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ.
+\&
+\& =back
+\&
+\& =end :biblio
+.Ve
+.PP
+Note, however, a "=begin \fIidentifier\fR\*(L"...\*(R"=end \fIidentifier\fR"
+region where \fIidentifier\fR does \fInot\fR begin with a colon, should not
+directly contain \*(L"=head1\*(R" ... \*(L"=head4\*(R" commands, nor \*(L"=over\*(R", nor \*(L"=back\*(R",
+nor \*(L"=item\*(R". For example, this may be considered invalid:
+.PP
+.Vb 1
+\& =begin somedata
+\&
+\& This is a data paragraph.
+\&
+\& =head1 Don\*(Aqt do this!
+\&
+\& This is a data paragraph too.
+\&
+\& =end somedata
+.Ve
+.PP
+A Pod processor may signal that the above (specifically the \*(L"=head1\*(R"
+paragraph) is an error. Note, however, that the following should
+\&\fInot\fR be treated as an error:
+.PP
+.Vb 1
+\& =begin somedata
+\&
+\& This is a data paragraph.
+\&
+\& =cut
+\&
+\& # Yup, this isn\*(Aqt Pod anymore.
+\& sub excl { (rand() > .5) ? "hoo!" : "hah!" }
+\&
+\& =pod
+\&
+\& This is a data paragraph too.
+\&
+\& =end somedata
+.Ve
+.PP
+And this too is valid:
+.PP
+.Vb 1
+\& =begin someformat
+\&
+\& This is a data paragraph.
+\&
+\& And this is a data paragraph.
+\&
+\& =begin someotherformat
+\&
+\& This is a data paragraph too.
+\&
+\& And this is a data paragraph too.
+\&
+\& =begin :yetanotherformat
+\&
+\& =head2 This is a command paragraph!
+\&
+\& This is an ordinary paragraph!
+\&
+\& And this is a verbatim paragraph!
+\&
+\& =end :yetanotherformat
+\&
+\& =end someotherformat
+\&
+\& Another data paragraph!
+\&
+\& =end someformat
+.Ve
+.PP
+The contents of the above \*(L"=begin :yetanotherformat\*(R" ...
+\&\*(L"=end :yetanotherformat\*(R" region \fIaren't\fR data paragraphs, because
+the immediately containing region's identifier (\*(L":yetanotherformat\*(R")
+begins with a colon. In practice, most regions that contain
+data paragraphs will contain \fIonly\fR data paragraphs; however,
+the above nesting is syntactically valid as Pod, even if it is
+rare. However, the handlers for some formats, like \*(L"html\*(R",
+will accept only data paragraphs, not nested regions; and they may
+complain if they see (targeted for them) nested regions, or commands,
+other than \*(L"=end\*(R", \*(L"=pod\*(R", and \*(L"=cut\*(R".
+.PP
+Also consider this valid structure:
+.PP
+.Vb 1
+\& =begin :biblio
+\&
+\& Wirth\*(Aqs classic is available in several editions, including:
+\&
+\& =over
+\&
+\& =item
+\&
+\& Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
+\& Teubner, Stuttgart. [Yes, it\*(Aqs in German.]
+\&
+\& =item
+\&
+\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
+\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ.
+\&
+\& =back
+\&
+\& Buy buy buy!
+\&
+\& =begin html
+\&
+\& <img src=\*(Aqwirth_spokesmodeling_book.png\*(Aq>
+\&
+\& <hr>
+\&
+\& =end html
+\&
+\& Now now now!
+\&
+\& =end :biblio
+.Ve
+.PP
+There, the \*(L"=begin html\*(R"...\*(L"=end html\*(R" region is nested inside
+the larger \*(L"=begin :biblio\*(R"...\*(L"=end :biblio\*(R" region. Note that the
+content of the \*(L"=begin html\*(R"...\*(L"=end html\*(R" region is data
+paragraph(s), because the immediately containing region's identifier
+(\*(L"html\*(R") \fIdoesn't\fR begin with a colon.
+.PP
+Pod parsers, when processing a series of data paragraphs one
+after another (within a single region), should consider them to
+be one large data paragraph that happens to contain blank lines. So
+the content of the above \*(L"=begin html\*(R"...\*(L"=end html\*(R" \fImay\fR be stored
+as two data paragraphs (one consisting of
+\&\*(L"<img src='wirth_spokesmodeling_book.png'>\en\*(R"
+and another consisting of \*(L"<hr>\en\*(R"), but \fIshould\fR be stored as
+a single data paragraph (consisting of
+\&\*(L"<img src='wirth_spokesmodeling_book.png'>\en\en<hr>\en\*(R").
+.PP
+Pod processors should tolerate empty
+"=begin \fIsomething\fR\*(L"...\*(R"=end \fIsomething\fR\*(L" regions,
+empty \*(R"=begin :\fIsomething\fR\*(L"...\*(R"=end :\fIsomething\fR\*(L" regions, and
+contentless \*(R"=for \fIsomething\fR\*(L" and \*(R"=for :\fIsomething\fR"
+paragraphs. I.e., these should be tolerated:
+.PP
+.Vb 1
+\& =for html
+\&
+\& =begin html
+\&
+\& =end html
+\&
+\& =begin :biblio
+\&
+\& =end :biblio
+.Ve
+.PP
+Incidentally, note that there's no easy way to express a data
+paragraph starting with something that looks like a command. Consider:
+.PP
+.Vb 1
+\& =begin stuff
+\&
+\& =shazbot
+\&
+\& =end stuff
+.Ve
+.PP
+There, \*(L"=shazbot\*(R" will be parsed as a Pod command \*(L"shazbot\*(R", not as a data
+paragraph \*(L"=shazbot\en\*(R". However, you can express a data paragraph consisting
+of \*(L"=shazbot\en\*(R" using this code:
+.PP
+.Vb 1
+\& =for stuff =shazbot
+.Ve
+.PP
+The situation where this is necessary, is presumably quite rare.
+.PP
+Note that =end commands must match the currently open =begin command. That
+is, they must properly nest. For example, this is valid:
+.PP
+.Vb 1
+\& =begin outer
+\&
+\& X
+\&
+\& =begin inner
+\&
+\& Y
+\&
+\& =end inner
+\&
+\& Z
+\&
+\& =end outer
+.Ve
+.PP
+while this is invalid:
+.PP
+.Vb 1
+\& =begin outer
+\&
+\& X
+\&
+\& =begin inner
+\&
+\& Y
+\&
+\& =end outer
+\&
+\& Z
+\&
+\& =end inner
+.Ve
+.PP
+This latter is improper because when the \*(L"=end outer\*(R" command is seen, the
+currently open region has the formatname \*(L"inner\*(R", not \*(L"outer\*(R". (It just
+happens that \*(L"outer\*(R" is the format name of a higher-up region.) This is
+an error. Processors must by default report this as an error, and may halt
+processing the document containing that error. A corollary of this is that
+regions cannot \*(L"overlap\*(R". That is, the latter block above does not represent
+a region called \*(L"outer\*(R" which contains X and Y, overlapping a region called
+\&\*(L"inner\*(R" which contains Y and Z. But because it is invalid (as all
+apparently overlapping regions would be), it doesn't represent that, or
+anything at all.
+.PP
+Similarly, this is invalid:
+.PP
+.Vb 1
+\& =begin thing
+\&
+\& =end hting
+.Ve
+.PP
+This is an error because the region is opened by \*(L"thing\*(R", and the \*(L"=end\*(R"
+tries to close \*(L"hting\*(R" [sic].
+.PP
+This is also invalid:
+.PP
+.Vb 1
+\& =begin thing
+\&
+\& =end
+.Ve
+.PP
+This is invalid because every \*(L"=end\*(R" command must have a formatname
+parameter.
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+perlpod, \*(L"PODs: Embedded Documentation\*(R" in perlsyn,
+podchecker
+.SH "AUTHOR"
+.IX Header "AUTHOR"
+Sean M. Burke