summaryrefslogtreecommitdiffstats
path: root/upstream/debian-unstable/man1/perlpodspec.1
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 19:43:11 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 19:43:11 +0000
commitfc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch)
treece1e3bce06471410239a6f41282e328770aa404a /upstream/debian-unstable/man1/perlpodspec.1
parentInitial commit. (diff)
downloadmanpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz
manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip
Adding upstream version 4.22.0.upstream/4.22.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'upstream/debian-unstable/man1/perlpodspec.1')
-rw-r--r--upstream/debian-unstable/man1/perlpodspec.11884
1 files changed, 1884 insertions, 0 deletions
diff --git a/upstream/debian-unstable/man1/perlpodspec.1 b/upstream/debian-unstable/man1/perlpodspec.1
new file mode 100644
index 00000000..676a48ef
--- /dev/null
+++ b/upstream/debian-unstable/man1/perlpodspec.1
@@ -0,0 +1,1884 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+. ds C` ""
+. ds C' ""
+'br\}
+.el\{\
+. ds C`
+. ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD. Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+. if \nF \{\
+. de IX
+. tm Index:\\$1\t\\n%\t"\\$2"
+..
+. if !\nF==2 \{\
+. nr % 0
+. nr F 2
+. \}
+. \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "PERLPODSPEC 1"
+.TH PERLPODSPEC 1 2024-01-12 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification. Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+perlpodspec \- Plain Old Documentation: format specification and notes
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+This document is detailed notes on the Pod markup language. Most
+people will only have to read perlpod to know how to write
+in Pod, but this document may answer some incidental questions to do
+with parsing and rendering Pod.
+.PP
+In this document, "must" / "must not", "should" /
+"should not", and "may" have their conventional (cf. RFC 2119)
+meanings: "X must do Y" means that if X doesn't do Y, it's against
+this specification, and should really be fixed. "X should do Y"
+means that it's recommended, but X may fail to do Y, if there's a
+good reason. "X may do Y" is merely a note that X can do Y at
+will (although it is up to the reader to detect any connotation of
+"and I think it would be \fInice\fR if X did Y" versus "it wouldn't
+really \fIbother\fR me if X did Y").
+.PP
+Notably, when I say "the parser should do Y", the
+parser may fail to do Y, if the calling application explicitly
+requests that the parser \fInot\fR do Y. I often phrase this as
+"the parser should, by default, do Y." This doesn't \fIrequire\fR
+the parser to provide an option for turning off whatever
+feature Y is (like expanding tabs in verbatim paragraphs), although
+it implicates that such an option \fImay\fR be provided.
+.SH "Pod Definitions"
+.IX Header "Pod Definitions"
+Pod is embedded in files, typically Perl source files, although you
+can write a file that's nothing but Pod.
+.PP
+A \fBline\fR in a file consists of zero or more non-newline characters,
+terminated by either a newline or the end of the file.
+.PP
+A \fBnewline sequence\fR is usually a platform-dependent concept, but
+Pod parsers should understand it to mean any of CR (ASCII 13), LF
+(ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in
+addition to any other system-specific meaning. The first CR/CRLF/LF
+sequence in the file may be used as the basis for identifying the
+newline sequence for parsing the rest of the file.
+.PP
+A \fBblank line\fR is a line consisting entirely of zero or more spaces
+(ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file.
+A \fBnon-blank line\fR is a line containing one or more characters other
+than space or tab (and terminated by a newline or end-of-file).
+.PP
+(\fINote:\fR Many older Pod parsers did not accept a line consisting of
+spaces/tabs and then a newline as a blank line. The only lines they
+considered blank were lines consisting of \fIno characters at all\fR,
+terminated by a newline.)
+.PP
+\&\fBWhitespace\fR is used in this document as a blanket term for spaces,
+tabs, and newline sequences. (By itself, this term usually refers
+to literal whitespace. That is, sequences of whitespace characters
+in Pod source, as opposed to "E<32>", which is a formatting
+code that \fIdenotes\fR a whitespace character.)
+.PP
+A \fBPod parser\fR is a module meant for parsing Pod (regardless of
+whether this involves calling callbacks or building a parse tree or
+directly formatting it). A \fBPod formatter\fR (or \fBPod translator\fR)
+is a module or program that converts Pod to some other format (HTML,
+plaintext, TeX, PostScript, RTF). A \fBPod processor\fR might be a
+formatter or translator, or might be a program that does something
+else with the Pod (like counting words, scanning for index points,
+etc.).
+.PP
+Pod content is contained in \fBPod blocks\fR. A Pod block starts with a
+line that matches \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR, and continues up to the next line
+that matches \f(CW\*(C`m/\eA=cut/\*(C'\fR or up to the end of the file if there is
+no \f(CW\*(C`m/\eA=cut/\*(C'\fR line.
+.PP
+Note that a parser is not expected to distinguish between something that
+looks like pod, but is in a quoted string, such as a here document.
+.PP
+Within a Pod block, there are \fBPod paragraphs\fR. A Pod paragraph
+consists of non-blank lines of text, separated by one or more blank
+lines.
+.PP
+For purposes of Pod processing, there are four types of paragraphs in
+a Pod block:
+.IP \(bu 4
+A command paragraph (also called a "directive"). The first line of
+this paragraph must match \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR. Command paragraphs are
+typically one line, as in:
+.Sp
+.Vb 1
+\& =head1 NOTES
+\&
+\& =item *
+.Ve
+.Sp
+But they may span several (non-blank) lines:
+.Sp
+.Vb 3
+\& =for comment
+\& Hm, I wonder what it would look like if
+\& you tried to write a BNF for Pod from this.
+\&
+\& =head3 Dr. Strangelove, or: How I Learned to
+\& Stop Worrying and Love the Bomb
+.Ve
+.Sp
+\&\fISome\fR command paragraphs allow formatting codes in their content
+(i.e., after the part that matches \f(CW\*(C`m/\eA=[a\-zA\-Z]\eS*\es*/\*(C'\fR), as in:
+.Sp
+.Vb 1
+\& =head1 Did You Remember to C<use strict;>?
+.Ve
+.Sp
+In other words, the Pod processing handler for "head1" will apply the
+same processing to "Did You Remember to C<use strict;>?" that it
+would to an ordinary paragraph (i.e., formatting codes like
+"C<...>") are parsed and presumably formatted appropriately, and
+whitespace in the form of literal spaces and/or tabs is not
+significant.
+.IP \(bu 4
+A \fBverbatim paragraph\fR. The first line of this paragraph must be a
+literal space or tab, and this paragraph must not be inside a "=begin
+\&\fIidentifier\fR", ... "=end \fIidentifier\fR" sequence unless
+"\fIidentifier\fR" begins with a colon (":"). That is, if a paragraph
+starts with a literal space or tab, but \fIis\fR inside a
+"=begin \fIidentifier\fR", ... "=end \fIidentifier\fR" region, then it's
+a data paragraph, unless "\fIidentifier\fR" begins with a colon.
+.Sp
+Whitespace \fIis\fR significant in verbatim paragraphs (although, in
+processing, tabs are probably expanded).
+.IP \(bu 4
+An \fBordinary paragraph\fR. A paragraph is an ordinary paragraph
+if its first line matches neither \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR nor
+\&\f(CW\*(C`m/\eA[ \et]/\*(C'\fR, \fIand\fR if it's not inside a "=begin \fIidentifier\fR",
+\&... "=end \fIidentifier\fR" sequence unless "\fIidentifier\fR" begins with
+a colon (":").
+.IP \(bu 4
+A \fBdata paragraph\fR. This is a paragraph that \fIis\fR inside a "=begin
+\&\fIidentifier\fR" ... "=end \fIidentifier\fR" sequence where
+"\fIidentifier\fR" does \fInot\fR begin with a literal colon (":"). In
+some sense, a data paragraph is not part of Pod at all (i.e.,
+effectively it's "out-of-band"), since it's not subject to most kinds
+of Pod parsing; but it is specified here, since Pod
+parsers need to be able to call an event for it, or store it in some
+form in a parse tree, or at least just parse \fIaround\fR it.
+.PP
+For example: consider the following paragraphs:
+.PP
+.Vb 1
+\& # <\- that\*(Aqs the 0th column
+\&
+\& =head1 Foo
+\&
+\& Stuff
+\&
+\& $foo\->bar
+\&
+\& =cut
+.Ve
+.PP
+Here, "=head1 Foo" and "=cut" are command paragraphs because the first
+line of each matches \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR. "\fI[space][space]\fR\f(CW$foo\fR\->bar"
+is a verbatim paragraph, because its first line starts with a literal
+whitespace character (and there's no "=begin"..."=end" region around).
+.PP
+The "=begin \fIidentifier\fR" ... "=end \fIidentifier\fR" commands stop
+paragraphs that they surround from being parsed as ordinary or verbatim
+paragraphs, if \fIidentifier\fR doesn't begin with a colon. This
+is discussed in detail in the section
+"About Data Paragraphs and "=begin/=end" Regions".
+.SH "Pod Commands"
+.IX Header "Pod Commands"
+This section is intended to supplement and clarify the discussion in
+"Command Paragraph" in perlpod. These are the currently recognized
+Pod commands:
+.IP """=head1"", ""=head2"", ""=head3"", ""=head4"", ""=head5"", ""=head6""" 4
+.IX Item """=head1"", ""=head2"", ""=head3"", ""=head4"", ""=head5"", ""=head6"""
+This command indicates that the text in the remainder of the paragraph
+is a heading. That text may contain formatting codes. Examples:
+.Sp
+.Vb 1
+\& =head1 Object Attributes
+\&
+\& =head3 What B<Not> to Do!
+.Ve
+.Sp
+Both \f(CW\*(C`=head5\*(C'\fR and \f(CW\*(C`=head6\*(C'\fR were added in 2020 and might not be
+supported on all Pod parsers. Pod::Simple 3.41 was released on October
+2020 and supports both of these providing support for all
+Pod::Simple\-based Pod parsers.
+.IP """=pod""" 4
+.IX Item """=pod"""
+This command indicates that this paragraph begins a Pod block. (If we
+are already in the middle of a Pod block, this command has no effect at
+all.) If there is any text in this command paragraph after "=pod",
+it must be ignored. Examples:
+.Sp
+.Vb 1
+\& =pod
+\&
+\& This is a plain Pod paragraph.
+\&
+\& =pod This text is ignored.
+.Ve
+.IP """=cut""" 4
+.IX Item """=cut"""
+This command indicates that this line is the end of this previously
+started Pod block. If there is any text after "=cut" on the line, it must be
+ignored. Examples:
+.Sp
+.Vb 1
+\& =cut
+\&
+\& =cut The documentation ends here.
+\&
+\& =cut
+\& # This is the first line of program text.
+\& sub foo { # This is the second.
+.Ve
+.Sp
+It is an error to try to \fIstart\fR a Pod block with a "=cut" command. In
+that case, the Pod processor must halt parsing of the input file, and
+must by default emit a warning.
+.IP """=over""" 4
+.IX Item """=over"""
+This command indicates that this is the start of a list/indent
+region. If there is any text following the "=over", it must consist
+of only a nonzero positive numeral. The semantics of this numeral is
+explained in the "About =over...=back Regions" section, further
+below. Formatting codes are not expanded. Examples:
+.Sp
+.Vb 1
+\& =over 3
+\&
+\& =over 3.5
+\&
+\& =over
+.Ve
+.IP """=item""" 4
+.IX Item """=item"""
+This command indicates that an item in a list begins here. Formatting
+codes are processed. The semantics of the (optional) text in the
+remainder of this paragraph are
+explained in the "About =over...=back Regions" section, further
+below. Examples:
+.Sp
+.Vb 1
+\& =item
+\&
+\& =item *
+\&
+\& =item *
+\&
+\& =item 14
+\&
+\& =item 3.
+\&
+\& =item C<< $thing\->stuff(I<dodad>) >>
+\&
+\& =item For transporting us beyond seas to be tried for pretended
+\& offenses
+\&
+\& =item He is at this time transporting large armies of foreign
+\& mercenaries to complete the works of death, desolation and
+\& tyranny, already begun with circumstances of cruelty and perfidy
+\& scarcely paralleled in the most barbarous ages, and totally
+\& unworthy the head of a civilized nation.
+.Ve
+.IP """=back""" 4
+.IX Item """=back"""
+This command indicates that this is the end of the region begun
+by the most recent "=over" command. It permits no text after the
+"=back" command.
+.IP """=begin formatname""" 4
+.IX Item """=begin formatname"""
+.PD 0
+.IP """=begin formatname parameter""" 4
+.IX Item """=begin formatname parameter"""
+.PD
+This marks the following paragraphs (until the matching "=end
+formatname") as being for some special kind of processing. Unless
+"formatname" begins with a colon, the contained non-command
+paragraphs are data paragraphs. But if "formatname" \fIdoes\fR begin
+with a colon, then non-command paragraphs are ordinary paragraphs
+or data paragraphs. This is discussed in detail in the section
+"About Data Paragraphs and "=begin/=end" Regions".
+.Sp
+It is advised that formatnames match the regexp
+\&\f(CW\*(C`m/\eA:?[\-a\-zA\-Z0\-9_]+\ez/\*(C'\fR. Everything following whitespace after the
+formatname is a parameter that may be used by the formatter when dealing
+with this region. This parameter must not be repeated in the "=end"
+paragraph. Implementors should anticipate future expansion in the
+semantics and syntax of the first parameter to "=begin"/"=end"/"=for".
+.IP """=end formatname""" 4
+.IX Item """=end formatname"""
+This marks the end of the region opened by the matching
+"=begin formatname" region. If "formatname" is not the formatname
+of the most recent open "=begin formatname" region, then this
+is an error, and must generate an error message. This
+is discussed in detail in the section
+"About Data Paragraphs and "=begin/=end" Regions".
+.IP """=for formatname text...""" 4
+.IX Item """=for formatname text..."""
+This is synonymous with:
+.Sp
+.Vb 1
+\& =begin formatname
+\&
+\& text...
+\&
+\& =end formatname
+.Ve
+.Sp
+That is, it creates a region consisting of a single paragraph; that
+paragraph is to be treated as a normal paragraph if "formatname"
+begins with a ":"; if "formatname" \fIdoesn't\fR begin with a colon,
+then "text..." will constitute a data paragraph. There is no way
+to use "=for formatname text..." to express "text..." as a verbatim
+paragraph.
+.IP """=encoding encodingname""" 4
+.IX Item """=encoding encodingname"""
+This command, which should occur early in the document (at least
+before any non-US-ASCII data!), declares that this document is
+encoded in the encoding \fIencodingname\fR, which must be
+an encoding name that Encode recognizes. (Encode's list
+of supported encodings, in Encode::Supported, is useful here.)
+If the Pod parser cannot decode the declared encoding, it
+should emit a warning and may abort parsing the document
+altogether.
+.Sp
+A document having more than one "=encoding" line should be
+considered an error. Pod processors may silently tolerate this if
+the not-first "=encoding" lines are just duplicates of the
+first one (e.g., if there's a "=encoding utf8" line, and later on
+another "=encoding utf8" line). But Pod processors should complain if
+there are contradictory "=encoding" lines in the same document
+(e.g., if there is a "=encoding utf8" early in the document and
+"=encoding big5" later). Pod processors that recognize BOMs
+may also complain if they see an "=encoding" line
+that contradicts the BOM (e.g., if a document with a UTF\-16LE
+BOM has an "=encoding shiftjis" line).
+.PP
+If a Pod processor sees any command other than the ones listed
+above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish",
+or "=w123"), that processor must by default treat this as an
+error. It must not process the paragraph beginning with that
+command, must by default warn of this as an error, and may
+abort the parse. A Pod parser may allow a way for particular
+applications to add to the above list of known commands, and to
+stipulate, for each additional command, whether formatting
+codes should be processed.
+.PP
+Future versions of this specification may add additional
+commands.
+.SH "Pod Formatting Codes"
+.IX Header "Pod Formatting Codes"
+(Note that in previous drafts of this document and of perlpod,
+formatting codes were referred to as "interior sequences", and
+this term may still be found in the documentation for Pod parsers,
+and in error messages from Pod processors.)
+.PP
+There are two syntaxes for formatting codes:
+.IP \(bu 4
+A formatting code starts with a capital letter (just US-ASCII [A\-Z])
+followed by a "<", any number of characters, and ending with the first
+matching ">". Examples:
+.Sp
+.Vb 1
+\& That\*(Aqs what I<you> think!
+\&
+\& What\*(Aqs C<CORE::dump()> for?
+\&
+\& X<C<chmod> and C<unlink()> Under Different Operating Systems>
+.Ve
+.IP \(bu 4
+A formatting code starts with a capital letter (just US-ASCII [A\-Z])
+followed by two or more "<"'s, one or more whitespace characters,
+any number of characters, one or more whitespace characters,
+and ending with the first matching sequence of two or more ">"'s, where
+the number of ">"'s equals the number of "<"'s in the opening of this
+formatting code. Examples:
+.Sp
+.Vb 1
+\& That\*(Aqs what I<< you >> think!
+\&
+\& C<<< open(X, ">>thing.dat") || die $! >>>
+\&
+\& B<< $foo\->bar(); >>
+.Ve
+.Sp
+With this syntax, the whitespace character(s) after the "C<<<"
+and before the ">>>" (or whatever letter) are \fInot\fR renderable. They
+do not signify whitespace, are merely part of the formatting codes
+themselves. That is, these are all synonymous:
+.Sp
+.Vb 7
+\& C<thing>
+\& C<< thing >>
+\& C<< thing >>
+\& C<<< thing >>>
+\& C<<<<
+\& thing
+\& >>>>
+.Ve
+.Sp
+and so on.
+.Sp
+Finally, the multiple-angle-bracket form does \fInot\fR alter the interpretation
+of nested formatting codes, meaning that the following four example lines are
+identical in meaning:
+.Sp
+.Vb 1
+\& B<example: C<$a E<lt>=E<gt> $b>>
+\&
+\& B<example: C<< $a <=> $b >>>
+\&
+\& B<example: C<< $a E<lt>=E<gt> $b >>>
+\&
+\& B<<< example: C<< $a E<lt>=E<gt> $b >> >>>
+.Ve
+.PP
+In parsing Pod, a notably tricky part is the correct parsing of
+(potentially nested!) formatting codes. Implementors should
+consult the code in the \f(CW\*(C`parse_text\*(C'\fR routine in Pod::Parser as an
+example of a correct implementation.
+.ie n .IP """I<text>"" \-\- italic text" 4
+.el .IP "\f(CWI<text>\fR \-\- italic text" 4
+.IX Item "I<text> -- italic text"
+See the brief discussion in "Formatting Codes" in perlpod.
+.ie n .IP """B<text>"" \-\- bold text" 4
+.el .IP "\f(CWB<text>\fR \-\- bold text" 4
+.IX Item "B<text> -- bold text"
+See the brief discussion in "Formatting Codes" in perlpod.
+.ie n .IP """C<code>"" \-\- code text" 4
+.el .IP "\f(CWC<code>\fR \-\- code text" 4
+.IX Item "C<code> -- code text"
+See the brief discussion in "Formatting Codes" in perlpod.
+.ie n .IP """F<filename>"" \-\- style for filenames" 4
+.el .IP "\f(CWF<filename>\fR \-\- style for filenames" 4
+.IX Item "F<filename> -- style for filenames"
+See the brief discussion in "Formatting Codes" in perlpod.
+.ie n .IP """X<topic name>"" \-\- an index entry" 4
+.el .IP "\f(CWX<topic name>\fR \-\- an index entry" 4
+.IX Item "X<topic name> -- an index entry"
+See the brief discussion in "Formatting Codes" in perlpod.
+.Sp
+This code is unusual in that most formatters completely discard
+this code and its content. Other formatters will render it with
+invisible codes that can be used in building an index of
+the current document.
+.ie n .IP """Z<>"" \-\- a null (zero-effect) formatting code" 4
+.el .IP "\f(CWZ<>\fR \-\- a null (zero-effect) formatting code" 4
+.IX Item "Z<> -- a null (zero-effect) formatting code"
+Discussed briefly in "Formatting Codes" in perlpod.
+.Sp
+This code is unusual in that it should have no content. That is,
+a processor may complain if it sees \f(CW\*(C`Z<potatoes>\*(C'\fR. Whether
+or not it complains, the \fIpotatoes\fR text should ignored.
+.ie n .IP """L<name>"" \-\- a hyperlink" 4
+.el .IP "\f(CWL<name>\fR \-\- a hyperlink" 4
+.IX Item "L<name> -- a hyperlink"
+The complicated syntaxes of this code are discussed at length in
+"Formatting Codes" in perlpod, and implementation details are
+discussed below, in "About L<...> Codes". Parsing the
+contents of L<content> is tricky. Notably, the content has to be
+checked for whether it looks like a URL, or whether it has to be split
+on literal "|" and/or "/" (in the right order!), and so on,
+\&\fIbefore\fR E<...> codes are resolved.
+.ie n .IP """E<escape>"" \-\- a character escape" 4
+.el .IP "\f(CWE<escape>\fR \-\- a character escape" 4
+.IX Item "E<escape> -- a character escape"
+See "Formatting Codes" in perlpod, and several points in
+"Notes on Implementing Pod Processors".
+.ie n .IP """S<text>"" \-\- text contains non-breaking spaces" 4
+.el .IP "\f(CWS<text>\fR \-\- text contains non-breaking spaces" 4
+.IX Item "S<text> -- text contains non-breaking spaces"
+This formatting code is syntactically simple, but semantically
+complex. What it means is that each space in the printable
+content of this code signifies a non-breaking space.
+.Sp
+Consider:
+.Sp
+.Vb 1
+\& C<$x ? $y : $z>
+\&
+\& S<C<$x ? $y : $z>>
+.Ve
+.Sp
+Both signify the monospace (c[ode] style) text consisting of
+"$x", one space, "?", one space, ":", one space, "$z". The
+difference is that in the latter, with the S code, those spaces
+are not "normal" spaces, but instead are non-breaking spaces.
+.PP
+If a Pod processor sees any formatting code other than the ones
+listed above (as in "N<...>", or "Q<...>", etc.), that
+processor must by default treat this as an error.
+A Pod parser may allow a way for particular
+applications to add to the above list of known formatting codes;
+a Pod parser might even allow a way to stipulate, for each additional
+command, whether it requires some form of special processing, as
+L<...> does.
+.PP
+Future versions of this specification may add additional
+formatting codes.
+.PP
+Historical note: A few older Pod processors would not see a ">" as
+closing a "C<" code, if the ">" was immediately preceded by
+a "\-". This was so that this:
+.PP
+.Vb 1
+\& C<$foo\->bar>
+.Ve
+.PP
+would parse as equivalent to this:
+.PP
+.Vb 1
+\& C<$foo\-E<gt>bar>
+.Ve
+.PP
+instead of as equivalent to a "C" formatting code containing
+only "$foo\-", and then a "bar>" outside the "C" formatting code. This
+problem has since been solved by the addition of syntaxes like this:
+.PP
+.Vb 1
+\& C<< $foo\->bar >>
+.Ve
+.PP
+Compliant parsers must not treat "\->" as special.
+.PP
+Formatting codes absolutely cannot span paragraphs. If a code is
+opened in one paragraph, and no closing code is found by the end of
+that paragraph, the Pod parser must close that formatting code,
+and should complain (as in "Unterminated I code in the paragraph
+starting at line 123: 'Time objects are not...'"). So these
+two paragraphs:
+.PP
+.Vb 1
+\& I<I told you not to do this!
+\&
+\& Don\*(Aqt make me say it again!>
+.Ve
+.PP
+\&...must \fInot\fR be parsed as two paragraphs in italics (with the I
+code starting in one paragraph and starting in another.) Instead,
+the first paragraph should generate a warning, but that aside, the
+above code must parse as if it were:
+.PP
+.Vb 1
+\& I<I told you not to do this!>
+\&
+\& Don\*(Aqt make me say it again!E<gt>
+.Ve
+.PP
+(In SGMLish jargon, all Pod commands are like block-level
+elements, whereas all Pod formatting codes are like inline-level
+elements.)
+.SH "Notes on Implementing Pod Processors"
+.IX Header "Notes on Implementing Pod Processors"
+The following is a long section of miscellaneous requirements
+and suggestions to do with Pod processing.
+.IP \(bu 4
+Pod formatters should tolerate lines in verbatim blocks that are of
+any length, even if that means having to break them (possibly several
+times, for very long lines) to avoid text running off the side of the
+page. Pod formatters may warn of such line-breaking. Such warnings
+are particularly appropriate for lines are over 100 characters long, which
+are usually not intentional.
+.IP \(bu 4
+Pod parsers must recognize \fIall\fR of the three well-known newline
+formats: CR, LF, and CRLF. See perlport.
+.IP \(bu 4
+Pod parsers should accept input lines that are of any length.
+.IP \(bu 4
+Since Perl recognizes a Unicode Byte Order Mark at the start of files
+as signaling that the file is Unicode encoded as in UTF\-16 (whether
+big-endian or little-endian) or UTF\-8, Pod parsers should do the
+same. Otherwise, the character encoding should be understood as
+being UTF\-8 if the first highbit byte sequence in the file seems
+valid as a UTF\-8 sequence, or otherwise as CP\-1252 (earlier versions of
+this specification used Latin\-1 instead of CP\-1252).
+.Sp
+Future versions of this specification may specify
+how Pod can accept other encodings. Presumably treatment of other
+encodings in Pod parsing would be as in XML parsing: whatever the
+encoding declared by a particular Pod file, content is to be
+stored in memory as Unicode characters.
+.IP \(bu 4
+The well known Unicode Byte Order Marks are as follows: if the
+file begins with the two literal byte values 0xFE 0xFF, this is
+the BOM for big-endian UTF\-16. If the file begins with the two
+literal byte value 0xFF 0xFE, this is the BOM for little-endian
+UTF\-16. On an ASCII platform, if the file begins with the three literal
+byte values
+0xEF 0xBB 0xBF, this is the BOM for UTF\-8.
+A mechanism portable to EBCDIC platforms is to:
+.Sp
+.Vb 2
+\& my $utf8_bom = "\ex{FEFF}";
+\& utf8::encode($utf8_bom);
+.Ve
+.IP \(bu 4
+A naive, but often sufficient heuristic on ASCII platforms, for testing
+the first highbit
+byte-sequence in a BOM-less file (whether in code or in Pod!), to see
+whether that sequence is valid as UTF\-8 (RFC 2279) is to check whether
+that the first byte in the sequence is in the range 0xC2 \- 0xFD
+\&\fIand\fR whether the next byte is in the range
+0x80 \- 0xBF. If so, the parser may conclude that this file is in
+UTF\-8, and all highbit sequences in the file should be assumed to
+be UTF\-8. Otherwise the parser should treat the file as being
+in CP\-1252. (A better check, and which works on EBCDIC platforms as
+well, is to pass a copy of the sequence to
+\&\fButf8::decode()\fR which performs a full validity check on the
+sequence and returns TRUE if it is valid UTF\-8, FALSE otherwise. This
+function is always pre-loaded, is fast because it is written in C, and
+will only get called at most once, so you don't need to avoid it out of
+performance concerns.)
+In the unlikely circumstance that the first highbit
+sequence in a truly non\-UTF\-8 file happens to appear to be UTF\-8, one
+can cater to our heuristic (as well as any more intelligent heuristic)
+by prefacing that line with a comment line containing a highbit
+sequence that is clearly \fInot\fR valid as UTF\-8. A line consisting
+of simply "#", an e\-acute, and any non-highbit byte,
+is sufficient to establish this file's encoding.
+.IP \(bu 4
+Pod processors must treat a "=for [label] [content...]" paragraph as
+meaning the same thing as a "=begin [label]" paragraph, content, and
+an "=end [label]" paragraph. (The parser may conflate these two
+constructs, or may leave them distinct, in the expectation that the
+formatter will nevertheless treat them the same.)
+.IP \(bu 4
+When rendering Pod to a format that allows comments (i.e., to nearly
+any format other than plaintext), a Pod formatter must insert comment
+text identifying its name and version number, and the name and
+version numbers of any modules it might be using to process the Pod.
+Minimal examples:
+.Sp
+.Vb 1
+\& %% POD::Pod2PS v3.14159, using POD::Parser v1.92
+\&
+\& <!\-\- Pod::HTML v3.14159, using POD::Parser v1.92 \-\->
+\&
+\& {\edoccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08}
+\&
+\& .\e" Pod::Man version 3.14159, using POD::Parser version 1.92
+.Ve
+.Sp
+Formatters may also insert additional comments, including: the
+release date of the Pod formatter program, the contact address for
+the author(s) of the formatter, the current time, the name of input
+file, the formatting options in effect, version of Perl used, etc.
+.Sp
+Formatters may also choose to note errors/warnings as comments,
+besides or instead of emitting them otherwise (as in messages to
+STDERR, or \f(CW\*(C`die\*(C'\fRing).
+.IP \(bu 4
+Pod parsers \fImay\fR emit warnings or error messages ("Unknown E code
+E<zslig>!") to STDERR (whether through printing to STDERR, or
+\&\f(CW\*(C`warn\*(C'\fRing/\f(CW\*(C`carp\*(C'\fRing, or \f(CW\*(C`die\*(C'\fRing/\f(CW\*(C`croak\*(C'\fRing), but \fImust\fR allow
+suppressing all such STDERR output, and instead allow an option for
+reporting errors/warnings
+in some other way, whether by triggering a callback, or noting errors
+in some attribute of the document object, or some similarly unobtrusive
+mechanism \-\- or even by appending a "Pod Errors" section to the end of
+the parsed form of the document.
+.IP \(bu 4
+In cases of exceptionally aberrant documents, Pod parsers may abort the
+parse. Even then, using \f(CW\*(C`die\*(C'\fRing/\f(CW\*(C`croak\*(C'\fRing is to be avoided; where
+possible, the parser library may simply close the input file
+and add text like "*** Formatting Aborted ***" to the end of the
+(partial) in-memory document.
+.IP \(bu 4
+In paragraphs where formatting codes (like E<...>, B<...>)
+are understood (i.e., \fInot\fR verbatim paragraphs, but \fIincluding\fR
+ordinary paragraphs, and command paragraphs that produce renderable
+text, like "=head1"), literal whitespace should generally be considered
+"insignificant", in that one literal space has the same meaning as any
+(nonzero) number of literal spaces, literal newlines, and literal tabs
+(as long as this produces no blank lines, since those would terminate
+the paragraph). Pod parsers should compact literal whitespace in each
+processed paragraph, but may provide an option for overriding this
+(since some processing tasks do not require it), or may follow
+additional special rules (for example, specially treating
+period-space-space or period-newline sequences).
+.IP \(bu 4
+Pod parsers should not, by default, try to coerce apostrophe (') and
+quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to
+turn backtick (`) into anything else but a single backtick character
+(distinct from an open quote character!), nor "\-\-" into anything but
+two minus signs. They \fImust never\fR do any of those things to text
+in C<...> formatting codes, and never \fIever\fR to text in verbatim
+paragraphs.
+.IP \(bu 4
+When rendering Pod to a format that has two kinds of hyphens (\-), one
+that's a non-breaking hyphen, and another that's a breakable hyphen
+(as in "object-oriented", which can be split across lines as
+"object\-", newline, "oriented"), formatters are encouraged to
+generally translate "\-" to non-breaking hyphen, but may apply
+heuristics to convert some of these to breaking hyphens.
+.IP \(bu 4
+Pod formatters should make reasonable efforts to keep words of Perl
+code from being broken across lines. For example, "Foo::Bar" in some
+formatting systems is seen as eligible for being broken across lines
+as "Foo::" newline "Bar" or even "Foo::\-" newline "Bar". This should
+be avoided where possible, either by disabling all line-breaking in
+mid-word, or by wrapping particular words with internal punctuation
+in "don't break this across lines" codes (which in some formats may
+not be a single code, but might be a matter of inserting non-breaking
+zero-width spaces between every pair of characters in a word.)
+.IP \(bu 4
+Pod parsers should, by default, expand tabs in verbatim paragraphs as
+they are processed, before passing them to the formatter or other
+processor. Parsers may also allow an option for overriding this.
+.IP \(bu 4
+Pod parsers should, by default, remove newlines from the end of
+ordinary and verbatim paragraphs before passing them to the
+formatter. For example, while the paragraph you're reading now
+could be considered, in Pod source, to end with (and contain)
+the newline(s) that end it, it should be processed as ending with
+(and containing) the period character that ends this sentence.
+.IP \(bu 4
+Pod parsers, when reporting errors, should make some effort to report
+an approximate line number ("Nested E<>'s in Paragraph #52, near
+line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph
+number ("Nested E<>'s in Paragraph #52 of Thing/Foo.pm!"). Where
+this is problematic, the paragraph number should at least be
+accompanied by an excerpt from the paragraph ("Nested E<>'s in
+Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for
+the C<interest rate> attribute...'").
+.IP \(bu 4
+Pod parsers, when processing a series of verbatim paragraphs one
+after another, should consider them to be one large verbatim
+paragraph that happens to contain blank lines. I.e., these two
+lines, which have a blank line between them:
+.Sp
+.Vb 1
+\& use Foo;
+\&
+\& print Foo\->VERSION
+.Ve
+.Sp
+should be unified into one paragraph ("\etuse Foo;\en\en\etprint
+Foo\->VERSION") before being passed to the formatter or other
+processor. Parsers may also allow an option for overriding this.
+.Sp
+While this might be too cumbersome to implement in event-based Pod
+parsers, it is straightforward for parsers that return parse trees.
+.IP \(bu 4
+Pod formatters, where feasible, are advised to avoid splitting short
+verbatim paragraphs (under twelve lines, say) across pages.
+.IP \(bu 4
+Pod parsers must treat a line with only spaces and/or tabs on it as a
+"blank line" such as separates paragraphs. (Some older parsers
+recognized only two adjacent newlines as a "blank line" but would not
+recognize a newline, a space, and a newline, as a blank line. This
+is noncompliant behavior.)
+.IP \(bu 4
+Authors of Pod formatters/processors should make every effort to
+avoid writing their own Pod parser. There are already several in
+CPAN, with a wide range of interface styles \-\- and one of them,
+Pod::Simple, comes with modern versions of Perl.
+.IP \(bu 4
+Characters in Pod documents may be conveyed either as literals, or by
+number in E<n> codes, or by an equivalent mnemonic, as in
+E<eacute> which is exactly equivalent to E<233>. The numbers
+are the Latin1/Unicode values, even on EBCDIC platforms.
+.Sp
+When referring to characters by using a E<n> numeric code, numbers
+in the range 32\-126 refer to those well known US-ASCII characters (also
+defined there by Unicode, with the same meaning), which all Pod
+formatters must render faithfully. Characters whose E<> numbers
+are in the ranges 0\-31 and 127\-159 should not be used (neither as
+literals,
+nor as E<number> codes), except for the literal byte-sequences for
+newline (ASCII 13, ASCII 13 10, or ASCII 10), and tab (ASCII 9).
+.Sp
+Numbers in the range 160\-255 refer to Latin\-1 characters (also
+defined there by Unicode, with the same meaning). Numbers above
+255 should be understood to refer to Unicode characters.
+.IP \(bu 4
+Be warned
+that some formatters cannot reliably render characters outside 32\-126;
+and many are able to handle 32\-126 and 160\-255, but nothing above
+255.
+.IP \(bu 4
+Besides the well-known "E<lt>" and "E<gt>" codes for
+less-than and greater-than, Pod parsers must understand "E<sol>"
+for "/" (solidus, slash), and "E<verbar>" for "|" (vertical bar,
+pipe). Pod parsers should also understand "E<lchevron>" and
+"E<rchevron>" as legacy codes for characters 171 and 187, i.e.,
+"left-pointing double angle quotation mark" = "left pointing
+guillemet" and "right-pointing double angle quotation mark" = "right
+pointing guillemet". (These look like little "<<" and ">>", and they
+are now preferably expressed with the HTML/XHTML codes "E<laquo>"
+and "E<raquo>".)
+.IP \(bu 4
+Pod parsers should understand all "E<html>" codes as defined
+in the entity declarations in the most recent XHTML specification at
+\&\f(CW\*(C`www.W3.org\*(C'\fR. Pod parsers must understand at least the entities
+that define characters in the range 160\-255 (Latin\-1). Pod parsers,
+when faced with some unknown "E<\fIidentifier\fR>" code,
+shouldn't simply replace it with nullstring (by default, at least),
+but may pass it through as a string consisting of the literal characters
+E, less-than, \fIidentifier\fR, greater-than. Or Pod parsers may offer the
+alternative option of processing such unknown
+"E<\fIidentifier\fR>" codes by firing an event especially
+for such codes, or by adding a special node-type to the in-memory
+document tree. Such "E<\fIidentifier\fR>" may have special meaning
+to some processors, or some processors may choose to add them to
+a special error report.
+.IP \(bu 4
+Pod parsers must also support the XHTML codes "E<quot>" for
+character 34 (doublequote, "), "E<amp>" for character 38
+(ampersand, &), and "E<apos>" for character 39 (apostrophe, ').
+.IP \(bu 4
+Note that in all cases of "E<whatever>", \fIwhatever\fR (whether
+an htmlname, or a number in any base) must consist only of
+alphanumeric characters \-\- that is, \fIwhatever\fR must match
+\&\f(CW\*(C`m/\eA\ew+\ez/\*(C'\fR. So "E<\ 0\ 1\ 2\ 3\ >" is invalid, because
+it contains spaces, which aren't alphanumeric characters. This
+presumably does not \fIneed\fR special treatment by a Pod processor;
+"\ 0\ 1\ 2\ 3\ " doesn't look like a number in any base, so it would
+presumably be looked up in the table of HTML-like names. Since
+there isn't (and cannot be) an HTML-like entity called "\ 0\ 1\ 2\ 3\ ",
+this will be treated as an error. However, Pod processors may
+treat "E<\ 0\ 1\ 2\ 3\ >" or "E<e\-acute>" as \fIsyntactically\fR
+invalid, potentially earning a different error message than the
+error message (or warning, or event) generated by a merely unknown
+(but theoretically valid) htmlname, as in "E<qacute>"
+[sic]. However, Pod parsers are not required to make this
+distinction.
+.IP \(bu 4
+Note that E<number> \fImust not\fR be interpreted as simply
+"codepoint \fInumber\fR in the current/native character set". It always
+means only "the character represented by codepoint \fInumber\fR in
+Unicode." (This is identical to the semantics of &#\fInumber\fR; in XML.)
+.Sp
+This will likely require many formatters to have tables mapping from
+treatable Unicode codepoints (such as the "\exE9" for the e\-acute
+character) to the escape sequences or codes necessary for conveying
+such sequences in the target output format. A converter to *roff
+would, for example know that "\exE9" (whether conveyed literally, or via
+a E<...> sequence) is to be conveyed as "e\e\e*'".
+Similarly, a program rendering Pod in a Mac OS application window, would
+presumably need to know that "\exE9" maps to codepoint 142 in MacRoman
+encoding that (at time of writing) is native for Mac OS. Such
+Unicode2whatever mappings are presumably already widely available for
+common output formats. (Such mappings may be incomplete! Implementers
+are not expected to bend over backwards in an attempt to render
+Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any
+of the other weird things that Unicode can encode.) And
+if a Pod document uses a character not found in such a mapping, the
+formatter should consider it an unrenderable character.
+.IP \(bu 4
+If, surprisingly, the implementor of a Pod formatter can't find a
+satisfactory pre-existing table mapping from Unicode characters to
+escapes in the target format (e.g., a decent table of Unicode
+characters to *roff escapes), it will be necessary to build such a
+table. If you are in this circumstance, you should begin with the
+characters in the range 0x00A0 \- 0x00FF, which is mostly the heavily
+used accented characters. Then proceed (as patience permits and
+fastidiousness compels) through the characters that the (X)HTML
+standards groups judged important enough to merit mnemonics
+for. These are declared in the (X)HTML specifications at the
+www.W3.org site. At time of writing (September 2001), the most recent
+entity declaration files are:
+.Sp
+.Vb 3
+\& http://www.w3.org/TR/xhtml1/DTD/xhtml\-lat1.ent
+\& http://www.w3.org/TR/xhtml1/DTD/xhtml\-special.ent
+\& http://www.w3.org/TR/xhtml1/DTD/xhtml\-symbol.ent
+.Ve
+.Sp
+Then you can progress through any remaining notable Unicode characters
+in the range 0x2000\-0x204D (consult the character tables at
+www.unicode.org), and whatever else strikes your fancy. For example,
+in \fIxhtml\-symbol.ent\fR, there is the entry:
+.Sp
+.Vb 1
+\& <!ENTITY infin "&#8734;"> <!\-\- infinity, U+221E ISOtech \-\->
+.Ve
+.Sp
+While the mapping "infin" to the character "\ex{221E}" will (hopefully)
+have been already handled by the Pod parser, the presence of the
+character in this file means that it's reasonably important enough to
+include in a formatter's table that maps from notable Unicode characters
+to the codes necessary for rendering them. So for a Unicode\-to\-*roff
+mapping, for example, this would merit the entry:
+.Sp
+.Vb 1
+\& "\ex{221E}" => \*(Aq\e(in\*(Aq,
+.Ve
+.Sp
+It is eagerly hoped that in the future, increasing numbers of formats
+(and formatters) will support Unicode characters directly (as (X)HTML
+does with \f(CW\*(C`&infin;\*(C'\fR, \f(CW\*(C`&#8734;\*(C'\fR, or \f(CW\*(C`&#x221E;\*(C'\fR), reducing the need
+for idiosyncratic mappings of Unicode\-to\-\fImy_escapes\fR.
+.IP \(bu 4
+It is up to individual Pod formatter to display good judgement when
+confronted with an unrenderable character (which is distinct from an
+unknown E<thing> sequence that the parser couldn't resolve to
+anything, renderable or not). It is good practice to map Latin letters
+with diacritics (like "E<eacute>"/"E<233>") to the corresponding
+unaccented US-ASCII letters (like a simple character 101, "e"), but
+clearly this is often not feasible, and an unrenderable character may
+be represented as "?", or the like. In attempting a sane fallback
+(as from E<233> to "e"), Pod formatters may use the
+\&\f(CW%Latin1Code_to_fallback\fR table in Pod::Escapes, or
+Text::Unidecode, if available.
+.Sp
+For example, this Pod text:
+.Sp
+.Vb 1
+\& magic is enabled if you set C<$Currency> to \*(AqE<euro>\*(Aq.
+.Ve
+.Sp
+may be rendered as:
+"magic is enabled if you set \f(CW$Currency\fR to '\fI?\fR'" or as
+"magic is enabled if you set \f(CW$Currency\fR to '\fB[euro]\fR'", or as
+"magic is enabled if you set \f(CW$Currency\fR to '[x20AC]', etc.
+.Sp
+A Pod formatter may also note, in a comment or warning, a list of what
+unrenderable characters were encountered.
+.IP \(bu 4
+E<...> may freely appear in any formatting code (other than
+in another E<...> or in an Z<>). That is, "X<The
+E<euro>1,000,000 Solution>" is valid, as is "L<The
+E<euro>1,000,000 Solution|Million::Euros>".
+.IP \(bu 4
+Some Pod formatters output to formats that implement non-breaking
+spaces as an individual character (which I'll call "NBSP"), and
+others output to formats that implement non-breaking spaces just as
+spaces wrapped in a "don't break this across lines" code. Note that
+at the level of Pod, both sorts of codes can occur: Pod can contain a
+NBSP character (whether as a literal, or as a "E<160>" or
+"E<nbsp>" code); and Pod can contain "S<foo
+I<bar> baz>" codes, where "mere spaces" (character 32) in
+such codes are taken to represent non-breaking spaces. Pod
+parsers should consider supporting the optional parsing of "S<foo
+I<bar> baz>" as if it were
+"foo\fINBSP\fRI<bar>\fINBSP\fRbaz", and, going the other way, the
+optional parsing of groups of words joined by NBSP's as if each group
+were in a S<...> code, so that formatters may use the
+representation that maps best to what the output format demands.
+.IP \(bu 4
+Some processors may find that the \f(CW\*(C`S<...>\*(C'\fR code is easiest to
+implement by replacing each space in the parse tree under the content
+of the S, with an NBSP. But note: the replacement should apply \fInot\fR to
+spaces in \fIall\fR text, but \fIonly\fR to spaces in \fIprintable\fR text. (This
+distinction may or may not be evident in the particular tree/event
+model implemented by the Pod parser.) For example, consider this
+unusual case:
+.Sp
+.Vb 1
+\& S<L</Autoloaded Functions>>
+.Ve
+.Sp
+This means that the space in the middle of the visible link text must
+not be broken across lines. In other words, it's the same as this:
+.Sp
+.Vb 1
+\& L<"AutoloadedE<160>Functions"/Autoloaded Functions>
+.Ve
+.Sp
+However, a misapplied space-to-NBSP replacement could (wrongly)
+produce something equivalent to this:
+.Sp
+.Vb 1
+\& L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions>
+.Ve
+.Sp
+\&...which is almost definitely not going to work as a hyperlink (assuming
+this formatter outputs a format supporting hypertext).
+.Sp
+Formatters may choose to just not support the S format code,
+especially in cases where the output format simply has no NBSP
+character/code and no code for "don't break this stuff across lines".
+.IP \(bu 4
+Besides the NBSP character discussed above, implementors are reminded
+of the existence of the other "special" character in Latin\-1, the
+"soft hyphen" character, also known as "discretionary hyphen",
+i.e. \f(CW\*(C`E<173>\*(C'\fR = \f(CW\*(C`E<0xAD>\*(C'\fR =
+\&\f(CW\*(C`E<shy>\*(C'\fR). This character expresses an optional hyphenation
+point. That is, it normally renders as nothing, but may render as a
+"\-" if a formatter breaks the word at that point. Pod formatters
+should, as appropriate, do one of the following: 1) render this with
+a code with the same meaning (e.g., "\e\-" in RTF), 2) pass it through
+in the expectation that the formatter understands this character as
+such, or 3) delete it.
+.Sp
+For example:
+.Sp
+.Vb 3
+\& sigE<shy>action
+\& manuE<shy>script
+\& JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi
+.Ve
+.Sp
+These signal to a formatter that if it is to hyphenate "sigaction"
+or "manuscript", then it should be done as
+"sig\-\fI[linebreak]\fRaction" or "manu\-\fI[linebreak]\fRscript"
+(and if it doesn't hyphenate it, then the \f(CW\*(C`E<shy>\*(C'\fR doesn't
+show up at all). And if it is
+to hyphenate "Jarkko" and/or "Hietaniemi", it can do
+so only at the points where there is a \f(CW\*(C`E<shy>\*(C'\fR code.
+.Sp
+In practice, it is anticipated that this character will not be used
+often, but formatters should either support it, or delete it.
+.IP \(bu 4
+If you think that you want to add a new command to Pod (like, say, a
+"=biblio" command), consider whether you could get the same
+effect with a for or begin/end sequence: "=for biblio ..." or "=begin
+biblio" ... "=end biblio". Pod processors that don't understand
+"=for biblio", etc, will simply ignore it, whereas they may complain
+loudly if they see "=biblio".
+.IP \(bu 4
+Throughout this document, "Pod" has been the preferred spelling for
+the name of the documentation format. One may also use "POD" or
+"pod". For the documentation that is (typically) in the Pod
+format, you may use "pod", or "Pod", or "POD". Understanding these
+distinctions is useful; but obsessing over how to spell them, usually
+is not.
+.SH "About L<...> Codes"
+.IX Header "About L<...> Codes"
+As you can tell from a glance at perlpod, the L<...>
+code is the most complex of the Pod formatting codes. The points below
+will hopefully clarify what it means and how processors should deal
+with it.
+.IP \(bu 4
+In parsing an L<...> code, Pod parsers must distinguish at least
+four attributes:
+.RS 4
+.IP First: 4
+.IX Item "First:"
+The link-text. If there is none, this must be \f(CW\*(C`undef\*(C'\fR. (E.g., in
+"L<Perl Functions|perlfunc>", the link-text is "Perl Functions".
+In "L<Time::HiRes>" and even "L<|Time::HiRes>", there is no
+link text. Note that link text may contain formatting.)
+.IP Second: 4
+.IX Item "Second:"
+The possibly inferred link-text; i.e., if there was no real link
+text, then this is the text that we'll infer in its place. (E.g., for
+"L<Getopt::Std>", the inferred link text is "Getopt::Std".)
+.IP Third: 4
+.IX Item "Third:"
+The name or URL, or \f(CW\*(C`undef\*(C'\fR if none. (E.g., in "L<Perl
+Functions|perlfunc>", the name (also sometimes called the page)
+is "perlfunc". In "L</CAVEATS>", the name is \f(CW\*(C`undef\*(C'\fR.)
+.IP Fourth: 4
+.IX Item "Fourth:"
+The section (AKA "item" in older perlpods), or \f(CW\*(C`undef\*(C'\fR if none. E.g.,
+in "L<Getopt::Std/DESCRIPTION>", "DESCRIPTION" is the section. (Note
+that this is not the same as a manpage section like the "5" in "man 5
+crontab". "Section Foo" in the Pod sense means the part of the text
+that's introduced by the heading or item whose text is "Foo".)
+.RE
+.RS 4
+.Sp
+Pod parsers may also note additional attributes including:
+.IP Fifth: 4
+.IX Item "Fifth:"
+A flag for whether item 3 (if present) is a URL (like
+"http://lists.perl.org" is), in which case there should be no section
+attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or
+possibly a man page name (like "\fBcrontab\fR\|(5)" is).
+.IP Sixth: 4
+.IX Item "Sixth:"
+The raw original L<...> content, before text is split on
+"|", "/", etc, and before E<...> codes are expanded.
+.RE
+.RS 4
+.Sp
+(The above were numbered only for concise reference below. It is not
+a requirement that these be passed as an actual list or array.)
+.Sp
+For example:
+.Sp
+.Vb 7
+\& L<Foo::Bar>
+\& => undef, # link text
+\& "Foo::Bar", # possibly inferred link text
+\& "Foo::Bar", # name
+\& undef, # section
+\& \*(Aqpod\*(Aq, # what sort of link
+\& "Foo::Bar" # original content
+\&
+\& L<Perlport\*(Aqs section on NL\*(Aqs|perlport/Newlines>
+\& => "Perlport\*(Aqs section on NL\*(Aqs", # link text
+\& "Perlport\*(Aqs section on NL\*(Aqs", # possibly inferred link text
+\& "perlport", # name
+\& "Newlines", # section
+\& \*(Aqpod\*(Aq, # what sort of link
+\& "Perlport\*(Aqs section on NL\*(Aqs|perlport/Newlines"
+\& # original content
+\&
+\& L<perlport/Newlines>
+\& => undef, # link text
+\& \*(Aq"Newlines" in perlport\*(Aq, # possibly inferred link text
+\& "perlport", # name
+\& "Newlines", # section
+\& \*(Aqpod\*(Aq, # what sort of link
+\& "perlport/Newlines" # original content
+\&
+\& L<crontab(5)/"DESCRIPTION">
+\& => undef, # link text
+\& \*(Aq"DESCRIPTION" in crontab(5)\*(Aq, # possibly inferred link text
+\& "crontab(5)", # name
+\& "DESCRIPTION", # section
+\& \*(Aqman\*(Aq, # what sort of link
+\& \*(Aqcrontab(5)/"DESCRIPTION"\*(Aq # original content
+\&
+\& L</Object Attributes>
+\& => undef, # link text
+\& \*(Aq"Object Attributes"\*(Aq, # possibly inferred link text
+\& undef, # name
+\& "Object Attributes", # section
+\& \*(Aqpod\*(Aq, # what sort of link
+\& "/Object Attributes" # original content
+\&
+\& L<https://www.perl.org/>
+\& => undef, # link text
+\& "https://www.perl.org/", # possibly inferred link text
+\& "https://www.perl.org/", # name
+\& undef, # section
+\& \*(Aqurl\*(Aq, # what sort of link
+\& "https://www.perl.org/" # original content
+\&
+\& L<Perl.org|https://www.perl.org/>
+\& => "Perl.org", # link text
+\& "https://www.perl.org/", # possibly inferred link text
+\& "https://www.perl.org/", # name
+\& undef, # section
+\& \*(Aqurl\*(Aq, # what sort of link
+\& "Perl.org|https://www.perl.org/" # original content
+.Ve
+.Sp
+Note that you can distinguish URL-links from anything else by the
+fact that they match \f(CW\*(C`m/\eA\ew+:[^:\es]\eS*\ez/\*(C'\fR. So
+\&\f(CW\*(C`L<http://www.perl.com>\*(C'\fR is a URL, but
+\&\f(CW\*(C`L<HTTP::Response>\*(C'\fR isn't.
+.RE
+.IP \(bu 4
+In case of L<...> codes with no "text|" part in them,
+older formatters have exhibited great variation in actually displaying
+the link or cross reference. For example, L<\fBcrontab\fR\|(5)> would render
+as "the \f(CWcrontab(5)\fR manpage", or "in the \f(CWcrontab(5)\fR manpage"
+or just "\f(CWcrontab(5)\fR".
+.Sp
+Pod processors must now treat "text|"\-less links as follows:
+.Sp
+.Vb 3
+\& L<name> => L<name|name>
+\& L</section> => L<"section"|/section>
+\& L<name/section> => L<"section" in name|name/section>
+.Ve
+.IP \(bu 4
+Note that section names might contain markup. I.e., if a section
+starts with:
+.Sp
+.Vb 1
+\& =head2 About the C<\-M> Operator
+.Ve
+.Sp
+or with:
+.Sp
+.Vb 1
+\& =item About the C<\-M> Operator
+.Ve
+.Sp
+then a link to it would look like this:
+.Sp
+.Vb 1
+\& L<somedoc/About the C<\-M> Operator>
+.Ve
+.Sp
+Formatters may choose to ignore the markup for purposes of resolving
+the link and use only the renderable characters in the section name,
+as in:
+.Sp
+.Vb 2
+\& <h1><a name="About_the_\-M_Operator">About the <code>\-M</code>
+\& Operator</h1>
+\&
+\& ...
+\&
+\& <a href="somedoc#About_the_\-M_Operator">About the <code>\-M</code>
+\& Operator" in somedoc</a>
+.Ve
+.IP \(bu 4
+Previous versions of perlpod distinguished \f(CW\*(C`L<name/"section">\*(C'\fR
+links from \f(CW\*(C`L<name/item>\*(C'\fR links (and their targets). These
+have been merged syntactically and semantically in the current
+specification, and \fIsection\fR can refer either to a "=head\fIn\fR Heading
+Content" command or to a "=item Item Content" command. This
+specification does not specify what behavior should be in the case
+of a given document having several things all seeming to produce the
+same \fIsection\fR identifier (e.g., in HTML, several things all producing
+the same \fIanchorname\fR in <a name="\fIanchorname\fR">...</a>
+elements). Where Pod processors can control this behavior, they should
+use the first such anchor. That is, \f(CW\*(C`L<Foo/Bar>\*(C'\fR refers to the
+\&\fIfirst\fR "Bar" section in Foo.
+.Sp
+But for some processors/formats this cannot be easily controlled; as
+with the HTML example, the behavior of multiple ambiguous
+<a name="\fIanchorname\fR">...</a> is most easily just left up to
+browsers to decide.
+.IP \(bu 4
+In a \f(CW\*(C`L<text|...>\*(C'\fR code, text may contain formatting codes
+for formatting or for E<...> escapes, as in:
+.Sp
+.Vb 1
+\& L<B<ummE<234>stuff>|...>
+.Ve
+.Sp
+For \f(CW\*(C`L<...>\*(C'\fR codes without a "name|" part, only
+\&\f(CW\*(C`E<...>\*(C'\fR and \f(CW\*(C`Z<>\*(C'\fR codes may occur. That is,
+authors should not use "\f(CW\*(C`L<B<Foo::Bar>>\*(C'\fR".
+.Sp
+Note, however, that formatting codes and Z<>'s can occur in any
+and all parts of an L<...> (i.e., in \fIname\fR, \fIsection\fR, \fItext\fR,
+and \fIurl\fR).
+.Sp
+Authors must not nest L<...> codes. For example, "L<The
+L<Foo::Bar> man page>" should be treated as an error.
+.IP \(bu 4
+Note that Pod authors may use formatting codes inside the "text"
+part of "L<text|name>" (and so on for L<text|/"sec">).
+.Sp
+In other words, this is valid:
+.Sp
+.Vb 1
+\& Go read L<the docs on C<$.>|perlvar/"$.">
+.Ve
+.Sp
+Some output formats that do allow rendering "L<...>" codes as
+hypertext, might not allow the link-text to be formatted; in
+that case, formatters will have to just ignore that formatting.
+.IP \(bu 4
+At time of writing, \f(CW\*(C`L<name>\*(C'\fR values are of two types:
+either the name of a Pod page like \f(CW\*(C`L<Foo::Bar>\*(C'\fR (which
+might be a real Perl module or program in an \f(CW@INC\fR / PATH
+directory, or a .pod file in those places); or the name of a Unix
+man page, like \f(CW\*(C`L<crontab(5)>\*(C'\fR. In theory, \f(CW\*(C`L<chmod>\*(C'\fR
+is ambiguous between a Pod page called "chmod", or the Unix man page
+"chmod" (in whatever man-section). However, the presence of a string
+in parens, as in "\fBcrontab\fR\|(5)", is sufficient to signal that what
+is being discussed is not a Pod page, and so is presumably a
+Unix man page. The distinction is of no importance to many
+Pod processors, but some processors that render to hypertext formats
+may need to distinguish them in order to know how to render a
+given \f(CW\*(C`L<foo>\*(C'\fR code.
+.IP \(bu 4
+Previous versions of perlpod allowed for a \f(CW\*(C`L<section>\*(C'\fR syntax (as in
+\&\f(CW\*(C`L<Object Attributes>\*(C'\fR), which was not easily distinguishable from
+\&\f(CW\*(C`L<name>\*(C'\fR syntax and for \f(CW\*(C`L<"section">\*(C'\fR which was only
+slightly less ambiguous. This syntax is no longer in the specification, and
+has been replaced by the \f(CW\*(C`L</section>\*(C'\fR syntax (where the slash was
+formerly optional). Pod parsers should tolerate the \f(CW\*(C`L<"section">\*(C'\fR
+syntax, for a while at least. The suggested heuristic for distinguishing
+\&\f(CW\*(C`L<section>\*(C'\fR from \f(CW\*(C`L<name>\*(C'\fR is that if it contains any
+whitespace, it's a \fIsection\fR. Pod processors should warn about this being
+deprecated syntax.
+.SH "About =over...=back Regions"
+.IX Header "About =over...=back Regions"
+"=over"..."=back" regions are used for various kinds of list-like
+structures. (I use the term "region" here simply as a collective
+term for everything from the "=over" to the matching "=back".)
+.IP \(bu 4
+The non-zero numeric \fIindentlevel\fR in "=over \fIindentlevel\fR" ...
+"=back" is used for giving the formatter a clue as to how many
+"spaces" (ems, or roughly equivalent units) it should tab over,
+although many formatters will have to convert this to an absolute
+measurement that may not exactly match with the size of spaces (or M's)
+in the document's base font. Other formatters may have to completely
+ignore the number. The lack of any explicit \fIindentlevel\fR parameter is
+equivalent to an \fIindentlevel\fR value of 4. Pod processors may
+complain if \fIindentlevel\fR is present but is not a positive number
+matching \f(CW\*(C`m/\eA(\ed*\e.)?\ed+\ez/\*(C'\fR.
+.IP \(bu 4
+Authors of Pod formatters are reminded that "=over" ... "=back" may
+map to several different constructs in your output format. For
+example, in converting Pod to (X)HTML, it can map to any of
+<ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or
+<blockquote>...</blockquote>. Similarly, "=item" can map to <li> or
+<dt>.
+.IP \(bu 4
+Each "=over" ... "=back" region should be one of the following:
+.RS 4
+.IP \(bu 4
+An "=over" ... "=back" region containing only "=item *" commands,
+each followed by some number of ordinary/verbatim paragraphs, other
+nested "=over" ... "=back" regions, "=for..." paragraphs, and
+"=begin"..."=end" regions.
+.Sp
+(Pod processors must tolerate a bare "=item" as if it were "=item
+*".) Whether "*" is rendered as a literal asterisk, an "o", or as
+some kind of real bullet character, is left up to the Pod formatter,
+and may depend on the level of nesting.
+.IP \(bu 4
+An "=over" ... "=back" region containing only
+\&\f(CW\*(C`m/\eA=item\es+\ed+\e.?\es*\ez/\*(C'\fR paragraphs, each one (or each group of them)
+followed by some number of ordinary/verbatim paragraphs, other nested
+"=over" ... "=back" regions, "=for..." paragraphs, and/or
+"=begin"..."=end" codes. Note that the numbers must start at 1
+in each section, and must proceed in order and without skipping
+numbers.
+.Sp
+(Pod processors must tolerate lines like "=item 1" as if they were
+"=item 1.", with the period.)
+.IP \(bu 4
+An "=over" ... "=back" region containing only "=item [text]"
+commands, each one (or each group of them) followed by some number of
+ordinary/verbatim paragraphs, other nested "=over" ... "=back"
+regions, or "=for..." paragraphs, and "=begin"..."=end" regions.
+.Sp
+The "=item [text]" paragraph should not match
+\&\f(CW\*(C`m/\eA=item\es+\ed+\e.?\es*\ez/\*(C'\fR or \f(CW\*(C`m/\eA=item\es+\e*\es*\ez/\*(C'\fR, nor should it
+match just \f(CW\*(C`m/\eA=item\es*\ez/\*(C'\fR.
+.IP \(bu 4
+An "=over" ... "=back" region containing no "=item" paragraphs at
+all, and containing only some number of
+ordinary/verbatim paragraphs, and possibly also some nested "=over"
+\&... "=back" regions, "=for..." paragraphs, and "=begin"..."=end"
+regions. Such an itemless "=over" ... "=back" region in Pod is
+equivalent in meaning to a "<blockquote>...</blockquote>" element in
+HTML.
+.RE
+.RS 4
+.Sp
+Note that with all the above cases, you can determine which type of
+"=over" ... "=back" you have, by examining the first (non\-"=cut",
+non\-"=pod") Pod paragraph after the "=over" command.
+.RE
+.IP \(bu 4
+Pod formatters \fImust\fR tolerate arbitrarily large amounts of text
+in the "=item \fItext...\fR" paragraph. In practice, most such
+paragraphs are short, as in:
+.Sp
+.Vb 1
+\& =item For cutting off our trade with all parts of the world
+.Ve
+.Sp
+But they may be arbitrarily long:
+.Sp
+.Vb 2
+\& =item For transporting us beyond seas to be tried for pretended
+\& offenses
+\&
+\& =item He is at this time transporting large armies of foreign
+\& mercenaries to complete the works of death, desolation and
+\& tyranny, already begun with circumstances of cruelty and perfidy
+\& scarcely paralleled in the most barbarous ages, and totally
+\& unworthy the head of a civilized nation.
+.Ve
+.IP \(bu 4
+Pod processors should tolerate "=item *" / "=item \fInumber\fR" commands
+with no accompanying paragraph. The middle item is an example:
+.Sp
+.Vb 1
+\& =over
+\&
+\& =item 1
+\&
+\& Pick up dry cleaning.
+\&
+\& =item 2
+\&
+\& =item 3
+\&
+\& Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs.
+\&
+\& =back
+.Ve
+.IP \(bu 4
+No "=over" ... "=back" region can contain headings. Processors may
+treat such a heading as an error.
+.IP \(bu 4
+Note that an "=over" ... "=back" region should have some
+content. That is, authors should not have an empty region like this:
+.Sp
+.Vb 1
+\& =over
+\&
+\& =back
+.Ve
+.Sp
+Pod processors seeing such a contentless "=over" ... "=back" region,
+may ignore it, or may report it as an error.
+.IP \(bu 4
+Processors must tolerate an "=over" list that goes off the end of the
+document (i.e., which has no matching "=back"), but they may warn
+about such a list.
+.IP \(bu 4
+Authors of Pod formatters should note that this construct:
+.Sp
+.Vb 1
+\& =item Neque
+\&
+\& =item Porro
+\&
+\& =item Quisquam Est
+\&
+\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
+\& velit, sed quia non numquam eius modi tempora incidunt ut
+\& labore et dolore magnam aliquam quaerat voluptatem.
+\&
+\& =item Ut Enim
+.Ve
+.Sp
+is semantically ambiguous, in a way that makes formatting decisions
+a bit difficult. On the one hand, it could be mention of an item
+"Neque", mention of another item "Porro", and mention of another
+item "Quisquam Est", with just the last one requiring the explanatory
+paragraph "Qui dolorem ipsum quia dolor..."; and then an item
+"Ut Enim". In that case, you'd want to format it like so:
+.Sp
+.Vb 1
+\& Neque
+\&
+\& Porro
+\&
+\& Quisquam Est
+\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
+\& velit, sed quia non numquam eius modi tempora incidunt ut
+\& labore et dolore magnam aliquam quaerat voluptatem.
+\&
+\& Ut Enim
+.Ve
+.Sp
+But it could equally well be a discussion of three (related or equivalent)
+items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph
+explaining them all, and then a new item "Ut Enim". In that case, you'd
+probably want to format it like so:
+.Sp
+.Vb 6
+\& Neque
+\& Porro
+\& Quisquam Est
+\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
+\& velit, sed quia non numquam eius modi tempora incidunt ut
+\& labore et dolore magnam aliquam quaerat voluptatem.
+\&
+\& Ut Enim
+.Ve
+.Sp
+But (for the foreseeable future), Pod does not provide any way for Pod
+authors to distinguish which grouping is meant by the above
+"=item"\-cluster structure. So formatters should format it like so:
+.Sp
+.Vb 1
+\& Neque
+\&
+\& Porro
+\&
+\& Quisquam Est
+\&
+\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
+\& velit, sed quia non numquam eius modi tempora incidunt ut
+\& labore et dolore magnam aliquam quaerat voluptatem.
+\&
+\& Ut Enim
+.Ve
+.Sp
+That is, there should be (at least roughly) equal spacing between
+items as between paragraphs (although that spacing may well be less
+than the full height of a line of text). This leaves it to the reader
+to use (con)textual cues to figure out whether the "Qui dolorem
+ipsum..." paragraph applies to the "Quisquam Est" item or to all three
+items "Neque", "Porro", and "Quisquam Est". While not an ideal
+situation, this is preferable to providing formatting cues that may
+be actually contrary to the author's intent.
+.SH "About Data Paragraphs and ""=begin/=end"" Regions"
+.IX Header "About Data Paragraphs and ""=begin/=end"" Regions"
+Data paragraphs are typically used for inlining non-Pod data that is
+to be used (typically passed through) when rendering the document to
+a specific format:
+.PP
+.Vb 1
+\& =begin rtf
+\&
+\& \epar{\epard\eqr\esa4500{\ei Printed\e~\echdate\e~\echtime}\epar}
+\&
+\& =end rtf
+.Ve
+.PP
+The exact same effect could, incidentally, be achieved with a single
+"=for" paragraph:
+.PP
+.Vb 1
+\& =for rtf \epar{\epard\eqr\esa4500{\ei Printed\e~\echdate\e~\echtime}\epar}
+.Ve
+.PP
+(Although that is not formally a data paragraph, it has the same
+meaning as one, and Pod parsers may parse it as one.)
+.PP
+Another example of a data paragraph:
+.PP
+.Vb 1
+\& =begin html
+\&
+\& I like <em>PIE</em>!
+\&
+\& <hr>Especially pecan pie!
+\&
+\& =end html
+.Ve
+.PP
+If these were ordinary paragraphs, the Pod parser would try to
+expand the "E</em>" (in the first paragraph) as a formatting
+code, just like "E<lt>" or "E<eacute>". But since this
+is in a "=begin \fIidentifier\fR"..."=end \fIidentifier\fR" region \fIand\fR
+the identifier "html" doesn't begin have a ":" prefix, the contents
+of this region are stored as data paragraphs, instead of being
+processed as ordinary paragraphs (or if they began with a spaces
+and/or tabs, as verbatim paragraphs).
+.PP
+As a further example: At time of writing, no "biblio" identifier is
+supported, but suppose some processor were written to recognize it as
+a way of (say) denoting a bibliographic reference (necessarily
+containing formatting codes in ordinary paragraphs). The fact that
+"biblio" paragraphs were meant for ordinary processing would be
+indicated by prefacing each "biblio" identifier with a colon:
+.PP
+.Vb 1
+\& =begin :biblio
+\&
+\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
+\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ.
+\&
+\& =end :biblio
+.Ve
+.PP
+This would signal to the parser that paragraphs in this begin...end
+region are subject to normal handling as ordinary/verbatim paragraphs
+(while still tagged as meant only for processors that understand the
+"biblio" identifier). The same effect could be had with:
+.PP
+.Vb 3
+\& =for :biblio
+\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
+\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ.
+.Ve
+.PP
+The ":" on these identifiers means simply "process this stuff
+normally, even though the result will be for some special target".
+I suggest that parser APIs report "biblio" as the target identifier,
+but also report that it had a ":" prefix. (And similarly, with the
+above "html", report "html" as the target identifier, and note the
+\&\fIlack\fR of a ":" prefix.)
+.PP
+Note that a "=begin \fIidentifier\fR"..."=end \fIidentifier\fR" region where
+\&\fIidentifier\fR begins with a colon, \fIcan\fR contain commands. For example:
+.PP
+.Vb 1
+\& =begin :biblio
+\&
+\& Wirth\*(Aqs classic is available in several editions, including:
+\&
+\& =for comment
+\& hm, check abebooks.com for how much used copies cost.
+\&
+\& =over
+\&
+\& =item
+\&
+\& Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
+\& Teubner, Stuttgart. [Yes, it\*(Aqs in German.]
+\&
+\& =item
+\&
+\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
+\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ.
+\&
+\& =back
+\&
+\& =end :biblio
+.Ve
+.PP
+Note, however, a "=begin \fIidentifier\fR"..."=end \fIidentifier\fR"
+region where \fIidentifier\fR does \fInot\fR begin with a colon, should not
+directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back",
+nor "=item". For example, this may be considered invalid:
+.PP
+.Vb 1
+\& =begin somedata
+\&
+\& This is a data paragraph.
+\&
+\& =head1 Don\*(Aqt do this!
+\&
+\& This is a data paragraph too.
+\&
+\& =end somedata
+.Ve
+.PP
+A Pod processor may signal that the above (specifically the "=head1"
+paragraph) is an error. Note, however, that the following should
+\&\fInot\fR be treated as an error:
+.PP
+.Vb 1
+\& =begin somedata
+\&
+\& This is a data paragraph.
+\&
+\& =cut
+\&
+\& # Yup, this isn\*(Aqt Pod anymore.
+\& sub excl { (rand() > .5) ? "hoo!" : "hah!" }
+\&
+\& =pod
+\&
+\& This is a data paragraph too.
+\&
+\& =end somedata
+.Ve
+.PP
+And this too is valid:
+.PP
+.Vb 1
+\& =begin someformat
+\&
+\& This is a data paragraph.
+\&
+\& And this is a data paragraph.
+\&
+\& =begin someotherformat
+\&
+\& This is a data paragraph too.
+\&
+\& And this is a data paragraph too.
+\&
+\& =begin :yetanotherformat
+\&
+\& =head2 This is a command paragraph!
+\&
+\& This is an ordinary paragraph!
+\&
+\& And this is a verbatim paragraph!
+\&
+\& =end :yetanotherformat
+\&
+\& =end someotherformat
+\&
+\& Another data paragraph!
+\&
+\& =end someformat
+.Ve
+.PP
+The contents of the above "=begin :yetanotherformat" ...
+"=end :yetanotherformat" region \fIaren't\fR data paragraphs, because
+the immediately containing region's identifier (":yetanotherformat")
+begins with a colon. In practice, most regions that contain
+data paragraphs will contain \fIonly\fR data paragraphs; however,
+the above nesting is syntactically valid as Pod, even if it is
+rare. However, the handlers for some formats, like "html",
+will accept only data paragraphs, not nested regions; and they may
+complain if they see (targeted for them) nested regions, or commands,
+other than "=end", "=pod", and "=cut".
+.PP
+Also consider this valid structure:
+.PP
+.Vb 1
+\& =begin :biblio
+\&
+\& Wirth\*(Aqs classic is available in several editions, including:
+\&
+\& =over
+\&
+\& =item
+\&
+\& Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
+\& Teubner, Stuttgart. [Yes, it\*(Aqs in German.]
+\&
+\& =item
+\&
+\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
+\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ.
+\&
+\& =back
+\&
+\& Buy buy buy!
+\&
+\& =begin html
+\&
+\& <img src=\*(Aqwirth_spokesmodeling_book.png\*(Aq>
+\&
+\& <hr>
+\&
+\& =end html
+\&
+\& Now now now!
+\&
+\& =end :biblio
+.Ve
+.PP
+There, the "=begin html"..."=end html" region is nested inside
+the larger "=begin :biblio"..."=end :biblio" region. Note that the
+content of the "=begin html"..."=end html" region is data
+paragraph(s), because the immediately containing region's identifier
+("html") \fIdoesn't\fR begin with a colon.
+.PP
+Pod parsers, when processing a series of data paragraphs one
+after another (within a single region), should consider them to
+be one large data paragraph that happens to contain blank lines. So
+the content of the above "=begin html"..."=end html" \fImay\fR be stored
+as two data paragraphs (one consisting of
+"<img src='wirth_spokesmodeling_book.png'>\en"
+and another consisting of "<hr>\en"), but \fIshould\fR be stored as
+a single data paragraph (consisting of
+"<img src='wirth_spokesmodeling_book.png'>\en\en<hr>\en").
+.PP
+Pod processors should tolerate empty
+"=begin \fIsomething\fR"..."=end \fIsomething\fR" regions,
+empty "=begin :\fIsomething\fR"..."=end :\fIsomething\fR" regions, and
+contentless "=for \fIsomething\fR" and "=for :\fIsomething\fR"
+paragraphs. I.e., these should be tolerated:
+.PP
+.Vb 1
+\& =for html
+\&
+\& =begin html
+\&
+\& =end html
+\&
+\& =begin :biblio
+\&
+\& =end :biblio
+.Ve
+.PP
+Incidentally, note that there's no easy way to express a data
+paragraph starting with something that looks like a command. Consider:
+.PP
+.Vb 1
+\& =begin stuff
+\&
+\& =shazbot
+\&
+\& =end stuff
+.Ve
+.PP
+There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data
+paragraph "=shazbot\en". However, you can express a data paragraph consisting
+of "=shazbot\en" using this code:
+.PP
+.Vb 1
+\& =for stuff =shazbot
+.Ve
+.PP
+The situation where this is necessary, is presumably quite rare.
+.PP
+Note that =end commands must match the currently open =begin command. That
+is, they must properly nest. For example, this is valid:
+.PP
+.Vb 1
+\& =begin outer
+\&
+\& X
+\&
+\& =begin inner
+\&
+\& Y
+\&
+\& =end inner
+\&
+\& Z
+\&
+\& =end outer
+.Ve
+.PP
+while this is invalid:
+.PP
+.Vb 1
+\& =begin outer
+\&
+\& X
+\&
+\& =begin inner
+\&
+\& Y
+\&
+\& =end outer
+\&
+\& Z
+\&
+\& =end inner
+.Ve
+.PP
+This latter is improper because when the "=end outer" command is seen, the
+currently open region has the formatname "inner", not "outer". (It just
+happens that "outer" is the format name of a higher-up region.) This is
+an error. Processors must by default report this as an error, and may halt
+processing the document containing that error. A corollary of this is that
+regions cannot "overlap". That is, the latter block above does not represent
+a region called "outer" which contains X and Y, overlapping a region called
+"inner" which contains Y and Z. But because it is invalid (as all
+apparently overlapping regions would be), it doesn't represent that, or
+anything at all.
+.PP
+Similarly, this is invalid:
+.PP
+.Vb 1
+\& =begin thing
+\&
+\& =end hting
+.Ve
+.PP
+This is an error because the region is opened by "thing", and the "=end"
+tries to close "hting" [sic].
+.PP
+This is also invalid:
+.PP
+.Vb 1
+\& =begin thing
+\&
+\& =end
+.Ve
+.PP
+This is invalid because every "=end" command must have a formatname
+parameter.
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+perlpod, "PODs: Embedded Documentation" in perlsyn,
+podchecker
+.SH AUTHOR
+.IX Header "AUTHOR"
+Sean M. Burke