diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
commit | fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch) | |
tree | ce1e3bce06471410239a6f41282e328770aa404a /upstream/debian-unstable/man1/perlpodspec.1 | |
parent | Initial commit. (diff) | |
download | manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip |
Adding upstream version 4.22.0.upstream/4.22.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'upstream/debian-unstable/man1/perlpodspec.1')
-rw-r--r-- | upstream/debian-unstable/man1/perlpodspec.1 | 1884 |
1 files changed, 1884 insertions, 0 deletions
diff --git a/upstream/debian-unstable/man1/perlpodspec.1 b/upstream/debian-unstable/man1/perlpodspec.1 new file mode 100644 index 00000000..676a48ef --- /dev/null +++ b/upstream/debian-unstable/man1/perlpodspec.1 @@ -0,0 +1,1884 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "PERLPODSPEC 1" +.TH PERLPODSPEC 1 2024-01-12 "perl v5.38.2" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +perlpodspec \- Plain Old Documentation: format specification and notes +.SH DESCRIPTION +.IX Header "DESCRIPTION" +This document is detailed notes on the Pod markup language. Most +people will only have to read perlpod to know how to write +in Pod, but this document may answer some incidental questions to do +with parsing and rendering Pod. +.PP +In this document, "must" / "must not", "should" / +"should not", and "may" have their conventional (cf. RFC 2119) +meanings: "X must do Y" means that if X doesn't do Y, it's against +this specification, and should really be fixed. "X should do Y" +means that it's recommended, but X may fail to do Y, if there's a +good reason. "X may do Y" is merely a note that X can do Y at +will (although it is up to the reader to detect any connotation of +"and I think it would be \fInice\fR if X did Y" versus "it wouldn't +really \fIbother\fR me if X did Y"). +.PP +Notably, when I say "the parser should do Y", the +parser may fail to do Y, if the calling application explicitly +requests that the parser \fInot\fR do Y. I often phrase this as +"the parser should, by default, do Y." This doesn't \fIrequire\fR +the parser to provide an option for turning off whatever +feature Y is (like expanding tabs in verbatim paragraphs), although +it implicates that such an option \fImay\fR be provided. +.SH "Pod Definitions" +.IX Header "Pod Definitions" +Pod is embedded in files, typically Perl source files, although you +can write a file that's nothing but Pod. +.PP +A \fBline\fR in a file consists of zero or more non-newline characters, +terminated by either a newline or the end of the file. +.PP +A \fBnewline sequence\fR is usually a platform-dependent concept, but +Pod parsers should understand it to mean any of CR (ASCII 13), LF +(ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in +addition to any other system-specific meaning. The first CR/CRLF/LF +sequence in the file may be used as the basis for identifying the +newline sequence for parsing the rest of the file. +.PP +A \fBblank line\fR is a line consisting entirely of zero or more spaces +(ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file. +A \fBnon-blank line\fR is a line containing one or more characters other +than space or tab (and terminated by a newline or end-of-file). +.PP +(\fINote:\fR Many older Pod parsers did not accept a line consisting of +spaces/tabs and then a newline as a blank line. The only lines they +considered blank were lines consisting of \fIno characters at all\fR, +terminated by a newline.) +.PP +\&\fBWhitespace\fR is used in this document as a blanket term for spaces, +tabs, and newline sequences. (By itself, this term usually refers +to literal whitespace. That is, sequences of whitespace characters +in Pod source, as opposed to "E<32>", which is a formatting +code that \fIdenotes\fR a whitespace character.) +.PP +A \fBPod parser\fR is a module meant for parsing Pod (regardless of +whether this involves calling callbacks or building a parse tree or +directly formatting it). A \fBPod formatter\fR (or \fBPod translator\fR) +is a module or program that converts Pod to some other format (HTML, +plaintext, TeX, PostScript, RTF). A \fBPod processor\fR might be a +formatter or translator, or might be a program that does something +else with the Pod (like counting words, scanning for index points, +etc.). +.PP +Pod content is contained in \fBPod blocks\fR. A Pod block starts with a +line that matches \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR, and continues up to the next line +that matches \f(CW\*(C`m/\eA=cut/\*(C'\fR or up to the end of the file if there is +no \f(CW\*(C`m/\eA=cut/\*(C'\fR line. +.PP +Note that a parser is not expected to distinguish between something that +looks like pod, but is in a quoted string, such as a here document. +.PP +Within a Pod block, there are \fBPod paragraphs\fR. A Pod paragraph +consists of non-blank lines of text, separated by one or more blank +lines. +.PP +For purposes of Pod processing, there are four types of paragraphs in +a Pod block: +.IP \(bu 4 +A command paragraph (also called a "directive"). The first line of +this paragraph must match \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR. Command paragraphs are +typically one line, as in: +.Sp +.Vb 1 +\& =head1 NOTES +\& +\& =item * +.Ve +.Sp +But they may span several (non-blank) lines: +.Sp +.Vb 3 +\& =for comment +\& Hm, I wonder what it would look like if +\& you tried to write a BNF for Pod from this. +\& +\& =head3 Dr. Strangelove, or: How I Learned to +\& Stop Worrying and Love the Bomb +.Ve +.Sp +\&\fISome\fR command paragraphs allow formatting codes in their content +(i.e., after the part that matches \f(CW\*(C`m/\eA=[a\-zA\-Z]\eS*\es*/\*(C'\fR), as in: +.Sp +.Vb 1 +\& =head1 Did You Remember to C<use strict;>? +.Ve +.Sp +In other words, the Pod processing handler for "head1" will apply the +same processing to "Did You Remember to C<use strict;>?" that it +would to an ordinary paragraph (i.e., formatting codes like +"C<...>") are parsed and presumably formatted appropriately, and +whitespace in the form of literal spaces and/or tabs is not +significant. +.IP \(bu 4 +A \fBverbatim paragraph\fR. The first line of this paragraph must be a +literal space or tab, and this paragraph must not be inside a "=begin +\&\fIidentifier\fR", ... "=end \fIidentifier\fR" sequence unless +"\fIidentifier\fR" begins with a colon (":"). That is, if a paragraph +starts with a literal space or tab, but \fIis\fR inside a +"=begin \fIidentifier\fR", ... "=end \fIidentifier\fR" region, then it's +a data paragraph, unless "\fIidentifier\fR" begins with a colon. +.Sp +Whitespace \fIis\fR significant in verbatim paragraphs (although, in +processing, tabs are probably expanded). +.IP \(bu 4 +An \fBordinary paragraph\fR. A paragraph is an ordinary paragraph +if its first line matches neither \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR nor +\&\f(CW\*(C`m/\eA[ \et]/\*(C'\fR, \fIand\fR if it's not inside a "=begin \fIidentifier\fR", +\&... "=end \fIidentifier\fR" sequence unless "\fIidentifier\fR" begins with +a colon (":"). +.IP \(bu 4 +A \fBdata paragraph\fR. This is a paragraph that \fIis\fR inside a "=begin +\&\fIidentifier\fR" ... "=end \fIidentifier\fR" sequence where +"\fIidentifier\fR" does \fInot\fR begin with a literal colon (":"). In +some sense, a data paragraph is not part of Pod at all (i.e., +effectively it's "out-of-band"), since it's not subject to most kinds +of Pod parsing; but it is specified here, since Pod +parsers need to be able to call an event for it, or store it in some +form in a parse tree, or at least just parse \fIaround\fR it. +.PP +For example: consider the following paragraphs: +.PP +.Vb 1 +\& # <\- that\*(Aqs the 0th column +\& +\& =head1 Foo +\& +\& Stuff +\& +\& $foo\->bar +\& +\& =cut +.Ve +.PP +Here, "=head1 Foo" and "=cut" are command paragraphs because the first +line of each matches \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR. "\fI[space][space]\fR\f(CW$foo\fR\->bar" +is a verbatim paragraph, because its first line starts with a literal +whitespace character (and there's no "=begin"..."=end" region around). +.PP +The "=begin \fIidentifier\fR" ... "=end \fIidentifier\fR" commands stop +paragraphs that they surround from being parsed as ordinary or verbatim +paragraphs, if \fIidentifier\fR doesn't begin with a colon. This +is discussed in detail in the section +"About Data Paragraphs and "=begin/=end" Regions". +.SH "Pod Commands" +.IX Header "Pod Commands" +This section is intended to supplement and clarify the discussion in +"Command Paragraph" in perlpod. These are the currently recognized +Pod commands: +.IP """=head1"", ""=head2"", ""=head3"", ""=head4"", ""=head5"", ""=head6""" 4 +.IX Item """=head1"", ""=head2"", ""=head3"", ""=head4"", ""=head5"", ""=head6""" +This command indicates that the text in the remainder of the paragraph +is a heading. That text may contain formatting codes. Examples: +.Sp +.Vb 1 +\& =head1 Object Attributes +\& +\& =head3 What B<Not> to Do! +.Ve +.Sp +Both \f(CW\*(C`=head5\*(C'\fR and \f(CW\*(C`=head6\*(C'\fR were added in 2020 and might not be +supported on all Pod parsers. Pod::Simple 3.41 was released on October +2020 and supports both of these providing support for all +Pod::Simple\-based Pod parsers. +.IP """=pod""" 4 +.IX Item """=pod""" +This command indicates that this paragraph begins a Pod block. (If we +are already in the middle of a Pod block, this command has no effect at +all.) If there is any text in this command paragraph after "=pod", +it must be ignored. Examples: +.Sp +.Vb 1 +\& =pod +\& +\& This is a plain Pod paragraph. +\& +\& =pod This text is ignored. +.Ve +.IP """=cut""" 4 +.IX Item """=cut""" +This command indicates that this line is the end of this previously +started Pod block. If there is any text after "=cut" on the line, it must be +ignored. Examples: +.Sp +.Vb 1 +\& =cut +\& +\& =cut The documentation ends here. +\& +\& =cut +\& # This is the first line of program text. +\& sub foo { # This is the second. +.Ve +.Sp +It is an error to try to \fIstart\fR a Pod block with a "=cut" command. In +that case, the Pod processor must halt parsing of the input file, and +must by default emit a warning. +.IP """=over""" 4 +.IX Item """=over""" +This command indicates that this is the start of a list/indent +region. If there is any text following the "=over", it must consist +of only a nonzero positive numeral. The semantics of this numeral is +explained in the "About =over...=back Regions" section, further +below. Formatting codes are not expanded. Examples: +.Sp +.Vb 1 +\& =over 3 +\& +\& =over 3.5 +\& +\& =over +.Ve +.IP """=item""" 4 +.IX Item """=item""" +This command indicates that an item in a list begins here. Formatting +codes are processed. The semantics of the (optional) text in the +remainder of this paragraph are +explained in the "About =over...=back Regions" section, further +below. Examples: +.Sp +.Vb 1 +\& =item +\& +\& =item * +\& +\& =item * +\& +\& =item 14 +\& +\& =item 3. +\& +\& =item C<< $thing\->stuff(I<dodad>) >> +\& +\& =item For transporting us beyond seas to be tried for pretended +\& offenses +\& +\& =item He is at this time transporting large armies of foreign +\& mercenaries to complete the works of death, desolation and +\& tyranny, already begun with circumstances of cruelty and perfidy +\& scarcely paralleled in the most barbarous ages, and totally +\& unworthy the head of a civilized nation. +.Ve +.IP """=back""" 4 +.IX Item """=back""" +This command indicates that this is the end of the region begun +by the most recent "=over" command. It permits no text after the +"=back" command. +.IP """=begin formatname""" 4 +.IX Item """=begin formatname""" +.PD 0 +.IP """=begin formatname parameter""" 4 +.IX Item """=begin formatname parameter""" +.PD +This marks the following paragraphs (until the matching "=end +formatname") as being for some special kind of processing. Unless +"formatname" begins with a colon, the contained non-command +paragraphs are data paragraphs. But if "formatname" \fIdoes\fR begin +with a colon, then non-command paragraphs are ordinary paragraphs +or data paragraphs. This is discussed in detail in the section +"About Data Paragraphs and "=begin/=end" Regions". +.Sp +It is advised that formatnames match the regexp +\&\f(CW\*(C`m/\eA:?[\-a\-zA\-Z0\-9_]+\ez/\*(C'\fR. Everything following whitespace after the +formatname is a parameter that may be used by the formatter when dealing +with this region. This parameter must not be repeated in the "=end" +paragraph. Implementors should anticipate future expansion in the +semantics and syntax of the first parameter to "=begin"/"=end"/"=for". +.IP """=end formatname""" 4 +.IX Item """=end formatname""" +This marks the end of the region opened by the matching +"=begin formatname" region. If "formatname" is not the formatname +of the most recent open "=begin formatname" region, then this +is an error, and must generate an error message. This +is discussed in detail in the section +"About Data Paragraphs and "=begin/=end" Regions". +.IP """=for formatname text...""" 4 +.IX Item """=for formatname text...""" +This is synonymous with: +.Sp +.Vb 1 +\& =begin formatname +\& +\& text... +\& +\& =end formatname +.Ve +.Sp +That is, it creates a region consisting of a single paragraph; that +paragraph is to be treated as a normal paragraph if "formatname" +begins with a ":"; if "formatname" \fIdoesn't\fR begin with a colon, +then "text..." will constitute a data paragraph. There is no way +to use "=for formatname text..." to express "text..." as a verbatim +paragraph. +.IP """=encoding encodingname""" 4 +.IX Item """=encoding encodingname""" +This command, which should occur early in the document (at least +before any non-US-ASCII data!), declares that this document is +encoded in the encoding \fIencodingname\fR, which must be +an encoding name that Encode recognizes. (Encode's list +of supported encodings, in Encode::Supported, is useful here.) +If the Pod parser cannot decode the declared encoding, it +should emit a warning and may abort parsing the document +altogether. +.Sp +A document having more than one "=encoding" line should be +considered an error. Pod processors may silently tolerate this if +the not-first "=encoding" lines are just duplicates of the +first one (e.g., if there's a "=encoding utf8" line, and later on +another "=encoding utf8" line). But Pod processors should complain if +there are contradictory "=encoding" lines in the same document +(e.g., if there is a "=encoding utf8" early in the document and +"=encoding big5" later). Pod processors that recognize BOMs +may also complain if they see an "=encoding" line +that contradicts the BOM (e.g., if a document with a UTF\-16LE +BOM has an "=encoding shiftjis" line). +.PP +If a Pod processor sees any command other than the ones listed +above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish", +or "=w123"), that processor must by default treat this as an +error. It must not process the paragraph beginning with that +command, must by default warn of this as an error, and may +abort the parse. A Pod parser may allow a way for particular +applications to add to the above list of known commands, and to +stipulate, for each additional command, whether formatting +codes should be processed. +.PP +Future versions of this specification may add additional +commands. +.SH "Pod Formatting Codes" +.IX Header "Pod Formatting Codes" +(Note that in previous drafts of this document and of perlpod, +formatting codes were referred to as "interior sequences", and +this term may still be found in the documentation for Pod parsers, +and in error messages from Pod processors.) +.PP +There are two syntaxes for formatting codes: +.IP \(bu 4 +A formatting code starts with a capital letter (just US-ASCII [A\-Z]) +followed by a "<", any number of characters, and ending with the first +matching ">". Examples: +.Sp +.Vb 1 +\& That\*(Aqs what I<you> think! +\& +\& What\*(Aqs C<CORE::dump()> for? +\& +\& X<C<chmod> and C<unlink()> Under Different Operating Systems> +.Ve +.IP \(bu 4 +A formatting code starts with a capital letter (just US-ASCII [A\-Z]) +followed by two or more "<"'s, one or more whitespace characters, +any number of characters, one or more whitespace characters, +and ending with the first matching sequence of two or more ">"'s, where +the number of ">"'s equals the number of "<"'s in the opening of this +formatting code. Examples: +.Sp +.Vb 1 +\& That\*(Aqs what I<< you >> think! +\& +\& C<<< open(X, ">>thing.dat") || die $! >>> +\& +\& B<< $foo\->bar(); >> +.Ve +.Sp +With this syntax, the whitespace character(s) after the "C<<<" +and before the ">>>" (or whatever letter) are \fInot\fR renderable. They +do not signify whitespace, are merely part of the formatting codes +themselves. That is, these are all synonymous: +.Sp +.Vb 7 +\& C<thing> +\& C<< thing >> +\& C<< thing >> +\& C<<< thing >>> +\& C<<<< +\& thing +\& >>>> +.Ve +.Sp +and so on. +.Sp +Finally, the multiple-angle-bracket form does \fInot\fR alter the interpretation +of nested formatting codes, meaning that the following four example lines are +identical in meaning: +.Sp +.Vb 1 +\& B<example: C<$a E<lt>=E<gt> $b>> +\& +\& B<example: C<< $a <=> $b >>> +\& +\& B<example: C<< $a E<lt>=E<gt> $b >>> +\& +\& B<<< example: C<< $a E<lt>=E<gt> $b >> >>> +.Ve +.PP +In parsing Pod, a notably tricky part is the correct parsing of +(potentially nested!) formatting codes. Implementors should +consult the code in the \f(CW\*(C`parse_text\*(C'\fR routine in Pod::Parser as an +example of a correct implementation. +.ie n .IP """I<text>"" \-\- italic text" 4 +.el .IP "\f(CWI<text>\fR \-\- italic text" 4 +.IX Item "I<text> -- italic text" +See the brief discussion in "Formatting Codes" in perlpod. +.ie n .IP """B<text>"" \-\- bold text" 4 +.el .IP "\f(CWB<text>\fR \-\- bold text" 4 +.IX Item "B<text> -- bold text" +See the brief discussion in "Formatting Codes" in perlpod. +.ie n .IP """C<code>"" \-\- code text" 4 +.el .IP "\f(CWC<code>\fR \-\- code text" 4 +.IX Item "C<code> -- code text" +See the brief discussion in "Formatting Codes" in perlpod. +.ie n .IP """F<filename>"" \-\- style for filenames" 4 +.el .IP "\f(CWF<filename>\fR \-\- style for filenames" 4 +.IX Item "F<filename> -- style for filenames" +See the brief discussion in "Formatting Codes" in perlpod. +.ie n .IP """X<topic name>"" \-\- an index entry" 4 +.el .IP "\f(CWX<topic name>\fR \-\- an index entry" 4 +.IX Item "X<topic name> -- an index entry" +See the brief discussion in "Formatting Codes" in perlpod. +.Sp +This code is unusual in that most formatters completely discard +this code and its content. Other formatters will render it with +invisible codes that can be used in building an index of +the current document. +.ie n .IP """Z<>"" \-\- a null (zero-effect) formatting code" 4 +.el .IP "\f(CWZ<>\fR \-\- a null (zero-effect) formatting code" 4 +.IX Item "Z<> -- a null (zero-effect) formatting code" +Discussed briefly in "Formatting Codes" in perlpod. +.Sp +This code is unusual in that it should have no content. That is, +a processor may complain if it sees \f(CW\*(C`Z<potatoes>\*(C'\fR. Whether +or not it complains, the \fIpotatoes\fR text should ignored. +.ie n .IP """L<name>"" \-\- a hyperlink" 4 +.el .IP "\f(CWL<name>\fR \-\- a hyperlink" 4 +.IX Item "L<name> -- a hyperlink" +The complicated syntaxes of this code are discussed at length in +"Formatting Codes" in perlpod, and implementation details are +discussed below, in "About L<...> Codes". Parsing the +contents of L<content> is tricky. Notably, the content has to be +checked for whether it looks like a URL, or whether it has to be split +on literal "|" and/or "/" (in the right order!), and so on, +\&\fIbefore\fR E<...> codes are resolved. +.ie n .IP """E<escape>"" \-\- a character escape" 4 +.el .IP "\f(CWE<escape>\fR \-\- a character escape" 4 +.IX Item "E<escape> -- a character escape" +See "Formatting Codes" in perlpod, and several points in +"Notes on Implementing Pod Processors". +.ie n .IP """S<text>"" \-\- text contains non-breaking spaces" 4 +.el .IP "\f(CWS<text>\fR \-\- text contains non-breaking spaces" 4 +.IX Item "S<text> -- text contains non-breaking spaces" +This formatting code is syntactically simple, but semantically +complex. What it means is that each space in the printable +content of this code signifies a non-breaking space. +.Sp +Consider: +.Sp +.Vb 1 +\& C<$x ? $y : $z> +\& +\& S<C<$x ? $y : $z>> +.Ve +.Sp +Both signify the monospace (c[ode] style) text consisting of +"$x", one space, "?", one space, ":", one space, "$z". The +difference is that in the latter, with the S code, those spaces +are not "normal" spaces, but instead are non-breaking spaces. +.PP +If a Pod processor sees any formatting code other than the ones +listed above (as in "N<...>", or "Q<...>", etc.), that +processor must by default treat this as an error. +A Pod parser may allow a way for particular +applications to add to the above list of known formatting codes; +a Pod parser might even allow a way to stipulate, for each additional +command, whether it requires some form of special processing, as +L<...> does. +.PP +Future versions of this specification may add additional +formatting codes. +.PP +Historical note: A few older Pod processors would not see a ">" as +closing a "C<" code, if the ">" was immediately preceded by +a "\-". This was so that this: +.PP +.Vb 1 +\& C<$foo\->bar> +.Ve +.PP +would parse as equivalent to this: +.PP +.Vb 1 +\& C<$foo\-E<gt>bar> +.Ve +.PP +instead of as equivalent to a "C" formatting code containing +only "$foo\-", and then a "bar>" outside the "C" formatting code. This +problem has since been solved by the addition of syntaxes like this: +.PP +.Vb 1 +\& C<< $foo\->bar >> +.Ve +.PP +Compliant parsers must not treat "\->" as special. +.PP +Formatting codes absolutely cannot span paragraphs. If a code is +opened in one paragraph, and no closing code is found by the end of +that paragraph, the Pod parser must close that formatting code, +and should complain (as in "Unterminated I code in the paragraph +starting at line 123: 'Time objects are not...'"). So these +two paragraphs: +.PP +.Vb 1 +\& I<I told you not to do this! +\& +\& Don\*(Aqt make me say it again!> +.Ve +.PP +\&...must \fInot\fR be parsed as two paragraphs in italics (with the I +code starting in one paragraph and starting in another.) Instead, +the first paragraph should generate a warning, but that aside, the +above code must parse as if it were: +.PP +.Vb 1 +\& I<I told you not to do this!> +\& +\& Don\*(Aqt make me say it again!E<gt> +.Ve +.PP +(In SGMLish jargon, all Pod commands are like block-level +elements, whereas all Pod formatting codes are like inline-level +elements.) +.SH "Notes on Implementing Pod Processors" +.IX Header "Notes on Implementing Pod Processors" +The following is a long section of miscellaneous requirements +and suggestions to do with Pod processing. +.IP \(bu 4 +Pod formatters should tolerate lines in verbatim blocks that are of +any length, even if that means having to break them (possibly several +times, for very long lines) to avoid text running off the side of the +page. Pod formatters may warn of such line-breaking. Such warnings +are particularly appropriate for lines are over 100 characters long, which +are usually not intentional. +.IP \(bu 4 +Pod parsers must recognize \fIall\fR of the three well-known newline +formats: CR, LF, and CRLF. See perlport. +.IP \(bu 4 +Pod parsers should accept input lines that are of any length. +.IP \(bu 4 +Since Perl recognizes a Unicode Byte Order Mark at the start of files +as signaling that the file is Unicode encoded as in UTF\-16 (whether +big-endian or little-endian) or UTF\-8, Pod parsers should do the +same. Otherwise, the character encoding should be understood as +being UTF\-8 if the first highbit byte sequence in the file seems +valid as a UTF\-8 sequence, or otherwise as CP\-1252 (earlier versions of +this specification used Latin\-1 instead of CP\-1252). +.Sp +Future versions of this specification may specify +how Pod can accept other encodings. Presumably treatment of other +encodings in Pod parsing would be as in XML parsing: whatever the +encoding declared by a particular Pod file, content is to be +stored in memory as Unicode characters. +.IP \(bu 4 +The well known Unicode Byte Order Marks are as follows: if the +file begins with the two literal byte values 0xFE 0xFF, this is +the BOM for big-endian UTF\-16. If the file begins with the two +literal byte value 0xFF 0xFE, this is the BOM for little-endian +UTF\-16. On an ASCII platform, if the file begins with the three literal +byte values +0xEF 0xBB 0xBF, this is the BOM for UTF\-8. +A mechanism portable to EBCDIC platforms is to: +.Sp +.Vb 2 +\& my $utf8_bom = "\ex{FEFF}"; +\& utf8::encode($utf8_bom); +.Ve +.IP \(bu 4 +A naive, but often sufficient heuristic on ASCII platforms, for testing +the first highbit +byte-sequence in a BOM-less file (whether in code or in Pod!), to see +whether that sequence is valid as UTF\-8 (RFC 2279) is to check whether +that the first byte in the sequence is in the range 0xC2 \- 0xFD +\&\fIand\fR whether the next byte is in the range +0x80 \- 0xBF. If so, the parser may conclude that this file is in +UTF\-8, and all highbit sequences in the file should be assumed to +be UTF\-8. Otherwise the parser should treat the file as being +in CP\-1252. (A better check, and which works on EBCDIC platforms as +well, is to pass a copy of the sequence to +\&\fButf8::decode()\fR which performs a full validity check on the +sequence and returns TRUE if it is valid UTF\-8, FALSE otherwise. This +function is always pre-loaded, is fast because it is written in C, and +will only get called at most once, so you don't need to avoid it out of +performance concerns.) +In the unlikely circumstance that the first highbit +sequence in a truly non\-UTF\-8 file happens to appear to be UTF\-8, one +can cater to our heuristic (as well as any more intelligent heuristic) +by prefacing that line with a comment line containing a highbit +sequence that is clearly \fInot\fR valid as UTF\-8. A line consisting +of simply "#", an e\-acute, and any non-highbit byte, +is sufficient to establish this file's encoding. +.IP \(bu 4 +Pod processors must treat a "=for [label] [content...]" paragraph as +meaning the same thing as a "=begin [label]" paragraph, content, and +an "=end [label]" paragraph. (The parser may conflate these two +constructs, or may leave them distinct, in the expectation that the +formatter will nevertheless treat them the same.) +.IP \(bu 4 +When rendering Pod to a format that allows comments (i.e., to nearly +any format other than plaintext), a Pod formatter must insert comment +text identifying its name and version number, and the name and +version numbers of any modules it might be using to process the Pod. +Minimal examples: +.Sp +.Vb 1 +\& %% POD::Pod2PS v3.14159, using POD::Parser v1.92 +\& +\& <!\-\- Pod::HTML v3.14159, using POD::Parser v1.92 \-\-> +\& +\& {\edoccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08} +\& +\& .\e" Pod::Man version 3.14159, using POD::Parser version 1.92 +.Ve +.Sp +Formatters may also insert additional comments, including: the +release date of the Pod formatter program, the contact address for +the author(s) of the formatter, the current time, the name of input +file, the formatting options in effect, version of Perl used, etc. +.Sp +Formatters may also choose to note errors/warnings as comments, +besides or instead of emitting them otherwise (as in messages to +STDERR, or \f(CW\*(C`die\*(C'\fRing). +.IP \(bu 4 +Pod parsers \fImay\fR emit warnings or error messages ("Unknown E code +E<zslig>!") to STDERR (whether through printing to STDERR, or +\&\f(CW\*(C`warn\*(C'\fRing/\f(CW\*(C`carp\*(C'\fRing, or \f(CW\*(C`die\*(C'\fRing/\f(CW\*(C`croak\*(C'\fRing), but \fImust\fR allow +suppressing all such STDERR output, and instead allow an option for +reporting errors/warnings +in some other way, whether by triggering a callback, or noting errors +in some attribute of the document object, or some similarly unobtrusive +mechanism \-\- or even by appending a "Pod Errors" section to the end of +the parsed form of the document. +.IP \(bu 4 +In cases of exceptionally aberrant documents, Pod parsers may abort the +parse. Even then, using \f(CW\*(C`die\*(C'\fRing/\f(CW\*(C`croak\*(C'\fRing is to be avoided; where +possible, the parser library may simply close the input file +and add text like "*** Formatting Aborted ***" to the end of the +(partial) in-memory document. +.IP \(bu 4 +In paragraphs where formatting codes (like E<...>, B<...>) +are understood (i.e., \fInot\fR verbatim paragraphs, but \fIincluding\fR +ordinary paragraphs, and command paragraphs that produce renderable +text, like "=head1"), literal whitespace should generally be considered +"insignificant", in that one literal space has the same meaning as any +(nonzero) number of literal spaces, literal newlines, and literal tabs +(as long as this produces no blank lines, since those would terminate +the paragraph). Pod parsers should compact literal whitespace in each +processed paragraph, but may provide an option for overriding this +(since some processing tasks do not require it), or may follow +additional special rules (for example, specially treating +period-space-space or period-newline sequences). +.IP \(bu 4 +Pod parsers should not, by default, try to coerce apostrophe (') and +quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to +turn backtick (`) into anything else but a single backtick character +(distinct from an open quote character!), nor "\-\-" into anything but +two minus signs. They \fImust never\fR do any of those things to text +in C<...> formatting codes, and never \fIever\fR to text in verbatim +paragraphs. +.IP \(bu 4 +When rendering Pod to a format that has two kinds of hyphens (\-), one +that's a non-breaking hyphen, and another that's a breakable hyphen +(as in "object-oriented", which can be split across lines as +"object\-", newline, "oriented"), formatters are encouraged to +generally translate "\-" to non-breaking hyphen, but may apply +heuristics to convert some of these to breaking hyphens. +.IP \(bu 4 +Pod formatters should make reasonable efforts to keep words of Perl +code from being broken across lines. For example, "Foo::Bar" in some +formatting systems is seen as eligible for being broken across lines +as "Foo::" newline "Bar" or even "Foo::\-" newline "Bar". This should +be avoided where possible, either by disabling all line-breaking in +mid-word, or by wrapping particular words with internal punctuation +in "don't break this across lines" codes (which in some formats may +not be a single code, but might be a matter of inserting non-breaking +zero-width spaces between every pair of characters in a word.) +.IP \(bu 4 +Pod parsers should, by default, expand tabs in verbatim paragraphs as +they are processed, before passing them to the formatter or other +processor. Parsers may also allow an option for overriding this. +.IP \(bu 4 +Pod parsers should, by default, remove newlines from the end of +ordinary and verbatim paragraphs before passing them to the +formatter. For example, while the paragraph you're reading now +could be considered, in Pod source, to end with (and contain) +the newline(s) that end it, it should be processed as ending with +(and containing) the period character that ends this sentence. +.IP \(bu 4 +Pod parsers, when reporting errors, should make some effort to report +an approximate line number ("Nested E<>'s in Paragraph #52, near +line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph +number ("Nested E<>'s in Paragraph #52 of Thing/Foo.pm!"). Where +this is problematic, the paragraph number should at least be +accompanied by an excerpt from the paragraph ("Nested E<>'s in +Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for +the C<interest rate> attribute...'"). +.IP \(bu 4 +Pod parsers, when processing a series of verbatim paragraphs one +after another, should consider them to be one large verbatim +paragraph that happens to contain blank lines. I.e., these two +lines, which have a blank line between them: +.Sp +.Vb 1 +\& use Foo; +\& +\& print Foo\->VERSION +.Ve +.Sp +should be unified into one paragraph ("\etuse Foo;\en\en\etprint +Foo\->VERSION") before being passed to the formatter or other +processor. Parsers may also allow an option for overriding this. +.Sp +While this might be too cumbersome to implement in event-based Pod +parsers, it is straightforward for parsers that return parse trees. +.IP \(bu 4 +Pod formatters, where feasible, are advised to avoid splitting short +verbatim paragraphs (under twelve lines, say) across pages. +.IP \(bu 4 +Pod parsers must treat a line with only spaces and/or tabs on it as a +"blank line" such as separates paragraphs. (Some older parsers +recognized only two adjacent newlines as a "blank line" but would not +recognize a newline, a space, and a newline, as a blank line. This +is noncompliant behavior.) +.IP \(bu 4 +Authors of Pod formatters/processors should make every effort to +avoid writing their own Pod parser. There are already several in +CPAN, with a wide range of interface styles \-\- and one of them, +Pod::Simple, comes with modern versions of Perl. +.IP \(bu 4 +Characters in Pod documents may be conveyed either as literals, or by +number in E<n> codes, or by an equivalent mnemonic, as in +E<eacute> which is exactly equivalent to E<233>. The numbers +are the Latin1/Unicode values, even on EBCDIC platforms. +.Sp +When referring to characters by using a E<n> numeric code, numbers +in the range 32\-126 refer to those well known US-ASCII characters (also +defined there by Unicode, with the same meaning), which all Pod +formatters must render faithfully. Characters whose E<> numbers +are in the ranges 0\-31 and 127\-159 should not be used (neither as +literals, +nor as E<number> codes), except for the literal byte-sequences for +newline (ASCII 13, ASCII 13 10, or ASCII 10), and tab (ASCII 9). +.Sp +Numbers in the range 160\-255 refer to Latin\-1 characters (also +defined there by Unicode, with the same meaning). Numbers above +255 should be understood to refer to Unicode characters. +.IP \(bu 4 +Be warned +that some formatters cannot reliably render characters outside 32\-126; +and many are able to handle 32\-126 and 160\-255, but nothing above +255. +.IP \(bu 4 +Besides the well-known "E<lt>" and "E<gt>" codes for +less-than and greater-than, Pod parsers must understand "E<sol>" +for "/" (solidus, slash), and "E<verbar>" for "|" (vertical bar, +pipe). Pod parsers should also understand "E<lchevron>" and +"E<rchevron>" as legacy codes for characters 171 and 187, i.e., +"left-pointing double angle quotation mark" = "left pointing +guillemet" and "right-pointing double angle quotation mark" = "right +pointing guillemet". (These look like little "<<" and ">>", and they +are now preferably expressed with the HTML/XHTML codes "E<laquo>" +and "E<raquo>".) +.IP \(bu 4 +Pod parsers should understand all "E<html>" codes as defined +in the entity declarations in the most recent XHTML specification at +\&\f(CW\*(C`www.W3.org\*(C'\fR. Pod parsers must understand at least the entities +that define characters in the range 160\-255 (Latin\-1). Pod parsers, +when faced with some unknown "E<\fIidentifier\fR>" code, +shouldn't simply replace it with nullstring (by default, at least), +but may pass it through as a string consisting of the literal characters +E, less-than, \fIidentifier\fR, greater-than. Or Pod parsers may offer the +alternative option of processing such unknown +"E<\fIidentifier\fR>" codes by firing an event especially +for such codes, or by adding a special node-type to the in-memory +document tree. Such "E<\fIidentifier\fR>" may have special meaning +to some processors, or some processors may choose to add them to +a special error report. +.IP \(bu 4 +Pod parsers must also support the XHTML codes "E<quot>" for +character 34 (doublequote, "), "E<amp>" for character 38 +(ampersand, &), and "E<apos>" for character 39 (apostrophe, '). +.IP \(bu 4 +Note that in all cases of "E<whatever>", \fIwhatever\fR (whether +an htmlname, or a number in any base) must consist only of +alphanumeric characters \-\- that is, \fIwhatever\fR must match +\&\f(CW\*(C`m/\eA\ew+\ez/\*(C'\fR. So "E<\ 0\ 1\ 2\ 3\ >" is invalid, because +it contains spaces, which aren't alphanumeric characters. This +presumably does not \fIneed\fR special treatment by a Pod processor; +"\ 0\ 1\ 2\ 3\ " doesn't look like a number in any base, so it would +presumably be looked up in the table of HTML-like names. Since +there isn't (and cannot be) an HTML-like entity called "\ 0\ 1\ 2\ 3\ ", +this will be treated as an error. However, Pod processors may +treat "E<\ 0\ 1\ 2\ 3\ >" or "E<e\-acute>" as \fIsyntactically\fR +invalid, potentially earning a different error message than the +error message (or warning, or event) generated by a merely unknown +(but theoretically valid) htmlname, as in "E<qacute>" +[sic]. However, Pod parsers are not required to make this +distinction. +.IP \(bu 4 +Note that E<number> \fImust not\fR be interpreted as simply +"codepoint \fInumber\fR in the current/native character set". It always +means only "the character represented by codepoint \fInumber\fR in +Unicode." (This is identical to the semantics of &#\fInumber\fR; in XML.) +.Sp +This will likely require many formatters to have tables mapping from +treatable Unicode codepoints (such as the "\exE9" for the e\-acute +character) to the escape sequences or codes necessary for conveying +such sequences in the target output format. A converter to *roff +would, for example know that "\exE9" (whether conveyed literally, or via +a E<...> sequence) is to be conveyed as "e\e\e*'". +Similarly, a program rendering Pod in a Mac OS application window, would +presumably need to know that "\exE9" maps to codepoint 142 in MacRoman +encoding that (at time of writing) is native for Mac OS. Such +Unicode2whatever mappings are presumably already widely available for +common output formats. (Such mappings may be incomplete! Implementers +are not expected to bend over backwards in an attempt to render +Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any +of the other weird things that Unicode can encode.) And +if a Pod document uses a character not found in such a mapping, the +formatter should consider it an unrenderable character. +.IP \(bu 4 +If, surprisingly, the implementor of a Pod formatter can't find a +satisfactory pre-existing table mapping from Unicode characters to +escapes in the target format (e.g., a decent table of Unicode +characters to *roff escapes), it will be necessary to build such a +table. If you are in this circumstance, you should begin with the +characters in the range 0x00A0 \- 0x00FF, which is mostly the heavily +used accented characters. Then proceed (as patience permits and +fastidiousness compels) through the characters that the (X)HTML +standards groups judged important enough to merit mnemonics +for. These are declared in the (X)HTML specifications at the +www.W3.org site. At time of writing (September 2001), the most recent +entity declaration files are: +.Sp +.Vb 3 +\& http://www.w3.org/TR/xhtml1/DTD/xhtml\-lat1.ent +\& http://www.w3.org/TR/xhtml1/DTD/xhtml\-special.ent +\& http://www.w3.org/TR/xhtml1/DTD/xhtml\-symbol.ent +.Ve +.Sp +Then you can progress through any remaining notable Unicode characters +in the range 0x2000\-0x204D (consult the character tables at +www.unicode.org), and whatever else strikes your fancy. For example, +in \fIxhtml\-symbol.ent\fR, there is the entry: +.Sp +.Vb 1 +\& <!ENTITY infin "∞"> <!\-\- infinity, U+221E ISOtech \-\-> +.Ve +.Sp +While the mapping "infin" to the character "\ex{221E}" will (hopefully) +have been already handled by the Pod parser, the presence of the +character in this file means that it's reasonably important enough to +include in a formatter's table that maps from notable Unicode characters +to the codes necessary for rendering them. So for a Unicode\-to\-*roff +mapping, for example, this would merit the entry: +.Sp +.Vb 1 +\& "\ex{221E}" => \*(Aq\e(in\*(Aq, +.Ve +.Sp +It is eagerly hoped that in the future, increasing numbers of formats +(and formatters) will support Unicode characters directly (as (X)HTML +does with \f(CW\*(C`∞\*(C'\fR, \f(CW\*(C`∞\*(C'\fR, or \f(CW\*(C`∞\*(C'\fR), reducing the need +for idiosyncratic mappings of Unicode\-to\-\fImy_escapes\fR. +.IP \(bu 4 +It is up to individual Pod formatter to display good judgement when +confronted with an unrenderable character (which is distinct from an +unknown E<thing> sequence that the parser couldn't resolve to +anything, renderable or not). It is good practice to map Latin letters +with diacritics (like "E<eacute>"/"E<233>") to the corresponding +unaccented US-ASCII letters (like a simple character 101, "e"), but +clearly this is often not feasible, and an unrenderable character may +be represented as "?", or the like. In attempting a sane fallback +(as from E<233> to "e"), Pod formatters may use the +\&\f(CW%Latin1Code_to_fallback\fR table in Pod::Escapes, or +Text::Unidecode, if available. +.Sp +For example, this Pod text: +.Sp +.Vb 1 +\& magic is enabled if you set C<$Currency> to \*(AqE<euro>\*(Aq. +.Ve +.Sp +may be rendered as: +"magic is enabled if you set \f(CW$Currency\fR to '\fI?\fR'" or as +"magic is enabled if you set \f(CW$Currency\fR to '\fB[euro]\fR'", or as +"magic is enabled if you set \f(CW$Currency\fR to '[x20AC]', etc. +.Sp +A Pod formatter may also note, in a comment or warning, a list of what +unrenderable characters were encountered. +.IP \(bu 4 +E<...> may freely appear in any formatting code (other than +in another E<...> or in an Z<>). That is, "X<The +E<euro>1,000,000 Solution>" is valid, as is "L<The +E<euro>1,000,000 Solution|Million::Euros>". +.IP \(bu 4 +Some Pod formatters output to formats that implement non-breaking +spaces as an individual character (which I'll call "NBSP"), and +others output to formats that implement non-breaking spaces just as +spaces wrapped in a "don't break this across lines" code. Note that +at the level of Pod, both sorts of codes can occur: Pod can contain a +NBSP character (whether as a literal, or as a "E<160>" or +"E<nbsp>" code); and Pod can contain "S<foo +I<bar> baz>" codes, where "mere spaces" (character 32) in +such codes are taken to represent non-breaking spaces. Pod +parsers should consider supporting the optional parsing of "S<foo +I<bar> baz>" as if it were +"foo\fINBSP\fRI<bar>\fINBSP\fRbaz", and, going the other way, the +optional parsing of groups of words joined by NBSP's as if each group +were in a S<...> code, so that formatters may use the +representation that maps best to what the output format demands. +.IP \(bu 4 +Some processors may find that the \f(CW\*(C`S<...>\*(C'\fR code is easiest to +implement by replacing each space in the parse tree under the content +of the S, with an NBSP. But note: the replacement should apply \fInot\fR to +spaces in \fIall\fR text, but \fIonly\fR to spaces in \fIprintable\fR text. (This +distinction may or may not be evident in the particular tree/event +model implemented by the Pod parser.) For example, consider this +unusual case: +.Sp +.Vb 1 +\& S<L</Autoloaded Functions>> +.Ve +.Sp +This means that the space in the middle of the visible link text must +not be broken across lines. In other words, it's the same as this: +.Sp +.Vb 1 +\& L<"AutoloadedE<160>Functions"/Autoloaded Functions> +.Ve +.Sp +However, a misapplied space-to-NBSP replacement could (wrongly) +produce something equivalent to this: +.Sp +.Vb 1 +\& L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions> +.Ve +.Sp +\&...which is almost definitely not going to work as a hyperlink (assuming +this formatter outputs a format supporting hypertext). +.Sp +Formatters may choose to just not support the S format code, +especially in cases where the output format simply has no NBSP +character/code and no code for "don't break this stuff across lines". +.IP \(bu 4 +Besides the NBSP character discussed above, implementors are reminded +of the existence of the other "special" character in Latin\-1, the +"soft hyphen" character, also known as "discretionary hyphen", +i.e. \f(CW\*(C`E<173>\*(C'\fR = \f(CW\*(C`E<0xAD>\*(C'\fR = +\&\f(CW\*(C`E<shy>\*(C'\fR). This character expresses an optional hyphenation +point. That is, it normally renders as nothing, but may render as a +"\-" if a formatter breaks the word at that point. Pod formatters +should, as appropriate, do one of the following: 1) render this with +a code with the same meaning (e.g., "\e\-" in RTF), 2) pass it through +in the expectation that the formatter understands this character as +such, or 3) delete it. +.Sp +For example: +.Sp +.Vb 3 +\& sigE<shy>action +\& manuE<shy>script +\& JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi +.Ve +.Sp +These signal to a formatter that if it is to hyphenate "sigaction" +or "manuscript", then it should be done as +"sig\-\fI[linebreak]\fRaction" or "manu\-\fI[linebreak]\fRscript" +(and if it doesn't hyphenate it, then the \f(CW\*(C`E<shy>\*(C'\fR doesn't +show up at all). And if it is +to hyphenate "Jarkko" and/or "Hietaniemi", it can do +so only at the points where there is a \f(CW\*(C`E<shy>\*(C'\fR code. +.Sp +In practice, it is anticipated that this character will not be used +often, but formatters should either support it, or delete it. +.IP \(bu 4 +If you think that you want to add a new command to Pod (like, say, a +"=biblio" command), consider whether you could get the same +effect with a for or begin/end sequence: "=for biblio ..." or "=begin +biblio" ... "=end biblio". Pod processors that don't understand +"=for biblio", etc, will simply ignore it, whereas they may complain +loudly if they see "=biblio". +.IP \(bu 4 +Throughout this document, "Pod" has been the preferred spelling for +the name of the documentation format. One may also use "POD" or +"pod". For the documentation that is (typically) in the Pod +format, you may use "pod", or "Pod", or "POD". Understanding these +distinctions is useful; but obsessing over how to spell them, usually +is not. +.SH "About L<...> Codes" +.IX Header "About L<...> Codes" +As you can tell from a glance at perlpod, the L<...> +code is the most complex of the Pod formatting codes. The points below +will hopefully clarify what it means and how processors should deal +with it. +.IP \(bu 4 +In parsing an L<...> code, Pod parsers must distinguish at least +four attributes: +.RS 4 +.IP First: 4 +.IX Item "First:" +The link-text. If there is none, this must be \f(CW\*(C`undef\*(C'\fR. (E.g., in +"L<Perl Functions|perlfunc>", the link-text is "Perl Functions". +In "L<Time::HiRes>" and even "L<|Time::HiRes>", there is no +link text. Note that link text may contain formatting.) +.IP Second: 4 +.IX Item "Second:" +The possibly inferred link-text; i.e., if there was no real link +text, then this is the text that we'll infer in its place. (E.g., for +"L<Getopt::Std>", the inferred link text is "Getopt::Std".) +.IP Third: 4 +.IX Item "Third:" +The name or URL, or \f(CW\*(C`undef\*(C'\fR if none. (E.g., in "L<Perl +Functions|perlfunc>", the name (also sometimes called the page) +is "perlfunc". In "L</CAVEATS>", the name is \f(CW\*(C`undef\*(C'\fR.) +.IP Fourth: 4 +.IX Item "Fourth:" +The section (AKA "item" in older perlpods), or \f(CW\*(C`undef\*(C'\fR if none. E.g., +in "L<Getopt::Std/DESCRIPTION>", "DESCRIPTION" is the section. (Note +that this is not the same as a manpage section like the "5" in "man 5 +crontab". "Section Foo" in the Pod sense means the part of the text +that's introduced by the heading or item whose text is "Foo".) +.RE +.RS 4 +.Sp +Pod parsers may also note additional attributes including: +.IP Fifth: 4 +.IX Item "Fifth:" +A flag for whether item 3 (if present) is a URL (like +"http://lists.perl.org" is), in which case there should be no section +attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or +possibly a man page name (like "\fBcrontab\fR\|(5)" is). +.IP Sixth: 4 +.IX Item "Sixth:" +The raw original L<...> content, before text is split on +"|", "/", etc, and before E<...> codes are expanded. +.RE +.RS 4 +.Sp +(The above were numbered only for concise reference below. It is not +a requirement that these be passed as an actual list or array.) +.Sp +For example: +.Sp +.Vb 7 +\& L<Foo::Bar> +\& => undef, # link text +\& "Foo::Bar", # possibly inferred link text +\& "Foo::Bar", # name +\& undef, # section +\& \*(Aqpod\*(Aq, # what sort of link +\& "Foo::Bar" # original content +\& +\& L<Perlport\*(Aqs section on NL\*(Aqs|perlport/Newlines> +\& => "Perlport\*(Aqs section on NL\*(Aqs", # link text +\& "Perlport\*(Aqs section on NL\*(Aqs", # possibly inferred link text +\& "perlport", # name +\& "Newlines", # section +\& \*(Aqpod\*(Aq, # what sort of link +\& "Perlport\*(Aqs section on NL\*(Aqs|perlport/Newlines" +\& # original content +\& +\& L<perlport/Newlines> +\& => undef, # link text +\& \*(Aq"Newlines" in perlport\*(Aq, # possibly inferred link text +\& "perlport", # name +\& "Newlines", # section +\& \*(Aqpod\*(Aq, # what sort of link +\& "perlport/Newlines" # original content +\& +\& L<crontab(5)/"DESCRIPTION"> +\& => undef, # link text +\& \*(Aq"DESCRIPTION" in crontab(5)\*(Aq, # possibly inferred link text +\& "crontab(5)", # name +\& "DESCRIPTION", # section +\& \*(Aqman\*(Aq, # what sort of link +\& \*(Aqcrontab(5)/"DESCRIPTION"\*(Aq # original content +\& +\& L</Object Attributes> +\& => undef, # link text +\& \*(Aq"Object Attributes"\*(Aq, # possibly inferred link text +\& undef, # name +\& "Object Attributes", # section +\& \*(Aqpod\*(Aq, # what sort of link +\& "/Object Attributes" # original content +\& +\& L<https://www.perl.org/> +\& => undef, # link text +\& "https://www.perl.org/", # possibly inferred link text +\& "https://www.perl.org/", # name +\& undef, # section +\& \*(Aqurl\*(Aq, # what sort of link +\& "https://www.perl.org/" # original content +\& +\& L<Perl.org|https://www.perl.org/> +\& => "Perl.org", # link text +\& "https://www.perl.org/", # possibly inferred link text +\& "https://www.perl.org/", # name +\& undef, # section +\& \*(Aqurl\*(Aq, # what sort of link +\& "Perl.org|https://www.perl.org/" # original content +.Ve +.Sp +Note that you can distinguish URL-links from anything else by the +fact that they match \f(CW\*(C`m/\eA\ew+:[^:\es]\eS*\ez/\*(C'\fR. So +\&\f(CW\*(C`L<http://www.perl.com>\*(C'\fR is a URL, but +\&\f(CW\*(C`L<HTTP::Response>\*(C'\fR isn't. +.RE +.IP \(bu 4 +In case of L<...> codes with no "text|" part in them, +older formatters have exhibited great variation in actually displaying +the link or cross reference. For example, L<\fBcrontab\fR\|(5)> would render +as "the \f(CWcrontab(5)\fR manpage", or "in the \f(CWcrontab(5)\fR manpage" +or just "\f(CWcrontab(5)\fR". +.Sp +Pod processors must now treat "text|"\-less links as follows: +.Sp +.Vb 3 +\& L<name> => L<name|name> +\& L</section> => L<"section"|/section> +\& L<name/section> => L<"section" in name|name/section> +.Ve +.IP \(bu 4 +Note that section names might contain markup. I.e., if a section +starts with: +.Sp +.Vb 1 +\& =head2 About the C<\-M> Operator +.Ve +.Sp +or with: +.Sp +.Vb 1 +\& =item About the C<\-M> Operator +.Ve +.Sp +then a link to it would look like this: +.Sp +.Vb 1 +\& L<somedoc/About the C<\-M> Operator> +.Ve +.Sp +Formatters may choose to ignore the markup for purposes of resolving +the link and use only the renderable characters in the section name, +as in: +.Sp +.Vb 2 +\& <h1><a name="About_the_\-M_Operator">About the <code>\-M</code> +\& Operator</h1> +\& +\& ... +\& +\& <a href="somedoc#About_the_\-M_Operator">About the <code>\-M</code> +\& Operator" in somedoc</a> +.Ve +.IP \(bu 4 +Previous versions of perlpod distinguished \f(CW\*(C`L<name/"section">\*(C'\fR +links from \f(CW\*(C`L<name/item>\*(C'\fR links (and their targets). These +have been merged syntactically and semantically in the current +specification, and \fIsection\fR can refer either to a "=head\fIn\fR Heading +Content" command or to a "=item Item Content" command. This +specification does not specify what behavior should be in the case +of a given document having several things all seeming to produce the +same \fIsection\fR identifier (e.g., in HTML, several things all producing +the same \fIanchorname\fR in <a name="\fIanchorname\fR">...</a> +elements). Where Pod processors can control this behavior, they should +use the first such anchor. That is, \f(CW\*(C`L<Foo/Bar>\*(C'\fR refers to the +\&\fIfirst\fR "Bar" section in Foo. +.Sp +But for some processors/formats this cannot be easily controlled; as +with the HTML example, the behavior of multiple ambiguous +<a name="\fIanchorname\fR">...</a> is most easily just left up to +browsers to decide. +.IP \(bu 4 +In a \f(CW\*(C`L<text|...>\*(C'\fR code, text may contain formatting codes +for formatting or for E<...> escapes, as in: +.Sp +.Vb 1 +\& L<B<ummE<234>stuff>|...> +.Ve +.Sp +For \f(CW\*(C`L<...>\*(C'\fR codes without a "name|" part, only +\&\f(CW\*(C`E<...>\*(C'\fR and \f(CW\*(C`Z<>\*(C'\fR codes may occur. That is, +authors should not use "\f(CW\*(C`L<B<Foo::Bar>>\*(C'\fR". +.Sp +Note, however, that formatting codes and Z<>'s can occur in any +and all parts of an L<...> (i.e., in \fIname\fR, \fIsection\fR, \fItext\fR, +and \fIurl\fR). +.Sp +Authors must not nest L<...> codes. For example, "L<The +L<Foo::Bar> man page>" should be treated as an error. +.IP \(bu 4 +Note that Pod authors may use formatting codes inside the "text" +part of "L<text|name>" (and so on for L<text|/"sec">). +.Sp +In other words, this is valid: +.Sp +.Vb 1 +\& Go read L<the docs on C<$.>|perlvar/"$."> +.Ve +.Sp +Some output formats that do allow rendering "L<...>" codes as +hypertext, might not allow the link-text to be formatted; in +that case, formatters will have to just ignore that formatting. +.IP \(bu 4 +At time of writing, \f(CW\*(C`L<name>\*(C'\fR values are of two types: +either the name of a Pod page like \f(CW\*(C`L<Foo::Bar>\*(C'\fR (which +might be a real Perl module or program in an \f(CW@INC\fR / PATH +directory, or a .pod file in those places); or the name of a Unix +man page, like \f(CW\*(C`L<crontab(5)>\*(C'\fR. In theory, \f(CW\*(C`L<chmod>\*(C'\fR +is ambiguous between a Pod page called "chmod", or the Unix man page +"chmod" (in whatever man-section). However, the presence of a string +in parens, as in "\fBcrontab\fR\|(5)", is sufficient to signal that what +is being discussed is not a Pod page, and so is presumably a +Unix man page. The distinction is of no importance to many +Pod processors, but some processors that render to hypertext formats +may need to distinguish them in order to know how to render a +given \f(CW\*(C`L<foo>\*(C'\fR code. +.IP \(bu 4 +Previous versions of perlpod allowed for a \f(CW\*(C`L<section>\*(C'\fR syntax (as in +\&\f(CW\*(C`L<Object Attributes>\*(C'\fR), which was not easily distinguishable from +\&\f(CW\*(C`L<name>\*(C'\fR syntax and for \f(CW\*(C`L<"section">\*(C'\fR which was only +slightly less ambiguous. This syntax is no longer in the specification, and +has been replaced by the \f(CW\*(C`L</section>\*(C'\fR syntax (where the slash was +formerly optional). Pod parsers should tolerate the \f(CW\*(C`L<"section">\*(C'\fR +syntax, for a while at least. The suggested heuristic for distinguishing +\&\f(CW\*(C`L<section>\*(C'\fR from \f(CW\*(C`L<name>\*(C'\fR is that if it contains any +whitespace, it's a \fIsection\fR. Pod processors should warn about this being +deprecated syntax. +.SH "About =over...=back Regions" +.IX Header "About =over...=back Regions" +"=over"..."=back" regions are used for various kinds of list-like +structures. (I use the term "region" here simply as a collective +term for everything from the "=over" to the matching "=back".) +.IP \(bu 4 +The non-zero numeric \fIindentlevel\fR in "=over \fIindentlevel\fR" ... +"=back" is used for giving the formatter a clue as to how many +"spaces" (ems, or roughly equivalent units) it should tab over, +although many formatters will have to convert this to an absolute +measurement that may not exactly match with the size of spaces (or M's) +in the document's base font. Other formatters may have to completely +ignore the number. The lack of any explicit \fIindentlevel\fR parameter is +equivalent to an \fIindentlevel\fR value of 4. Pod processors may +complain if \fIindentlevel\fR is present but is not a positive number +matching \f(CW\*(C`m/\eA(\ed*\e.)?\ed+\ez/\*(C'\fR. +.IP \(bu 4 +Authors of Pod formatters are reminded that "=over" ... "=back" may +map to several different constructs in your output format. For +example, in converting Pod to (X)HTML, it can map to any of +<ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or +<blockquote>...</blockquote>. Similarly, "=item" can map to <li> or +<dt>. +.IP \(bu 4 +Each "=over" ... "=back" region should be one of the following: +.RS 4 +.IP \(bu 4 +An "=over" ... "=back" region containing only "=item *" commands, +each followed by some number of ordinary/verbatim paragraphs, other +nested "=over" ... "=back" regions, "=for..." paragraphs, and +"=begin"..."=end" regions. +.Sp +(Pod processors must tolerate a bare "=item" as if it were "=item +*".) Whether "*" is rendered as a literal asterisk, an "o", or as +some kind of real bullet character, is left up to the Pod formatter, +and may depend on the level of nesting. +.IP \(bu 4 +An "=over" ... "=back" region containing only +\&\f(CW\*(C`m/\eA=item\es+\ed+\e.?\es*\ez/\*(C'\fR paragraphs, each one (or each group of them) +followed by some number of ordinary/verbatim paragraphs, other nested +"=over" ... "=back" regions, "=for..." paragraphs, and/or +"=begin"..."=end" codes. Note that the numbers must start at 1 +in each section, and must proceed in order and without skipping +numbers. +.Sp +(Pod processors must tolerate lines like "=item 1" as if they were +"=item 1.", with the period.) +.IP \(bu 4 +An "=over" ... "=back" region containing only "=item [text]" +commands, each one (or each group of them) followed by some number of +ordinary/verbatim paragraphs, other nested "=over" ... "=back" +regions, or "=for..." paragraphs, and "=begin"..."=end" regions. +.Sp +The "=item [text]" paragraph should not match +\&\f(CW\*(C`m/\eA=item\es+\ed+\e.?\es*\ez/\*(C'\fR or \f(CW\*(C`m/\eA=item\es+\e*\es*\ez/\*(C'\fR, nor should it +match just \f(CW\*(C`m/\eA=item\es*\ez/\*(C'\fR. +.IP \(bu 4 +An "=over" ... "=back" region containing no "=item" paragraphs at +all, and containing only some number of +ordinary/verbatim paragraphs, and possibly also some nested "=over" +\&... "=back" regions, "=for..." paragraphs, and "=begin"..."=end" +regions. Such an itemless "=over" ... "=back" region in Pod is +equivalent in meaning to a "<blockquote>...</blockquote>" element in +HTML. +.RE +.RS 4 +.Sp +Note that with all the above cases, you can determine which type of +"=over" ... "=back" you have, by examining the first (non\-"=cut", +non\-"=pod") Pod paragraph after the "=over" command. +.RE +.IP \(bu 4 +Pod formatters \fImust\fR tolerate arbitrarily large amounts of text +in the "=item \fItext...\fR" paragraph. In practice, most such +paragraphs are short, as in: +.Sp +.Vb 1 +\& =item For cutting off our trade with all parts of the world +.Ve +.Sp +But they may be arbitrarily long: +.Sp +.Vb 2 +\& =item For transporting us beyond seas to be tried for pretended +\& offenses +\& +\& =item He is at this time transporting large armies of foreign +\& mercenaries to complete the works of death, desolation and +\& tyranny, already begun with circumstances of cruelty and perfidy +\& scarcely paralleled in the most barbarous ages, and totally +\& unworthy the head of a civilized nation. +.Ve +.IP \(bu 4 +Pod processors should tolerate "=item *" / "=item \fInumber\fR" commands +with no accompanying paragraph. The middle item is an example: +.Sp +.Vb 1 +\& =over +\& +\& =item 1 +\& +\& Pick up dry cleaning. +\& +\& =item 2 +\& +\& =item 3 +\& +\& Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs. +\& +\& =back +.Ve +.IP \(bu 4 +No "=over" ... "=back" region can contain headings. Processors may +treat such a heading as an error. +.IP \(bu 4 +Note that an "=over" ... "=back" region should have some +content. That is, authors should not have an empty region like this: +.Sp +.Vb 1 +\& =over +\& +\& =back +.Ve +.Sp +Pod processors seeing such a contentless "=over" ... "=back" region, +may ignore it, or may report it as an error. +.IP \(bu 4 +Processors must tolerate an "=over" list that goes off the end of the +document (i.e., which has no matching "=back"), but they may warn +about such a list. +.IP \(bu 4 +Authors of Pod formatters should note that this construct: +.Sp +.Vb 1 +\& =item Neque +\& +\& =item Porro +\& +\& =item Quisquam Est +\& +\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci +\& velit, sed quia non numquam eius modi tempora incidunt ut +\& labore et dolore magnam aliquam quaerat voluptatem. +\& +\& =item Ut Enim +.Ve +.Sp +is semantically ambiguous, in a way that makes formatting decisions +a bit difficult. On the one hand, it could be mention of an item +"Neque", mention of another item "Porro", and mention of another +item "Quisquam Est", with just the last one requiring the explanatory +paragraph "Qui dolorem ipsum quia dolor..."; and then an item +"Ut Enim". In that case, you'd want to format it like so: +.Sp +.Vb 1 +\& Neque +\& +\& Porro +\& +\& Quisquam Est +\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci +\& velit, sed quia non numquam eius modi tempora incidunt ut +\& labore et dolore magnam aliquam quaerat voluptatem. +\& +\& Ut Enim +.Ve +.Sp +But it could equally well be a discussion of three (related or equivalent) +items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph +explaining them all, and then a new item "Ut Enim". In that case, you'd +probably want to format it like so: +.Sp +.Vb 6 +\& Neque +\& Porro +\& Quisquam Est +\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci +\& velit, sed quia non numquam eius modi tempora incidunt ut +\& labore et dolore magnam aliquam quaerat voluptatem. +\& +\& Ut Enim +.Ve +.Sp +But (for the foreseeable future), Pod does not provide any way for Pod +authors to distinguish which grouping is meant by the above +"=item"\-cluster structure. So formatters should format it like so: +.Sp +.Vb 1 +\& Neque +\& +\& Porro +\& +\& Quisquam Est +\& +\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci +\& velit, sed quia non numquam eius modi tempora incidunt ut +\& labore et dolore magnam aliquam quaerat voluptatem. +\& +\& Ut Enim +.Ve +.Sp +That is, there should be (at least roughly) equal spacing between +items as between paragraphs (although that spacing may well be less +than the full height of a line of text). This leaves it to the reader +to use (con)textual cues to figure out whether the "Qui dolorem +ipsum..." paragraph applies to the "Quisquam Est" item or to all three +items "Neque", "Porro", and "Quisquam Est". While not an ideal +situation, this is preferable to providing formatting cues that may +be actually contrary to the author's intent. +.SH "About Data Paragraphs and ""=begin/=end"" Regions" +.IX Header "About Data Paragraphs and ""=begin/=end"" Regions" +Data paragraphs are typically used for inlining non-Pod data that is +to be used (typically passed through) when rendering the document to +a specific format: +.PP +.Vb 1 +\& =begin rtf +\& +\& \epar{\epard\eqr\esa4500{\ei Printed\e~\echdate\e~\echtime}\epar} +\& +\& =end rtf +.Ve +.PP +The exact same effect could, incidentally, be achieved with a single +"=for" paragraph: +.PP +.Vb 1 +\& =for rtf \epar{\epard\eqr\esa4500{\ei Printed\e~\echdate\e~\echtime}\epar} +.Ve +.PP +(Although that is not formally a data paragraph, it has the same +meaning as one, and Pod parsers may parse it as one.) +.PP +Another example of a data paragraph: +.PP +.Vb 1 +\& =begin html +\& +\& I like <em>PIE</em>! +\& +\& <hr>Especially pecan pie! +\& +\& =end html +.Ve +.PP +If these were ordinary paragraphs, the Pod parser would try to +expand the "E</em>" (in the first paragraph) as a formatting +code, just like "E<lt>" or "E<eacute>". But since this +is in a "=begin \fIidentifier\fR"..."=end \fIidentifier\fR" region \fIand\fR +the identifier "html" doesn't begin have a ":" prefix, the contents +of this region are stored as data paragraphs, instead of being +processed as ordinary paragraphs (or if they began with a spaces +and/or tabs, as verbatim paragraphs). +.PP +As a further example: At time of writing, no "biblio" identifier is +supported, but suppose some processor were written to recognize it as +a way of (say) denoting a bibliographic reference (necessarily +containing formatting codes in ordinary paragraphs). The fact that +"biblio" paragraphs were meant for ordinary processing would be +indicated by prefacing each "biblio" identifier with a colon: +.PP +.Vb 1 +\& =begin :biblio +\& +\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures = +\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ. +\& +\& =end :biblio +.Ve +.PP +This would signal to the parser that paragraphs in this begin...end +region are subject to normal handling as ordinary/verbatim paragraphs +(while still tagged as meant only for processors that understand the +"biblio" identifier). The same effect could be had with: +.PP +.Vb 3 +\& =for :biblio +\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures = +\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ. +.Ve +.PP +The ":" on these identifiers means simply "process this stuff +normally, even though the result will be for some special target". +I suggest that parser APIs report "biblio" as the target identifier, +but also report that it had a ":" prefix. (And similarly, with the +above "html", report "html" as the target identifier, and note the +\&\fIlack\fR of a ":" prefix.) +.PP +Note that a "=begin \fIidentifier\fR"..."=end \fIidentifier\fR" region where +\&\fIidentifier\fR begins with a colon, \fIcan\fR contain commands. For example: +.PP +.Vb 1 +\& =begin :biblio +\& +\& Wirth\*(Aqs classic is available in several editions, including: +\& +\& =for comment +\& hm, check abebooks.com for how much used copies cost. +\& +\& =over +\& +\& =item +\& +\& Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> +\& Teubner, Stuttgart. [Yes, it\*(Aqs in German.] +\& +\& =item +\& +\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures = +\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ. +\& +\& =back +\& +\& =end :biblio +.Ve +.PP +Note, however, a "=begin \fIidentifier\fR"..."=end \fIidentifier\fR" +region where \fIidentifier\fR does \fInot\fR begin with a colon, should not +directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back", +nor "=item". For example, this may be considered invalid: +.PP +.Vb 1 +\& =begin somedata +\& +\& This is a data paragraph. +\& +\& =head1 Don\*(Aqt do this! +\& +\& This is a data paragraph too. +\& +\& =end somedata +.Ve +.PP +A Pod processor may signal that the above (specifically the "=head1" +paragraph) is an error. Note, however, that the following should +\&\fInot\fR be treated as an error: +.PP +.Vb 1 +\& =begin somedata +\& +\& This is a data paragraph. +\& +\& =cut +\& +\& # Yup, this isn\*(Aqt Pod anymore. +\& sub excl { (rand() > .5) ? "hoo!" : "hah!" } +\& +\& =pod +\& +\& This is a data paragraph too. +\& +\& =end somedata +.Ve +.PP +And this too is valid: +.PP +.Vb 1 +\& =begin someformat +\& +\& This is a data paragraph. +\& +\& And this is a data paragraph. +\& +\& =begin someotherformat +\& +\& This is a data paragraph too. +\& +\& And this is a data paragraph too. +\& +\& =begin :yetanotherformat +\& +\& =head2 This is a command paragraph! +\& +\& This is an ordinary paragraph! +\& +\& And this is a verbatim paragraph! +\& +\& =end :yetanotherformat +\& +\& =end someotherformat +\& +\& Another data paragraph! +\& +\& =end someformat +.Ve +.PP +The contents of the above "=begin :yetanotherformat" ... +"=end :yetanotherformat" region \fIaren't\fR data paragraphs, because +the immediately containing region's identifier (":yetanotherformat") +begins with a colon. In practice, most regions that contain +data paragraphs will contain \fIonly\fR data paragraphs; however, +the above nesting is syntactically valid as Pod, even if it is +rare. However, the handlers for some formats, like "html", +will accept only data paragraphs, not nested regions; and they may +complain if they see (targeted for them) nested regions, or commands, +other than "=end", "=pod", and "=cut". +.PP +Also consider this valid structure: +.PP +.Vb 1 +\& =begin :biblio +\& +\& Wirth\*(Aqs classic is available in several editions, including: +\& +\& =over +\& +\& =item +\& +\& Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> +\& Teubner, Stuttgart. [Yes, it\*(Aqs in German.] +\& +\& =item +\& +\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures = +\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ. +\& +\& =back +\& +\& Buy buy buy! +\& +\& =begin html +\& +\& <img src=\*(Aqwirth_spokesmodeling_book.png\*(Aq> +\& +\& <hr> +\& +\& =end html +\& +\& Now now now! +\& +\& =end :biblio +.Ve +.PP +There, the "=begin html"..."=end html" region is nested inside +the larger "=begin :biblio"..."=end :biblio" region. Note that the +content of the "=begin html"..."=end html" region is data +paragraph(s), because the immediately containing region's identifier +("html") \fIdoesn't\fR begin with a colon. +.PP +Pod parsers, when processing a series of data paragraphs one +after another (within a single region), should consider them to +be one large data paragraph that happens to contain blank lines. So +the content of the above "=begin html"..."=end html" \fImay\fR be stored +as two data paragraphs (one consisting of +"<img src='wirth_spokesmodeling_book.png'>\en" +and another consisting of "<hr>\en"), but \fIshould\fR be stored as +a single data paragraph (consisting of +"<img src='wirth_spokesmodeling_book.png'>\en\en<hr>\en"). +.PP +Pod processors should tolerate empty +"=begin \fIsomething\fR"..."=end \fIsomething\fR" regions, +empty "=begin :\fIsomething\fR"..."=end :\fIsomething\fR" regions, and +contentless "=for \fIsomething\fR" and "=for :\fIsomething\fR" +paragraphs. I.e., these should be tolerated: +.PP +.Vb 1 +\& =for html +\& +\& =begin html +\& +\& =end html +\& +\& =begin :biblio +\& +\& =end :biblio +.Ve +.PP +Incidentally, note that there's no easy way to express a data +paragraph starting with something that looks like a command. Consider: +.PP +.Vb 1 +\& =begin stuff +\& +\& =shazbot +\& +\& =end stuff +.Ve +.PP +There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data +paragraph "=shazbot\en". However, you can express a data paragraph consisting +of "=shazbot\en" using this code: +.PP +.Vb 1 +\& =for stuff =shazbot +.Ve +.PP +The situation where this is necessary, is presumably quite rare. +.PP +Note that =end commands must match the currently open =begin command. That +is, they must properly nest. For example, this is valid: +.PP +.Vb 1 +\& =begin outer +\& +\& X +\& +\& =begin inner +\& +\& Y +\& +\& =end inner +\& +\& Z +\& +\& =end outer +.Ve +.PP +while this is invalid: +.PP +.Vb 1 +\& =begin outer +\& +\& X +\& +\& =begin inner +\& +\& Y +\& +\& =end outer +\& +\& Z +\& +\& =end inner +.Ve +.PP +This latter is improper because when the "=end outer" command is seen, the +currently open region has the formatname "inner", not "outer". (It just +happens that "outer" is the format name of a higher-up region.) This is +an error. Processors must by default report this as an error, and may halt +processing the document containing that error. A corollary of this is that +regions cannot "overlap". That is, the latter block above does not represent +a region called "outer" which contains X and Y, overlapping a region called +"inner" which contains Y and Z. But because it is invalid (as all +apparently overlapping regions would be), it doesn't represent that, or +anything at all. +.PP +Similarly, this is invalid: +.PP +.Vb 1 +\& =begin thing +\& +\& =end hting +.Ve +.PP +This is an error because the region is opened by "thing", and the "=end" +tries to close "hting" [sic]. +.PP +This is also invalid: +.PP +.Vb 1 +\& =begin thing +\& +\& =end +.Ve +.PP +This is invalid because every "=end" command must have a formatname +parameter. +.SH "SEE ALSO" +.IX Header "SEE ALSO" +perlpod, "PODs: Embedded Documentation" in perlsyn, +podchecker +.SH AUTHOR +.IX Header "AUTHOR" +Sean M. Burke |