diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
commit | fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch) | |
tree | ce1e3bce06471410239a6f41282e328770aa404a /upstream/debian-bookworm/man1/perlpodspec.1 | |
parent | Initial commit. (diff) | |
download | manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip |
Adding upstream version 4.22.0.upstream/4.22.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'upstream/debian-bookworm/man1/perlpodspec.1')
-rw-r--r-- | upstream/debian-bookworm/man1/perlpodspec.1 | 1912 |
1 files changed, 1912 insertions, 0 deletions
diff --git a/upstream/debian-bookworm/man1/perlpodspec.1 b/upstream/debian-bookworm/man1/perlpodspec.1 new file mode 100644 index 00000000..bdfd6780 --- /dev/null +++ b/upstream/debian-bookworm/man1/perlpodspec.1 @@ -0,0 +1,1912 @@ +.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" Set up some character translations and predefined strings. \*(-- will +.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left +.\" double quote, and \*(R" will give a right double quote. \*(C+ will +.\" give a nicer C++. Capital omega is used to do unbreakable dashes and +.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, +.\" nothing in troff, for use with C<>. +.tr \(*W- +.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' +.ie n \{\ +. ds -- \(*W- +. ds PI pi +. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch +. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch +. ds L" "" +. ds R" "" +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds -- \|\(em\| +. ds PI \(*p +. ds L" `` +. ds R" '' +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "PERLPODSPEC 1" +.TH PERLPODSPEC 1 "2023-11-25" "perl v5.36.0" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH "NAME" +perlpodspec \- Plain Old Documentation: format specification and notes +.SH "DESCRIPTION" +.IX Header "DESCRIPTION" +This document is detailed notes on the Pod markup language. Most +people will only have to read perlpod to know how to write +in Pod, but this document may answer some incidental questions to do +with parsing and rendering Pod. +.PP +In this document, \*(L"must\*(R" / \*(L"must not\*(R", \*(L"should\*(R" / +\&\*(L"should not\*(R", and \*(L"may\*(R" have their conventional (cf. \s-1RFC 2119\s0) +meanings: \*(L"X must do Y\*(R" means that if X doesn't do Y, it's against +this specification, and should really be fixed. \*(L"X should do Y\*(R" +means that it's recommended, but X may fail to do Y, if there's a +good reason. \*(L"X may do Y\*(R" is merely a note that X can do Y at +will (although it is up to the reader to detect any connotation of +"and I think it would be \fInice\fR if X did Y\*(L" versus \*(R"it wouldn't +really \fIbother\fR me if X did Y"). +.PP +Notably, when I say \*(L"the parser should do Y\*(R", the +parser may fail to do Y, if the calling application explicitly +requests that the parser \fInot\fR do Y. I often phrase this as +\&\*(L"the parser should, by default, do Y.\*(R" This doesn't \fIrequire\fR +the parser to provide an option for turning off whatever +feature Y is (like expanding tabs in verbatim paragraphs), although +it implicates that such an option \fImay\fR be provided. +.SH "Pod Definitions" +.IX Header "Pod Definitions" +Pod is embedded in files, typically Perl source files, although you +can write a file that's nothing but Pod. +.PP +A \fBline\fR in a file consists of zero or more non-newline characters, +terminated by either a newline or the end of the file. +.PP +A \fBnewline sequence\fR is usually a platform-dependent concept, but +Pod parsers should understand it to mean any of \s-1CR\s0 (\s-1ASCII 13\s0), \s-1LF\s0 +(\s-1ASCII 10\s0), or a \s-1CRLF\s0 (\s-1ASCII 13\s0 followed immediately by \s-1ASCII 10\s0), in +addition to any other system-specific meaning. The first \s-1CR/CRLF/LF\s0 +sequence in the file may be used as the basis for identifying the +newline sequence for parsing the rest of the file. +.PP +A \fBblank line\fR is a line consisting entirely of zero or more spaces +(\s-1ASCII 32\s0) or tabs (\s-1ASCII 9\s0), and terminated by a newline or end-of-file. +A \fBnon-blank line\fR is a line containing one or more characters other +than space or tab (and terminated by a newline or end-of-file). +.PP +(\fINote:\fR Many older Pod parsers did not accept a line consisting of +spaces/tabs and then a newline as a blank line. The only lines they +considered blank were lines consisting of \fIno characters at all\fR, +terminated by a newline.) +.PP +\&\fBWhitespace\fR is used in this document as a blanket term for spaces, +tabs, and newline sequences. (By itself, this term usually refers +to literal whitespace. That is, sequences of whitespace characters +in Pod source, as opposed to \*(L"E<32>\*(R", which is a formatting +code that \fIdenotes\fR a whitespace character.) +.PP +A \fBPod parser\fR is a module meant for parsing Pod (regardless of +whether this involves calling callbacks or building a parse tree or +directly formatting it). A \fBPod formatter\fR (or \fBPod translator\fR) +is a module or program that converts Pod to some other format (\s-1HTML,\s0 +plaintext, TeX, PostScript, \s-1RTF\s0). A \fBPod processor\fR might be a +formatter or translator, or might be a program that does something +else with the Pod (like counting words, scanning for index points, +etc.). +.PP +Pod content is contained in \fBPod blocks\fR. A Pod block starts with a +line that matches \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR, and continues up to the next line +that matches \f(CW\*(C`m/\eA=cut/\*(C'\fR or up to the end of the file if there is +no \f(CW\*(C`m/\eA=cut/\*(C'\fR line. +.PP +Note that a parser is not expected to distinguish between something that +looks like pod, but is in a quoted string, such as a here document. +.PP +Within a Pod block, there are \fBPod paragraphs\fR. A Pod paragraph +consists of non-blank lines of text, separated by one or more blank +lines. +.PP +For purposes of Pod processing, there are four types of paragraphs in +a Pod block: +.IP "\(bu" 4 +A command paragraph (also called a \*(L"directive\*(R"). The first line of +this paragraph must match \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR. Command paragraphs are +typically one line, as in: +.Sp +.Vb 1 +\& =head1 NOTES +\& +\& =item * +.Ve +.Sp +But they may span several (non-blank) lines: +.Sp +.Vb 3 +\& =for comment +\& Hm, I wonder what it would look like if +\& you tried to write a BNF for Pod from this. +\& +\& =head3 Dr. Strangelove, or: How I Learned to +\& Stop Worrying and Love the Bomb +.Ve +.Sp +\&\fISome\fR command paragraphs allow formatting codes in their content +(i.e., after the part that matches \f(CW\*(C`m/\eA=[a\-zA\-Z]\eS*\es*/\*(C'\fR), as in: +.Sp +.Vb 1 +\& =head1 Did You Remember to C<use strict;>? +.Ve +.Sp +In other words, the Pod processing handler for \*(L"head1\*(R" will apply the +same processing to \*(L"Did You Remember to C<use strict;>?\*(R" that it +would to an ordinary paragraph (i.e., formatting codes like +\&\*(L"C<...>\*(R") are parsed and presumably formatted appropriately, and +whitespace in the form of literal spaces and/or tabs is not +significant. +.IP "\(bu" 4 +A \fBverbatim paragraph\fR. The first line of this paragraph must be a +literal space or tab, and this paragraph must not be inside a "=begin +\&\fIidentifier\fR\*(L", ... \*(R"=end \fIidentifier\fR\*(L" sequence unless +\&\*(R"\fIidentifier\fR\*(L" begins with a colon (\*(R":"). That is, if a paragraph +starts with a literal space or tab, but \fIis\fR inside a +"=begin \fIidentifier\fR\*(L", ... \*(R"=end \fIidentifier\fR\*(L" region, then it's +a data paragraph, unless \*(R"\fIidentifier\fR" begins with a colon. +.Sp +Whitespace \fIis\fR significant in verbatim paragraphs (although, in +processing, tabs are probably expanded). +.IP "\(bu" 4 +An \fBordinary paragraph\fR. A paragraph is an ordinary paragraph +if its first line matches neither \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR nor +\&\f(CW\*(C`m/\eA[ \et]/\*(C'\fR, \fIand\fR if it's not inside a "=begin \fIidentifier\fR\*(L", +\&... \*(R"=end \fIidentifier\fR\*(L" sequence unless \*(R"\fIidentifier\fR\*(L" begins with +a colon (\*(R":"). +.IP "\(bu" 4 +A \fBdata paragraph\fR. This is a paragraph that \fIis\fR inside a "=begin +\&\fIidentifier\fR\*(L" ... \*(R"=end \fIidentifier\fR\*(L" sequence where +\&\*(R"\fIidentifier\fR" does \fInot\fR begin with a literal colon (\*(L":\*(R"). In +some sense, a data paragraph is not part of Pod at all (i.e., +effectively it's \*(L"out-of-band\*(R"), since it's not subject to most kinds +of Pod parsing; but it is specified here, since Pod +parsers need to be able to call an event for it, or store it in some +form in a parse tree, or at least just parse \fIaround\fR it. +.PP +For example: consider the following paragraphs: +.PP +.Vb 1 +\& # <\- that\*(Aqs the 0th column +\& +\& =head1 Foo +\& +\& Stuff +\& +\& $foo\->bar +\& +\& =cut +.Ve +.PP +Here, \*(L"=head1 Foo\*(R" and \*(L"=cut\*(R" are command paragraphs because the first +line of each matches \f(CW\*(C`m/\eA=[a\-zA\-Z]/\*(C'\fR. "\fI[space][space]\fR\f(CW$foo\fR\->bar\*(L" +is a verbatim paragraph, because its first line starts with a literal +whitespace character (and there's no \*(R"=begin\*(L"...\*(R"=end" region around). +.PP +The "=begin \fIidentifier\fR\*(L" ... \*(R"=end \fIidentifier\fR" commands stop +paragraphs that they surround from being parsed as ordinary or verbatim +paragraphs, if \fIidentifier\fR doesn't begin with a colon. This +is discussed in detail in the section +\&\*(L"About Data Paragraphs and \*(R"=begin/=end\*(L" Regions\*(R". +.SH "Pod Commands" +.IX Header "Pod Commands" +This section is intended to supplement and clarify the discussion in +\&\*(L"Command Paragraph\*(R" in perlpod. These are the currently recognized +Pod commands: +.ie n .IP """=head1"", ""=head2"", ""=head3"", ""=head4"", ""=head5"", ""=head6""" 4 +.el .IP "``=head1'', ``=head2'', ``=head3'', ``=head4'', ``=head5'', ``=head6''" 4 +.IX Item "=head1, =head2, =head3, =head4, =head5, =head6" +This command indicates that the text in the remainder of the paragraph +is a heading. That text may contain formatting codes. Examples: +.Sp +.Vb 1 +\& =head1 Object Attributes +\& +\& =head3 What B<Not> to Do! +.Ve +.Sp +Both \f(CW\*(C`=head5\*(C'\fR and \f(CW\*(C`=head6\*(C'\fR were added in 2020 and might not be +supported on all Pod parsers. Pod::Simple 3.41 was released on October +2020 and supports both of these providing support for all +Pod::Simple\-based Pod parsers. +.ie n .IP """=pod""" 4 +.el .IP "``=pod''" 4 +.IX Item "=pod" +This command indicates that this paragraph begins a Pod block. (If we +are already in the middle of a Pod block, this command has no effect at +all.) If there is any text in this command paragraph after \*(L"=pod\*(R", +it must be ignored. Examples: +.Sp +.Vb 1 +\& =pod +\& +\& This is a plain Pod paragraph. +\& +\& =pod This text is ignored. +.Ve +.ie n .IP """=cut""" 4 +.el .IP "``=cut''" 4 +.IX Item "=cut" +This command indicates that this line is the end of this previously +started Pod block. If there is any text after \*(L"=cut\*(R" on the line, it must be +ignored. Examples: +.Sp +.Vb 1 +\& =cut +\& +\& =cut The documentation ends here. +\& +\& =cut +\& # This is the first line of program text. +\& sub foo { # This is the second. +.Ve +.Sp +It is an error to try to \fIstart\fR a Pod block with a \*(L"=cut\*(R" command. In +that case, the Pod processor must halt parsing of the input file, and +must by default emit a warning. +.ie n .IP """=over""" 4 +.el .IP "``=over''" 4 +.IX Item "=over" +This command indicates that this is the start of a list/indent +region. If there is any text following the \*(L"=over\*(R", it must consist +of only a nonzero positive numeral. The semantics of this numeral is +explained in the \*(L"About =over...=back Regions\*(R" section, further +below. Formatting codes are not expanded. Examples: +.Sp +.Vb 1 +\& =over 3 +\& +\& =over 3.5 +\& +\& =over +.Ve +.ie n .IP """=item""" 4 +.el .IP "``=item''" 4 +.IX Item "=item" +This command indicates that an item in a list begins here. Formatting +codes are processed. The semantics of the (optional) text in the +remainder of this paragraph are +explained in the \*(L"About =over...=back Regions\*(R" section, further +below. Examples: +.Sp +.Vb 1 +\& =item +\& +\& =item * +\& +\& =item * +\& +\& =item 14 +\& +\& =item 3. +\& +\& =item C<< $thing\->stuff(I<dodad>) >> +\& +\& =item For transporting us beyond seas to be tried for pretended +\& offenses +\& +\& =item He is at this time transporting large armies of foreign +\& mercenaries to complete the works of death, desolation and +\& tyranny, already begun with circumstances of cruelty and perfidy +\& scarcely paralleled in the most barbarous ages, and totally +\& unworthy the head of a civilized nation. +.Ve +.ie n .IP """=back""" 4 +.el .IP "``=back''" 4 +.IX Item "=back" +This command indicates that this is the end of the region begun +by the most recent \*(L"=over\*(R" command. It permits no text after the +\&\*(L"=back\*(R" command. +.ie n .IP """=begin formatname""" 4 +.el .IP "``=begin formatname''" 4 +.IX Item "=begin formatname" +.PD 0 +.ie n .IP """=begin formatname parameter""" 4 +.el .IP "``=begin formatname parameter''" 4 +.IX Item "=begin formatname parameter" +.PD +This marks the following paragraphs (until the matching \*(L"=end +formatname\*(R") as being for some special kind of processing. Unless +\&\*(L"formatname\*(R" begins with a colon, the contained non-command +paragraphs are data paragraphs. But if \*(L"formatname\*(R" \fIdoes\fR begin +with a colon, then non-command paragraphs are ordinary paragraphs +or data paragraphs. This is discussed in detail in the section +\&\*(L"About Data Paragraphs and \*(R"=begin/=end\*(L" Regions\*(R". +.Sp +It is advised that formatnames match the regexp +\&\f(CW\*(C`m/\eA:?[\-a\-zA\-Z0\-9_]+\ez/\*(C'\fR. Everything following whitespace after the +formatname is a parameter that may be used by the formatter when dealing +with this region. This parameter must not be repeated in the \*(L"=end\*(R" +paragraph. Implementors should anticipate future expansion in the +semantics and syntax of the first parameter to \*(L"=begin\*(R"/\*(L"=end\*(R"/\*(L"=for\*(R". +.ie n .IP """=end formatname""" 4 +.el .IP "``=end formatname''" 4 +.IX Item "=end formatname" +This marks the end of the region opened by the matching +\&\*(L"=begin formatname\*(R" region. If \*(L"formatname\*(R" is not the formatname +of the most recent open \*(L"=begin formatname\*(R" region, then this +is an error, and must generate an error message. This +is discussed in detail in the section +\&\*(L"About Data Paragraphs and \*(R"=begin/=end\*(L" Regions\*(R". +.ie n .IP """=for formatname text...""" 4 +.el .IP "``=for formatname text...''" 4 +.IX Item "=for formatname text..." +This is synonymous with: +.Sp +.Vb 1 +\& =begin formatname +\& +\& text... +\& +\& =end formatname +.Ve +.Sp +That is, it creates a region consisting of a single paragraph; that +paragraph is to be treated as a normal paragraph if \*(L"formatname\*(R" +begins with a \*(L":\*(R"; if \*(L"formatname\*(R" \fIdoesn't\fR begin with a colon, +then \*(L"text...\*(R" will constitute a data paragraph. There is no way +to use \*(L"=for formatname text...\*(R" to express \*(L"text...\*(R" as a verbatim +paragraph. +.ie n .IP """=encoding encodingname""" 4 +.el .IP "``=encoding encodingname''" 4 +.IX Item "=encoding encodingname" +This command, which should occur early in the document (at least +before any non-US-ASCII data!), declares that this document is +encoded in the encoding \fIencodingname\fR, which must be +an encoding name that Encode recognizes. (Encode's list +of supported encodings, in Encode::Supported, is useful here.) +If the Pod parser cannot decode the declared encoding, it +should emit a warning and may abort parsing the document +altogether. +.Sp +A document having more than one \*(L"=encoding\*(R" line should be +considered an error. Pod processors may silently tolerate this if +the not-first \*(L"=encoding\*(R" lines are just duplicates of the +first one (e.g., if there's a \*(L"=encoding utf8\*(R" line, and later on +another \*(L"=encoding utf8\*(R" line). But Pod processors should complain if +there are contradictory \*(L"=encoding\*(R" lines in the same document +(e.g., if there is a \*(L"=encoding utf8\*(R" early in the document and +\&\*(L"=encoding big5\*(R" later). Pod processors that recognize BOMs +may also complain if they see an \*(L"=encoding\*(R" line +that contradicts the \s-1BOM\s0 (e.g., if a document with a \s-1UTF\-16LE +BOM\s0 has an \*(L"=encoding shiftjis\*(R" line). +.PP +If a Pod processor sees any command other than the ones listed +above (like \*(L"=head\*(R", or \*(L"=haed1\*(R", or \*(L"=stuff\*(R", or \*(L"=cuttlefish\*(R", +or \*(L"=w123\*(R"), that processor must by default treat this as an +error. It must not process the paragraph beginning with that +command, must by default warn of this as an error, and may +abort the parse. A Pod parser may allow a way for particular +applications to add to the above list of known commands, and to +stipulate, for each additional command, whether formatting +codes should be processed. +.PP +Future versions of this specification may add additional +commands. +.SH "Pod Formatting Codes" +.IX Header "Pod Formatting Codes" +(Note that in previous drafts of this document and of perlpod, +formatting codes were referred to as \*(L"interior sequences\*(R", and +this term may still be found in the documentation for Pod parsers, +and in error messages from Pod processors.) +.PP +There are two syntaxes for formatting codes: +.IP "\(bu" 4 +A formatting code starts with a capital letter (just US-ASCII [A\-Z]) +followed by a \*(L"<\*(R", any number of characters, and ending with the first +matching \*(L">\*(R". Examples: +.Sp +.Vb 1 +\& That\*(Aqs what I<you> think! +\& +\& What\*(Aqs C<CORE::dump()> for? +\& +\& X<C<chmod> and C<unlink()> Under Different Operating Systems> +.Ve +.IP "\(bu" 4 +A formatting code starts with a capital letter (just US-ASCII [A\-Z]) +followed by two or more \*(L"<\*(R"'s, one or more whitespace characters, +any number of characters, one or more whitespace characters, +and ending with the first matching sequence of two or more \*(L">\*(R"'s, where +the number of \*(L">\*(R"'s equals the number of \*(L"<\*(R"'s in the opening of this +formatting code. Examples: +.Sp +.Vb 1 +\& That\*(Aqs what I<< you >> think! +\& +\& C<<< open(X, ">>thing.dat") || die $! >>> +\& +\& B<< $foo\->bar(); >> +.Ve +.Sp +With this syntax, the whitespace character(s) after the \*(L"C<<<\*(R" +and before the \*(L">>>\*(R" (or whatever letter) are \fInot\fR renderable. They +do not signify whitespace, are merely part of the formatting codes +themselves. That is, these are all synonymous: +.Sp +.Vb 7 +\& C<thing> +\& C<< thing >> +\& C<< thing >> +\& C<<< thing >>> +\& C<<<< +\& thing +\& >>>> +.Ve +.Sp +and so on. +.Sp +Finally, the multiple-angle-bracket form does \fInot\fR alter the interpretation +of nested formatting codes, meaning that the following four example lines are +identical in meaning: +.Sp +.Vb 1 +\& B<example: C<$a E<lt>=E<gt> $b>> +\& +\& B<example: C<< $a <=> $b >>> +\& +\& B<example: C<< $a E<lt>=E<gt> $b >>> +\& +\& B<<< example: C<< $a E<lt>=E<gt> $b >> >>> +.Ve +.PP +In parsing Pod, a notably tricky part is the correct parsing of +(potentially nested!) formatting codes. Implementors should +consult the code in the \f(CW\*(C`parse_text\*(C'\fR routine in Pod::Parser as an +example of a correct implementation. +.ie n .IP """I<text>"" \*(-- italic text" 4 +.el .IP "\f(CWI<text>\fR \*(-- italic text" 4 +.IX Item "I<text> italic text" +See the brief discussion in \*(L"Formatting Codes\*(R" in perlpod. +.ie n .IP """B<text>"" \*(-- bold text" 4 +.el .IP "\f(CWB<text>\fR \*(-- bold text" 4 +.IX Item "B<text> bold text" +See the brief discussion in \*(L"Formatting Codes\*(R" in perlpod. +.ie n .IP """C<code>"" \*(-- code text" 4 +.el .IP "\f(CWC<code>\fR \*(-- code text" 4 +.IX Item "C<code> code text" +See the brief discussion in \*(L"Formatting Codes\*(R" in perlpod. +.ie n .IP """F<filename>"" \*(-- style for filenames" 4 +.el .IP "\f(CWF<filename>\fR \*(-- style for filenames" 4 +.IX Item "F<filename> style for filenames" +See the brief discussion in \*(L"Formatting Codes\*(R" in perlpod. +.ie n .IP """X<topic name>"" \*(-- an index entry" 4 +.el .IP "\f(CWX<topic name>\fR \*(-- an index entry" 4 +.IX Item "X<topic name> an index entry" +See the brief discussion in \*(L"Formatting Codes\*(R" in perlpod. +.Sp +This code is unusual in that most formatters completely discard +this code and its content. Other formatters will render it with +invisible codes that can be used in building an index of +the current document. +.ie n .IP """Z<>"" \*(-- a null (zero-effect) formatting code" 4 +.el .IP "\f(CWZ<>\fR \*(-- a null (zero-effect) formatting code" 4 +.IX Item "Z<> a null (zero-effect) formatting code" +Discussed briefly in \*(L"Formatting Codes\*(R" in perlpod. +.Sp +This code is unusual in that it should have no content. That is, +a processor may complain if it sees \f(CW\*(C`Z<potatoes>\*(C'\fR. Whether +or not it complains, the \fIpotatoes\fR text should ignored. +.ie n .IP """L<name>"" \*(-- a hyperlink" 4 +.el .IP "\f(CWL<name>\fR \*(-- a hyperlink" 4 +.IX Item "L<name> a hyperlink" +The complicated syntaxes of this code are discussed at length in +\&\*(L"Formatting Codes\*(R" in perlpod, and implementation details are +discussed below, in \*(L"About L<...> Codes\*(R". Parsing the +contents of L<content> is tricky. Notably, the content has to be +checked for whether it looks like a \s-1URL,\s0 or whether it has to be split +on literal \*(L"|\*(R" and/or \*(L"/\*(R" (in the right order!), and so on, +\&\fIbefore\fR E<...> codes are resolved. +.ie n .IP """E<escape>"" \*(-- a character escape" 4 +.el .IP "\f(CWE<escape>\fR \*(-- a character escape" 4 +.IX Item "E<escape> a character escape" +See \*(L"Formatting Codes\*(R" in perlpod, and several points in +\&\*(L"Notes on Implementing Pod Processors\*(R". +.ie n .IP """S<text>"" \*(-- text contains non-breaking spaces" 4 +.el .IP "\f(CWS<text>\fR \*(-- text contains non-breaking spaces" 4 +.IX Item "S<text> text contains non-breaking spaces" +This formatting code is syntactically simple, but semantically +complex. What it means is that each space in the printable +content of this code signifies a non-breaking space. +.Sp +Consider: +.Sp +.Vb 1 +\& C<$x ? $y : $z> +\& +\& S<C<$x ? $y : $z>> +.Ve +.Sp +Both signify the monospace (c[ode] style) text consisting of +\&\*(L"$x\*(R", one space, \*(L"?\*(R", one space, \*(L":\*(R", one space, \*(L"$z\*(R". The +difference is that in the latter, with the S code, those spaces +are not \*(L"normal\*(R" spaces, but instead are non-breaking spaces. +.PP +If a Pod processor sees any formatting code other than the ones +listed above (as in \*(L"N<...>\*(R", or \*(L"Q<...>\*(R", etc.), that +processor must by default treat this as an error. +A Pod parser may allow a way for particular +applications to add to the above list of known formatting codes; +a Pod parser might even allow a way to stipulate, for each additional +command, whether it requires some form of special processing, as +L<...> does. +.PP +Future versions of this specification may add additional +formatting codes. +.PP +Historical note: A few older Pod processors would not see a \*(L">\*(R" as +closing a \*(L"C<\*(R" code, if the \*(L">\*(R" was immediately preceded by +a \*(L"\-\*(R". This was so that this: +.PP +.Vb 1 +\& C<$foo\->bar> +.Ve +.PP +would parse as equivalent to this: +.PP +.Vb 1 +\& C<$foo\-E<gt>bar> +.Ve +.PP +instead of as equivalent to a \*(L"C\*(R" formatting code containing +only \*(L"$foo\-\*(R", and then a \*(L"bar>\*(R" outside the \*(L"C\*(R" formatting code. This +problem has since been solved by the addition of syntaxes like this: +.PP +.Vb 1 +\& C<< $foo\->bar >> +.Ve +.PP +Compliant parsers must not treat \*(L"\->\*(R" as special. +.PP +Formatting codes absolutely cannot span paragraphs. If a code is +opened in one paragraph, and no closing code is found by the end of +that paragraph, the Pod parser must close that formatting code, +and should complain (as in \*(L"Unterminated I code in the paragraph +starting at line 123: 'Time objects are not...'\*(R"). So these +two paragraphs: +.PP +.Vb 1 +\& I<I told you not to do this! +\& +\& Don\*(Aqt make me say it again!> +.Ve +.PP +\&...must \fInot\fR be parsed as two paragraphs in italics (with the I +code starting in one paragraph and starting in another.) Instead, +the first paragraph should generate a warning, but that aside, the +above code must parse as if it were: +.PP +.Vb 1 +\& I<I told you not to do this!> +\& +\& Don\*(Aqt make me say it again!E<gt> +.Ve +.PP +(In SGMLish jargon, all Pod commands are like block-level +elements, whereas all Pod formatting codes are like inline-level +elements.) +.SH "Notes on Implementing Pod Processors" +.IX Header "Notes on Implementing Pod Processors" +The following is a long section of miscellaneous requirements +and suggestions to do with Pod processing. +.IP "\(bu" 4 +Pod formatters should tolerate lines in verbatim blocks that are of +any length, even if that means having to break them (possibly several +times, for very long lines) to avoid text running off the side of the +page. Pod formatters may warn of such line-breaking. Such warnings +are particularly appropriate for lines are over 100 characters long, which +are usually not intentional. +.IP "\(bu" 4 +Pod parsers must recognize \fIall\fR of the three well-known newline +formats: \s-1CR, LF,\s0 and \s-1CRLF.\s0 See perlport. +.IP "\(bu" 4 +Pod parsers should accept input lines that are of any length. +.IP "\(bu" 4 +Since Perl recognizes a Unicode Byte Order Mark at the start of files +as signaling that the file is Unicode encoded as in \s-1UTF\-16\s0 (whether +big-endian or little-endian) or \s-1UTF\-8,\s0 Pod parsers should do the +same. Otherwise, the character encoding should be understood as +being \s-1UTF\-8\s0 if the first highbit byte sequence in the file seems +valid as a \s-1UTF\-8\s0 sequence, or otherwise as \s-1CP\-1252\s0 (earlier versions of +this specification used Latin\-1 instead of \s-1CP\-1252\s0). +.Sp +Future versions of this specification may specify +how Pod can accept other encodings. Presumably treatment of other +encodings in Pod parsing would be as in \s-1XML\s0 parsing: whatever the +encoding declared by a particular Pod file, content is to be +stored in memory as Unicode characters. +.IP "\(bu" 4 +The well known Unicode Byte Order Marks are as follows: if the +file begins with the two literal byte values 0xFE 0xFF, this is +the \s-1BOM\s0 for big-endian \s-1UTF\-16.\s0 If the file begins with the two +literal byte value 0xFF 0xFE, this is the \s-1BOM\s0 for little-endian +\&\s-1UTF\-16.\s0 On an \s-1ASCII\s0 platform, if the file begins with the three literal +byte values +0xEF 0xBB 0xBF, this is the \s-1BOM\s0 for \s-1UTF\-8. +A\s0 mechanism portable to \s-1EBCDIC\s0 platforms is to: +.Sp +.Vb 2 +\& my $utf8_bom = "\ex{FEFF}"; +\& utf8::encode($utf8_bom); +.Ve +.IP "\(bu" 4 +A naive, but often sufficient heuristic on \s-1ASCII\s0 platforms, for testing +the first highbit +byte-sequence in a BOM-less file (whether in code or in Pod!), to see +whether that sequence is valid as \s-1UTF\-8\s0 (\s-1RFC 2279\s0) is to check whether +that the first byte in the sequence is in the range 0xC2 \- 0xFD +\&\fIand\fR whether the next byte is in the range +0x80 \- 0xBF. If so, the parser may conclude that this file is in +\&\s-1UTF\-8,\s0 and all highbit sequences in the file should be assumed to +be \s-1UTF\-8.\s0 Otherwise the parser should treat the file as being +in \s-1CP\-1252.\s0 (A better check, and which works on \s-1EBCDIC\s0 platforms as +well, is to pass a copy of the sequence to +\&\fButf8::decode()\fR which performs a full validity check on the +sequence and returns \s-1TRUE\s0 if it is valid \s-1UTF\-8, FALSE\s0 otherwise. This +function is always pre-loaded, is fast because it is written in C, and +will only get called at most once, so you don't need to avoid it out of +performance concerns.) +In the unlikely circumstance that the first highbit +sequence in a truly non\-UTF\-8 file happens to appear to be \s-1UTF\-8,\s0 one +can cater to our heuristic (as well as any more intelligent heuristic) +by prefacing that line with a comment line containing a highbit +sequence that is clearly \fInot\fR valid as \s-1UTF\-8.\s0 A line consisting +of simply \*(L"#\*(R", an e\-acute, and any non-highbit byte, +is sufficient to establish this file's encoding. +.IP "\(bu" 4 +Pod processors must treat a \*(L"=for [label] [content...]\*(R" paragraph as +meaning the same thing as a \*(L"=begin [label]\*(R" paragraph, content, and +an \*(L"=end [label]\*(R" paragraph. (The parser may conflate these two +constructs, or may leave them distinct, in the expectation that the +formatter will nevertheless treat them the same.) +.IP "\(bu" 4 +When rendering Pod to a format that allows comments (i.e., to nearly +any format other than plaintext), a Pod formatter must insert comment +text identifying its name and version number, and the name and +version numbers of any modules it might be using to process the Pod. +Minimal examples: +.Sp +.Vb 1 +\& %% POD::Pod2PS v3.14159, using POD::Parser v1.92 +\& +\& <!\-\- Pod::HTML v3.14159, using POD::Parser v1.92 \-\-> +\& +\& {\edoccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08} +\& +\& .\e" Pod::Man version 3.14159, using POD::Parser version 1.92 +.Ve +.Sp +Formatters may also insert additional comments, including: the +release date of the Pod formatter program, the contact address for +the author(s) of the formatter, the current time, the name of input +file, the formatting options in effect, version of Perl used, etc. +.Sp +Formatters may also choose to note errors/warnings as comments, +besides or instead of emitting them otherwise (as in messages to +\&\s-1STDERR,\s0 or \f(CW\*(C`die\*(C'\fRing). +.IP "\(bu" 4 +Pod parsers \fImay\fR emit warnings or error messages (\*(L"Unknown E code +E<zslig>!\*(R") to \s-1STDERR\s0 (whether through printing to \s-1STDERR,\s0 or +\&\f(CW\*(C`warn\*(C'\fRing/\f(CW\*(C`carp\*(C'\fRing, or \f(CW\*(C`die\*(C'\fRing/\f(CW\*(C`croak\*(C'\fRing), but \fImust\fR allow +suppressing all such \s-1STDERR\s0 output, and instead allow an option for +reporting errors/warnings +in some other way, whether by triggering a callback, or noting errors +in some attribute of the document object, or some similarly unobtrusive +mechanism \*(-- or even by appending a \*(L"Pod Errors\*(R" section to the end of +the parsed form of the document. +.IP "\(bu" 4 +In cases of exceptionally aberrant documents, Pod parsers may abort the +parse. Even then, using \f(CW\*(C`die\*(C'\fRing/\f(CW\*(C`croak\*(C'\fRing is to be avoided; where +possible, the parser library may simply close the input file +and add text like \*(L"*** Formatting Aborted ***\*(R" to the end of the +(partial) in-memory document. +.IP "\(bu" 4 +In paragraphs where formatting codes (like E<...>, B<...>) +are understood (i.e., \fInot\fR verbatim paragraphs, but \fIincluding\fR +ordinary paragraphs, and command paragraphs that produce renderable +text, like \*(L"=head1\*(R"), literal whitespace should generally be considered +\&\*(L"insignificant\*(R", in that one literal space has the same meaning as any +(nonzero) number of literal spaces, literal newlines, and literal tabs +(as long as this produces no blank lines, since those would terminate +the paragraph). Pod parsers should compact literal whitespace in each +processed paragraph, but may provide an option for overriding this +(since some processing tasks do not require it), or may follow +additional special rules (for example, specially treating +period-space-space or period-newline sequences). +.IP "\(bu" 4 +Pod parsers should not, by default, try to coerce apostrophe (') and +quote (\*(L") into smart quotes (little 9's, 66's, 99's, etc), nor try to +turn backtick (`) into anything else but a single backtick character +(distinct from an open quote character!), nor \*(R"\-\-" into anything but +two minus signs. They \fImust never\fR do any of those things to text +in C<...> formatting codes, and never \fIever\fR to text in verbatim +paragraphs. +.IP "\(bu" 4 +When rendering Pod to a format that has two kinds of hyphens (\-), one +that's a non-breaking hyphen, and another that's a breakable hyphen +(as in \*(L"object-oriented\*(R", which can be split across lines as +\&\*(L"object\-\*(R", newline, \*(L"oriented\*(R"), formatters are encouraged to +generally translate \*(L"\-\*(R" to non-breaking hyphen, but may apply +heuristics to convert some of these to breaking hyphens. +.IP "\(bu" 4 +Pod formatters should make reasonable efforts to keep words of Perl +code from being broken across lines. For example, \*(L"Foo::Bar\*(R" in some +formatting systems is seen as eligible for being broken across lines +as \*(L"Foo::\*(R" newline \*(L"Bar\*(R" or even \*(L"Foo::\-\*(R" newline \*(L"Bar\*(R". This should +be avoided where possible, either by disabling all line-breaking in +mid-word, or by wrapping particular words with internal punctuation +in \*(L"don't break this across lines\*(R" codes (which in some formats may +not be a single code, but might be a matter of inserting non-breaking +zero-width spaces between every pair of characters in a word.) +.IP "\(bu" 4 +Pod parsers should, by default, expand tabs in verbatim paragraphs as +they are processed, before passing them to the formatter or other +processor. Parsers may also allow an option for overriding this. +.IP "\(bu" 4 +Pod parsers should, by default, remove newlines from the end of +ordinary and verbatim paragraphs before passing them to the +formatter. For example, while the paragraph you're reading now +could be considered, in Pod source, to end with (and contain) +the newline(s) that end it, it should be processed as ending with +(and containing) the period character that ends this sentence. +.IP "\(bu" 4 +Pod parsers, when reporting errors, should make some effort to report +an approximate line number (\*(L"Nested E<>'s in Paragraph #52, near +line 633 of Thing/Foo.pm!\*(R"), instead of merely noting the paragraph +number (\*(L"Nested E<>'s in Paragraph #52 of Thing/Foo.pm!\*(R"). Where +this is problematic, the paragraph number should at least be +accompanied by an excerpt from the paragraph (\*(L"Nested E<>'s in +Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for +the C<interest rate> attribute...'\*(R"). +.IP "\(bu" 4 +Pod parsers, when processing a series of verbatim paragraphs one +after another, should consider them to be one large verbatim +paragraph that happens to contain blank lines. I.e., these two +lines, which have a blank line between them: +.Sp +.Vb 1 +\& use Foo; +\& +\& print Foo\->VERSION +.Ve +.Sp +should be unified into one paragraph (\*(L"\etuse Foo;\en\en\etprint +Foo\->\s-1VERSION\*(R"\s0) before being passed to the formatter or other +processor. Parsers may also allow an option for overriding this. +.Sp +While this might be too cumbersome to implement in event-based Pod +parsers, it is straightforward for parsers that return parse trees. +.IP "\(bu" 4 +Pod formatters, where feasible, are advised to avoid splitting short +verbatim paragraphs (under twelve lines, say) across pages. +.IP "\(bu" 4 +Pod parsers must treat a line with only spaces and/or tabs on it as a +\&\*(L"blank line\*(R" such as separates paragraphs. (Some older parsers +recognized only two adjacent newlines as a \*(L"blank line\*(R" but would not +recognize a newline, a space, and a newline, as a blank line. This +is noncompliant behavior.) +.IP "\(bu" 4 +Authors of Pod formatters/processors should make every effort to +avoid writing their own Pod parser. There are already several in +\&\s-1CPAN,\s0 with a wide range of interface styles \*(-- and one of them, +Pod::Simple, comes with modern versions of Perl. +.IP "\(bu" 4 +Characters in Pod documents may be conveyed either as literals, or by +number in E<n> codes, or by an equivalent mnemonic, as in +E<eacute> which is exactly equivalent to E<233>. The numbers +are the Latin1/Unicode values, even on \s-1EBCDIC\s0 platforms. +.Sp +When referring to characters by using a E<n> numeric code, numbers +in the range 32\-126 refer to those well known US-ASCII characters (also +defined there by Unicode, with the same meaning), which all Pod +formatters must render faithfully. Characters whose E<> numbers +are in the ranges 0\-31 and 127\-159 should not be used (neither as +literals, +nor as E<number> codes), except for the literal byte-sequences for +newline (\s-1ASCII 13, ASCII 13 10,\s0 or \s-1ASCII 10\s0), and tab (\s-1ASCII 9\s0). +.Sp +Numbers in the range 160\-255 refer to Latin\-1 characters (also +defined there by Unicode, with the same meaning). Numbers above +255 should be understood to refer to Unicode characters. +.IP "\(bu" 4 +Be warned +that some formatters cannot reliably render characters outside 32\-126; +and many are able to handle 32\-126 and 160\-255, but nothing above +255. +.IP "\(bu" 4 +Besides the well-known \*(L"E<lt>\*(R" and \*(L"E<gt>\*(R" codes for +less-than and greater-than, Pod parsers must understand \*(L"E<sol>\*(R" +for \*(L"/\*(R" (solidus, slash), and \*(L"E<verbar>\*(R" for \*(L"|\*(R" (vertical bar, +pipe). Pod parsers should also understand \*(L"E<lchevron>\*(R" and +\&\*(L"E<rchevron>\*(R" as legacy codes for characters 171 and 187, i.e., +\&\*(L"left-pointing double angle quotation mark\*(R" = \*(L"left pointing +guillemet\*(R" and \*(L"right-pointing double angle quotation mark\*(R" = \*(L"right +pointing guillemet\*(R". (These look like little \*(L"<<\*(R" and \*(L">>\*(R", and they +are now preferably expressed with the \s-1HTML/XHTML\s0 codes \*(L"E<laquo>\*(R" +and \*(L"E<raquo>\*(R".) +.IP "\(bu" 4 +Pod parsers should understand all \*(L"E<html>\*(R" codes as defined +in the entity declarations in the most recent \s-1XHTML\s0 specification at +\&\f(CW\*(C`www.W3.org\*(C'\fR. Pod parsers must understand at least the entities +that define characters in the range 160\-255 (Latin\-1). Pod parsers, +when faced with some unknown "E<\fIidentifier\fR>" code, +shouldn't simply replace it with nullstring (by default, at least), +but may pass it through as a string consisting of the literal characters +E, less-than, \fIidentifier\fR, greater-than. Or Pod parsers may offer the +alternative option of processing such unknown +"E<\fIidentifier\fR>\*(L" codes by firing an event especially +for such codes, or by adding a special node-type to the in-memory +document tree. Such \*(R"E<\fIidentifier\fR>" may have special meaning +to some processors, or some processors may choose to add them to +a special error report. +.IP "\(bu" 4 +Pod parsers must also support the \s-1XHTML\s0 codes \*(L"E<quot>\*(R" for +character 34 (doublequote, \*(L"), \*(R"E<amp>\*(L" for character 38 +(ampersand, &), and \*(R"E<apos>" for character 39 (apostrophe, '). +.IP "\(bu" 4 +Note that in all cases of \*(L"E<whatever>\*(R", \fIwhatever\fR (whether +an htmlname, or a number in any base) must consist only of +alphanumeric characters \*(-- that is, \fIwhatever\fR must match +\&\f(CW\*(C`m/\eA\ew+\ez/\*(C'\fR. So \*(L"E< 0 1 2 3 >\*(R" is invalid, because +it contains spaces, which aren't alphanumeric characters. This +presumably does not \fIneed\fR special treatment by a Pod processor; +\&\*(L" 0 1 2 3 \*(R" doesn't look like a number in any base, so it would +presumably be looked up in the table of HTML-like names. Since +there isn't (and cannot be) an HTML-like entity called \*(L" 0 1 2 3 \*(R", +this will be treated as an error. However, Pod processors may +treat \*(L"E< 0 1 2 3 >\*(R" or \*(L"E<e\-acute>\*(R" as \fIsyntactically\fR +invalid, potentially earning a different error message than the +error message (or warning, or event) generated by a merely unknown +(but theoretically valid) htmlname, as in \*(L"E<qacute>\*(R" +[sic]. However, Pod parsers are not required to make this +distinction. +.IP "\(bu" 4 +Note that E<number> \fImust not\fR be interpreted as simply +"codepoint \fInumber\fR in the current/native character set\*(L". It always +means only \*(R"the character represented by codepoint \fInumber\fR in +Unicode." (This is identical to the semantics of &#\fInumber\fR; in \s-1XML.\s0) +.Sp +This will likely require many formatters to have tables mapping from +treatable Unicode codepoints (such as the \*(L"\exE9\*(R" for the e\-acute +character) to the escape sequences or codes necessary for conveying +such sequences in the target output format. A converter to *roff +would, for example know that \*(L"\exE9\*(R" (whether conveyed literally, or via +a E<...> sequence) is to be conveyed as \*(L"e\e\e*'\*(R". +Similarly, a program rendering Pod in a Mac \s-1OS\s0 application window, would +presumably need to know that \*(L"\exE9\*(R" maps to codepoint 142 in MacRoman +encoding that (at time of writing) is native for Mac \s-1OS.\s0 Such +Unicode2whatever mappings are presumably already widely available for +common output formats. (Such mappings may be incomplete! Implementers +are not expected to bend over backwards in an attempt to render +Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any +of the other weird things that Unicode can encode.) And +if a Pod document uses a character not found in such a mapping, the +formatter should consider it an unrenderable character. +.IP "\(bu" 4 +If, surprisingly, the implementor of a Pod formatter can't find a +satisfactory pre-existing table mapping from Unicode characters to +escapes in the target format (e.g., a decent table of Unicode +characters to *roff escapes), it will be necessary to build such a +table. If you are in this circumstance, you should begin with the +characters in the range 0x00A0 \- 0x00FF, which is mostly the heavily +used accented characters. Then proceed (as patience permits and +fastidiousness compels) through the characters that the (X)HTML +standards groups judged important enough to merit mnemonics +for. These are declared in the (X)HTML specifications at the +www.W3.org site. At time of writing (September 2001), the most recent +entity declaration files are: +.Sp +.Vb 3 +\& http://www.w3.org/TR/xhtml1/DTD/xhtml\-lat1.ent +\& http://www.w3.org/TR/xhtml1/DTD/xhtml\-special.ent +\& http://www.w3.org/TR/xhtml1/DTD/xhtml\-symbol.ent +.Ve +.Sp +Then you can progress through any remaining notable Unicode characters +in the range 0x2000\-0x204D (consult the character tables at +www.unicode.org), and whatever else strikes your fancy. For example, +in \fIxhtml\-symbol.ent\fR, there is the entry: +.Sp +.Vb 1 +\& <!ENTITY infin "∞"> <!\-\- infinity, U+221E ISOtech \-\-> +.Ve +.Sp +While the mapping \*(L"infin\*(R" to the character \*(L"\ex{221E}\*(R" will (hopefully) +have been already handled by the Pod parser, the presence of the +character in this file means that it's reasonably important enough to +include in a formatter's table that maps from notable Unicode characters +to the codes necessary for rendering them. So for a Unicode\-to\-*roff +mapping, for example, this would merit the entry: +.Sp +.Vb 1 +\& "\ex{221E}" => \*(Aq\e(in\*(Aq, +.Ve +.Sp +It is eagerly hoped that in the future, increasing numbers of formats +(and formatters) will support Unicode characters directly (as (X)HTML +does with \f(CW\*(C`∞\*(C'\fR, \f(CW\*(C`∞\*(C'\fR, or \f(CW\*(C`∞\*(C'\fR), reducing the need +for idiosyncratic mappings of Unicode\-to\-\fImy_escapes\fR. +.IP "\(bu" 4 +It is up to individual Pod formatter to display good judgement when +confronted with an unrenderable character (which is distinct from an +unknown E<thing> sequence that the parser couldn't resolve to +anything, renderable or not). It is good practice to map Latin letters +with diacritics (like \*(L"E<eacute>\*(R"/\*(L"E<233>\*(R") to the corresponding +unaccented US-ASCII letters (like a simple character 101, \*(L"e\*(R"), but +clearly this is often not feasible, and an unrenderable character may +be represented as \*(L"?\*(R", or the like. In attempting a sane fallback +(as from E<233> to \*(L"e\*(R"), Pod formatters may use the +\&\f(CW%Latin1Code_to_fallback\fR table in Pod::Escapes, or +Text::Unidecode, if available. +.Sp +For example, this Pod text: +.Sp +.Vb 1 +\& magic is enabled if you set C<$Currency> to \*(AqE<euro>\*(Aq. +.Ve +.Sp +may be rendered as: +"magic is enabled if you set \f(CW$Currency\fR to '\fI?\fR'\*(L" or as +\&\*(R"magic is enabled if you set \f(CW$Currency\fR to '\fB[euro]\fR'\*(L", or as +\&\*(R"magic is enabled if you set \f(CW$Currency\fR to '[x20AC]', etc. +.Sp +A Pod formatter may also note, in a comment or warning, a list of what +unrenderable characters were encountered. +.IP "\(bu" 4 +E<...> may freely appear in any formatting code (other than +in another E<...> or in an Z<>). That is, \*(L"X<The +E<euro>1,000,000 Solution>\*(R" is valid, as is \*(L"L<The +E<euro>1,000,000 Solution|Million::Euros>\*(R". +.IP "\(bu" 4 +Some Pod formatters output to formats that implement non-breaking +spaces as an individual character (which I'll call \*(L"\s-1NBSP\*(R"\s0), and +others output to formats that implement non-breaking spaces just as +spaces wrapped in a \*(L"don't break this across lines\*(R" code. Note that +at the level of Pod, both sorts of codes can occur: Pod can contain a +\&\s-1NBSP\s0 character (whether as a literal, or as a \*(L"E<160>\*(R" or +\&\*(L"E<nbsp>\*(R" code); and Pod can contain \*(L"S<foo +I<bar> baz>\*(R" codes, where \*(L"mere spaces\*(R" (character 32) in +such codes are taken to represent non-breaking spaces. Pod +parsers should consider supporting the optional parsing of \*(L"S<foo +I<bar> baz>\*(R" as if it were +"foo\fI\s-1NBSP\s0\fRI<bar>\fI\s-1NBSP\s0\fRbaz", and, going the other way, the +optional parsing of groups of words joined by \s-1NBSP\s0's as if each group +were in a S<...> code, so that formatters may use the +representation that maps best to what the output format demands. +.IP "\(bu" 4 +Some processors may find that the \f(CW\*(C`S<...>\*(C'\fR code is easiest to +implement by replacing each space in the parse tree under the content +of the S, with an \s-1NBSP.\s0 But note: the replacement should apply \fInot\fR to +spaces in \fIall\fR text, but \fIonly\fR to spaces in \fIprintable\fR text. (This +distinction may or may not be evident in the particular tree/event +model implemented by the Pod parser.) For example, consider this +unusual case: +.Sp +.Vb 1 +\& S<L</Autoloaded Functions>> +.Ve +.Sp +This means that the space in the middle of the visible link text must +not be broken across lines. In other words, it's the same as this: +.Sp +.Vb 1 +\& L<"AutoloadedE<160>Functions"/Autoloaded Functions> +.Ve +.Sp +However, a misapplied space-to-NBSP replacement could (wrongly) +produce something equivalent to this: +.Sp +.Vb 1 +\& L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions> +.Ve +.Sp +\&...which is almost definitely not going to work as a hyperlink (assuming +this formatter outputs a format supporting hypertext). +.Sp +Formatters may choose to just not support the S format code, +especially in cases where the output format simply has no \s-1NBSP\s0 +character/code and no code for \*(L"don't break this stuff across lines\*(R". +.IP "\(bu" 4 +Besides the \s-1NBSP\s0 character discussed above, implementors are reminded +of the existence of the other \*(L"special\*(R" character in Latin\-1, the +\&\*(L"soft hyphen\*(R" character, also known as \*(L"discretionary hyphen\*(R", +i.e. \f(CW\*(C`E<173>\*(C'\fR = \f(CW\*(C`E<0xAD>\*(C'\fR = +\&\f(CW\*(C`E<shy>\*(C'\fR). This character expresses an optional hyphenation +point. That is, it normally renders as nothing, but may render as a +\&\*(L"\-\*(R" if a formatter breaks the word at that point. Pod formatters +should, as appropriate, do one of the following: 1) render this with +a code with the same meaning (e.g., \*(L"\e\-\*(R" in \s-1RTF\s0), 2) pass it through +in the expectation that the formatter understands this character as +such, or 3) delete it. +.Sp +For example: +.Sp +.Vb 3 +\& sigE<shy>action +\& manuE<shy>script +\& JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi +.Ve +.Sp +These signal to a formatter that if it is to hyphenate \*(L"sigaction\*(R" +or \*(L"manuscript\*(R", then it should be done as +"sig\-\fI[linebreak]\fRaction\*(L" or \*(R"manu\-\fI[linebreak]\fRscript" +(and if it doesn't hyphenate it, then the \f(CW\*(C`E<shy>\*(C'\fR doesn't +show up at all). And if it is +to hyphenate \*(L"Jarkko\*(R" and/or \*(L"Hietaniemi\*(R", it can do +so only at the points where there is a \f(CW\*(C`E<shy>\*(C'\fR code. +.Sp +In practice, it is anticipated that this character will not be used +often, but formatters should either support it, or delete it. +.IP "\(bu" 4 +If you think that you want to add a new command to Pod (like, say, a +\&\*(L"=biblio\*(R" command), consider whether you could get the same +effect with a for or begin/end sequence: \*(L"=for biblio ...\*(R" or \*(L"=begin +biblio\*(R" ... \*(L"=end biblio\*(R". Pod processors that don't understand +\&\*(L"=for biblio\*(R", etc, will simply ignore it, whereas they may complain +loudly if they see \*(L"=biblio\*(R". +.IP "\(bu" 4 +Throughout this document, \*(L"Pod\*(R" has been the preferred spelling for +the name of the documentation format. One may also use \*(L"\s-1POD\*(R"\s0 or +\&\*(L"pod\*(R". For the documentation that is (typically) in the Pod +format, you may use \*(L"pod\*(R", or \*(L"Pod\*(R", or \*(L"\s-1POD\*(R".\s0 Understanding these +distinctions is useful; but obsessing over how to spell them, usually +is not. +.SH "About L<...> Codes" +.IX Header "About L<...> Codes" +As you can tell from a glance at perlpod, the L<...> +code is the most complex of the Pod formatting codes. The points below +will hopefully clarify what it means and how processors should deal +with it. +.IP "\(bu" 4 +In parsing an L<...> code, Pod parsers must distinguish at least +four attributes: +.RS 4 +.IP "First:" 4 +.IX Item "First:" +The link-text. If there is none, this must be \f(CW\*(C`undef\*(C'\fR. (E.g., in +\&\*(L"L<Perl Functions|perlfunc>\*(R", the link-text is \*(L"Perl Functions\*(R". +In \*(L"L<Time::HiRes>\*(R" and even \*(L"L<|Time::HiRes>\*(R", there is no +link text. Note that link text may contain formatting.) +.IP "Second:" 4 +.IX Item "Second:" +The possibly inferred link-text; i.e., if there was no real link +text, then this is the text that we'll infer in its place. (E.g., for +\&\*(L"L<Getopt::Std>\*(R", the inferred link text is \*(L"Getopt::Std\*(R".) +.IP "Third:" 4 +.IX Item "Third:" +The name or \s-1URL,\s0 or \f(CW\*(C`undef\*(C'\fR if none. (E.g., in \*(L"L<Perl +Functions|perlfunc>\*(R", the name (also sometimes called the page) +is \*(L"perlfunc\*(R". In \*(L"L</CAVEATS>\*(R", the name is \f(CW\*(C`undef\*(C'\fR.) +.IP "Fourth:" 4 +.IX Item "Fourth:" +The section (\s-1AKA\s0 \*(L"item\*(R" in older perlpods), or \f(CW\*(C`undef\*(C'\fR if none. E.g., +in \*(L"L<Getopt::Std/DESCRIPTION>\*(R", \*(L"\s-1DESCRIPTION\*(R"\s0 is the section. (Note +that this is not the same as a manpage section like the \*(L"5\*(R" in \*(L"man 5 +crontab\*(R". \*(L"Section Foo\*(R" in the Pod sense means the part of the text +that's introduced by the heading or item whose text is \*(L"Foo\*(R".) +.RE +.RS 4 +.Sp +Pod parsers may also note additional attributes including: +.IP "Fifth:" 4 +.IX Item "Fifth:" +A flag for whether item 3 (if present) is a \s-1URL\s0 (like +\&\*(L"http://lists.perl.org\*(R" is), in which case there should be no section +attribute; a Pod name (like \*(L"perldoc\*(R" and \*(L"Getopt::Std\*(R" are); or +possibly a man page name (like \*(L"\fBcrontab\fR\|(5)\*(R" is). +.IP "Sixth:" 4 +.IX Item "Sixth:" +The raw original L<...> content, before text is split on +\&\*(L"|\*(R", \*(L"/\*(R", etc, and before E<...> codes are expanded. +.RE +.RS 4 +.Sp +(The above were numbered only for concise reference below. It is not +a requirement that these be passed as an actual list or array.) +.Sp +For example: +.Sp +.Vb 7 +\& L<Foo::Bar> +\& => undef, # link text +\& "Foo::Bar", # possibly inferred link text +\& "Foo::Bar", # name +\& undef, # section +\& \*(Aqpod\*(Aq, # what sort of link +\& "Foo::Bar" # original content +\& +\& L<Perlport\*(Aqs section on NL\*(Aqs|perlport/Newlines> +\& => "Perlport\*(Aqs section on NL\*(Aqs", # link text +\& "Perlport\*(Aqs section on NL\*(Aqs", # possibly inferred link text +\& "perlport", # name +\& "Newlines", # section +\& \*(Aqpod\*(Aq, # what sort of link +\& "Perlport\*(Aqs section on NL\*(Aqs|perlport/Newlines" +\& # original content +\& +\& L<perlport/Newlines> +\& => undef, # link text +\& \*(Aq"Newlines" in perlport\*(Aq, # possibly inferred link text +\& "perlport", # name +\& "Newlines", # section +\& \*(Aqpod\*(Aq, # what sort of link +\& "perlport/Newlines" # original content +\& +\& L<crontab(5)/"DESCRIPTION"> +\& => undef, # link text +\& \*(Aq"DESCRIPTION" in crontab(5)\*(Aq, # possibly inferred link text +\& "crontab(5)", # name +\& "DESCRIPTION", # section +\& \*(Aqman\*(Aq, # what sort of link +\& \*(Aqcrontab(5)/"DESCRIPTION"\*(Aq # original content +\& +\& L</Object Attributes> +\& => undef, # link text +\& \*(Aq"Object Attributes"\*(Aq, # possibly inferred link text +\& undef, # name +\& "Object Attributes", # section +\& \*(Aqpod\*(Aq, # what sort of link +\& "/Object Attributes" # original content +\& +\& L<https://www.perl.org/> +\& => undef, # link text +\& "https://www.perl.org/", # possibly inferred link text +\& "https://www.perl.org/", # name +\& undef, # section +\& \*(Aqurl\*(Aq, # what sort of link +\& "https://www.perl.org/" # original content +\& +\& L<Perl.org|https://www.perl.org/> +\& => "Perl.org", # link text +\& "https://www.perl.org/", # possibly inferred link text +\& "https://www.perl.org/", # name +\& undef, # section +\& \*(Aqurl\*(Aq, # what sort of link +\& "Perl.org|https://www.perl.org/" # original content +.Ve +.Sp +Note that you can distinguish URL-links from anything else by the +fact that they match \f(CW\*(C`m/\eA\ew+:[^:\es]\eS*\ez/\*(C'\fR. So +\&\f(CW\*(C`L<http://www.perl.com>\*(C'\fR is a \s-1URL,\s0 but +\&\f(CW\*(C`L<HTTP::Response>\*(C'\fR isn't. +.RE +.IP "\(bu" 4 +In case of L<...> codes with no \*(L"text|\*(R" part in them, +older formatters have exhibited great variation in actually displaying +the link or cross reference. For example, L<\fBcrontab\fR\|(5)> would render +as "the \f(CWcrontab(5)\fR manpage\*(L", or \*(R"in the \f(CWcrontab(5)\fR manpage\*(L" +or just \*(R"\f(CWcrontab(5)\fR". +.Sp +Pod processors must now treat \*(L"text|\*(R"\-less links as follows: +.Sp +.Vb 3 +\& L<name> => L<name|name> +\& L</section> => L<"section"|/section> +\& L<name/section> => L<"section" in name|name/section> +.Ve +.IP "\(bu" 4 +Note that section names might contain markup. I.e., if a section +starts with: +.Sp +.Vb 1 +\& =head2 About the C<\-M> Operator +.Ve +.Sp +or with: +.Sp +.Vb 1 +\& =item About the C<\-M> Operator +.Ve +.Sp +then a link to it would look like this: +.Sp +.Vb 1 +\& L<somedoc/About the C<\-M> Operator> +.Ve +.Sp +Formatters may choose to ignore the markup for purposes of resolving +the link and use only the renderable characters in the section name, +as in: +.Sp +.Vb 2 +\& <h1><a name="About_the_\-M_Operator">About the <code>\-M</code> +\& Operator</h1> +\& +\& ... +\& +\& <a href="somedoc#About_the_\-M_Operator">About the <code>\-M</code> +\& Operator" in somedoc</a> +.Ve +.IP "\(bu" 4 +Previous versions of perlpod distinguished \f(CW\*(C`L<name/"section">\*(C'\fR +links from \f(CW\*(C`L<name/item>\*(C'\fR links (and their targets). These +have been merged syntactically and semantically in the current +specification, and \fIsection\fR can refer either to a "=head\fIn\fR Heading +Content\*(L" command or to a \*(R"=item Item Content" command. This +specification does not specify what behavior should be in the case +of a given document having several things all seeming to produce the +same \fIsection\fR identifier (e.g., in \s-1HTML,\s0 several things all producing +the same \fIanchorname\fR in <a name="\fIanchorname\fR">...</a> +elements). Where Pod processors can control this behavior, they should +use the first such anchor. That is, \f(CW\*(C`L<Foo/Bar>\*(C'\fR refers to the +\&\fIfirst\fR \*(L"Bar\*(R" section in Foo. +.Sp +But for some processors/formats this cannot be easily controlled; as +with the \s-1HTML\s0 example, the behavior of multiple ambiguous +<a name="\fIanchorname\fR">...</a> is most easily just left up to +browsers to decide. +.IP "\(bu" 4 +In a \f(CW\*(C`L<text|...>\*(C'\fR code, text may contain formatting codes +for formatting or for E<...> escapes, as in: +.Sp +.Vb 1 +\& L<B<ummE<234>stuff>|...> +.Ve +.Sp +For \f(CW\*(C`L<...>\*(C'\fR codes without a \*(L"name|\*(R" part, only +\&\f(CW\*(C`E<...>\*(C'\fR and \f(CW\*(C`Z<>\*(C'\fR codes may occur. That is, +authors should not use "\f(CW\*(C`L<B<Foo::Bar>>\*(C'\fR". +.Sp +Note, however, that formatting codes and Z<>'s can occur in any +and all parts of an L<...> (i.e., in \fIname\fR, \fIsection\fR, \fItext\fR, +and \fIurl\fR). +.Sp +Authors must not nest L<...> codes. For example, \*(L"L<The +L<Foo::Bar> man page>\*(R" should be treated as an error. +.IP "\(bu" 4 +Note that Pod authors may use formatting codes inside the \*(L"text\*(R" +part of \*(L"L<text|name>\*(R" (and so on for L<text|/\*(L"sec\*(R">). +.Sp +In other words, this is valid: +.Sp +.Vb 1 +\& Go read L<the docs on C<$.>|perlvar/"$."> +.Ve +.Sp +Some output formats that do allow rendering \*(L"L<...>\*(R" codes as +hypertext, might not allow the link-text to be formatted; in +that case, formatters will have to just ignore that formatting. +.IP "\(bu" 4 +At time of writing, \f(CW\*(C`L<name>\*(C'\fR values are of two types: +either the name of a Pod page like \f(CW\*(C`L<Foo::Bar>\*(C'\fR (which +might be a real Perl module or program in an \f(CW@INC\fR / \s-1PATH\s0 +directory, or a .pod file in those places); or the name of a Unix +man page, like \f(CW\*(C`L<crontab(5)>\*(C'\fR. In theory, \f(CW\*(C`L<chmod>\*(C'\fR +is ambiguous between a Pod page called \*(L"chmod\*(R", or the Unix man page +\&\*(L"chmod\*(R" (in whatever man-section). However, the presence of a string +in parens, as in \*(L"\fBcrontab\fR\|(5)\*(R", is sufficient to signal that what +is being discussed is not a Pod page, and so is presumably a +Unix man page. The distinction is of no importance to many +Pod processors, but some processors that render to hypertext formats +may need to distinguish them in order to know how to render a +given \f(CW\*(C`L<foo>\*(C'\fR code. +.IP "\(bu" 4 +Previous versions of perlpod allowed for a \f(CW\*(C`L<section>\*(C'\fR syntax (as in +\&\f(CW\*(C`L<Object Attributes>\*(C'\fR), which was not easily distinguishable from +\&\f(CW\*(C`L<name>\*(C'\fR syntax and for \f(CW\*(C`L<"section">\*(C'\fR which was only +slightly less ambiguous. This syntax is no longer in the specification, and +has been replaced by the \f(CW\*(C`L</section>\*(C'\fR syntax (where the slash was +formerly optional). Pod parsers should tolerate the \f(CW\*(C`L<"section">\*(C'\fR +syntax, for a while at least. The suggested heuristic for distinguishing +\&\f(CW\*(C`L<section>\*(C'\fR from \f(CW\*(C`L<name>\*(C'\fR is that if it contains any +whitespace, it's a \fIsection\fR. Pod processors should warn about this being +deprecated syntax. +.SH "About =over...=back Regions" +.IX Header "About =over...=back Regions" +\&\*(L"=over\*(R"...\*(L"=back\*(R" regions are used for various kinds of list-like +structures. (I use the term \*(L"region\*(R" here simply as a collective +term for everything from the \*(L"=over\*(R" to the matching \*(L"=back\*(R".) +.IP "\(bu" 4 +The non-zero numeric \fIindentlevel\fR in "=over \fIindentlevel\fR\*(L" ... +\&\*(R"=back\*(L" is used for giving the formatter a clue as to how many +\&\*(R"spaces" (ems, or roughly equivalent units) it should tab over, +although many formatters will have to convert this to an absolute +measurement that may not exactly match with the size of spaces (or M's) +in the document's base font. Other formatters may have to completely +ignore the number. The lack of any explicit \fIindentlevel\fR parameter is +equivalent to an \fIindentlevel\fR value of 4. Pod processors may +complain if \fIindentlevel\fR is present but is not a positive number +matching \f(CW\*(C`m/\eA(\ed*\e.)?\ed+\ez/\*(C'\fR. +.IP "\(bu" 4 +Authors of Pod formatters are reminded that \*(L"=over\*(R" ... \*(L"=back\*(R" may +map to several different constructs in your output format. For +example, in converting Pod to (X)HTML, it can map to any of +<ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or +<blockquote>...</blockquote>. Similarly, \*(L"=item\*(R" can map to <li> or +<dt>. +.IP "\(bu" 4 +Each \*(L"=over\*(R" ... \*(L"=back\*(R" region should be one of the following: +.RS 4 +.IP "\(bu" 4 +An \*(L"=over\*(R" ... \*(L"=back\*(R" region containing only \*(L"=item *\*(R" commands, +each followed by some number of ordinary/verbatim paragraphs, other +nested \*(L"=over\*(R" ... \*(L"=back\*(R" regions, \*(L"=for...\*(R" paragraphs, and +\&\*(L"=begin\*(R"...\*(L"=end\*(R" regions. +.Sp +(Pod processors must tolerate a bare \*(L"=item\*(R" as if it were \*(L"=item +*\*(R".) Whether \*(L"*\*(R" is rendered as a literal asterisk, an \*(L"o\*(R", or as +some kind of real bullet character, is left up to the Pod formatter, +and may depend on the level of nesting. +.IP "\(bu" 4 +An \*(L"=over\*(R" ... \*(L"=back\*(R" region containing only +\&\f(CW\*(C`m/\eA=item\es+\ed+\e.?\es*\ez/\*(C'\fR paragraphs, each one (or each group of them) +followed by some number of ordinary/verbatim paragraphs, other nested +\&\*(L"=over\*(R" ... \*(L"=back\*(R" regions, \*(L"=for...\*(R" paragraphs, and/or +\&\*(L"=begin\*(R"...\*(L"=end\*(R" codes. Note that the numbers must start at 1 +in each section, and must proceed in order and without skipping +numbers. +.Sp +(Pod processors must tolerate lines like \*(L"=item 1\*(R" as if they were +\&\*(L"=item 1.\*(R", with the period.) +.IP "\(bu" 4 +An \*(L"=over\*(R" ... \*(L"=back\*(R" region containing only \*(L"=item [text]\*(R" +commands, each one (or each group of them) followed by some number of +ordinary/verbatim paragraphs, other nested \*(L"=over\*(R" ... \*(L"=back\*(R" +regions, or \*(L"=for...\*(R" paragraphs, and \*(L"=begin\*(R"...\*(L"=end\*(R" regions. +.Sp +The \*(L"=item [text]\*(R" paragraph should not match +\&\f(CW\*(C`m/\eA=item\es+\ed+\e.?\es*\ez/\*(C'\fR or \f(CW\*(C`m/\eA=item\es+\e*\es*\ez/\*(C'\fR, nor should it +match just \f(CW\*(C`m/\eA=item\es*\ez/\*(C'\fR. +.IP "\(bu" 4 +An \*(L"=over\*(R" ... \*(L"=back\*(R" region containing no \*(L"=item\*(R" paragraphs at +all, and containing only some number of +ordinary/verbatim paragraphs, and possibly also some nested \*(L"=over\*(R" +\&... \*(L"=back\*(R" regions, \*(L"=for...\*(R" paragraphs, and \*(L"=begin\*(R"...\*(L"=end\*(R" +regions. Such an itemless \*(L"=over\*(R" ... \*(L"=back\*(R" region in Pod is +equivalent in meaning to a \*(L"<blockquote>...</blockquote>\*(R" element in +\&\s-1HTML.\s0 +.RE +.RS 4 +.Sp +Note that with all the above cases, you can determine which type of +\&\*(L"=over\*(R" ... \*(L"=back\*(R" you have, by examining the first (non\-\*(L"=cut\*(R", +non\-\*(L"=pod\*(R") Pod paragraph after the \*(L"=over\*(R" command. +.RE +.IP "\(bu" 4 +Pod formatters \fImust\fR tolerate arbitrarily large amounts of text +in the "=item \fItext...\fR" paragraph. In practice, most such +paragraphs are short, as in: +.Sp +.Vb 1 +\& =item For cutting off our trade with all parts of the world +.Ve +.Sp +But they may be arbitrarily long: +.Sp +.Vb 2 +\& =item For transporting us beyond seas to be tried for pretended +\& offenses +\& +\& =item He is at this time transporting large armies of foreign +\& mercenaries to complete the works of death, desolation and +\& tyranny, already begun with circumstances of cruelty and perfidy +\& scarcely paralleled in the most barbarous ages, and totally +\& unworthy the head of a civilized nation. +.Ve +.IP "\(bu" 4 +Pod processors should tolerate \*(L"=item *\*(R" / "=item \fInumber\fR" commands +with no accompanying paragraph. The middle item is an example: +.Sp +.Vb 1 +\& =over +\& +\& =item 1 +\& +\& Pick up dry cleaning. +\& +\& =item 2 +\& +\& =item 3 +\& +\& Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs. +\& +\& =back +.Ve +.IP "\(bu" 4 +No \*(L"=over\*(R" ... \*(L"=back\*(R" region can contain headings. Processors may +treat such a heading as an error. +.IP "\(bu" 4 +Note that an \*(L"=over\*(R" ... \*(L"=back\*(R" region should have some +content. That is, authors should not have an empty region like this: +.Sp +.Vb 1 +\& =over +\& +\& =back +.Ve +.Sp +Pod processors seeing such a contentless \*(L"=over\*(R" ... \*(L"=back\*(R" region, +may ignore it, or may report it as an error. +.IP "\(bu" 4 +Processors must tolerate an \*(L"=over\*(R" list that goes off the end of the +document (i.e., which has no matching \*(L"=back\*(R"), but they may warn +about such a list. +.IP "\(bu" 4 +Authors of Pod formatters should note that this construct: +.Sp +.Vb 1 +\& =item Neque +\& +\& =item Porro +\& +\& =item Quisquam Est +\& +\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci +\& velit, sed quia non numquam eius modi tempora incidunt ut +\& labore et dolore magnam aliquam quaerat voluptatem. +\& +\& =item Ut Enim +.Ve +.Sp +is semantically ambiguous, in a way that makes formatting decisions +a bit difficult. On the one hand, it could be mention of an item +\&\*(L"Neque\*(R", mention of another item \*(L"Porro\*(R", and mention of another +item \*(L"Quisquam Est\*(R", with just the last one requiring the explanatory +paragraph \*(L"Qui dolorem ipsum quia dolor...\*(R"; and then an item +\&\*(L"Ut Enim\*(R". In that case, you'd want to format it like so: +.Sp +.Vb 1 +\& Neque +\& +\& Porro +\& +\& Quisquam Est +\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci +\& velit, sed quia non numquam eius modi tempora incidunt ut +\& labore et dolore magnam aliquam quaerat voluptatem. +\& +\& Ut Enim +.Ve +.Sp +But it could equally well be a discussion of three (related or equivalent) +items, \*(L"Neque\*(R", \*(L"Porro\*(R", and \*(L"Quisquam Est\*(R", followed by a paragraph +explaining them all, and then a new item \*(L"Ut Enim\*(R". In that case, you'd +probably want to format it like so: +.Sp +.Vb 6 +\& Neque +\& Porro +\& Quisquam Est +\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci +\& velit, sed quia non numquam eius modi tempora incidunt ut +\& labore et dolore magnam aliquam quaerat voluptatem. +\& +\& Ut Enim +.Ve +.Sp +But (for the foreseeable future), Pod does not provide any way for Pod +authors to distinguish which grouping is meant by the above +\&\*(L"=item\*(R"\-cluster structure. So formatters should format it like so: +.Sp +.Vb 1 +\& Neque +\& +\& Porro +\& +\& Quisquam Est +\& +\& Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci +\& velit, sed quia non numquam eius modi tempora incidunt ut +\& labore et dolore magnam aliquam quaerat voluptatem. +\& +\& Ut Enim +.Ve +.Sp +That is, there should be (at least roughly) equal spacing between +items as between paragraphs (although that spacing may well be less +than the full height of a line of text). This leaves it to the reader +to use (con)textual cues to figure out whether the \*(L"Qui dolorem +ipsum...\*(R" paragraph applies to the \*(L"Quisquam Est\*(R" item or to all three +items \*(L"Neque\*(R", \*(L"Porro\*(R", and \*(L"Quisquam Est\*(R". While not an ideal +situation, this is preferable to providing formatting cues that may +be actually contrary to the author's intent. +.ie n .SH "About Data Paragraphs and ""=begin/=end"" Regions" +.el .SH "About Data Paragraphs and ``=begin/=end'' Regions" +.IX Header "About Data Paragraphs and =begin/=end Regions" +Data paragraphs are typically used for inlining non-Pod data that is +to be used (typically passed through) when rendering the document to +a specific format: +.PP +.Vb 1 +\& =begin rtf +\& +\& \epar{\epard\eqr\esa4500{\ei Printed\e~\echdate\e~\echtime}\epar} +\& +\& =end rtf +.Ve +.PP +The exact same effect could, incidentally, be achieved with a single +\&\*(L"=for\*(R" paragraph: +.PP +.Vb 1 +\& =for rtf \epar{\epard\eqr\esa4500{\ei Printed\e~\echdate\e~\echtime}\epar} +.Ve +.PP +(Although that is not formally a data paragraph, it has the same +meaning as one, and Pod parsers may parse it as one.) +.PP +Another example of a data paragraph: +.PP +.Vb 1 +\& =begin html +\& +\& I like <em>PIE</em>! +\& +\& <hr>Especially pecan pie! +\& +\& =end html +.Ve +.PP +If these were ordinary paragraphs, the Pod parser would try to +expand the \*(L"E</em>\*(R" (in the first paragraph) as a formatting +code, just like \*(L"E<lt>\*(R" or \*(L"E<eacute>\*(R". But since this +is in a "=begin \fIidentifier\fR\*(L"...\*(R"=end \fIidentifier\fR" region \fIand\fR +the identifier \*(L"html\*(R" doesn't begin have a \*(L":\*(R" prefix, the contents +of this region are stored as data paragraphs, instead of being +processed as ordinary paragraphs (or if they began with a spaces +and/or tabs, as verbatim paragraphs). +.PP +As a further example: At time of writing, no \*(L"biblio\*(R" identifier is +supported, but suppose some processor were written to recognize it as +a way of (say) denoting a bibliographic reference (necessarily +containing formatting codes in ordinary paragraphs). The fact that +\&\*(L"biblio\*(R" paragraphs were meant for ordinary processing would be +indicated by prefacing each \*(L"biblio\*(R" identifier with a colon: +.PP +.Vb 1 +\& =begin :biblio +\& +\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures = +\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ. +\& +\& =end :biblio +.Ve +.PP +This would signal to the parser that paragraphs in this begin...end +region are subject to normal handling as ordinary/verbatim paragraphs +(while still tagged as meant only for processors that understand the +\&\*(L"biblio\*(R" identifier). The same effect could be had with: +.PP +.Vb 3 +\& =for :biblio +\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures = +\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ. +.Ve +.PP +The \*(L":\*(R" on these identifiers means simply \*(L"process this stuff +normally, even though the result will be for some special target\*(R". +I suggest that parser APIs report \*(L"biblio\*(R" as the target identifier, +but also report that it had a \*(L":\*(R" prefix. (And similarly, with the +above \*(L"html\*(R", report \*(L"html\*(R" as the target identifier, and note the +\&\fIlack\fR of a \*(L":\*(R" prefix.) +.PP +Note that a "=begin \fIidentifier\fR\*(L"...\*(R"=end \fIidentifier\fR" region where +\&\fIidentifier\fR begins with a colon, \fIcan\fR contain commands. For example: +.PP +.Vb 1 +\& =begin :biblio +\& +\& Wirth\*(Aqs classic is available in several editions, including: +\& +\& =for comment +\& hm, check abebooks.com for how much used copies cost. +\& +\& =over +\& +\& =item +\& +\& Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> +\& Teubner, Stuttgart. [Yes, it\*(Aqs in German.] +\& +\& =item +\& +\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures = +\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ. +\& +\& =back +\& +\& =end :biblio +.Ve +.PP +Note, however, a "=begin \fIidentifier\fR\*(L"...\*(R"=end \fIidentifier\fR" +region where \fIidentifier\fR does \fInot\fR begin with a colon, should not +directly contain \*(L"=head1\*(R" ... \*(L"=head4\*(R" commands, nor \*(L"=over\*(R", nor \*(L"=back\*(R", +nor \*(L"=item\*(R". For example, this may be considered invalid: +.PP +.Vb 1 +\& =begin somedata +\& +\& This is a data paragraph. +\& +\& =head1 Don\*(Aqt do this! +\& +\& This is a data paragraph too. +\& +\& =end somedata +.Ve +.PP +A Pod processor may signal that the above (specifically the \*(L"=head1\*(R" +paragraph) is an error. Note, however, that the following should +\&\fInot\fR be treated as an error: +.PP +.Vb 1 +\& =begin somedata +\& +\& This is a data paragraph. +\& +\& =cut +\& +\& # Yup, this isn\*(Aqt Pod anymore. +\& sub excl { (rand() > .5) ? "hoo!" : "hah!" } +\& +\& =pod +\& +\& This is a data paragraph too. +\& +\& =end somedata +.Ve +.PP +And this too is valid: +.PP +.Vb 1 +\& =begin someformat +\& +\& This is a data paragraph. +\& +\& And this is a data paragraph. +\& +\& =begin someotherformat +\& +\& This is a data paragraph too. +\& +\& And this is a data paragraph too. +\& +\& =begin :yetanotherformat +\& +\& =head2 This is a command paragraph! +\& +\& This is an ordinary paragraph! +\& +\& And this is a verbatim paragraph! +\& +\& =end :yetanotherformat +\& +\& =end someotherformat +\& +\& Another data paragraph! +\& +\& =end someformat +.Ve +.PP +The contents of the above \*(L"=begin :yetanotherformat\*(R" ... +\&\*(L"=end :yetanotherformat\*(R" region \fIaren't\fR data paragraphs, because +the immediately containing region's identifier (\*(L":yetanotherformat\*(R") +begins with a colon. In practice, most regions that contain +data paragraphs will contain \fIonly\fR data paragraphs; however, +the above nesting is syntactically valid as Pod, even if it is +rare. However, the handlers for some formats, like \*(L"html\*(R", +will accept only data paragraphs, not nested regions; and they may +complain if they see (targeted for them) nested regions, or commands, +other than \*(L"=end\*(R", \*(L"=pod\*(R", and \*(L"=cut\*(R". +.PP +Also consider this valid structure: +.PP +.Vb 1 +\& =begin :biblio +\& +\& Wirth\*(Aqs classic is available in several editions, including: +\& +\& =over +\& +\& =item +\& +\& Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> +\& Teubner, Stuttgart. [Yes, it\*(Aqs in German.] +\& +\& =item +\& +\& Wirth, Niklaus. 1976. I<Algorithms + Data Structures = +\& Programs.> Prentice\-Hall, Englewood Cliffs, NJ. +\& +\& =back +\& +\& Buy buy buy! +\& +\& =begin html +\& +\& <img src=\*(Aqwirth_spokesmodeling_book.png\*(Aq> +\& +\& <hr> +\& +\& =end html +\& +\& Now now now! +\& +\& =end :biblio +.Ve +.PP +There, the \*(L"=begin html\*(R"...\*(L"=end html\*(R" region is nested inside +the larger \*(L"=begin :biblio\*(R"...\*(L"=end :biblio\*(R" region. Note that the +content of the \*(L"=begin html\*(R"...\*(L"=end html\*(R" region is data +paragraph(s), because the immediately containing region's identifier +(\*(L"html\*(R") \fIdoesn't\fR begin with a colon. +.PP +Pod parsers, when processing a series of data paragraphs one +after another (within a single region), should consider them to +be one large data paragraph that happens to contain blank lines. So +the content of the above \*(L"=begin html\*(R"...\*(L"=end html\*(R" \fImay\fR be stored +as two data paragraphs (one consisting of +\&\*(L"<img src='wirth_spokesmodeling_book.png'>\en\*(R" +and another consisting of \*(L"<hr>\en\*(R"), but \fIshould\fR be stored as +a single data paragraph (consisting of +\&\*(L"<img src='wirth_spokesmodeling_book.png'>\en\en<hr>\en\*(R"). +.PP +Pod processors should tolerate empty +"=begin \fIsomething\fR\*(L"...\*(R"=end \fIsomething\fR\*(L" regions, +empty \*(R"=begin :\fIsomething\fR\*(L"...\*(R"=end :\fIsomething\fR\*(L" regions, and +contentless \*(R"=for \fIsomething\fR\*(L" and \*(R"=for :\fIsomething\fR" +paragraphs. I.e., these should be tolerated: +.PP +.Vb 1 +\& =for html +\& +\& =begin html +\& +\& =end html +\& +\& =begin :biblio +\& +\& =end :biblio +.Ve +.PP +Incidentally, note that there's no easy way to express a data +paragraph starting with something that looks like a command. Consider: +.PP +.Vb 1 +\& =begin stuff +\& +\& =shazbot +\& +\& =end stuff +.Ve +.PP +There, \*(L"=shazbot\*(R" will be parsed as a Pod command \*(L"shazbot\*(R", not as a data +paragraph \*(L"=shazbot\en\*(R". However, you can express a data paragraph consisting +of \*(L"=shazbot\en\*(R" using this code: +.PP +.Vb 1 +\& =for stuff =shazbot +.Ve +.PP +The situation where this is necessary, is presumably quite rare. +.PP +Note that =end commands must match the currently open =begin command. That +is, they must properly nest. For example, this is valid: +.PP +.Vb 1 +\& =begin outer +\& +\& X +\& +\& =begin inner +\& +\& Y +\& +\& =end inner +\& +\& Z +\& +\& =end outer +.Ve +.PP +while this is invalid: +.PP +.Vb 1 +\& =begin outer +\& +\& X +\& +\& =begin inner +\& +\& Y +\& +\& =end outer +\& +\& Z +\& +\& =end inner +.Ve +.PP +This latter is improper because when the \*(L"=end outer\*(R" command is seen, the +currently open region has the formatname \*(L"inner\*(R", not \*(L"outer\*(R". (It just +happens that \*(L"outer\*(R" is the format name of a higher-up region.) This is +an error. Processors must by default report this as an error, and may halt +processing the document containing that error. A corollary of this is that +regions cannot \*(L"overlap\*(R". That is, the latter block above does not represent +a region called \*(L"outer\*(R" which contains X and Y, overlapping a region called +\&\*(L"inner\*(R" which contains Y and Z. But because it is invalid (as all +apparently overlapping regions would be), it doesn't represent that, or +anything at all. +.PP +Similarly, this is invalid: +.PP +.Vb 1 +\& =begin thing +\& +\& =end hting +.Ve +.PP +This is an error because the region is opened by \*(L"thing\*(R", and the \*(L"=end\*(R" +tries to close \*(L"hting\*(R" [sic]. +.PP +This is also invalid: +.PP +.Vb 1 +\& =begin thing +\& +\& =end +.Ve +.PP +This is invalid because every \*(L"=end\*(R" command must have a formatname +parameter. +.SH "SEE ALSO" +.IX Header "SEE ALSO" +perlpod, \*(L"PODs: Embedded Documentation\*(R" in perlsyn, +podchecker +.SH "AUTHOR" +.IX Header "AUTHOR" +Sean M. Burke |