summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/sources.sgml
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
commit46651ce6fe013220ed397add242004d764fc0153 (patch)
tree6e5299f990f88e60174a1d3ae6e48eedd2688b2b /doc/src/sgml/sources.sgml
parentInitial commit. (diff)
downloadpostgresql-14-upstream.tar.xz
postgresql-14-upstream.zip
Adding upstream version 14.5.upstream/14.5upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/sources.sgml')
-rw-r--r--doc/src/sgml/sources.sgml1029
1 files changed, 1029 insertions, 0 deletions
diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml
new file mode 100644
index 0000000..e6ae02f
--- /dev/null
+++ b/doc/src/sgml/sources.sgml
@@ -0,0 +1,1029 @@
+<!-- doc/src/sgml/sources.sgml -->
+
+ <chapter id="source">
+ <title>PostgreSQL Coding Conventions</title>
+
+ <sect1 id="source-format">
+ <title>Formatting</title>
+
+ <para>
+ Source code formatting uses 4 column tab spacing, with
+ tabs preserved (i.e., tabs are not expanded to spaces).
+ Each logical indentation level is one additional tab stop.
+ </para>
+
+ <para>
+ Layout rules (brace positioning, etc) follow BSD conventions. In
+ particular, curly braces for the controlled blocks of <literal>if</literal>,
+ <literal>while</literal>, <literal>switch</literal>, etc go on their own lines.
+ </para>
+
+ <para>
+ Limit line lengths so that the code is readable in an 80-column window.
+ (This doesn't mean that you must never go past 80 columns. For instance,
+ breaking a long error message string in arbitrary places just to keep the
+ code within 80 columns is probably not a net gain in readability.)
+ </para>
+
+ <para>
+ To maintain a consistent coding style, do not use C++ style comments
+ (<literal>//</literal> comments). <application>pgindent</application>
+ will replace them with <literal>/* ... */</literal>.
+ </para>
+
+ <para>
+ The preferred style for multi-line comment blocks is
+<programlisting>
+/*
+ * comment text begins here
+ * and continues here
+ */
+</programlisting>
+ Note that comment blocks that begin in column 1 will be preserved as-is
+ by <application>pgindent</application>, but it will re-flow indented comment blocks
+ as though they were plain text. If you want to preserve the line breaks
+ in an indented block, add dashes like this:
+<programlisting>
+ /*----------
+ * comment text begins here
+ * and continues here
+ *----------
+ */
+</programlisting>
+ </para>
+
+ <para>
+ While submitted patches do not absolutely have to follow these formatting
+ rules, it's a good idea to do so. Your code will get run through
+ <application>pgindent</application> before the next release, so there's no point in
+ making it look nice under some other set of formatting conventions.
+ A good rule of thumb for patches is <quote>make the new code look like
+ the existing code around it</quote>.
+ </para>
+
+ <para>
+ The <filename>src/tools</filename> directory contains sample settings
+ files that can be used with the <productname>emacs</productname>,
+ <productname>xemacs</productname> or <productname>vim</productname>
+ editors to help ensure that they format code according to these
+ conventions.
+ </para>
+
+ <para>
+ The text browsing tools <application>more</application> and
+ <application>less</application> can be invoked as:
+<programlisting>
+more -x4
+less -x4
+</programlisting>
+ to make them show tabs appropriately.
+ </para>
+ </sect1>
+
+ <sect1 id="error-message-reporting">
+ <title>Reporting Errors Within the Server</title>
+
+ <indexterm>
+ <primary>ereport</primary>
+ </indexterm>
+ <indexterm>
+ <primary>elog</primary>
+ </indexterm>
+
+ <para>
+ Error, warning, and log messages generated within the server code
+ should be created using <function>ereport</function>, or its older cousin
+ <function>elog</function>. The use of this function is complex enough to
+ require some explanation.
+ </para>
+
+ <para>
+ There are two required elements for every message: a severity level
+ (ranging from <literal>DEBUG</literal> to <literal>PANIC</literal>) and a primary
+ message text. In addition there are optional elements, the most
+ common of which is an error identifier code that follows the SQL spec's
+ SQLSTATE conventions.
+ <function>ereport</function> itself is just a shell macro that exists
+ mainly for the syntactic convenience of making message generation
+ look like a single function call in the C source code. The only parameter
+ accepted directly by <function>ereport</function> is the severity level.
+ The primary message text and any optional message elements are
+ generated by calling auxiliary functions, such as <function>errmsg</function>,
+ within the <function>ereport</function> call.
+ </para>
+
+ <para>
+ A typical call to <function>ereport</function> might look like this:
+<programlisting>
+ereport(ERROR,
+ errcode(ERRCODE_DIVISION_BY_ZERO),
+ errmsg("division by zero"));
+</programlisting>
+ This specifies error severity level <literal>ERROR</literal> (a run-of-the-mill
+ error). The <function>errcode</function> call specifies the SQLSTATE error code
+ using a macro defined in <filename>src/include/utils/errcodes.h</filename>. The
+ <function>errmsg</function> call provides the primary message text.
+ </para>
+
+ <para>
+ You will also frequently see this older style, with an extra set of
+ parentheses surrounding the auxiliary function calls:
+<programlisting>
+ereport(ERROR,
+ (errcode(ERRCODE_DIVISION_BY_ZERO),
+ errmsg("division by zero")));
+</programlisting>
+ The extra parentheses were required
+ before <productname>PostgreSQL</productname> version 12, but are now
+ optional.
+ </para>
+
+ <para>
+ Here is a more complex example:
+<programlisting>
+ereport(ERROR,
+ errcode(ERRCODE_AMBIGUOUS_FUNCTION),
+ errmsg("function %s is not unique",
+ func_signature_string(funcname, nargs,
+ NIL, actual_arg_types)),
+ errhint("Unable to choose a best candidate function. "
+ "You might need to add explicit typecasts."));
+</programlisting>
+ This illustrates the use of format codes to embed run-time values into
+ a message text. Also, an optional <quote>hint</quote> message is provided.
+ The auxiliary function calls can be written in any order, but
+ conventionally <function>errcode</function>
+ and <function>errmsg</function> appear first.
+ </para>
+
+ <para>
+ If the severity level is <literal>ERROR</literal> or higher,
+ <function>ereport</function> aborts execution of the current query
+ and does not return to the caller. If the severity level is
+ lower than <literal>ERROR</literal>, <function>ereport</function> returns normally.
+ </para>
+
+ <para>
+ The available auxiliary routines for <function>ereport</function> are:
+ <itemizedlist>
+ <listitem>
+ <para>
+ <function>errcode(sqlerrcode)</function> specifies the SQLSTATE error identifier
+ code for the condition. If this routine is not called, the error
+ identifier defaults to
+ <literal>ERRCODE_INTERNAL_ERROR</literal> when the error severity level is
+ <literal>ERROR</literal> or higher, <literal>ERRCODE_WARNING</literal> when the
+ error level is <literal>WARNING</literal>, otherwise (for <literal>NOTICE</literal>
+ and below) <literal>ERRCODE_SUCCESSFUL_COMPLETION</literal>.
+ While these defaults are often convenient, always think whether they
+ are appropriate before omitting the <function>errcode()</function> call.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errmsg(const char *msg, ...)</function> specifies the primary error
+ message text, and possibly run-time values to insert into it. Insertions
+ are specified by <function>sprintf</function>-style format codes. In addition to
+ the standard format codes accepted by <function>sprintf</function>, the format
+ code <literal>%m</literal> can be used to insert the error message returned
+ by <function>strerror</function> for the current value of <literal>errno</literal>.
+ <footnote>
+ <para>
+ That is, the value that was current when the <function>ereport</function> call
+ was reached; changes of <literal>errno</literal> within the auxiliary reporting
+ routines will not affect it. That would not be true if you were to
+ write <literal>strerror(errno)</literal> explicitly in <function>errmsg</function>'s
+ parameter list; accordingly, do not do so.
+ </para>
+ </footnote>
+ <literal>%m</literal> does not require any
+ corresponding entry in the parameter list for <function>errmsg</function>.
+ Note that the message string will be run through <function>gettext</function>
+ for possible localization before format codes are processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errmsg_internal(const char *msg, ...)</function> is the same as
+ <function>errmsg</function>, except that the message string will not be
+ translated nor included in the internationalization message dictionary.
+ This should be used for <quote>cannot happen</quote> cases that are probably
+ not worth expending translation effort on.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errmsg_plural(const char *fmt_singular, const char *fmt_plural,
+ unsigned long n, ...)</function> is like <function>errmsg</function>, but with
+ support for various plural forms of the message.
+ <replaceable>fmt_singular</replaceable> is the English singular format,
+ <replaceable>fmt_plural</replaceable> is the English plural format,
+ <replaceable>n</replaceable> is the integer value that determines which plural
+ form is needed, and the remaining arguments are formatted according
+ to the selected format string. For more information see
+ <xref linkend="nls-guidelines"/>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errdetail(const char *msg, ...)</function> supplies an optional
+ <quote>detail</quote> message; this is to be used when there is additional
+ information that seems inappropriate to put in the primary message.
+ The message string is processed in just the same way as for
+ <function>errmsg</function>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errdetail_internal(const char *msg, ...)</function> is the same
+ as <function>errdetail</function>, except that the message string will not be
+ translated nor included in the internationalization message dictionary.
+ This should be used for detail messages that are not worth expending
+ translation effort on, for instance because they are too technical to be
+ useful to most users.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errdetail_plural(const char *fmt_singular, const char *fmt_plural,
+ unsigned long n, ...)</function> is like <function>errdetail</function>, but with
+ support for various plural forms of the message.
+ For more information see <xref linkend="nls-guidelines"/>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errdetail_log(const char *msg, ...)</function> is the same as
+ <function>errdetail</function> except that this string goes only to the server
+ log, never to the client. If both <function>errdetail</function> (or one of
+ its equivalents above) and
+ <function>errdetail_log</function> are used then one string goes to the client
+ and the other to the log. This is useful for error details that are
+ too security-sensitive or too bulky to include in the report
+ sent to the client.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errdetail_log_plural(const char *fmt_singular, const char
+ *fmt_plural, unsigned long n, ...)</function> is like
+ <function>errdetail_log</function>, but with support for various plural forms of
+ the message.
+ For more information see <xref linkend="nls-guidelines"/>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errhint(const char *msg, ...)</function> supplies an optional
+ <quote>hint</quote> message; this is to be used when offering suggestions
+ about how to fix the problem, as opposed to factual details about
+ what went wrong.
+ The message string is processed in just the same way as for
+ <function>errmsg</function>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errhint_plural(const char *fmt_singular, const char *fmt_plural,
+ unsigned long n, ...)</function> is like <function>errhint</function>, but with
+ support for various plural forms of the message.
+ For more information see <xref linkend="nls-guidelines"/>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errcontext(const char *msg, ...)</function> is not normally called
+ directly from an <function>ereport</function> message site; rather it is used
+ in <literal>error_context_stack</literal> callback functions to provide
+ information about the context in which an error occurred, such as the
+ current location in a PL function.
+ The message string is processed in just the same way as for
+ <function>errmsg</function>. Unlike the other auxiliary functions, this can
+ be called more than once per <function>ereport</function> call; the successive
+ strings thus supplied are concatenated with separating newlines.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errposition(int cursorpos)</function> specifies the textual location
+ of an error within a query string. Currently it is only useful for
+ errors detected in the lexical and syntactic analysis phases of
+ query processing.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errtable(Relation rel)</function> specifies a relation whose
+ name and schema name should be included as auxiliary fields in the error
+ report.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errtablecol(Relation rel, int attnum)</function> specifies
+ a column whose name, table name, and schema name should be included as
+ auxiliary fields in the error report.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errtableconstraint(Relation rel, const char *conname)</function>
+ specifies a table constraint whose name, table name, and schema name
+ should be included as auxiliary fields in the error report. Indexes
+ should be considered to be constraints for this purpose, whether or
+ not they have an associated <structname>pg_constraint</structname> entry. Be
+ careful to pass the underlying heap relation, not the index itself, as
+ <literal>rel</literal>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errdatatype(Oid datatypeOid)</function> specifies a data
+ type whose name and schema name should be included as auxiliary fields
+ in the error report.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errdomainconstraint(Oid datatypeOid, const char *conname)</function>
+ specifies a domain constraint whose name, domain name, and schema name
+ should be included as auxiliary fields in the error report.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errcode_for_file_access()</function> is a convenience function that
+ selects an appropriate SQLSTATE error identifier for a failure in a
+ file-access-related system call. It uses the saved
+ <literal>errno</literal> to determine which error code to generate.
+ Usually this should be used in combination with <literal>%m</literal> in the
+ primary error message text.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errcode_for_socket_access()</function> is a convenience function that
+ selects an appropriate SQLSTATE error identifier for a failure in a
+ socket-related system call.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errhidestmt(bool hide_stmt)</function> can be called to specify
+ suppression of the <literal>STATEMENT:</literal> portion of a message in the
+ postmaster log. Generally this is appropriate if the message text
+ includes the current statement already.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <function>errhidecontext(bool hide_ctx)</function> can be called to
+ specify suppression of the <literal>CONTEXT:</literal> portion of a message in
+ the postmaster log. This should only be used for verbose debugging
+ messages where the repeated inclusion of context would bloat the log
+ too much.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <note>
+ <para>
+ At most one of the functions <function>errtable</function>,
+ <function>errtablecol</function>, <function>errtableconstraint</function>,
+ <function>errdatatype</function>, or <function>errdomainconstraint</function> should
+ be used in an <function>ereport</function> call. These functions exist to
+ allow applications to extract the name of a database object associated
+ with the error condition without having to examine the
+ potentially-localized error message text.
+ These functions should be used in error reports for which it's likely
+ that applications would wish to have automatic error handling. As of
+ <productname>PostgreSQL</productname> 9.3, complete coverage exists only for
+ errors in SQLSTATE class 23 (integrity constraint violation), but this
+ is likely to be expanded in future.
+ </para>
+ </note>
+
+ <para>
+ There is an older function <function>elog</function> that is still heavily used.
+ An <function>elog</function> call:
+<programlisting>
+elog(level, "format string", ...);
+</programlisting>
+ is exactly equivalent to:
+<programlisting>
+ereport(level, errmsg_internal("format string", ...));
+</programlisting>
+ Notice that the SQLSTATE error code is always defaulted, and the message
+ string is not subject to translation.
+ Therefore, <function>elog</function> should be used only for internal errors and
+ low-level debug logging. Any message that is likely to be of interest to
+ ordinary users should go through <function>ereport</function>. Nonetheless,
+ there are enough internal <quote>cannot happen</quote> error checks in the
+ system that <function>elog</function> is still widely used; it is preferred for
+ those messages for its notational simplicity.
+ </para>
+
+ <para>
+ Advice about writing good error messages can be found in
+ <xref linkend="error-style-guide"/>.
+ </para>
+ </sect1>
+
+ <sect1 id="error-style-guide">
+ <title>Error Message Style Guide</title>
+
+ <para>
+ This style guide is offered in the hope of maintaining a consistent,
+ user-friendly style throughout all the messages generated by
+ <productname>PostgreSQL</productname>.
+ </para>
+
+ <simplesect>
+ <title>What Goes Where</title>
+
+ <para>
+ The primary message should be short, factual, and avoid reference to
+ implementation details such as specific function names.
+ <quote>Short</quote> means <quote>should fit on one line under normal
+ conditions</quote>. Use a detail message if needed to keep the primary
+ message short, or if you feel a need to mention implementation details
+ such as the particular system call that failed. Both primary and detail
+ messages should be factual. Use a hint message for suggestions about what
+ to do to fix the problem, especially if the suggestion might not always be
+ applicable.
+ </para>
+
+ <para>
+ For example, instead of:
+<programlisting>
+IpcMemoryCreate: shmget(key=%d, size=%u, 0%o) failed: %m
+(plus a long addendum that is basically a hint)
+</programlisting>
+ write:
+<programlisting>
+Primary: could not create shared memory segment: %m
+Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
+Hint: the addendum
+</programlisting>
+ </para>
+
+ <para>
+ Rationale: keeping the primary message short helps keep it to the point,
+ and lets clients lay out screen space on the assumption that one line is
+ enough for error messages. Detail and hint messages can be relegated to a
+ verbose mode, or perhaps a pop-up error-details window. Also, details and
+ hints would normally be suppressed from the server log to save
+ space. Reference to implementation details is best avoided since users
+ aren't expected to know the details.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Formatting</title>
+
+ <para>
+ Don't put any specific assumptions about formatting into the message
+ texts. Expect clients and the server log to wrap lines to fit their own
+ needs. In long messages, newline characters (\n) can be used to indicate
+ suggested paragraph breaks. Don't end a message with a newline. Don't
+ use tabs or other formatting characters. (In error context displays,
+ newlines are automatically added to separate levels of context such as
+ function calls.)
+ </para>
+
+ <para>
+ Rationale: Messages are not necessarily displayed on terminal-type
+ displays. In GUI displays or browsers these formatting instructions are
+ at best ignored.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Quotation Marks</title>
+
+ <para>
+ English text should use double quotes when quoting is appropriate.
+ Text in other languages should consistently use one kind of quotes that is
+ consistent with publishing customs and computer output of other programs.
+ </para>
+
+ <para>
+ Rationale: The choice of double quotes over single quotes is somewhat
+ arbitrary, but tends to be the preferred use. Some have suggested
+ choosing the kind of quotes depending on the type of object according to
+ SQL conventions (namely, strings single quoted, identifiers double
+ quoted). But this is a language-internal technical issue that many users
+ aren't even familiar with, it won't scale to other kinds of quoted terms,
+ it doesn't translate to other languages, and it's pretty pointless, too.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Use of Quotes</title>
+
+ <para>
+ Always use quotes to delimit file names, user-supplied identifiers, and
+ other variables that might contain words. Do not use them to mark up
+ variables that will not contain words (for example, operator names).
+ </para>
+
+ <para>
+ There are functions in the backend that will double-quote their own output
+ as needed (for example, <function>format_type_be()</function>). Do not put
+ additional quotes around the output of such functions.
+ </para>
+
+ <para>
+ Rationale: Objects can have names that create ambiguity when embedded in a
+ message. Be consistent about denoting where a plugged-in name starts and
+ ends. But don't clutter messages with unnecessary or duplicate quote
+ marks.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Grammar and Punctuation</title>
+
+ <para>
+ The rules are different for primary error messages and for detail/hint
+ messages:
+ </para>
+
+ <para>
+ Primary error messages: Do not capitalize the first letter. Do not end a
+ message with a period. Do not even think about ending a message with an
+ exclamation point.
+ </para>
+
+ <para>
+ Detail and hint messages: Use complete sentences, and end each with
+ a period. Capitalize the first word of sentences. Put two spaces after
+ the period if another sentence follows (for English text; might be
+ inappropriate in other languages).
+ </para>
+
+ <para>
+ Error context strings: Do not capitalize the first letter and do
+ not end the string with a period. Context strings should normally
+ not be complete sentences.
+ </para>
+
+ <para>
+ Rationale: Avoiding punctuation makes it easier for client applications to
+ embed the message into a variety of grammatical contexts. Often, primary
+ messages are not grammatically complete sentences anyway. (And if they're
+ long enough to be more than one sentence, they should be split into
+ primary and detail parts.) However, detail and hint messages are longer
+ and might need to include multiple sentences. For consistency, they should
+ follow complete-sentence style even when there's only one sentence.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Upper Case vs. Lower Case</title>
+
+ <para>
+ Use lower case for message wording, including the first letter of a
+ primary error message. Use upper case for SQL commands and key words if
+ they appear in the message.
+ </para>
+
+ <para>
+ Rationale: It's easier to make everything look more consistent this
+ way, since some messages are complete sentences and some not.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Avoid Passive Voice</title>
+
+ <para>
+ Use the active voice. Use complete sentences when there is an acting
+ subject (<quote>A could not do B</quote>). Use telegram style without
+ subject if the subject would be the program itself; do not use
+ <quote>I</quote> for the program.
+ </para>
+
+ <para>
+ Rationale: The program is not human. Don't pretend otherwise.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Present vs. Past Tense</title>
+
+ <para>
+ Use past tense if an attempt to do something failed, but could perhaps
+ succeed next time (perhaps after fixing some problem). Use present tense
+ if the failure is certainly permanent.
+ </para>
+
+ <para>
+ There is a nontrivial semantic difference between sentences of the form:
+<programlisting>
+could not open file "%s": %m
+</programlisting>
+and:
+<programlisting>
+cannot open file "%s"
+</programlisting>
+ The first one means that the attempt to open the file failed. The
+ message should give a reason, such as <quote>disk full</quote> or
+ <quote>file doesn't exist</quote>. The past tense is appropriate because
+ next time the disk might not be full anymore or the file in question might
+ exist.
+ </para>
+
+ <para>
+ The second form indicates that the functionality of opening the named file
+ does not exist at all in the program, or that it's conceptually
+ impossible. The present tense is appropriate because the condition will
+ persist indefinitely.
+ </para>
+
+ <para>
+ Rationale: Granted, the average user will not be able to draw great
+ conclusions merely from the tense of the message, but since the language
+ provides us with a grammar we should use it correctly.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Type of the Object</title>
+
+ <para>
+ When citing the name of an object, state what kind of object it is.
+ </para>
+
+ <para>
+ Rationale: Otherwise no one will know what <quote>foo.bar.baz</quote>
+ refers to.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Brackets</title>
+
+ <para>
+ Square brackets are only to be used (1) in command synopses to denote
+ optional arguments, or (2) to denote an array subscript.
+ </para>
+
+ <para>
+ Rationale: Anything else does not correspond to widely-known customary
+ usage and will confuse people.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Assembling Error Messages</title>
+
+ <para>
+ When a message includes text that is generated elsewhere, embed it in
+ this style:
+<programlisting>
+could not open file %s: %m
+</programlisting>
+ </para>
+
+ <para>
+ Rationale: It would be difficult to account for all possible error codes
+ to paste this into a single smooth sentence, so some sort of punctuation
+ is needed. Putting the embedded text in parentheses has also been
+ suggested, but it's unnatural if the embedded text is likely to be the
+ most important part of the message, as is often the case.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Reasons for Errors</title>
+
+ <para>
+ Messages should always state the reason why an error occurred.
+ For example:
+<programlisting>
+BAD: could not open file %s
+BETTER: could not open file %s (I/O failure)
+</programlisting>
+ If no reason is known you better fix the code.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Function Names</title>
+
+ <para>
+ Don't include the name of the reporting routine in the error text. We have
+ other mechanisms for finding that out when needed, and for most users it's
+ not helpful information. If the error text doesn't make as much sense
+ without the function name, reword it.
+<programlisting>
+BAD: pg_strtoint32: error in "z": cannot parse "z"
+BETTER: invalid input syntax for type integer: "z"
+</programlisting>
+ </para>
+
+ <para>
+ Avoid mentioning called function names, either; instead say what the code
+ was trying to do:
+<programlisting>
+BAD: open() failed: %m
+BETTER: could not open file %s: %m
+</programlisting>
+ If it really seems necessary, mention the system call in the detail
+ message. (In some cases, providing the actual values passed to the
+ system call might be appropriate information for the detail message.)
+ </para>
+
+ <para>
+ Rationale: Users don't know what all those functions do.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Tricky Words to Avoid</title>
+
+ <formalpara>
+ <title>Unable</title>
+ <para>
+ <quote>Unable</quote> is nearly the passive voice. Better use
+ <quote>cannot</quote> or <quote>could not</quote>, as appropriate.
+ </para>
+ </formalpara>
+
+ <formalpara>
+ <title>Bad</title>
+ <para>
+ Error messages like <quote>bad result</quote> are really hard to interpret
+ intelligently. It's better to write why the result is <quote>bad</quote>,
+ e.g., <quote>invalid format</quote>.
+ </para>
+ </formalpara>
+
+ <formalpara>
+ <title>Illegal</title>
+ <para>
+ <quote>Illegal</quote> stands for a violation of the law, the rest is
+ <quote>invalid</quote>. Better yet, say why it's invalid.
+ </para>
+ </formalpara>
+
+ <formalpara>
+ <title>Unknown</title>
+ <para>
+ Try to avoid <quote>unknown</quote>. Consider <quote>error: unknown
+ response</quote>. If you don't know what the response is, how do you know
+ it's erroneous? <quote>Unrecognized</quote> is often a better choice.
+ Also, be sure to include the value being complained of.
+<programlisting>
+BAD: unknown node type
+BETTER: unrecognized node type: 42
+</programlisting>
+ </para>
+ </formalpara>
+
+ <formalpara>
+ <title>Find vs. Exists</title>
+ <para>
+ If the program uses a nontrivial algorithm to locate a resource (e.g., a
+ path search) and that algorithm fails, it is fair to say that the program
+ couldn't <quote>find</quote> the resource. If, on the other hand, the
+ expected location of the resource is known but the program cannot access
+ it there then say that the resource doesn't <quote>exist</quote>. Using
+ <quote>find</quote> in this case sounds weak and confuses the issue.
+ </para>
+ </formalpara>
+
+ <formalpara>
+ <title>May vs. Can vs. Might</title>
+ <para>
+ <quote>May</quote> suggests permission (e.g., "You may borrow my rake."),
+ and has little use in documentation or error messages.
+ <quote>Can</quote> suggests ability (e.g., "I can lift that log."),
+ and <quote>might</quote> suggests possibility (e.g., "It might rain
+ today."). Using the proper word clarifies meaning and assists
+ translation.
+ </para>
+ </formalpara>
+
+ <formalpara>
+ <title>Contractions</title>
+ <para>
+ Avoid contractions, like <quote>can't</quote>; use
+ <quote>cannot</quote> instead.
+ </para>
+ </formalpara>
+
+ <formalpara>
+ <title>Non-negative</title>
+ <para>
+ Avoid <quote>non-negative</quote> as it is ambiguous
+ about whether it accepts zero. It's better to use
+ <quote>greater than zero</quote> or
+ <quote>greater than or equal to zero</quote>.
+ </para>
+ </formalpara>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Proper Spelling</title>
+
+ <para>
+ Spell out words in full. For instance, avoid:
+ <itemizedlist>
+ <listitem>
+ <para>
+ spec
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ stats
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ parens
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ auth
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ xact
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>
+ Rationale: This will improve consistency.
+ </para>
+
+ </simplesect>
+
+ <simplesect>
+ <title>Localization</title>
+
+ <para>
+ Keep in mind that error message texts need to be translated into other
+ languages. Follow the guidelines in <xref linkend="nls-guidelines"/>
+ to avoid making life difficult for translators.
+ </para>
+ </simplesect>
+
+ </sect1>
+
+ <sect1 id="source-conventions">
+ <title>Miscellaneous Coding Conventions</title>
+
+ <simplesect>
+ <title>C Standard</title>
+ <para>
+ Code in <productname>PostgreSQL</productname> should only rely on language
+ features available in the C99 standard. That means a conforming
+ C99 compiler has to be able to compile postgres, at least aside
+ from a few platform dependent pieces.
+ </para>
+ <para>
+ A few features included in the C99 standard are, at this time, not
+ permitted to be used in core <productname>PostgreSQL</productname>
+ code. This currently includes variable length arrays, intermingled
+ declarations and code, <literal>//</literal> comments, universal
+ character names. Reasons for that include portability and historical
+ practices.
+ </para>
+ <para>
+ Features from later revisions of the C standard or compiler specific
+ features can be used, if a fallback is provided.
+ </para>
+ <para>
+ For example <literal>_Static_assert()</literal> and
+ <literal>__builtin_constant_p</literal> are currently used, even though
+ they are from newer revisions of the C standard and a
+ <productname>GCC</productname> extension respectively. If not available
+ we respectively fall back to using a C99 compatible replacement that
+ performs the same checks, but emits rather cryptic messages and do not
+ use <literal>__builtin_constant_p</literal>.
+ </para>
+ </simplesect>
+
+ <simplesect>
+ <title>Function-Like Macros and Inline Functions</title>
+ <para>
+ Both, macros with arguments and <literal>static inline</literal>
+ functions, may be used. The latter are preferable if there are
+ multiple-evaluation hazards when written as a macro, as e.g., the
+ case with
+<programlisting>
+#define Max(x, y) ((x) > (y) ? (x) : (y))
+</programlisting>
+ or when the macro would be very long. In other cases it's only
+ possible to use macros, or at least easier. For example because
+ expressions of various types need to be passed to the macro.
+ </para>
+ <para>
+ When the definition of an inline function references symbols
+ (i.e., variables, functions) that are only available as part of the
+ backend, the function may not be visible when included from frontend
+ code.
+<programlisting>
+#ifndef FRONTEND
+static inline MemoryContext
+MemoryContextSwitchTo(MemoryContext context)
+{
+ MemoryContext old = CurrentMemoryContext;
+
+ CurrentMemoryContext = context;
+ return old;
+}
+#endif /* FRONTEND */
+</programlisting>
+ In this example <literal>CurrentMemoryContext</literal>, which is only
+ available in the backend, is referenced and the function thus
+ hidden with a <literal>#ifndef FRONTEND</literal>. This rule
+ exists because some compilers emit references to symbols
+ contained in inline functions even if the function is not used.
+ </para>
+ </simplesect>
+
+ <simplesect>
+ <title>Writing Signal Handlers</title>
+ <para>
+ To be suitable to run inside a signal handler code has to be
+ written very carefully. The fundamental problem is that, unless
+ blocked, a signal handler can interrupt code at any time. If code
+ inside the signal handler uses the same state as code outside
+ chaos may ensue. As an example consider what happens if a signal
+ handler tries to acquire a lock that's already held in the
+ interrupted code.
+ </para>
+ <para>
+ Barring special arrangements code in signal handlers may only
+ call async-signal safe functions (as defined in POSIX) and access
+ variables of type <literal>volatile sig_atomic_t</literal>. A few
+ functions in <command>postgres</command> are also deemed signal safe, importantly
+ <function>SetLatch()</function>.
+ </para>
+ <para>
+ In most cases signal handlers should do nothing more than note
+ that a signal has arrived, and wake up code running outside of
+ the handler using a latch. An example of such a handler is the
+ following:
+<programlisting>
+static void
+handle_sighup(SIGNAL_ARGS)
+{
+ int save_errno = errno;
+
+ got_SIGHUP = true;
+ SetLatch(MyLatch);
+
+ errno = save_errno;
+}
+</programlisting>
+ <varname>errno</varname> is saved and restored because
+ <function>SetLatch()</function> might change it. If that were not done
+ interrupted code that's currently inspecting <varname>errno</varname> might see the wrong
+ value.
+ </para>
+ </simplesect>
+
+ <simplesect>
+ <title>Calling Function Pointers</title>
+
+ <para>
+ For clarity, it is preferred to explicitly dereference a function pointer
+ when calling the pointed-to function if the pointer is a simple variable,
+ for example:
+<programlisting>
+(*emit_log_hook) (edata);
+</programlisting>
+ (even though <literal>emit_log_hook(edata)</literal> would also work).
+ When the function pointer is part of a structure, then the extra
+ punctuation can and usually should be omitted, for example:
+<programlisting>
+paramInfo->paramFetch(paramInfo, paramId);
+</programlisting>
+ </para>
+ </simplesect>
+ </sect1>
+ </chapter>