summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/custom-scan.sgml
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
commit46651ce6fe013220ed397add242004d764fc0153 (patch)
tree6e5299f990f88e60174a1d3ae6e48eedd2688b2b /doc/src/sgml/custom-scan.sgml
parentInitial commit. (diff)
downloadpostgresql-14-46651ce6fe013220ed397add242004d764fc0153.tar.xz
postgresql-14-46651ce6fe013220ed397add242004d764fc0153.zip
Adding upstream version 14.5.upstream/14.5upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/custom-scan.sgml')
-rw-r--r--doc/src/sgml/custom-scan.sgml383
1 files changed, 383 insertions, 0 deletions
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..239ba29
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,383 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing a Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+ <primary>custom scan provider</primary>
+ <secondary>handler for</secondary>
+ </indexterm>
+
+ <para>
+ <productname>PostgreSQL</productname> supports a set of experimental facilities which
+ are intended to allow extension modules to add new scan types to the system.
+ Unlike a <link linkend="fdwhandler">foreign data wrapper</link>, which is only
+ responsible for knowing how to scan its own foreign tables, a custom scan
+ provider can provide an alternative method of scanning any relation in the
+ system. Typically, the motivation for writing a custom scan provider will
+ be to allow the use of some optimization not supported by the core
+ system, such as caching or some form of hardware acceleration. This chapter
+ outlines how to write a new custom scan provider.
+ </para>
+
+ <para>
+ Implementing a new type of custom scan is a three-step process. First,
+ during planning, it is necessary to generate access paths representing a
+ scan using the proposed strategy. Second, if one of those access paths
+ is selected by the planner as the optimal strategy for scanning a
+ particular relation, the access path must be converted to a plan.
+ Finally, it must be possible to execute the plan and generate the same
+ results that would have been generated for any other access path targeting
+ the same relation.
+ </para>
+
+ <sect1 id="custom-scan-path">
+ <title>Creating Custom Scan Paths</title>
+
+ <para>
+ A custom scan provider will typically add paths for a base relation by
+ setting the following hook, which is called after the core code has
+ generated all the access paths it can for the relation (except for
+ Gather paths, which are made after this call so that they can use
+ partial paths added by the hook):
+<programlisting>
+typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
+ RelOptInfo *rel,
+ Index rti,
+ RangeTblEntry *rte);
+extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
+</programlisting>
+ </para>
+
+ <para>
+ Although this hook function can be used to examine, modify, or remove
+ paths generated by the core system, a custom scan provider will typically
+ confine itself to generating <structname>CustomPath</structname> objects and adding
+ them to <literal>rel</literal> using <function>add_path</function>. The custom scan
+ provider is responsible for initializing the <structname>CustomPath</structname>
+ object, which is declared like this:
+<programlisting>
+typedef struct CustomPath
+{
+ Path path;
+ uint32 flags;
+ List *custom_paths;
+ List *custom_private;
+ const CustomPathMethods *methods;
+} CustomPath;
+</programlisting>
+ </para>
+
+ <para>
+ <structfield>path</structfield> must be initialized as for any other path, including
+ the row-count estimate, start and total cost, and sort ordering provided
+ by this path. <structfield>flags</structfield> is a bit mask, which should include
+ <literal>CUSTOMPATH_SUPPORT_BACKWARD_SCAN</literal> if the custom path can support
+ a backward scan and <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</literal> if it
+ can support mark and restore. Both capabilities are optional.
+ An optional <structfield>custom_paths</structfield> is a list of <structname>Path</structname>
+ nodes used by this custom-path node; these will be transformed into
+ <structname>Plan</structname> nodes by planner.
+ <structfield>custom_private</structfield> can be used to store the custom path's
+ private data. Private data should be stored in a form that can be handled
+ by <literal>nodeToString</literal>, so that debugging routines that attempt to
+ print the custom path will work as designed. <structfield>methods</structfield> must
+ point to a (usually statically allocated) object implementing the required
+ custom path methods, of which there is currently only one.
+ </para>
+
+ <para>
+ A custom scan provider can also provide join paths. Just as for base
+ relations, such a path must produce the same output as would normally be
+ produced by the join it replaces. To do this, the join provider should
+ set the following hook, and then within the hook function,
+ create <structname>CustomPath</structname> path(s) for the join relation.
+<programlisting>
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ JoinPathExtraData *extra);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+</programlisting>
+
+ This hook will be invoked repeatedly for the same join relation, with
+ different combinations of inner and outer relations; it is the
+ responsibility of the hook to minimize duplicated work.
+ </para>
+
+ <sect2 id="custom-scan-path-callbacks">
+ <title>Custom Scan Path Callbacks</title>
+
+ <para>
+<programlisting>
+Plan *(*PlanCustomPath) (PlannerInfo *root,
+ RelOptInfo *rel,
+ CustomPath *best_path,
+ List *tlist,
+ List *clauses,
+ List *custom_plans);
+</programlisting>
+ Convert a custom path to a finished plan. The return value will generally
+ be a <literal>CustomScan</literal> object, which the callback must allocate and
+ initialize. See <xref linkend="custom-scan-plan"/> for more details.
+ </para>
+ </sect2>
+ </sect1>
+
+ <sect1 id="custom-scan-plan">
+ <title>Creating Custom Scan Plans</title>
+
+ <para>
+ A custom scan is represented in a finished plan tree using the following
+ structure:
+<programlisting>
+typedef struct CustomScan
+{
+ Scan scan;
+ uint32 flags;
+ List *custom_plans;
+ List *custom_exprs;
+ List *custom_private;
+ List *custom_scan_tlist;
+ Bitmapset *custom_relids;
+ const CustomScanMethods *methods;
+} CustomScan;
+</programlisting>
+ </para>
+
+ <para>
+ <structfield>scan</structfield> must be initialized as for any other scan, including
+ estimated costs, target lists, qualifications, and so on.
+ <structfield>flags</structfield> is a bit mask with the same meaning as in
+ <structname>CustomPath</structname>.
+ <structfield>custom_plans</structfield> can be used to store child
+ <structname>Plan</structname> nodes.
+ <structfield>custom_exprs</structfield> should be used to
+ store expression trees that will need to be fixed up by
+ <filename>setrefs.c</filename> and <filename>subselect.c</filename>, while
+ <structfield>custom_private</structfield> should be used to store other private data
+ that is only used by the custom scan provider itself.
+ <structfield>custom_scan_tlist</structfield> can be NIL when scanning a base
+ relation, indicating that the custom scan returns scan tuples that match
+ the base relation's row type. Otherwise it is a target list describing
+ the actual scan tuples. <structfield>custom_scan_tlist</structfield> must be
+ provided for joins, and could be provided for scans if the custom scan
+ provider can compute some non-Var expressions.
+ <structfield>custom_relids</structfield> is set by the core code to the set of
+ relations (range table indexes) that this scan node handles; except when
+ this scan is replacing a join, it will have only one member.
+ <structfield>methods</structfield> must point to a (usually statically allocated)
+ object implementing the required custom scan methods, which are further
+ detailed below.
+ </para>
+
+ <para>
+ When a <structname>CustomScan</structname> scans a single relation,
+ <structfield>scan.scanrelid</structfield> must be the range table index of the table
+ to be scanned. When it replaces a join, <structfield>scan.scanrelid</structfield>
+ should be zero.
+ </para>
+
+ <para>
+ Plan trees must be able to be duplicated using <function>copyObject</function>,
+ so all the data stored within the <quote>custom</quote> fields must consist of
+ nodes that that function can handle. Furthermore, custom scan providers
+ cannot substitute a larger structure that embeds
+ a <structname>CustomScan</structname> for the structure itself, as would be possible
+ for a <structname>CustomPath</structname> or <structname>CustomScanState</structname>.
+ </para>
+
+ <sect2 id="custom-scan-plan-callbacks">
+ <title>Custom Scan Plan Callbacks</title>
+ <para>
+<programlisting>
+Node *(*CreateCustomScanState) (CustomScan *cscan);
+</programlisting>
+ Allocate a <structname>CustomScanState</structname> for this
+ <structname>CustomScan</structname>. The actual allocation will often be larger than
+ required for an ordinary <structname>CustomScanState</structname>, because many
+ providers will wish to embed that as the first field of a larger structure.
+ The value returned must have the node tag and <structfield>methods</structfield>
+ set appropriately, but other fields should be left as zeroes at this
+ stage; after <function>ExecInitCustomScan</function> performs basic initialization,
+ the <function>BeginCustomScan</function> callback will be invoked to give the
+ custom scan provider a chance to do whatever else is needed.
+ </para>
+ </sect2>
+ </sect1>
+
+ <sect1 id="custom-scan-execution">
+ <title>Executing Custom Scans</title>
+
+ <para>
+ When a <structfield>CustomScan</structfield> is executed, its execution state is
+ represented by a <structfield>CustomScanState</structfield>, which is declared as
+ follows:
+<programlisting>
+typedef struct CustomScanState
+{
+ ScanState ss;
+ uint32 flags;
+ const CustomExecMethods *methods;
+} CustomScanState;
+</programlisting>
+ </para>
+
+ <para>
+ <structfield>ss</structfield> is initialized as for any other scan state,
+ except that if the scan is for a join rather than a base relation,
+ <literal>ss.ss_currentRelation</literal> is left NULL.
+ <structfield>flags</structfield> is a bit mask with the same meaning as in
+ <structname>CustomPath</structname> and <structname>CustomScan</structname>.
+ <structfield>methods</structfield> must point to a (usually statically allocated)
+ object implementing the required custom scan state methods, which are
+ further detailed below. Typically, a <structname>CustomScanState</structname>, which
+ need not support <function>copyObject</function>, will actually be a larger
+ structure embedding the above as its first member.
+ </para>
+
+ <sect2 id="custom-scan-execution-callbacks">
+ <title>Custom Scan Execution Callbacks</title>
+
+ <para>
+<programlisting>
+void (*BeginCustomScan) (CustomScanState *node,
+ EState *estate,
+ int eflags);
+</programlisting>
+ Complete initialization of the supplied <structname>CustomScanState</structname>.
+ Standard fields have been initialized by <function>ExecInitCustomScan</function>,
+ but any private fields should be initialized here.
+ </para>
+
+ <para>
+<programlisting>
+TupleTableSlot *(*ExecCustomScan) (CustomScanState *node);
+</programlisting>
+ Fetch the next scan tuple. If any tuples remain, it should fill
+ <literal>ps_ResultTupleSlot</literal> with the next tuple in the current scan
+ direction, and then return the tuple slot. If not,
+ <literal>NULL</literal> or an empty slot should be returned.
+ </para>
+
+ <para>
+<programlisting>
+void (*EndCustomScan) (CustomScanState *node);
+</programlisting>
+ Clean up any private data associated with the <literal>CustomScanState</literal>.
+ This method is required, but it does not need to do anything if there is
+ no associated data or it will be cleaned up automatically.
+ </para>
+
+ <para>
+<programlisting>
+void (*ReScanCustomScan) (CustomScanState *node);
+</programlisting>
+ Rewind the current scan to the beginning and prepare to rescan the
+ relation.
+ </para>
+
+ <para>
+<programlisting>
+void (*MarkPosCustomScan) (CustomScanState *node);
+</programlisting>
+ Save the current scan position so that it can subsequently be restored
+ by the <function>RestrPosCustomScan</function> callback. This callback is
+ optional, and need only be supplied if the
+ <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</literal> flag is set.
+ </para>
+
+ <para>
+<programlisting>
+void (*RestrPosCustomScan) (CustomScanState *node);
+</programlisting>
+ Restore the previous scan position as saved by the
+ <function>MarkPosCustomScan</function> callback. This callback is optional,
+ and need only be supplied if the
+ <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</literal> flag is set.
+ </para>
+
+ <para>
+<programlisting>
+Size (*EstimateDSMCustomScan) (CustomScanState *node,
+ ParallelContext *pcxt);
+</programlisting>
+ Estimate the amount of dynamic shared memory that will be required
+ for parallel operation. This may be higher than the amount that will
+ actually be used, but it must not be lower. The return value is in bytes.
+ This callback is optional, and need only be supplied if this custom
+ scan provider supports parallel execution.
+ </para>
+
+ <para>
+<programlisting>
+void (*InitializeDSMCustomScan) (CustomScanState *node,
+ ParallelContext *pcxt,
+ void *coordinate);
+</programlisting>
+ Initialize the dynamic shared memory that will be required for parallel
+ operation. <literal>coordinate</literal> points to a shared memory area of
+ size equal to the return value of <function>EstimateDSMCustomScan</function>.
+ This callback is optional, and need only be supplied if this custom
+ scan provider supports parallel execution.
+ </para>
+
+ <para>
+<programlisting>
+void (*ReInitializeDSMCustomScan) (CustomScanState *node,
+ ParallelContext *pcxt,
+ void *coordinate);
+</programlisting>
+ Re-initialize the dynamic shared memory required for parallel operation
+ when the custom-scan plan node is about to be re-scanned.
+ This callback is optional, and need only be supplied if this custom
+ scan provider supports parallel execution.
+ Recommended practice is that this callback reset only shared state,
+ while the <function>ReScanCustomScan</function> callback resets only local
+ state. Currently, this callback will be called
+ before <function>ReScanCustomScan</function>, but it's best not to rely on
+ that ordering.
+ </para>
+
+ <para>
+<programlisting>
+void (*InitializeWorkerCustomScan) (CustomScanState *node,
+ shm_toc *toc,
+ void *coordinate);
+</programlisting>
+ Initialize a parallel worker's local state based on the shared state
+ set up by the leader during <function>InitializeDSMCustomScan</function>.
+ This callback is optional, and need only be supplied if this custom
+ scan provider supports parallel execution.
+ </para>
+
+ <para>
+<programlisting>
+void (*ShutdownCustomScan) (CustomScanState *node);
+</programlisting>
+ Release resources when it is anticipated the node will not be executed
+ to completion. This is not called in all cases; sometimes,
+ <literal>EndCustomScan</literal> may be called without this function having
+ been called first. Since the DSM segment used by parallel query is
+ destroyed just after this callback is invoked, custom scan providers that
+ wish to take some action before the DSM segment goes away should implement
+ this method.
+ </para>
+
+ <para>
+<programlisting>
+void (*ExplainCustomScan) (CustomScanState *node,
+ List *ancestors,
+ ExplainState *es);
+</programlisting>
+ Output additional information for <command>EXPLAIN</command> of a custom-scan
+ plan node. This callback is optional. Common data stored in the
+ <structname>ScanState</structname>, such as the target list and scan relation, will
+ be shown even without this callback, but the callback allows the display
+ of additional, private state.
+ </para>
+ </sect2>
+ </sect1>
+</chapter>