diff options
Diffstat (limited to 'doc/src/sgml/custom-scan.sgml')
-rw-r--r-- | doc/src/sgml/custom-scan.sgml | 407 |
1 files changed, 407 insertions, 0 deletions
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml new file mode 100644 index 0000000..cd989e7 --- /dev/null +++ b/doc/src/sgml/custom-scan.sgml @@ -0,0 +1,407 @@ +<!-- doc/src/sgml/custom-scan.sgml --> + +<chapter id="custom-scan"> + <title>Writing a Custom Scan Provider</title> + + <indexterm zone="custom-scan"> + <primary>custom scan provider</primary> + <secondary>handler for</secondary> + </indexterm> + + <para> + <productname>PostgreSQL</productname> supports a set of experimental facilities which + are intended to allow extension modules to add new scan types to the system. + Unlike a <link linkend="fdwhandler">foreign data wrapper</link>, which is only + responsible for knowing how to scan its own foreign tables, a custom scan + provider can provide an alternative method of scanning any relation in the + system. Typically, the motivation for writing a custom scan provider will + be to allow the use of some optimization not supported by the core + system, such as caching or some form of hardware acceleration. This chapter + outlines how to write a new custom scan provider. + </para> + + <para> + Implementing a new type of custom scan is a three-step process. First, + during planning, it is necessary to generate access paths representing a + scan using the proposed strategy. Second, if one of those access paths + is selected by the planner as the optimal strategy for scanning a + particular relation, the access path must be converted to a plan. + Finally, it must be possible to execute the plan and generate the same + results that would have been generated for any other access path targeting + the same relation. + </para> + + <sect1 id="custom-scan-path"> + <title>Creating Custom Scan Paths</title> + + <para> + A custom scan provider will typically add paths for a base relation by + setting the following hook, which is called after the core code has + generated all the access paths it can for the relation (except for + Gather paths, which are made after this call so that they can use + partial paths added by the hook): +<programlisting> +typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root, + RelOptInfo *rel, + Index rti, + RangeTblEntry *rte); +extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook; +</programlisting> + </para> + + <para> + Although this hook function can be used to examine, modify, or remove + paths generated by the core system, a custom scan provider will typically + confine itself to generating <structname>CustomPath</structname> objects and adding + them to <literal>rel</literal> using <function>add_path</function>. The custom scan + provider is responsible for initializing the <structname>CustomPath</structname> + object, which is declared like this: +<programlisting> +typedef struct CustomPath +{ + Path path; + uint32 flags; + List *custom_paths; + List *custom_private; + const CustomPathMethods *methods; +} CustomPath; +</programlisting> + </para> + + <para> + <structfield>path</structfield> must be initialized as for any other path, including + the row-count estimate, start and total cost, and sort ordering provided + by this path. <structfield>flags</structfield> is a bit mask, which + specifies whether the scan provider can support certain optional + capabilities. <structfield>flags</structfield> should include + <literal>CUSTOMPATH_SUPPORT_BACKWARD_SCAN</literal> if the custom path can support + a backward scan, <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</literal> if it + can support mark and restore, + and <literal>CUSTOMPATH_SUPPORT_PROJECTION</literal> if it can perform + projections. (If <literal>CUSTOMPATH_SUPPORT_PROJECTION</literal> is not + set, the scan node will only be asked to produce Vars of the scanned + relation; while if that flag is set, the scan node must be able to + evaluate scalar expressions over these Vars.) + An optional <structfield>custom_paths</structfield> is a list of <structname>Path</structname> + nodes used by this custom-path node; these will be transformed into + <structname>Plan</structname> nodes by planner. + <structfield>custom_private</structfield> can be used to store the custom path's + private data. Private data should be stored in a form that can be handled + by <literal>nodeToString</literal>, so that debugging routines that attempt to + print the custom path will work as designed. <structfield>methods</structfield> must + point to a (usually statically allocated) object implementing the required + custom path methods, which are further detailed below. + </para> + + <para> + A custom scan provider can also provide join paths. Just as for base + relations, such a path must produce the same output as would normally be + produced by the join it replaces. To do this, the join provider should + set the following hook, and then within the hook function, + create <structname>CustomPath</structname> path(s) for the join relation. +<programlisting> +typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root, + RelOptInfo *joinrel, + RelOptInfo *outerrel, + RelOptInfo *innerrel, + JoinType jointype, + JoinPathExtraData *extra); +extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook; +</programlisting> + + This hook will be invoked repeatedly for the same join relation, with + different combinations of inner and outer relations; it is the + responsibility of the hook to minimize duplicated work. + </para> + + <sect2 id="custom-scan-path-callbacks"> + <title>Custom Scan Path Callbacks</title> + + <para> +<programlisting> +Plan *(*PlanCustomPath) (PlannerInfo *root, + RelOptInfo *rel, + CustomPath *best_path, + List *tlist, + List *clauses, + List *custom_plans); +</programlisting> + Convert a custom path to a finished plan. The return value will generally + be a <literal>CustomScan</literal> object, which the callback must allocate and + initialize. See <xref linkend="custom-scan-plan"/> for more details. + </para> + + <para> +<programlisting> +List *(*ReparameterizeCustomPathByChild) (PlannerInfo *root, + List *custom_private, + RelOptInfo *child_rel); +</programlisting> + This callback is called while converting a path parameterized by the + top-most parent of the given child relation <literal>child_rel</literal> + to be parameterized by the child relation. The callback is used to + reparameterize any paths or translate any expression nodes saved in the + given <literal>custom_private</literal> member of a + <structname>CustomPath</structname>. The callback may use + <literal>reparameterize_path_by_child</literal>, + <literal>adjust_appendrel_attrs</literal> or + <literal>adjust_appendrel_attrs_multilevel</literal> as required. + </para> + </sect2> + </sect1> + + <sect1 id="custom-scan-plan"> + <title>Creating Custom Scan Plans</title> + + <para> + A custom scan is represented in a finished plan tree using the following + structure: +<programlisting> +typedef struct CustomScan +{ + Scan scan; + uint32 flags; + List *custom_plans; + List *custom_exprs; + List *custom_private; + List *custom_scan_tlist; + Bitmapset *custom_relids; + const CustomScanMethods *methods; +} CustomScan; +</programlisting> + </para> + + <para> + <structfield>scan</structfield> must be initialized as for any other scan, including + estimated costs, target lists, qualifications, and so on. + <structfield>flags</structfield> is a bit mask with the same meaning as in + <structname>CustomPath</structname>. + <structfield>custom_plans</structfield> can be used to store child + <structname>Plan</structname> nodes. + <structfield>custom_exprs</structfield> should be used to + store expression trees that will need to be fixed up by + <filename>setrefs.c</filename> and <filename>subselect.c</filename>, while + <structfield>custom_private</structfield> should be used to store other private data + that is only used by the custom scan provider itself. + <structfield>custom_scan_tlist</structfield> can be NIL when scanning a base + relation, indicating that the custom scan returns scan tuples that match + the base relation's row type. Otherwise it is a target list describing + the actual scan tuples. <structfield>custom_scan_tlist</structfield> must be + provided for joins, and could be provided for scans if the custom scan + provider can compute some non-Var expressions. + <structfield>custom_relids</structfield> is set by the core code to the set of + relations (range table indexes) that this scan node handles; except when + this scan is replacing a join, it will have only one member. + <structfield>methods</structfield> must point to a (usually statically allocated) + object implementing the required custom scan methods, which are further + detailed below. + </para> + + <para> + When a <structname>CustomScan</structname> scans a single relation, + <structfield>scan.scanrelid</structfield> must be the range table index of the table + to be scanned. When it replaces a join, <structfield>scan.scanrelid</structfield> + should be zero. + </para> + + <para> + Plan trees must be able to be duplicated using <function>copyObject</function>, + so all the data stored within the <quote>custom</quote> fields must consist of + nodes that that function can handle. Furthermore, custom scan providers + cannot substitute a larger structure that embeds + a <structname>CustomScan</structname> for the structure itself, as would be possible + for a <structname>CustomPath</structname> or <structname>CustomScanState</structname>. + </para> + + <sect2 id="custom-scan-plan-callbacks"> + <title>Custom Scan Plan Callbacks</title> + <para> +<programlisting> +Node *(*CreateCustomScanState) (CustomScan *cscan); +</programlisting> + Allocate a <structname>CustomScanState</structname> for this + <structname>CustomScan</structname>. The actual allocation will often be larger than + required for an ordinary <structname>CustomScanState</structname>, because many + providers will wish to embed that as the first field of a larger structure. + The value returned must have the node tag and <structfield>methods</structfield> + set appropriately, but other fields should be left as zeroes at this + stage; after <function>ExecInitCustomScan</function> performs basic initialization, + the <function>BeginCustomScan</function> callback will be invoked to give the + custom scan provider a chance to do whatever else is needed. + </para> + </sect2> + </sect1> + + <sect1 id="custom-scan-execution"> + <title>Executing Custom Scans</title> + + <para> + When a <structfield>CustomScan</structfield> is executed, its execution state is + represented by a <structfield>CustomScanState</structfield>, which is declared as + follows: +<programlisting> +typedef struct CustomScanState +{ + ScanState ss; + uint32 flags; + const CustomExecMethods *methods; +} CustomScanState; +</programlisting> + </para> + + <para> + <structfield>ss</structfield> is initialized as for any other scan state, + except that if the scan is for a join rather than a base relation, + <literal>ss.ss_currentRelation</literal> is left NULL. + <structfield>flags</structfield> is a bit mask with the same meaning as in + <structname>CustomPath</structname> and <structname>CustomScan</structname>. + <structfield>methods</structfield> must point to a (usually statically allocated) + object implementing the required custom scan state methods, which are + further detailed below. Typically, a <structname>CustomScanState</structname>, which + need not support <function>copyObject</function>, will actually be a larger + structure embedding the above as its first member. + </para> + + <sect2 id="custom-scan-execution-callbacks"> + <title>Custom Scan Execution Callbacks</title> + + <para> +<programlisting> +void (*BeginCustomScan) (CustomScanState *node, + EState *estate, + int eflags); +</programlisting> + Complete initialization of the supplied <structname>CustomScanState</structname>. + Standard fields have been initialized by <function>ExecInitCustomScan</function>, + but any private fields should be initialized here. + </para> + + <para> +<programlisting> +TupleTableSlot *(*ExecCustomScan) (CustomScanState *node); +</programlisting> + Fetch the next scan tuple. If any tuples remain, it should fill + <literal>ps_ResultTupleSlot</literal> with the next tuple in the current scan + direction, and then return the tuple slot. If not, + <literal>NULL</literal> or an empty slot should be returned. + </para> + + <para> +<programlisting> +void (*EndCustomScan) (CustomScanState *node); +</programlisting> + Clean up any private data associated with the <literal>CustomScanState</literal>. + This method is required, but it does not need to do anything if there is + no associated data or it will be cleaned up automatically. + </para> + + <para> +<programlisting> +void (*ReScanCustomScan) (CustomScanState *node); +</programlisting> + Rewind the current scan to the beginning and prepare to rescan the + relation. + </para> + + <para> +<programlisting> +void (*MarkPosCustomScan) (CustomScanState *node); +</programlisting> + Save the current scan position so that it can subsequently be restored + by the <function>RestrPosCustomScan</function> callback. This callback is + optional, and need only be supplied if the + <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</literal> flag is set. + </para> + + <para> +<programlisting> +void (*RestrPosCustomScan) (CustomScanState *node); +</programlisting> + Restore the previous scan position as saved by the + <function>MarkPosCustomScan</function> callback. This callback is optional, + and need only be supplied if the + <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</literal> flag is set. + </para> + + <para> +<programlisting> +Size (*EstimateDSMCustomScan) (CustomScanState *node, + ParallelContext *pcxt); +</programlisting> + Estimate the amount of dynamic shared memory that will be required + for parallel operation. This may be higher than the amount that will + actually be used, but it must not be lower. The return value is in bytes. + This callback is optional, and need only be supplied if this custom + scan provider supports parallel execution. + </para> + + <para> +<programlisting> +void (*InitializeDSMCustomScan) (CustomScanState *node, + ParallelContext *pcxt, + void *coordinate); +</programlisting> + Initialize the dynamic shared memory that will be required for parallel + operation. <literal>coordinate</literal> points to a shared memory area of + size equal to the return value of <function>EstimateDSMCustomScan</function>. + This callback is optional, and need only be supplied if this custom + scan provider supports parallel execution. + </para> + + <para> +<programlisting> +void (*ReInitializeDSMCustomScan) (CustomScanState *node, + ParallelContext *pcxt, + void *coordinate); +</programlisting> + Re-initialize the dynamic shared memory required for parallel operation + when the custom-scan plan node is about to be re-scanned. + This callback is optional, and need only be supplied if this custom + scan provider supports parallel execution. + Recommended practice is that this callback reset only shared state, + while the <function>ReScanCustomScan</function> callback resets only local + state. Currently, this callback will be called + before <function>ReScanCustomScan</function>, but it's best not to rely on + that ordering. + </para> + + <para> +<programlisting> +void (*InitializeWorkerCustomScan) (CustomScanState *node, + shm_toc *toc, + void *coordinate); +</programlisting> + Initialize a parallel worker's local state based on the shared state + set up by the leader during <function>InitializeDSMCustomScan</function>. + This callback is optional, and need only be supplied if this custom + scan provider supports parallel execution. + </para> + + <para> +<programlisting> +void (*ShutdownCustomScan) (CustomScanState *node); +</programlisting> + Release resources when it is anticipated the node will not be executed + to completion. This is not called in all cases; sometimes, + <literal>EndCustomScan</literal> may be called without this function having + been called first. Since the DSM segment used by parallel query is + destroyed just after this callback is invoked, custom scan providers that + wish to take some action before the DSM segment goes away should implement + this method. + </para> + + <para> +<programlisting> +void (*ExplainCustomScan) (CustomScanState *node, + List *ancestors, + ExplainState *es); +</programlisting> + Output additional information for <command>EXPLAIN</command> of a custom-scan + plan node. This callback is optional. Common data stored in the + <structname>ScanState</structname>, such as the target list and scan relation, will + be shown even without this callback, but the callback allows the display + of additional, private state. + </para> + </sect2> + </sect1> +</chapter> |