diff options
Diffstat (limited to 'doc/src/sgml/tablesample-method.sgml')
-rw-r--r-- | doc/src/sgml/tablesample-method.sgml | 300 |
1 files changed, 300 insertions, 0 deletions
diff --git a/doc/src/sgml/tablesample-method.sgml b/doc/src/sgml/tablesample-method.sgml new file mode 100644 index 0000000..c821941 --- /dev/null +++ b/doc/src/sgml/tablesample-method.sgml @@ -0,0 +1,300 @@ +<!-- doc/src/sgml/tablesample-method.sgml --> + +<chapter id="tablesample-method"> + <title>Writing a Table Sampling Method</title> + + <indexterm zone="tablesample-method"> + <primary>table sampling method</primary> + </indexterm> + + <indexterm zone="tablesample-method"> + <primary><literal>TABLESAMPLE</literal> method</primary> + </indexterm> + + <para> + <productname>PostgreSQL</productname>'s implementation of the <literal>TABLESAMPLE</literal> + clause supports custom table sampling methods, in addition to + the <literal>BERNOULLI</literal> and <literal>SYSTEM</literal> methods that are required + by the SQL standard. The sampling method determines which rows of the + table will be selected when the <literal>TABLESAMPLE</literal> clause is used. + </para> + + <para> + At the SQL level, a table sampling method is represented by a single SQL + function, typically implemented in C, having the signature +<programlisting> +method_name(internal) RETURNS tsm_handler +</programlisting> + The name of the function is the same method name appearing in the + <literal>TABLESAMPLE</literal> clause. The <type>internal</type> argument is a dummy + (always having value zero) that simply serves to prevent this function from + being called directly from an SQL command. + The result of the function must be a palloc'd struct of + type <type>TsmRoutine</type>, which contains pointers to support functions for + the sampling method. These support functions are plain C functions and + are not visible or callable at the SQL level. The support functions are + described in <xref linkend="tablesample-support-functions"/>. + </para> + + <para> + In addition to function pointers, the <type>TsmRoutine</type> struct must + provide these additional fields: + </para> + + <variablelist> + <varlistentry> + <term><literal>List *parameterTypes</literal></term> + <listitem> + <para> + This is an OID list containing the data type OIDs of the parameter(s) + that will be accepted by the <literal>TABLESAMPLE</literal> clause when this + sampling method is used. For example, for the built-in methods, this + list contains a single item with value <literal>FLOAT4OID</literal>, which + represents the sampling percentage. Custom sampling methods can have + more or different parameters. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>bool repeatable_across_queries</literal></term> + <listitem> + <para> + If <literal>true</literal>, the sampling method can deliver identical samples + across successive queries, if the same parameters + and <literal>REPEATABLE</literal> seed value are supplied each time and the + table contents have not changed. When this is <literal>false</literal>, + the <literal>REPEATABLE</literal> clause is not accepted for use with the + sampling method. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>bool repeatable_across_scans</literal></term> + <listitem> + <para> + If <literal>true</literal>, the sampling method can deliver identical samples + across successive scans in the same query (assuming unchanging + parameters, seed value, and snapshot). + When this is <literal>false</literal>, the planner will not select plans that + would require scanning the sampled table more than once, since that + might result in inconsistent query output. + </para> + </listitem> + </varlistentry> + </variablelist> + + <para> + The <type>TsmRoutine</type> struct type is declared + in <filename>src/include/access/tsmapi.h</filename>, which see for additional + details. + </para> + + <para> + The table sampling methods included in the standard distribution are good + references when trying to write your own. Look into + the <filename>src/backend/access/tablesample</filename> subdirectory of the source + tree for the built-in sampling methods, and into the <filename>contrib</filename> + subdirectory for add-on methods. + </para> + + <sect1 id="tablesample-support-functions"> + <title>Sampling Method Support Functions</title> + + <para> + The TSM handler function returns a palloc'd <type>TsmRoutine</type> struct + containing pointers to the support functions described below. Most of + the functions are required, but some are optional, and those pointers can + be NULL. + </para> + + <para> +<programlisting> +void +SampleScanGetSampleSize (PlannerInfo *root, + RelOptInfo *baserel, + List *paramexprs, + BlockNumber *pages, + double *tuples); +</programlisting> + + This function is called during planning. It must estimate the number of + relation pages that will be read during a sample scan, and the number of + tuples that will be selected by the scan. (For example, these might be + determined by estimating the sampling fraction, and then multiplying + the <literal>baserel->pages</literal> and <literal>baserel->tuples</literal> + numbers by that, being sure to round the results to integral values.) + The <literal>paramexprs</literal> list holds the expression(s) that are + parameters to the <literal>TABLESAMPLE</literal> clause. It is recommended to + use <function>estimate_expression_value()</function> to try to reduce these + expressions to constants, if their values are needed for estimation + purposes; but the function must provide size estimates even if they cannot + be reduced, and it should not fail even if the values appear invalid + (remember that they're only estimates of what the run-time values will be). + The <literal>pages</literal> and <literal>tuples</literal> parameters are outputs. + </para> + + <para> +<programlisting> +void +InitSampleScan (SampleScanState *node, + int eflags); +</programlisting> + + Initialize for execution of a SampleScan plan node. + This is called during executor startup. + It should perform any initialization needed before processing can start. + The <structname>SampleScanState</structname> node has already been created, but + its <structfield>tsm_state</structfield> field is NULL. + The <function>InitSampleScan</function> function can palloc whatever internal + state data is needed by the sampling method, and store a pointer to + it in <literal>node->tsm_state</literal>. + Information about the table to scan is accessible through other fields + of the <structname>SampleScanState</structname> node (but note that the + <literal>node->ss.ss_currentScanDesc</literal> scan descriptor is not set + up yet). + <literal>eflags</literal> contains flag bits describing the executor's + operating mode for this plan node. + </para> + + <para> + When <literal>(eflags & EXEC_FLAG_EXPLAIN_ONLY)</literal> is true, + the scan will not actually be performed, so this function should only do + the minimum required to make the node state valid for <command>EXPLAIN</command> + and <function>EndSampleScan</function>. + </para> + + <para> + This function can be omitted (set the pointer to NULL), in which case + <function>BeginSampleScan</function> must perform all initialization needed + by the sampling method. + </para> + + <para> +<programlisting> +void +BeginSampleScan (SampleScanState *node, + Datum *params, + int nparams, + uint32 seed); +</programlisting> + + Begin execution of a sampling scan. + This is called just before the first attempt to fetch a tuple, and + may be called again if the scan needs to be restarted. + Information about the table to scan is accessible through fields + of the <structname>SampleScanState</structname> node (but note that the + <literal>node->ss.ss_currentScanDesc</literal> scan descriptor is not set + up yet). + The <literal>params</literal> array, of length <literal>nparams</literal>, contains the + values of the parameters supplied in the <literal>TABLESAMPLE</literal> clause. + These will have the number and types specified in the sampling + method's <literal>parameterTypes</literal> list, and have been checked + to not be null. + <literal>seed</literal> contains a seed to use for any random numbers generated + within the sampling method; it is either a hash derived from the + <literal>REPEATABLE</literal> value if one was given, or the result + of <literal>random()</literal> if not. + </para> + + <para> + This function may adjust the fields <literal>node->use_bulkread</literal> + and <literal>node->use_pagemode</literal>. + If <literal>node->use_bulkread</literal> is <literal>true</literal>, which it is by + default, the scan will use a buffer access strategy that encourages + recycling buffers after use. It might be reasonable to set this + to <literal>false</literal> if the scan will visit only a small fraction of the + table's pages. + If <literal>node->use_pagemode</literal> is <literal>true</literal>, which it is by + default, the scan will perform visibility checking in a single pass for + all tuples on each visited page. It might be reasonable to set this + to <literal>false</literal> if the scan will select only a small fraction of the + tuples on each visited page. That will result in fewer tuple visibility + checks being performed, though each one will be more expensive because it + will require more locking. + </para> + + <para> + If the sampling method is + marked <literal>repeatable_across_scans</literal>, it must be able to + select the same set of tuples during a rescan as it did originally, that is + a fresh call of <function>BeginSampleScan</function> must lead to selecting the + same tuples as before (if the <literal>TABLESAMPLE</literal> parameters + and seed don't change). + </para> + + <para> +<programlisting> +BlockNumber +NextSampleBlock (SampleScanState *node, BlockNumber nblocks); +</programlisting> + + Returns the block number of the next page to be scanned, or + <literal>InvalidBlockNumber</literal> if no pages remain to be scanned. + </para> + + <para> + This function can be omitted (set the pointer to NULL), in which case + the core code will perform a sequential scan of the entire relation. + Such a scan can use synchronized scanning, so that the sampling method + cannot assume that the relation pages are visited in the same order on + each scan. + </para> + + <para> +<programlisting> +OffsetNumber +NextSampleTuple (SampleScanState *node, + BlockNumber blockno, + OffsetNumber maxoffset); +</programlisting> + + Returns the offset number of the next tuple to be sampled on the + specified page, or <literal>InvalidOffsetNumber</literal> if no tuples remain to + be sampled. <literal>maxoffset</literal> is the largest offset number in use + on the page. + </para> + + <note> + <para> + <function>NextSampleTuple</function> is not explicitly told which of the offset + numbers in the range <literal>1 .. maxoffset</literal> actually contain valid + tuples. This is not normally a problem since the core code ignores + requests to sample missing or invisible tuples; that should not result in + any bias in the sample. However, if necessary, the function can use + <literal>node->donetuples</literal> to examine how many of the tuples + it returned were valid and visible. + </para> + </note> + + <note> + <para> + <function>NextSampleTuple</function> must <emphasis>not</emphasis> assume + that <literal>blockno</literal> is the same page number returned by the most + recent <function>NextSampleBlock</function> call. It was returned by some + previous <function>NextSampleBlock</function> call, but the core code is allowed + to call <function>NextSampleBlock</function> in advance of actually scanning + pages, so as to support prefetching. It is OK to assume that once + sampling of a given page begins, successive <function>NextSampleTuple</function> + calls all refer to the same page until <literal>InvalidOffsetNumber</literal> is + returned. + </para> + </note> + + <para> +<programlisting> +void +EndSampleScan (SampleScanState *node); +</programlisting> + + End the scan and release resources. It is normally not important + to release palloc'd memory, but any externally-visible resources + should be cleaned up. + This function can be omitted (set the pointer to NULL) in the common + case where no such resources exist. + </para> + + </sect1> + +</chapter> |